Dive Into Python

Appendix A. Further reading

Chapter 1. Getting To Know Python

1.3. Documenting functions
- Python Style Guide discusses how to write a good docstring.
- Python Tutorial discusses conventions for spacing in docstrings.
1.4. Everything is an object
- Python Reference Manual explains exactly what it means to say that everything in Python is an object.
- eff-bot summarizes Python objects.
1.5. Indenting code
- Python Reference Manual discusses cross-platform indentation issues and shows various indentation errors.
- Python Style Guide discusses good indentation style.
1.6. Testing modules
- Python Reference Manual discusses the low-level details of importing modules.
1.7. Dictionaries 101
- How to Think Like a Computer Scientist teaches about dictionaries and shows how to use dictionaries to model sparse matrices.
- Python Knowledge Base has lots of example code using dictionaries.
- Python Cookbook discusses how to sort the values of a dictionary by key.
- Python Library Reference summarizes all the dictionary methods.
1.8. Lists 101
- How to Think Like a Computer Scientist teaches about lists and makes an important point about passing lists as function arguments.
- Python Tutorial shows how to use lists as stacks and queues.
- Python Knowledge Base answers common questions about lists and has lots of example code using lists.
- Python Library Reference summarizes all the list methods.
1.9. Tuples 101
- How to Think Like a Computer Scientist teaches about tuples and shows how to concatenate tuples.
- Python Knowledge Base shows how to sort a tuple.
- Python Tutorial shows how to define a tuple with one element.
1.10. Defining variables
- Python Reference Manual shows examples of when you can skip the line continuation character and when you have to use it.
1.11. Assigning multiple values at once
- How to Think Like a Computer Scientist shows how to use multi-variable assignment to swap the values of two variables.
1.12. Formatting strings
- Python Library Reference summarizes all the string formatting format characters.
- Effective AWK Programming discusses all the format characters and advanced string formatting techniques like specifying width, precision, and zero-padding.
1.13. Mapping lists
- Python Tutorial discusses another way to map lists using the built-in map function.
- Python Tutorial shows how to do nested list comprehensions.
1.14. Joining lists and splitting strings
- Python Knowledge Base answers common questions about strings and has lots of example code using strings.
- Python Library Reference summarizes all the string methods.
- Python Library Reference documents the string module.
- The Whole Python FAQ explains why join is a string method instead of a list method.

Chapter 2. The Power Of Introspection

2.2. Optional and named arguments
- Python Tutorial discusses exactly when and how default arguments are evaluated, which matters when the default value is a list or an expression with side effects.
2.3. type, str, dir, and other built-in functions
- Python Library Reference documents all the built-in functions and all the built-in exceptions.
2.5. Filtering lists
- Python Tutorial discusses another way to filter lists using the built-in filter function.
2.6. The peculiar nature of and and or
- Python Cookbook discusses alternatives to the and-or trick.
2.7. Using lambda functions
- Python Knowledge Base discusses using lambda to call functions indirectly.
- Python Tutorial shows how to access outside variables from inside a lambda function. (PEP 227 explains how this will change in future versions of Python.)
- The Whole Python FAQ has examples of obfuscated one-liners using lambda.

Chapter 3. An Object-Oriented Framework

3.2. Importing modules using from module import
- eff-bot has more to say on import module vs. from module import.
- Python Tutorial discusses advanced import techniques, including from module import *.
3.3. Defining classes
- Learning to Program has a gentler introduction to classes.
- How to Think Like a Computer Scientist shows how to use classes to model compound datatypes.
- Python Tutorial has an in-depth look at classes, namespaces, and inheritance.
- Python Knowledge Base answers common questions about classes.
3.4. Instantiating classes
- Python Library Reference summarizes built-in attributes like __class__.
- Python Library Reference documents the gc module, which gives you low-level control over Python's garbage collection.
3.5. UserDict: a wrapper class
- Python Library Reference documents the UserDict module and the copy module.
3.7. Advanced special class methods
- Python Reference Manual documents all the special class methods.
3.9. Private functions
- Python Tutorial discusses the inner workings of private variables.
3.10. Handling exceptions
- Python Tutorial discusses exceptions, including raising your own exceptions and handling multiple exceptions at once.
- Python Library Reference summarizes all the built-in exceptions.
- Python Library Reference documents the getpass module.
- Python Library Reference documents the traceback module, which provides low-level access to exception attributes after an exception is raised.
- Python Reference Manual discusses the inner workings of the try...except block.
3.11. File objects
- Python Tutorial discusses reading and writing files, including how to read a file one line at a time into a list.
- eff-bot discusses efficiency and performance of various ways of reading a file.
- Python Knowledge Base answers common questions about files.
- Python Library Reference summarizes all the file object methods.
3.13. More on modules
- Python Tutorial discusses exactly when and how default arguments are evaluated.
- Python Library Reference documents the sys module.
3.14. The os module
- Python Knowledge Base answers questions about the os module.
- Python Library Reference documents the os module and the os.path module.

Chapter 4. HTML Processing

4.9. Regular expressions 101
- Regular Expression HOWTO teaches about regular expressions and how to use them in Python.
- Python Library Reference summarizes the re module.

Chapter 5. Unit Testing

5.1. Diving in
- This site has more on Roman numerals, including a fascinating history of how Romans and other civilizations really used them (short answer: haphazardly and inconsistently).
5.2. Introducing romantest.py
- The PyUnit home page has an in-depth discussion of using the unittest framework, including advanced features not covered in this chapter.
- Python Library Reference summarizes the unittest module.
- ExtremeProgramming.org discusses why you should write unit tests.
- The Portland Pattern Repository has an ongoing discussion of unit tests, including a standard definition, why you should code unit tests first, and several in-depth case studies.
5.14. Summary
- XProgramming.com has links to download unit testing frameworks for many different languages.

Appendix B. A 5-minute review

Chapter 1. Getting To Know Python

1.1. Diving in
Here is a complete, working Python program.
1.2. Declaring functions
Python has functions like most other languages, but it does not have separate header files like C++ or interface/implementation sections like Pascal. When you need a function, just declare it and code it.
1.3. Documenting functions
You can document a Python function by giving it a docstring.
1.4. Everything is an object
A function, like everything else in Python, is an object.
1.5. Indenting code
Python functions have no explicit begin or end, no curly braces that would mark where the function code starts and stops. The only delimiter is a colon (“:”) and the indentation of the code itself.
1.6. Testing modules
Python modules are objects and have several useful attributes. You can use this to easily test your modules as you write them.
1.7. Dictionaries 101
One of Python's built-in datatypes is the dictionary, which defines one-to-one relationships between keys and values.
1.8. Lists 101
Lists are Python's workhorse datatype. If your only experience with lists is arrays in Visual Basic or (God forbid) the datastore in Powerbuilder, brace yourself for Python lists.
1.9. Tuples 101
A tuple is an immutable list. A tuple can not be changed in any way once it is created.
1.10. Defining variables
Python has local and global variables like most other languages, but it has no explicit variable declarations. Variables spring into existence by being assigned a value, and are automatically destroyed when they go out of scope.
1.11. Assigning multiple values at once
One of the cooler programming shortcuts in Python is using sequences to assign multiple values at once.
1.12. Formatting strings
Python supports formatting values into strings. Although this can include very complicated expressions, the most basic usage is to insert values into a string with the %s placeholder.
1.13. Mapping lists
One of the most powerful features of Python is the list comprehension, which provides a compact way of mapping a list into another list by applying a function to each of the elements of the list.
1.14. Joining lists and splitting strings
You have a list of key-value pairs in the form key=value, and you want to join them into a single string. To join any list of strings into a single string, use the join method of a string object.
1.15. Summary
The odbchelper.py program and its output should now make perfect sense.

Chapter 2. The Power Of Introspection

2.1. Diving in
Here is a complete, working Python program. You should understand a good deal about it just by looking at it. The numbered lines illustrate concepts covered in Getting To Know Python. Don't worry if the rest of the code looks intimidating; you'll learn all about it throughout this chapter.
2.2. Optional and named arguments
Python allows function arguments to have default values; if the function is called without the argument, the argument gets its default value. Futhermore, arguments can be specified in any order by using named arguments. Stored procedures in SQL Server Transact/SQL can do this; if you're a SQL Server scripting guru, you can skim this part.
2.3. type, str, dir, and other built-in functions
Python has a small set of extremely useful built-in functions. All other functions are partitioned off into modules. This was actually a conscious design decision, to keep the core language from getting bloated like other scripting languages (cough cough, Visual Basic).
2.4. Getting object references with getattr
You already know that Python functions are objects. What you don't know is that you can get a reference to a function without knowing its name until run-time, using the getattr function.
2.5. Filtering lists
As you know, Python has powerful capabilities for mapping lists into other lists, via list comprehensions. This can be combined with a filtering mechanism, where some elements in the list are mapped while others are skipped entirely.
2.6. The peculiar nature of and and or
In Python, and and or perform boolean logic as you would expect, but they do not return boolean values; they return one of the actual values they are comparing.
2.7. Using lambda functions
Python supports an interesting syntax that lets you define one-line mini-functions on the fly. Borrowed from Lisp, these so-called lambda functions can be used anywhere a function is required.
2.8. Putting it all together
The last line of code, the only one we haven't deconstructed yet, is the one that does all the work. But by now the work is easy, because everything we need is already set up just the way we need it. All the dominoes are in place; it's time to knock them down.
2.9. Summary
The apihelper.py program and its output should now make perfect sense.

Chapter 3. An Object-Oriented Framework

3.1. Diving in
Here is a complete, working Python program. Read the docstrings of the module, the classes, and the functions to get an overview of what this program does and how it works. As usual, don't worry about the stuff you don't understand; that's what the rest of the chapter is for.
3.2. Importing modules using from module import
Python has two ways of importing modules. Both are useful, and you should know when to use each. One way, import module, you've already seen in chapter 1. The other way accomplishes the same thing but works in subtlely and importantly different ways.
3.3. Defining classes
Python is fully object-oriented: you can define your own classes, inherit from your own or built-in classes, and instantiate the classes you've defined.
3.4. Instantiating classes
Instantiating classes in Python is straightforward. To instantiate a class, simply call the class as if it were a function, passing the arguments that the __init__ method defines. The return value will be the newly created object.
3.5. UserDict: a wrapper class
As you've seen, FileInfo is a class that acts like a dictionary. To explore this further, let's look at the UserDict class in the UserDict module, which is the ancestor of our FileInfo class. This is nothing special; the class is written in Python and stored in a .py file, just like our code. In particular, it's stored in the lib directory in your Python installation.
3.6. Special class methods
In addition to normal class methods, there are a number of special methods which Python classes can define. Instead of being called directly by your code (like normal methods), special methods are called for you by Python in particular circumstances or when specific syntax is used.
3.7. Advanced special class methods
There are more special methods than just __getitem__ and __setitem__. Some of them let you emulate functionality that you may not even know about.
3.8. Class attributes
You already know about data attributes, which are variables owned by a specific instance of a class. Python also supports class attributes, which are variables owned by the class itself.
3.9. Private functions
Like most languages, Python has the concept of private functions, which can not be called from outside their module; private class methods, which can not be called from outside their class; and private attributes, which can not be accessed from outside their class. Unlike most languages, whether a Python function, method, or attribute is private or public is determined entirely by its name.
3.10. Handling exceptions
Like many object-oriented languages, Python has exception handling via try...except blocks.
3.11. File objects
Python has a built-in function, open, for opening a file on disk. open returns a file object, which has methods and attributes for getting information about and manipulating the opened file.
3.12. for loops
Like most other languages, Python has for loops. The only reason you haven't seen them until now is that Python is good at so many other things that you don't need them as often.
3.13. More on modules
Modules, like everything else in Python, are objects. Once imported, you can always get a reference to a module through the global dictionary sys.modules.
3.14. The os module
The os module has lots of useful functions for manipulating files and processes, and os.path has functions for manipulating file and directory paths.
3.15. Putting it all together
Once again, all the dominoes are in place. We've seen how each line of code works. Now let's step back and see how it all fits together.
3.16. Summary
The fileinfo.py program should now make perfect sense.

Chapter 4. HTML Processing

4.1. Diving in
I often see questions on comp.lang.python like “How can I list all the [headers|images|links] in my HTML document?” “How do I [parse|translate|munge] the text of my HTML document but leave the tags alone?” “How can I [add|remove|quote] attributes of all my HTML tags at once?” This chapter will answer all of these questions.
4.2. Introducing sgmllib.py
HTML processing is broken into three steps: breaking down the HTML into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into HTML again. The first step is done by sgmllib.py, a part of the standard Python library.
4.3. Extracting data from HTML documents
To extract data from HTML documents, subclass the SGMLParser class and define methods for each tag or entity you want to capture.
4.4. Introducing BaseHTMLProcessor.py
SGMLParser doesn't produce anything by itself. It parses and parses and parses, and it calls a method for each interesting thing it finds, but the methods don't do anything. SGMLParser is an HTML consumer: it takes HTML and breaks it down into small, structured pieces. As you saw in the previous section, you can subclass SGMLParser to define classes that catch specific tags and produce useful things, like a list of all the links on a web page. Now we'll take this one step further by defining a class that catches everything SGMLParser throws at it and reconstructs the complete HTML document. In technical terms, this class will be an HTML producer.
4.5. locals and globals
Python has two built-in functions, locals and globals, which provide dictionary-based access to local and global variables.
4.6. Dictionary-based string formatting
There is an alternative form of string formatting that uses dictionaries instead of tuples of values.
4.7. Quoting attribute values
A common question on comp.lang.python is “I have a bunch of HTML documents with unquoted attribute values, and I want to properly quote them all. How can I do this?”^[10] (This is generally precipitated by a project manager who has found the HTML-is-a-standard religion joining a large project and proclaiming that all pages must validate against an HTML validator. Unquoted attribute values are a common violation of the HTML standard.) Whatever the reason, unquoted attribute values are easy to fix by feeding HTML through BaseHTMLProcessor.
4.8. Introducing dialect.py
Dialectizer is a simple (and silly) descendant of BaseHTMLProcessor. It runs blocks of text through a series of substitutions, but it makes sure that anything within a <pre>...</pre> block passes through unaltered.
4.9. Regular expressions 101
Regular expressions are a powerful (and fairly standardized) way of searching, replacing, and parsing text with complex patterns of characters. If you've used regular expressions in other languages (like Perl), you should skip this section and just read the summary of the re module to get an overview of the available functions and their arguments.
4.10. Putting it all together
Sorry, you've reached the end of the chapter that's been written so far. Please check back at https://book.diveintopython.org/ to see if there are any updates.

Chapter 5. Unit Testing

5.1. Diving in
In previous chapters, we “dived in” by immediately looking at code and trying to understanding it as quickly as possible. Now that you have some Python under your belt, we're going to step back and look at the steps that happen before the code gets written.
5.2. Introducing romantest.py
Now that we've completely defined the behavior we expect from our conversion functions, we're going to do something a little unexpected: we're going to write a test suite that puts these functions through their paces and makes sure that they behave the way we want them to. You read that right: we're going to write code that tests code that we haven't written yet.
5.3. Testing for success
The most fundamental part of unit testing is constructing individual test cases. A test case answers a single question about the code it is testing.
5.4. Testing for failure
It is not enough to test that our functions succeed when given good input; we must also test that they fail when given bad input. And not just any sort of failure; they must fail in the way we expect.
5.5. Testing for sanity
Often, you will find that a unit of code contains a set of reciprocal functions, usually in the form of conversion functions where one converts A to B and the other converts B to A. In these cases, it is useful to create a “sanity check” to make sure that you can convert A to B and back to A without losing decimal precision, incurring rounding errors, or triggering any other sort of bug.
5.6. roman.py, stage 1
Now that our unit test is complete, it's time to start writing the code that our test cases are attempting to test. We're going to do this in stages, so we can see all the unit tests fail, then watch them pass one by one as we fill in the gaps in roman.py.
5.7. roman.py, stage 2
Now that we have the framework of our roman module laid out, it's time to start writing code and passing test cases.
5.8. roman.py, stage 3
Now that toRoman behaves correctly with good input (integers from 1 to 3999), it's time to make it behave correctly with bad input (everything else).
5.9. roman.py, stage 4
Now that toRoman is done, it's time to start coding fromRoman. Thanks to our rich data structure that maps individual Roman numerals to integer values, this is no more difficult than the toRoman function.
5.10. roman.py, stage 5
Now that fromRoman works properly with good input, it's time to fit in the last piece of the puzzle: making it work properly with bad input. That means finding a way to look at a string and determine if it's a valid Roman numeral. This is inherently more difficult than validating numeric input in toRoman, but we have a powerful tool at our disposal: regular expressions.
5.11. Handling bugs
Despite your best efforts to write comprehensive unit tests, bugs happen. What do I mean by “bug”? A bug is a test case you haven't written yet.
5.12. Handling changing requirements
Despite your best efforts to pin your customers to the ground and extract exact requirements from them on pain of horrible nasty things involving scissors and hot wax, requirements will change. Most customers don't know what they want until they see it, and even if they do, they aren't that good at articulating what they want precisely enough to be useful. And even if they do, they'll want more in the next release anyway. So be prepared to update your test cases as requirements change.
5.13. Refactoring
The best thing about comprehensive unit testing is not the feeling you get when all your test cases finally pass, or even the feeling you get when someone else blames you for breaking their code and you can actually prove that you didn't. The best thing about unit testing is that it gives you the freedom to refactor mercilessly.
5.14. Summary
Unit testing is a powerful concept which, if properly implemented, can both reduce maintenance costs and increase flexibility in any long-term project. It is also important to understand that unit testing is not a panacea, a Magic Problem Solver, or a silver bullet. Writing good test cases is hard, and keeping them up to date takes discipline (especially when customers are screaming for critical bug fixes). Unit testing is not a replacement for other forms of testing, including functional testing, integration testing, and user acceptance testing. But it is feasible, and it does work, and once you've seen it work, you'll wonder how you ever got along without it.

Appendix C. Tips and tricks

Chapter 1. Getting To Know Python

1.1. Diving in


	In the Python IDE on Windows, you can run a module with File->Run... (Ctrl-R). Output is displayed in the interactive window.


	In the Python IDE on Mac OS, you can run a module with Python->Run window... (Cmd-R), but there is an important option you must set first. Open the module in the IDE, pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and make sure “Run as __main__” is checked. This setting is saved with the module, so you only have to do this once per module.


	On UNIX-compatible systems (including Mac OS X), you can run a module from the command line: `python` `odbchelper.py`

1.2. Declaring functions


	In Visual Basic, functions (that return a value) start with `function`, and subroutines (that do not return a value) start with `sub`. There are no subroutines in Python. Everything is a function, all functions return a value (even if it's `None`), and all functions start with `def`.


	In Java, C++, and other strongly-typed languages, you must specify the datatype of the function return value and each function argument. In Python, you never explicitly specify the datatype of anything. Based on what value you assign, Python keeps track of the datatype internally.

1.3. Documenting functions


	Triple quotes are also an easy way to define a string with both single and double quotes, like `qq/.../` in Perl.


	Many Python IDEs use the `docstring` to provide context-sensitive documentation, so that when you type a function name, its `docstring` appears as a tooltip. This can be incredibly helpful, but it's only as good as the `docstring`s you write.

1.4. Everything is an object


	`import` in Python is like `require` in Perl. Once you `import` a Python module, you access its functions with `module.function`; once you `require` a Perl module, you access its functions with `module::function`.

1.5. Indenting code
Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks.

1.6. Testing modules


	Like C, Python uses `==` for comparison and `=` for assignment. Unlike C, Python does not support in-line assignment, so there's no chance of accidentally assigning the value you thought you were comparing.


	On MacPython, there is an additional step to make the `if` `__name__` trick work. Pop up the module's options menu by clicking the black triangle in the upper-right corner of the window, and make sure Run as __main__ is checked.

1.7. Dictionaries 101


	A dictionary in Python is like a hash in Perl. In Perl, variables which store hashes always start with a `%` character; in Python, variables can be named anything, and Python keeps track of the datatype internally.


	A dictionary in Python is like an instance of the `Hashtable` class in Java.


	A dictionary in Python is like an instance of the `Scripting.Dictionary` object in Visual Basic.


	Dictionaries have no concept of order among elements. It is incorrect to say that the elements are “out of order”; they are simply unordered. This is an important distinction which will annoy you when you want to access the elements of a dictionary in a specific, repeatable order (like alphabetical order by key). There are ways of doing this, they're just not built into the dictionary.

1.8. Lists 101


	A list in Python is like an array in Perl. In Perl, variables which store arrays always start with the `@` character; in Python, variables can be named anything, and Python keeps track of the datatype internally.


	A list in Python is much more than an array in Java (although it can be used as one if that's really all you want out of life). A better analogy would be to the `Vector` class, which can hold arbitrary objects and can expand dynamically as new items are added.


	There is no boolean datatype in Python. In a boolean context (like an `if` statement), `0` is false and all other numbers are true. This extends to other datatypes, too. An empty string (`""`), an empty list (`[]`), and an empty dictionary (`{}`) are all false; all other strings, lists, and dictionaries are true.

1.9. Tuples 101


	Tuples can be converted into lists, and vice-versa. The built-in `tuple` function takes a list and returns a tuple with the same elements, and the `list` function takes a tuple and returns a list. In effect, `tuple` freezes a list, and `list` thaws a tuple.

1.10. Defining variables


	When a command is split among several lines with the line continuation marker (“`\`”), the continued lines can be indented in any manner; Python's normally stringent indentation rules do not apply. If your Python IDE auto-indents the continued line, you should probably accept its default unless you have a burning reason not to.


	Strictly speaking, expressions in parentheses, straight brackets, or curly braces (like defining a dictionary) can be split into multiple lines with or without the line continuation character (“`\`”). I like to include the backslash even when it's not required because I think it makes the code easier to read, but that's a matter of style.

1.12. Formatting strings
String formatting in Python uses the same syntax as the sprintf function in C.

1.14. Joining lists and splitting strings


	`join` only works on lists of strings; it does not do any type coercion. `join`ing a list that has one or more non-string elements will raise an exception.


	`anystring.split`(delimiter, 1) is a useful technique when you want to search a string for a substring and then work with everything before the substring (which ends up in the first element of the returned list) and everything after it (which ends up in the second element).

Chapter 2. The Power Of Introspection

2.2. Optional and named arguments
The only thing you have to do to call a function is specify a value (somehow) for each required argument; the manner and order in which you do that is up to you.

2.3. type, str, dir, and other built-in functions


	Python comes with excellent reference manuals, which you should peruse thoroughly to learn all the modules Python has to offer. But whereas in most languages you would find yourself referring back to the manuals (or man pages, or, God help you, MSDN) to remind yourself how to use these modules, Python is largely self-documenting.

2.6. The peculiar nature of and and or
The and-or trick, bool and a or b, will not work like the C expression bool ? a : b when a is false in a boolean context.

2.7. Using lambda functions


	`lambda` functions are a matter of style. Using them is never required; anywhere you could use them, you could define a separate normal function and use that instead. I use them in places where I want to encapsulate specific, non-reusable code without littering my code with a lot of little one-line functions.

2.8. Putting it all together
In SQL, you would use IS NULL instead of = NULL to compare a null value. In Python, there is no special syntax; you use == None just like any other comparison.

Chapter 3. An Object-Oriented Framework

3.2. Importing modules using from module import
from module import in Python is like use module in Perl; import module in Python is like require module in Perl.

3.3. Defining classes


	The `pass` statement in Python is like an empty set of braces (`{}`) in Java or C.


	In Python, the ancestor of a class is simply listed in parentheses immediately after the class name. There is no special keyword like `extends` in Java.


	Although I won't discuss it in depth in this book, Python supports multiple inheritance. In the parentheses following the class name, you can list as many ancestor classes as you like, separated by commas.


	By convention, the first argument of any class method (the reference to the current instance) is called `self`. This argument fills the role of the reserved word `this` in C++ or Java, but `self` is not a reserved word in Python, merely a naming convention. Nonetheless, please don't call it anything but `self`; this is a very strong convention.


	When defining your class methods, you must explicitly list `self` as the first argument for each method, including `__init__`. When you call a method of an ancestor class from within your class, you must include the `self` argument. But when you call your class method from outside, you do not specify anything for the `self` argument; you skip it entirely, and Python automatically adds the instance reference for you. I am aware that this is confusing at first; it's not really inconsistent, but it may appear inconsistent because it relies on a distinction (between bound and unbound methods) that you don't know about yet.


	`__init__` methods are optional, but when you define one, you must remember to explicitly call the ancestor's `__init__` method. This is more generally true: whenever a descendant wants to extend the behavior of the ancestor, the descendant method must explicitly call the ancestor method at the proper time, with the proper arguments.

3.4. Instantiating classes
In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit new operator like C++ or Java.

3.5. UserDict: a wrapper class


	In the Python IDE on Windows, you can quickly open any module in your library path with File->Locate... (Ctrl-L).


	Java and Powerbuilder support function overloading by argument list, i.e. one class can have multiple methods with the same name but a different number of arguments, or arguments of different types. Other languages (most notably PL/SQL) even support function overloading by argument name; i.e. one class can have multiple methods with the same name and the same number of arguments of the same type but different argument names. Python supports neither of these; it has no form of function overloading whatsoever. An `__init__` method is an `__init__` method is an `__init__` method, regardless of its arguments. There can be only one `__init__` method per class, and if a descendant class has an `__init__` method, it always overrides the ancestor `__init__` method, even if the descendant defines it with a different argument list.


	Always assign an initial value to all of an instance's data attributes in the `__init__` method. It will save you hours of debugging later.

3.6. Special class methods
When accessing data attributes within a class, you need to qualify the attribute name: self.attribute. When calling other methods within a class, you need to qualify the method name: self.method.

3.7. Advanced special class methods


	In Java, you determine whether two string variables reference the same physical memory location by using `str1 == str2`. This is called object identity, and it is written in Python as `str1 is str2`. To compare string values in Java, you would use `str1.equals(str2)`; in Python, you would use `str1 == str2`. Java programmers who have been taught to believe that the world is a better place because `==` in Java compares by identity instead of by value may have a difficult time adjusting to Python's lack of such “gotchas”.


	While other object-oriented languages only let you define the physical model of an object (“this object has a `GetLength` method”), Python's special class methods like `__len__` allow you to define the logical model of an object (“this object has a length”).

3.8. Class attributes


	In Java, both static variables (called class attributes in Python) and instance variables (called data attributes in Python) are defined immediately after the class definition (one with the `static` keyword, one without). In Python, only class attributes can be defined here; data attributes are defined in the `__init__` method.

3.9. Private functions


	If the name of a Python function, class method, or attribute starts with (but doesn't end with) two underscores, it's private; everything else is public.


	In Python, all special methods (like `__setitem__`) and built-in attributes (like `__doc__`) follow a standard naming convention: they both start with and end with two underscores. Don't name your own methods and attributes this way; it will only confuse you (and others) later.


	Python has no concept of protected class methods (accessible only in their own class and descendant classes). Class methods are either private (accessible only in their own class) or public (accessible from anywhere).

3.10. Handling exceptions
Python uses try...except to handle exceptions and raise to generate them. Java and C++ use try...catch to handle exceptions, and throw to generate them.

3.14. The os module


	Whenever possible, you should use the functions in `os` and `os.path` for file, directory, and path manipulations. These modules are wrappers for platform-specific modules, so functions like `os.path.split` work on UNIX, Windows, Mac OS, and any other supported Python platform.

Chapter 4. HTML Processing

4.2. Introducing sgmllib.py
Python 2.0 had a bug where SGMLParser would not recognize declarations at all (handle_decl would never be called), which meant that DOCTYPEs were silently ignored. This is fixed in Python 2.1.
In the Python IDE on Windows, you can specify command line arguments in the “Run script” dialog. Separate multiple arguments with spaces.

4.4. Introducing BaseHTMLProcessor.py


	The HTML specification requires that all non-HTML (like client-side JavaScript) must be enclosed in HTML comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't). `BaseHTMLProcessor` is not forgiving; if script is improperly embedded, it will be parsed as if it were HTML. For instance, if the script contains less-than and equals signs, `SGMLParser` may incorrectly think that it has found tags and attributes. `SGMLParser` always converts tags and attribute names to lowercase, which may break the script, and `BaseHTMLProcessor` always encloses attribute values in double quotes (even if the original HTML document used single quotes or no quotes), which will certainly break the script. Always protect your client-side script within HTML comments.

4.5. locals and globals


	Python 2.2 will introduce a subtle but important change that affects the namespace search order: nested scopes. In Python 2.0, when you reference a variable within a nested function or `lambda` function, Python will search for that variable in the current (nested or `lambda`) function's namespace, then in the module's namespace. Python 2.2 will search for the variable in the current (nested or `lambda`) function's namespace, then in the parent function's namespace, then in the module's namespace. Python 2.1 can work either way; by default, it works like Python 2.0, but you can add the following line of code at the top of your module to make your module work like Python 2.2: from __future__ import nested_scopes


	Using the `locals` and `globals` functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string. This mirrors the functionality of the `getattr` function, which allows you to access arbitrary functions dynamically by providing the function name as a string.

4.6. Dictionary-based string formatting


	Using dictionary-based string formatting with `locals` is a convenient way of making complex string formatting expressions more readable, but it comes with a price. There is a slight performance hit in making the call to `locals`. Ordinarily, it's not enough to worry about, but if you have a string formatting expression in a loop (including a list comprehension), you should probably stick with the normal tuple-based form.

Chapter 5. Unit Testing

5.2. Introducing romantest.py
unittest is available in Python 2.1 and later. Python 2.0 users can download it from pyunit.sourceforge.net.

5.8. roman.py, stage 3


	The most important thing that comprehensive unit testing can tell you is when to stop coding. When all the unit tests for a function pass, stop coding the function. When all the unit tests for an entire module pass, stop coding the module.

5.10. roman.py, stage 5
When all your tests pass, stop coding.
5.13. Refactoring
Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly.

Appendix D. List of examples

Chapter 1. Getting To Know Python

Chapter 2. The Power Of Introspection

Chapter 3. An Object-Oriented Framework

Chapter 4. HTML Processing

Chapter 5. Unit Testing

Appendix E. Revision history

Revision History
Revision 3.4	31 May 2001
Added `roman.py`, stage 5. Added Handling bugs. Added Handling changing requirements. Added Refactoring. Added Summary. Fixed yet another stylesheet bug that was dropping nested `</span>` tags.
Revision 3.3	24 May 2001
Added Diving in. Added Introducing `romantest.py`. Added Testing for success. Added Testing for failure. Added Testing for sanity. Added `roman.py`, stage 1. Added `roman.py`, stage 2. Added `roman.py`, stage 3. Added `roman.py`, stage 4. Tweaked stylesheets in an endless quest for complete Netscape/Mozilla compatibility.
Revision 3.2	3 May 2001
Added Introducing `dialect.py`. Added Regular expressions 101. Fixed bug in `handle_decl` method that would produce incorrect declarations (adding a space where it couldn't be). Fixed bug in CSS (introduced in 2.9) where body background color was missing.
Revision 3.1	18 Apr 2001
Added code in `BaseHTMLProcessor.py` to handle declarations, now that Python 2.1 supports them. Added note about nested scopes in `locals` and `globals`. Fixed obscure bug in Example 4.1. `BaseHTMLProcessor.py` where attribute values with character entities would not be properly escaped. Now recommending (but not requiring) Python 2.1, due to its support of declarations in `sgmllib.py`. Updated download links on thes home page to point to Python 2.1, where available. Moved to versioned filenames, to help people who redistribute the book.
Revision 3.0	16 Apr 2001
Fixed minor bug in code listing in HTML Processing. Added link to Chinese translation on home page.
Revision 2.9	13 Apr 2001
Added `locals` and `globals`. Added Dictionary-based string formatting. Tightened code in HTML Processing, specifically `ChefDialectizer`, to use fewer and simpler regular expressions. Fixed a stylesheet bug that was inserting blank pages between chapters in the PDF version. Fixed a script bug that was stripping the `DOCTYPE` from the home page. Added link to Python Cookbook, and added a few links to individual recipes in Further reading. Switched to Google for searching on `https://book.diveintopython.org/`. Upgraded to version 1.36 of the DocBook XSL stylesheets, which was much more difficult than it sounds. There may still be lingering bugs.
Revision 2.8	26 Mar 2001
Added Extracting data from HTML documents. Added Introducing `BaseHTMLProcessor.py`. Added Quoting attribute values. Tightened up code in The Power Of Introspection, using the built-in function `callable` instead of manually checking types. Moved Importing modules using `from` module import from The Power Of Introspection to An Object-Oriented Framework. Fixed typo in code example in Diving in (added colon). Added several additional downloadable example scripts. Added Windows Help output format.
Revision 2.7	16 Mar 2001
Added Introducing `sgmllib.py`. Tightened up code in HTML Processing. Changed code in Getting To Know Python to use `items` method instead of `keys`. Moved Assigning multiple values at once section to Getting To Know Python. Edited note about `join` string method, and provided a link to the new entry in The Whole Python FAQ that explains why `join` is a string method instead of a list method. Rewrote The peculiar nature of `and` and `or` to emphasize the fundamental nature of `and` and `or` and de-emphasize the `and-or` trick. Reorganized language comparisons into `note`s.
Revision 2.6	28 Feb 2001
The PDF and Word versions now have colorized examples, an improved table of contents, and properly indented `tip`s and `note`s. The Word version is now in native Word format, compatible with Word 97. The PDF and text versions now have fewer problems with improperly converted special characters (like trademark symbols and curly quotes). Added link to download Word version for UNIX, in case some twisted soul wants to import it into StarOffice or something. Fixed several `note`s which were missing titles. Fixed stylesheets to work around bug in Internet Explorer 5 for Mac OS which caused colorized words in the examples to be displayed in the wrong font. (Hello?!? Microsoft? Which part of `<pre>` don't you understand?) Fixed archive corruption in Mac OS downloads. In first section of each chapter, added link to download examples. (My access logs show that people skim or skip the two pages where they could have downloaded them (the home page and Preface), then scramble to find a download link once they actually start reading.) Tightened the home page and Preface even more, in the hopes that someday someone will read them. Soon I hope to get back to actually writing this book instead of debugging it.
Revision 2.5	23 Feb 2001
Added More on modules. Added The `os` module. Moved Example 3.35. Splitting pathnames from Assigning multiple values at once to The `os` module. Added Putting it all together. Added Summary. Added Diving in. Fixed program listing in Example 3.28. Iterating through a dictionary which was missing a colon.
Revision 2.4.1	12 Feb 2001
Changed newsgroup links to use “news:” protocol, now that `deja.com` is defunct. Added file sizes to download links.
Revision 2.4	12 Feb 2001
Added “further reading” links in most sections, and collated them in Further reading. Added URLs in parentheses next to external links in text version.
Revision 2.3	9 Feb 2001
Rewrote some of the code in An Object-Oriented Framework to use class attributes and a better example of multi-variable assignment. Reorganized An Object-Oriented Framework to put the class sections first. Added Class attributes. Added Handling exceptions. Added File objects. Merged the “review” section in An Object-Oriented Framework into Diving in. Colorized all program listings and examples. Fixed important error in Declaring functions: functions that do not explicitly return a value return `None`, so you can assign the return value of such a function to a variable without raising an exception. Added minor clarifications to Documenting functions, Everything is an object, and Defining variables.
Revision 2.2	2 Feb 2001
Edited Getting object references with `getattr`. Added titles to `xref` tags, so they can have their cute little tooltips too. Changed the look of the revision history page. Fixed problem I introduced yesterday in my HTML post-processing script that was causing invalid HTML character references and breaking some browsers. Upgraded to version 1.29 of the DocBook XSL stylesheets.
Revision 2.1	1 Feb 2001
Rewrote the example code of The Power Of Introspection to use `getattr` instead of `exec` and `eval`, and rewrote explanatory text to match. Added example of list operators in Lists 101. Added links to relevant sections in the summary lists at the end of each chapter (Summary and Summary).
Revision 2.0	31 Jan 2001
Split Special class methods into three sections, `UserDict`: a wrapper class, Special class methods, and Advanced special class methods. Changed notes on garbage collection to point out that Python 2.0 and later can handle circular references without additional coding. Fixed UNIX downloads to include all relevant files.
Revision 1.9	15 Jan 2001
Removed introduction to Getting To Know Python. Removed introduction to The Power Of Introspection. Removed introduction to An Object-Oriented Framework. Edited text ruthlessly. I tend to ramble.
Revision 1.8	12 Jan 2001
Added more examples to Assigning multiple values at once. Added Defining classes. Added Instantiating classes. Added Special class methods. More minor stylesheet tweaks, including adding titles to `link` tags, which, if your browser is cool enough, will display a description of the link target in a cute little tooltip.
Revision 1.71	3 Jan 2001
Made several modifications to stylesheets to improve browser compatibility.
Revision 1.7	2 Jan 2001
Added introduction to Getting To Know Python. Added introduction to The Power Of Introspection. Added review section to An Object-Oriented Framework [later removed] Added Private functions. Added `for` loops. Added Assigning multiple values at once. Wrote scripts to convert book to new output formats: one single HTML file, PDF, Microsoft Word 97, and plain text. Registered the `diveintopython.org` domain and moved the book there, along with links to download the book in all available output formats for offline reading. Modified the XSL stylesheets to change the header and footer navigation that displays on each page. The top of each page is branded with the domain name and book version, followed by a breadcrumb trail to jump back to the chapter table of contents, the main table of contents, or the site home page.
Revision 1.6	11 Dec 2000
Added Putting it all together. Finished The Power Of Introspection with Summary. Started An Object-Oriented Framework with Diving in.
Revision 1.5	22 Nov 2000
Added The peculiar nature of `and` and `or`. Added Using `lambda` functions. Added appendix that lists section abstracts. Added appendix that lists tips. Added appendix that lists examples. Added appendix that lists revision history. Expanded example of mapping lists in Mapping lists. Encapsulated several more common phrases into entities. Upgraded to version 1.25 of the DocBook XSL stylesheets.
Revision 1.4	14 Nov 2000
Added Filtering lists. Added `dir` documentation to `type`, `str`, `dir`, and other built-in functions. Added `in` example in Tuples 101. Added additional note about `if` `__name__` trick under MacPython. Switched to the SAXON XSLT processor from Michael Kay of ICL. Upgraded to version 1.24 of the DocBook XSL stylesheets. Added db-html processing instructions with explicit filenames of each chapter and section, to allow deep links to content even if I add or re-arrange sections later. Made several common phrases into entities for easier reuse. Changed several `literal` tags to `constant`.
Revision 1.3	9 Nov 2000
Added section on dynamic code execution. Added links to relevant section/example wherever I refer to previously covered concepts. Expanded introduction of chapter 2 to explain what the function actually does. Explicitly placed example code under the GNU General Public License and added appendix to display license. Changed links to licenses to use `xref` tags, now that I know how to use them.
Revision 1.2	6 Nov 2000
Added first four sections of chapter 2. Tightened up preface even more, and added link to Mac OS version of Python. Filled out examples in "Mapping lists" and "Joining strings" to show logical progression. Added output in chapter 1 summary.
Revision 1.1	31 Oct 2000
Finished chapter 1 with sections on mapping and joining, and a chapter summary. Toned down the preface, added links to introductions for non-programmers. Fixed several typos.
Revision 1.0	30 Oct 2000
Initial publication

Appendix F. About the book

This book was written in DocBook XML using Emacs, and converted to HTML using the SAXON XSLT processor from Michael Kay of ICL with a customized version of Norman Walsh's XSL stylesheets. From there, it was converted to PDF using HTMLDoc, and to plain text using w3m. Program listings and examples were colorized using an updated version of Just van Rossum's pyfontify.py, which is included in the example scripts.

If you're interested in learning more about DocBook for technical writing, you can download the XML source for this book (Windows, UNIX, Mac OS), which also includes the customized XSL stylesheets. You should also read the canonical book, DocBook: The Definitive Guide. If you're going to do any serious writing in DocBook, I would recommend subscribing to the DocBook mailing lists.

<< 靈 5 � 傑� �맑�

� 量

Gnu Free Document License >>


	Python uses carriage returns to separate statements and a colon and indentation to separate code blocks. C++ and Java use semicolons to separate statements and curly braces to separate code blocks.


	String formatting in Python uses the same syntax as the `sprintf` function in C.


	The only thing you have to do to call a function is specify a value (somehow) for each required argument; the manner and order in which you do that is up to you.


	The `and-or` trick, `bool and` `a` or `b`, will not work like the C expression `bool ?` `a` : `b` when `a` is false in a boolean context.


	In SQL, you would use `IS NULL` instead of `= NULL` to compare a null value. In Python, there is no special syntax; you use `== None` just like any other comparison.


	`from module import` in Python is like `use module` in Perl; `import module` in Python is like `require module` in Perl.


	In Python, simply call a class as if it were a function to create a new instance of the class. There is no explicit `new` operator like C++ or Java.


	When accessing data attributes within a class, you need to qualify the attribute name: `self.attribute`. When calling other methods within a class, you need to qualify the method name: `self.method`.


	Python uses `try...except` to handle exceptions and `raise` to generate them. Java and C++ use `try...catch` to handle exceptions, and `throw` to generate them.


	Python 2.0 had a bug where `SGMLParser` would not recognize declarations at all (`handle_decl` would never be called), which meant that `DOCTYPE`s were silently ignored. This is fixed in Python 2.1.


	In the Python IDE on Windows, you can specify command line arguments in the “Run script” dialog. Separate multiple arguments with spaces.


	`unittest` is available in Python 2.1 and later. Python 2.0 users can download it from `pyunit.sourceforge.net`.


	Whenever you are going to use a regular expression more than once, you should compile it to get a pattern object, then call the methods on the pattern object directly.