10 Python pitfalls

原文见：http://zephyrfalcon.org/labs/python_pitfalls.html

(or however many I'll find ;-)

These are not necessarily warts or flaws; rather, they are (side effects of) language features that often trip up newbies, and sometimes experienced programmers. Incomplete understanding of some core Python behavior may cause people to get bitten by these.

This document is meant as some sort of guideline to those who are new to Python. It's better to learn about the pitfalls early, than to encounter them in production code shortly before a deadline. :-} It is *not* meant to criticize the language; as said, most of these pitfalls are not due to language flaws.

1. Inconsistent indentation

OK, this is a cheesy one to start with. However, many newbies come from languages where whitespace "doesn't matter", and are in for a rude surprise when they find out the hard way that their inconsistent indentation practices are punished by Python.

Solution: Indent consistently. Use all spaces, or all tabs, but don't mix them. A decent editor helps.

2. Assignment, aka names and objects

People coming from statically typed languages like Pascal and C often assume that Python variables and assignment work the same as in their language of choice. At first glance, it looks indeed the same:

a = b = 3
a = 4
print a, b  # 4, 3

However, then they run into trouble when using mutable objects. Often this goes hand in hand with a claim that Python treats mutable and immutable objects differently.

a = [1, 2, 3]
b = a
a.append(4)
print b
# b is now [1, 2, 3, 4] as well

What is going on, is that a statement like a = [1, 2, 3] does two things: 1. it creates an object, in this case a list, with value [1, 2, 3]; 2. it binds name a to it in the local namespace. b = a then binds b to the same list (which is already referenced by a). Once you realize this, it is less difficult to understand what a.append(4) does... it changes the list referenced to by both a and b.

The idea that mutable and immutable objects are treated differently when doing assignment, is incorrect. When doing a = 3 and b = a, the exact same thing happens as with the list. a and b now refer to the same object, an integer with value 3. However, because integers are immutable, you don't run into side effects.

Solution: Read this. To get rid of unwanted side effects, copy (using the copy method, the slice operator, etc). Python never copies implicitly.

3. The += operator

In languages like C, augmented assignment operators like += are a shorthand for a longer expression. For example,

x += 42;

is syntactic sugar for

x = x + 42;

So, you might think that it's the same in Python. Sure enough, it seems that way at first:

a = 1
a = a + 42
# a is 43
a = 1
a += 42
# a is 43

However, for mutable objects, x += y is not necessarily the same as x = x + y. Consider lists:

>>> z = [1, 2, 3]
>>> id(z)
24213240
>>> z += [4]
>>> id(z)
24213240
>>> z = z + [5]
>>> id(z)
24226184

x += y changes the list in-place, having the same effect as the extend method. x = x + y creates a new list and rebinds it to x, which is something else. A subtle difference that can lead to subtle and hard-to-catch bugs.

Not only that, it also leads to surprising behavior when mixing mutable and immutable containers:

>>> t = ([],)
>>> t[0] += [2, 3]
Traceback (most recent call last):
File "<input>", line 1, in ?
TypeError: object doesn't support item assignment
>>> t
([2, 3],)

Sure enough, tuples don't support item assignment -- but after applying the +=, the list inside the tuple *did* change! The reason is again that += changes in-place. The item assignment doesn't work, but when the exception occurs, the item has already been changed in place.

This is one pitfall that I personally consider a wart.

Solution: depending on your stance on this, you can: avoid += altogether; use it for integers only; or just live with it. :-)

4. Class attributes vs instance attributes

At least two things can go wrong here. First of all, newbies regularly stick attributes in a class (rather than an instance), and are surprised when the attributes are shared between instances:

>>> class Foo:
...     bar = []
...     def __init__(self, x):
...         self.bar.append(x)
...
>>> f = Foo(42)
>>> g = Foo(100)
>>> f.bar, g.bar
([42, 100], [42, 100])

This is not a wart, though, but a nice feature that can be useful in many situations. The misunderstanding springs from the fact that class attributes have been used rather than instance attributes, possibly because instance attributes are created differently from other languages. In C++, Object Pascal, etc, you declare them in the class body.

Another (small) pitfall is that self.foo can refer to two things: the instance attribute foo, or, in absence of that, the class attribute foo. Compare:

>>> class Foo:
...     a = 42
...     def __init__(self):
...         self.a = 43
...
>>> f = Foo()
>>> f.a
43

and

>>> class Foo:
...     a = 42
...
>>> f = Foo()
>>> f.a
42

In the first example, f.a refers to the instance attribute, with value 43. It overrides the class attribute a with value 42. In the second example, there is no instance attribute a, so f.a refers to the class attribute.

The following code combines the two:

>>> class Foo:
...
...     bar = []
...     def __init__(self, x):
...         self.bar = self.bar + [x]
...
>>> f = Foo(42)
>>> g = Foo(100)
>>> f.bar
[42]
>>> g.bar
[100]

In self.bar = self.bar + [x], the self.bars are not the same... the second one refers to the class attribute bar, then the result of the expression is bound to the instance attribute.

Solution: This distinction can be confusing, but is not incomprehensible. Use class attributes when you want to share something between multiple class instances. To avoid ambiguity, you can refer to them as self.__class__.name rather than self.name, even if there is no instance attribute with that name. Use instance attributes for attributes unique to the instance, and refer to them as self.name.

Update: Several people noted that #3 and #4 can be combined for even more twisted fun:

>>> class Foo:
... bar = []
... def __init__(self, x):
...     self.bar += [x]
...
>>> f = Foo(42)
>>> g = Foo(100)
>>> f.bar
[42, 100]
>>> g.bar
[42, 100]

Again, the reason for this behavior is that self.bar += something is not the same as self.bar = self.bar + something. self.bar refers to Foo.bar here, so f and g update the same list.

5. Mutable default arguments

This one bites beginners over and over again. It's really a variant of #2, combined with unexpected behavior of default arguments. Consider this function:

>>> def popo(x=[]):
...     x.append(666)
...     print x
...
>>> popo([1, 2, 3])
[1, 2, 3, 666]
>>> x = [1, 2]
>>> popo(x)
[1, 2, 666]
>>> x
[1, 2, 666]

This was expected. But now:

>>> popo()
[666]
>>> popo()
[666, 666]
>>> popo()
[666, 666, 666]

Maybe you expected that the output would be [666] in all cases... after all, when popo() is called without arguments, it takes [] as the default argument for x, right? Wrong. The default argument is bound *once*, when the function is *created*, not when it's called. (In other words, for a function f(x=[]), x is *not* bound whenever the function is called. x got bound to [] when we defined f, and that's it.) So if it's a mutable object, and it has changed, then the next function call will take this same list (which has different contents now) as its default argument.

Solution: This behavior can occasionally be useful. In general, just watch out for unwanted side effects.

6. UnboundLocalError

According to the reference manual, this error occurs if a name "refers to a local variable that has not been bound". That sounds cryptical. It's best illustrated by a small example:

>>> def p():
...     x = x + 2
...
>>> p()
Traceback (most recent call last):
File "<input>", line 1, in ?
File "<input>", line 2, in p
UnboundLocalError: local variable 'x' referenced before
assignment

Inside p, the statement x = x + 2 cannot be resolved, because the x in the expression x + 2 has no value yet. That seems reasonable; you can't refer to a name that hasn't been bound yet. But now consider:

>>> x = 2
>>> def q():
...     print x
...     x = 3
...     print x
...
>>> q()
Traceback (most recent call last):
File "<input>", line 1, in ?
File "<input>", line 2, in q
UnboundLocalError: local variable 'x' referenced before
assignment

You'd think that this piece of code would be valid -- first it prints 2 (for the global variable x), then assigns the local variable x to 3, and prints it (3). This doesn't work though. This is because of scoping rules, explained by the reference manual:

"If a name is bound in a block, it is a local variable of that block. If a name is bound at the module level, it is a global variable. (The variables of the module code block are local and global.) If a variable is used in a code block but not defined there, it is a free variable.

When a name is not found at all, a NameError exception is raised. If the name refers to a local variable that has not been bound, a UnboundLocalError exception is raised."

In other words: a variable in a function can be local or global, but not both. (No matter if you rebind it later.) In the example above, Python determines that x is local (according to the rules). But upon execution it encounters print x, and x doesn't have a value yet... hence the error.

Note that a function body of just print x or x = 3; print x would have been perfectly valid.

Solution: Don't mix local and global variables like this.

7. Floating point rounding errors

When using floating point numbers, printing their values may have surprising results. To make matters more interesting, the str() and repr() representations may differ. An example says it all:

>>> c = 0.1
>>> c
0.10000000000000001
>>> repr(c)
'0.10000000000000001'
>>> str(c)
'0.1'

Because many numbers cannot be represented exactly in base 2 (which is what computer hardware uses), the actual value has to be approximated in base 10.

Solution: Read the tutorial for more information.

8. String concatenation

This is a different kind of pitfall. In many languages, concatenating strings with the + operator or something similar might be quite efficient. For example, in Pascal:

var S : String;
for I := 1 to 10000 do begin
S := S + Something(I);
end;

(This piece of code assumes a string type of more than 255 characters, which was the maximum in Turbo Pascal, aside... ;-)

Similar code in Python is likely to be highly inefficient. Since Python strings are immutable (as opposed to Pascal strings), a new string is created for every iteration (and old ones are thrown away). This may result in unexpected performance hits. Using string concatenation with + or += is OK for small changes, but it's usually not recommended in a loop.

Solution: If at all possible, create a list of values, then use string.join (or the join() method) to glue them together as one long string. Sometimes this can result in dramatic speedups.

To illustrate this, a simple benchmark. (timeit is a simple function that runs another function and returns how long it took to complete, in seconds.)

>>> def f():
...     s = ""
...     for i in range(100000):
...         s = s + "abcdefg"[i % 7]
...
>>> timeit(f)
23.7819999456
>>> def g():
...     z = []
...     for i in range(100000):
...         z.append("abcdefg"[i % 7])
...     return ''.join(z)
...
>>> timeit(g)
0.343000054359

Update: This was fixed in CPython 2.4. According to the What's New in Python 2.4 page: "String concatenations in statements of the form s = s + "abc" and s += "abc" are now performed more efficiently in certain circumstances. This optimization won't be present in other Python implementations such as Jython, so you shouldn't rely on it; using the join() method of strings is still recommended when you want to efficiently glue a large number of strings together."

9. Binary mode for files

Or rather, it's *not* using binary mode that can cause confusion. Some operating systems, like Windows, distinguish between binary files and text files. To illustrate this, files in Python can be opened in binary mode or text mode:

f1 = open(filename, "r")  # text
f2 = open(filename, "rb") # binary

In text mode, lines may be terminated by any newline/carriage return character (/n, /r, or /r/n). Binary mode does not do this. Also, on Windows, when reading from a file in text mode, newlines are represented by Python as /n (universal); in binary mode, it's /r/n. Reading a piece of data may therefore yield very different results in these modes.

There are also systems that don't have the text/binary distinction. On Unix, for example, files are always opened in binary mode. Because of this, some code written on Unix may open a file in mode 'r', which has different results when run on Windows. Or, someone coming from Unix may use the 'r' flag on Windows, and be puzzled about the results.

Solution: Use the correct flags -- 'r' for text mode (even on Unix), 'rb' for binary mode.

10. Catching multiple exceptions

Sometimes you want to catch multiple exception in one except clause. An obvious idiom seems to be:

try:
...something that raises an error...
except IndexError, ValueError:
# expects to catch IndexError and ValueError
# wrong!

This doesn't work though... the reason becomes clear when comparing this to:

>>> try:
...     1/0
... except ZeroDivisionError, e:
...     print e
...
integer division or modulo by zero

The first "argument" in the except clause is the exception class, the second one is an optional name, which will be used to bind the actual exception instance that has been raised. So, in the erroneous code above, the except clause catches an IndexError, and binds the name ValueError to the exception instance. Probably not what we want. ;-)

This works better:

try:
...something that raises an error...
except (IndexError, ValueError):
# does catch IndexError and ValueError

Solution: When catching multiple exceptions in one except clause, use parentheses to create a tuple with exceptions.