Writing Idiomatic Python (3.x)
Writing Idiomatic Python (3.x)
Jeff Knupp
2013
i
ii
No part of this book may be reproduced in any form or by any electronic or mechanical means without permission in
writing from the author.
For my girls, Elissa and Alexandra. Thanks for putting up with your husband/daddy living at Starbucks every weekend.
-- Love, Dad
December, 2013
iii
Preface
While I'm not usually one for aphorisms, this one strikes a chord with me. Maybe it's because I've spent my professional
career writing software at huge companies, but I have yet to inherit code that didn't eventually cause me to curse the
original author at some point. Everyone (besides you, of course, dear reader) struggles to write code that's easy to
maintain. When Python became popular, many thought that, because of its terseness, it would naturally lead to more
maintainable software.
Alas, maintainability is not an emergent property of using an expressive language. Badly written Python code is just
as unmaintainable as badly written C++, Perl, Java and all the rest of the languages known for their, ahem, readability.
Terse code is not a free lunch.
So what do we do? Resign ourselves to maintaining code we can't understand? Rant on Twitter and The Daily WTF
about the awful code we have to work on? What must we do to stop the pain?
It's that simple. Idioms in a programming language are a sort of lingua franca to let future readers know exactly what
we're trying to accomplish. We may document our code extensively, write exhaustive unit tests, and hold code reviews
three times a day, but the fact remains: when someone else needs to make changes, the code is king. If that someone
is you, all the documentation in the world won't help you understand unreadable code. After all, how can you even be
sure the code is doing what the documentation says?
We're usually reading someone else's code because there's a problem. But idiomatic code helps here, too. Even if it's
wrong, when code is written idiomatically, it's far easier to spot bugs. Idiomatic code reduces the cognitive load on
the reader. After learning a language's idioms, you'll spend less time wondering ``Wait, why are they using a named
tuple there'' and more time understanding what the code actually does.
iv
PREFACE v
After you learn and internalize a language's idioms, reading the code of a like-minded developer feels like speed
reading. You're no longer stopping at every line, trying to figure out what it does while struggling to keep in mind what
came before. Instead, you'll find yourself almost skimming the code, thinking things like `OK, open a file, transform
the contents to a sorted list, generate the giant report in a thread-safe way.' When you have that level of insight into
code someone else wrote, there's no bug you can't fix and no enhancement you can't make.
All of this sounds great, right? There's only one catch: you have to know and use a language's idioms to benefit. Enter
Writing Idiomatic Python. What started as a hasty blog post of idioms (fueled largely by my frustration while fixing the
code of experienced developers new to Python) is now a full-fledged eBook.
I hope you find the book useful. It is meant to be a living document, updated in near-real time with corrections,
clarifications, and additions. If you find an error in the text or have difficulty deciphering a passage, please feel free
to email me at jeff@jeffknupp.com. With their permission, I'll be adding the names of all who contribute bug fixes and
clarifications to the appendix.
Cheers,
Jeff Knupp
January, 2013
Change List
– Previously, the use of print() as a function in the Python 2.7+ edition code samples was a source of
confusion as the import statement enabling it was omitted for brevity
vi
CHANGE LIST vii
• ``Conventions'' section outlining a number of conventions used in the book. Specifically, those listed were
identified by readers as a source of confusion.
• 11 new idioms and 3 new sections in total
– New idioms have been added across the board in a number of different sections. Many of these are quite
basic and thus important idioms to understand.
– New Idiom: ``Chain comparisons to make if statements more concise''
– New Idiom: ``Use if and else as a short ternary operator replacement''
– New Idiom: ``Learn to treat functions as values''
– New Idiom: ``Use return to evaluate expressions as well as return values''
– New Idiom: ``Use return to evaluate expressions in addition to return values''
– New Idiom: ``Learn to use `keyword arguments' properly
– New Idiom: ``Make use of appropriate `assert' methods in unit tests''
– New Idiom: ``Use a try block to determine if a package is available''
– New Idiom: ``Use tuples to organize a long list of modules to import''
– New Idiom: ``Make your Python scripts directly executable''
– New Idiom: ``Use sys.argv to reference command line parameters''
– New Idiom: ``Use ord to get the ASCII code of a character and chr to get the character from an ASCII
code''
– New Idiom: ``Make use of negative indices''
– New Idiom: ``Prefer list comprehensions to the built-in map() and filter() functions''
– New Idiom: ``Use the built-in function sum to calculate the sum of a list of values''
CHANGE LIST viii
– New Idiom: ``Use all to determine if all elements of an iterable are True''
– New Idiom: ``Use __repr__ for a machine-readable representation of a class''
– New Idiom: ``Use the isinstance function to determine the type of an object''
– New Idiom: ``Use multiple assignment to condense variables all set to the same value''
– New Idiom: ``Use the isinstance function to determine the type of an object''
This book adopts a number of conventions for convenience and readability purposes. A few in particular bear men-
tioning explicitly to clear up any confusion:
• Each idiom that includes a sample shows both the idiomatic way of implementing the code as well as the ``harm-
ful'' way. In many cases, the code listed as ``harmful'' is not harmful in the sense that writing code in that manner
will cause problems. Rather, it is simply an example of how one might write the same code non-idiomatically.
You may use the ``harmful'' examples as templates to search for in your own code. When you find code like
that, replace it with the idiomatic version.
• print() is used as a function in both editions of the book. In the 2.7+ edition, there is an idiom devoted to
using print in this way through the statement from __future__ import print_function. In all other
code samples, this import statement is omitted for brevity.
• In some code samples, PEP-8 and/or PEP-257 are violated to accommodate formatting limitations or for brevity.
In particular, most functions in code samples do not contain docstrings. Here again, the book has an explicit
idiom regarding the consistent use of docstrings; they are omitted for brevity. This may change in future versions
• All code samples, if they were part of a stand-alone script, would include an if __name__ == '__main__'
statement and a main() function, as described by the idioms ``Use the if __name__ == '__main__'
pattern to allow a file to be both imported and run directly'' and ``Use sys.exit in your script to return proper
error codes''. These statements are omitted in code samples for brevity.
ix
Contents
Dedication iii
Preface iv
Change List vi
Version 1.1, February 2, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Version 1.2, February 17, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Version 1.3, June 16, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
Version 1.4, November 22, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Version 1.5, December 13, 2013 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
Conventions ix
Contents x
x
CONTENTS xi
2.7 Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.7.1 Use the isinstance function to determine the type of an object . . . . . . . . . . . . . . . . 53
2.7.2 Use leading underscores in function and variable names to denote ``private'' data . . . . . . . 55
2.7.3 Use properties to ``future-proof'' your class implementation . . . . . . . . . . . . . . . . . 59
2.7.4 Use __repr__ for a machine-readable representation of a class . . . . . . . . . . . . . . . . 60
2.7.5 Define __str__ in a class to show a human-readable representation . . . . . . . . . . . . . . 62
2.8 Context Managers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
2.8.1 Use a context manager to ensure resources are properly managed . . . . . . . . . . . . . . 63
2.9 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.9.1 Prefer a generator expression to a list comprehension for simple iteration . . . . . . 64
2.9.2 Use a generator to lazily load infinite sequences . . . . . . . . . . . . . . . . . . . . . . . . 65
4 General Advice 89
CONTENTS xiii
5 Contributors 99
Chapter 1
1.1 If Statements
1.1.1.1 Harmful
1.1.1.2 Idiomatic
if x <= y <= z:
return True
1
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 2
1.1.2 Avoid placing conditional branch code on the same line as the colon
Using indentation to indicate scope (like you already do everywhere else in Python) makes it easy to determine what
will be executed as part of a conditional statement. if, elif, and else statements should always be on their own
line. No code should follow the :.
1.1.2.1 Harmful
name = 'Jeff'
address = 'New York, NY'
if name: print(name)
print(address)
1.1.2.2 Idiomatic
name = 'Jeff'
address = 'New York, NY'
if name:
print(name)
print(address)
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 3
1.1.3.1 Harmful
is_generic_name = False
name = 'Tom'
if name == 'Tom' or name == 'Dick' or name == 'Harry':
is_generic_name = True
1.1.3.2 Idiomatic
name = 'Tom'
is_generic_name = name in ('Tom', 'Dick', 'Harry')
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 4
• None
• False
• zero for numeric types
• empty sequences
• empty dictionaries
• a value of 0 or False returned when either __len__ or __nonzero__ is called
Everything else is considered True (and thus most things are implicitly True). The last condition for determining
False, by checking the value returned by __len__ or __nonzero__, allows you to define how ``truthiness'' should
work for any class you create.
if statements in Python make use of ``truthiness'' implicitly, and you should too. Instead of checking if a variable
foo is True like this
if foo == True:
There are a number of reasons for this. The most obvious is so that if your code changes and foo becomes an int
instead of True or False, your if statement still works against checks for zero. But at a deeper level, the reasoning
is based on the difference between equality and identity. Using == determines if two objects have the same
value (as defined by their _eq attribute). Using is determines if the two objects are actually the same underlying
object.
Note that while there are cases where is works as if it were comparing for equality, these are special cases and
shouldn't be relied upon.
As a consequence, avoid comparing directly to False and None and empty sequences like [], {}, and (). If a list
named my_list is empty, calling if my_list: will evaluate to False.
There are times, however, when comparing directly to None is not just recommended, but required. A function checking
if an argument whose default value is None was actually set must compare directly to None like so:
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 5
What's wrong with if position:? Well, if someone wanted to insert into position 0, the function would act as if
position hadn't been set, since 0 evaluates to False. Note the use of is not: comparisons against None (a singleton
in Python) should always use is or is not, not == (from PEP8).
1.1.4.1 Harmful
def number_of_evil_robots_attacking():
return 10
def should_raise_shields():
# "We only raise Shields when one or more giant robots attack,
# so I can just return that value..."
return number_of_evil_robots_attacking()
if should_raise_shields() == True:
raise_shields()
print('Shields raised')
else:
print('Safe! No giant robots attacking')
1.1.4.2 Idiomatic
def number_of_evil_robots_attacking():
return 10
def should_raise_shields():
# "We only raise Shields when one or more giant robots attack,
# so I can just return that value..."
return number_of_evil_robots_attacking()
if should_raise_shields():
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 6
raise_shields()
print('Shields raised')
else:
print('Safe! No giant robots attacking')
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 7
1.1.5.1 Harmful
foo = True
value = 0
if foo:
value = 1
print(value)
1.1.5.2 Idiomatic
foo = True
print(value)
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 8
1.2.1 Use the enumerate function in loops instead of creating an ``index'' variable
Programmers coming from other languages are used to explicitly declaring a variable to track the index of a container
in a loop. For example, in C++:
1.2.1.1 Harmful
1.2.1.2 Idiomatic
1.2.2.1 Harmful
1.2.2.2 Idiomatic
In the scenario below, we are running a report to check if any of the email addresses our users registered are malformed
(users can register multiple addresses). The idiomatic version is more concise thanks to not having to deal with the
has_malformed_email_address flag. What's more, even if another programmer wasn't familiar with the for ...
else idiom, our code is clear enough to teach them.
1.2.3.1 Harmful
1.2.3.2 Idiomatic
1.3 Functions
1.3.1 Avoid using a mutable object as the default value for a function argument
When the Python interpreter encounters a function definition, default arguments are evaluated to determine their
value. This evaluation, however, occurs only once. Calling the function does not trigger another evaluation of the
arguments. Since the computed value is used in all subsequent calls to the function, using a mutable object as a
default value often yields unintended results.
A mutable object is one whose value can be changed directly. list, dict, set, and most class instances are mutable.
We can mutate a list by calling append on it. The object has been changed to contain the appended element.
Immutable objects, by contrast, can not be altered after they are created. string, int, and tuple objects are all
examples of immutable objects. We can't directly change the value of a string, for example. All string operations
that would alter a string instead return new string objects.
So why does this matter for default arguments? Recall that the value initially computed for a default argument is
reused each time the function is called. For immutable objects like strings, this is fine, since there is no way for us
to change that value directly. For mutable objects, however, changing the value of a default argument will be reflected
in subsequent calls to the function. In the example below (taken from the official Python tutorial), an empty list is
used as a default argument value. If the function adds an element to that list, the list argument will still contain
that element the next time the function is called. The list argument is not ``reset'' to an empty list; rather, the same
list object is used for every call to that function.
1.3.1.1 Harmful
print(f(1))
print(f(2))
print(f(3))
# [1, 2, 3]
1.3.1.2 Idiomatic
print(f(1))
print(f(2))
print(f(3))
Knowing this, it's simple and more concise to simply return the result of an expression rather than creating a new
variable to hold that result only to return the variable on the next line.
1.3.2.1 Harmful
1.3.2.2 Idiomatic
This is easily accomplished through the use of keyword arguments. keywords arguments are distinguished from
``normal'' arguments by the presence of an = and a default value. In the case of print, the definition of the function
would have a signature similar to the following: def print(*values, sep=' '). *values are the values you'd
like to print. sep is a keyword argument with a default value of ` `.
The most useful property of keyword arguments is the fact that they are optional in every function call. Thus, it's
possible to use keyword arguments to add additional information in special cases and the default value in the normal
case. If they were required for all calls, you would always need to supply a value (which would almost always be `)
forsep`, a pointless burden.
1.3.3.1 Harmful
1.3.3.2 Idiomatic
The idiom is also useful when maintaining backwards compatibility in an API. If our function accepts arbitrary argu-
ments, we are free to add new arguments in a new version while not breaking existing code using fewer arguments.
As long as everything is properly documented, the ``actual'' parameters of a function are not of much consequence.
Of course, that's not to say that we should simply stop using named parameters in functions. Indeed, this should be
our default. There are, however, a number of situations where the use of *args and **kwargs is useful or necessary.
1.3.4.1 Harmful
def so_many_options():
# I can tack on new parameters, but only if I make
# all of them optional...
def make_api_call(foo, bar, baz, qux=None, foo_polarity=None,
baz_coefficient=None, quux_capacitor=None,
bar_has_hopped=None, true=None, false=None,
file_not_found=None):
# ... and so on ad infinitum
return file_not_found
def version_graveyard():
# ... or I can create a new function each time the signature
# changes.
def make_api_call_v2(foo, bar, baz, qux):
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 16
def make_api_call_v4(
foo, bar, baz, qux, foo_polarity, baz_coefficient):
return make_api_call_v3(
foo, bar, baz, qux, foo_polarity) * baz_coefficient
def make_api_call_v5(
foo, bar, baz, qux, foo_polarity,
baz_coefficient, quux_capacitor):
# I don't need 'foo', 'bar', or 'baz' anymore, but I have to
# keep supporting them...
return baz_coefficient * quux_capacitor
def make_api_call_v6(
foo, bar, baz, qux, foo_polarity, baz_coefficient,
quux_capacitor, bar_has_hopped):
if bar_has_hopped:
baz_coefficient *= -1
return make_api_call_v5(foo, bar, baz, qux,
foo_polarity, baz_coefficient,
quux_capacitor)
def make_api_call_v7(
foo, bar, baz, qux, foo_polarity, baz_coefficient,
quux_capacitor, bar_has_hopped, true):
return true
def make_api_call_v8(
foo, bar, baz, qux, foo_polarity, baz_coefficient,
quux_capacitor, bar_has_hopped, true, false):
return false
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 17
def make_api_call_v9(
foo, bar, baz, qux, foo_polarity, baz_coefficient,
quux_capacitor, bar_has_hopped,
true, false, file_not_found):
return file_not_found
1.3.4.2 Idiomatic
def new_hotness():
def make_api_call(foo, bar, baz, *args, **kwargs):
# Now I can accept any type and number of arguments
# without worrying about breaking existing code.
baz_coefficient = kwargs['the_baz']
1.3.5.1 Harmful
def print_addition_table():
for x in range(1, 3):
for y in range(1, 3):
print(str(x + y) + '\n')
def print_subtraction_table():
for x in range(1, 3):
for y in range(1, 3):
print(str(x - y) + '\n')
def print_multiplication_table():
for x in range(1, 3):
for y in range(1, 3):
print(str(x * y) + '\n')
def print_division_table():
for x in range(1, 3):
for y in range(1, 3):
print(str(x / y) + '\n')
print_addition_table()
print_subtraction_table()
print_multiplication_table()
print_division_table()
1.3.5.2 Idiomatic
import operator as op
def print_table(operator):
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 19
1.4 Exceptions
Because (in other languages) deciding when to raise an exception is partly a matter of taste (and thus experience),
novices tend to overuse them. This overuse of exceptions leads to a number of problems: the control flow of a program
is more difficult to follow, they create a burden on calling code when allowed to propagate up a call chain, and in many
languages they impose a stiff performance penalty. These facts have led to a general vilification of exceptions. Many
organizations have explicit coding standards that forbid their use (see, for example, Google's official C++ Style Guide
).
Python takes a different view. Exceptions can be found in almost every popular third-party package, and the Python
standard library makes liberal use of them. In fact, exceptions are built into fundamental parts of the language itself.
For example, did you know that any time you use a for loop in Python, you're using exceptions?
That may sound odd, but it's true: exceptions are used for control flow throughout the Python language. Have you
ever wondered how for loops know when to stop? For things like lists that have an easily determined length the
question seems trivial. But what about generators, which could produce values ad infinitum?
Any time you use for to iterate over an iterable (basically, all sequence types and anything that defines
__iter__() or __getitem__()), it needs to know when to stop iterating. Take a look at the code below:
#!py
words = ['exceptions', 'are', 'useful']
for word in words:
print(word)
How does for know when it's reached the last element in words and should stop trying to get more items? The answer
may surprise you: the list raises a StopIteration exception.
In fact, all iterables follow this pattern. When a for statement is first evaluated, it calls iter() on the object being
iterated over. This creates an iterator for the object, capable of returning the contents of the object in sequence.
For the call to iter() to succeed, the object must either support the iteration protocol (by defining __iter__()) or
the sequence protocol (by defining __getitem__()).
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 21
As it happens, both the __iter__() and __getitem__() functions are required to raise an exception when the
items to iterate over are exhausted. __iter__() raises the StopIteration exception, as discussed earlier, and
__getitem__() raises the IndexError exception. This is how for knows when to stop.
So whenever you're wondering if it's OK to use exceptions in Python, just remember this: for all but the most trivial
programs, you're probably using them already.
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 22
Code written in this manner must ask a number of different questions before it is convinced it's OK to do something.
More importantly, once all questions have been answered to its satisfaction, the code assumes whatever it is about to
do will succeed.
The if statements littered throughout the function give both the programmer and readers of the code a false sense
of security. The programmer has checked everything she can think of that would prevent her code from working,
so clearly nothing can go wrong, right? Someone reading the code would similarly assume that, with all those if
statements, the function must handle all possible error conditions. Calling it shouldn't require any error handling.
Code written in this style is said to be written in a ``Look Before You Leap (LBYL)'' style. Every (thought of) pre-
condition is explicitly checked. There's an obvious problem with this approach: if the code doesn't ask all of the right
questions, bad things happen. It's rarely possible to anticipate everything that could go wrong. What's more, as the
Python documentation astutely points out, code written in this style can fail badly in a multi-threaded environment. A
condition that was true in an if statement may be false by the next line.
Alternatively, code written according to the principle, ``[It's] Easier to Ask for Forgiveness than Permission (EAFP),''
assumes things will go well and catches exceptions if they don't. It puts the code's true purpose front-and-center,
increasing clarity. Rather than seeing a string of if statements and needing to remember what each checked before
you even know what the code wants to do, EAFP-style code presents the end goal first. The error handling code that
follows is easier to read; you already know the operation that could have failed.
1.4.2.1 Harmful
def get_log_level(config_dict):
if 'ENABLE_LOGGING' in config_dict:
if config_dict['ENABLE_LOGGING'] != True:
return None
elif not 'DEFAULT_LOG_LEVEL' in config_dict:
return None
else:
return config_dict['DEFAULT_LOG_LEVEL']
else:
return None
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 23
1.4.2.2 Idiomatic
def get_log_level(config_dict):
try:
if config_dict['ENABLE_LOGGING']:
return config_dict['DEFAULT_LOG_LEVEL']
except KeyError:
# if either value wasn't present, a
# KeyError will be raised, so
# return None
return None
CHAPTER 1. CONTROL STRUCTURES AND FUNCTIONS 24
Exceptions have tracebacks and messages for a reason: to aid in debugging when something goes wrong. If you
``swallow'' an exception with a bare except clause, you suppress genuinely useful debugging information. If you
need to know whenever an exception occurs but don't intend to deal with it (say for logging purposes), add a bare
raise to the end of your except block. The bare raise re-raises the exception that was caught. This way, your
code runs and the user still gets useful information when something goes wrong.
Of course, there are valid reasons one would need to ensure some block of code never generates an exception. Almost
none of the idioms described in this book are meant to be mechanically followed: use your head.
1.4.3.1 Harmful
import requests
def get_json_response(url):
try:
r = requests.get(url)
return r.json()
except:
print('Oops, something went wrong!')
return None
1.4.3.2 Idiomatic
import requests
def get_json_response(url):
return requests.get(url).json()
2.1 Variables
2.1.1 Use multiple assignment to condense variables all set to the same value
Python supports multiple assignment, in which a number of variables are all set to the same variable (rather than each
assignment being on its own line). This is one case where concision aids readability rather than hampering it.
2.1.1.1 Harmful
x = 'foo'
y = 'foo'
z = 'foo'
2.1.1.2 Idiomatic
x = y = z = 'foo'
26
CHAPTER 2. WORKING WITH DATA 27
2.1.2 Avoid using a temporary variable when performing a swap of two values
There is no reason to swap using a temporary variable in Python. We can use tuples to make our intention more clear.
2.1.2.1 Harmful
foo = 'Foo'
bar = 'Bar'
temp = foo
foo = bar
bar = temp
2.1.2.2 Idiomatic
foo = 'Foo'
bar = 'Bar'
(foo, bar) = (bar, foo)
CHAPTER 2. WORKING WITH DATA 28
2.2 Strings
2.2.1 Chain string functions to make a simple series of transformations more clear
When applying a simple sequence of transformations on some datum, chaining the calls in a single expression is often
more clear than creating a temporary variable for each step of the transformation. Too much chaining, however, can
make your code harder to follow. ``No more than three chained functions'' is a good rule of thumb.
2.2.1.1 Harmful
2.2.1.2 Idiomatic
2.2.2 Use ''.join when creating a single string for list elements
It's faster, uses less memory, and you'll see it everywhere anyway. Note that the two quotes represent the delimiter
between list elements in the string we're creating. '' just means we wish to concatenate the elements with no
characters between them.
2.2.2.1 Harmful
2.2.2.2 Idiomatic
2.2.3 Use ord to get the ASCII code of a character and ord to get the character from
an ASCII code
It's often useful to be able to make use of the ASCII value of a character (in string hashing, for example). It is likewise
useful to be able to translate in the ``other direction'', from ASCII code to character.
Python has two oft-overlooked built-in functions, chr and ord, that are used to perform these translations.
2.2.3.1 Harmful
hash_value = 0
character_hash = {
'a': 97,
'b': 98,
'c': 99,
# ...
'y': 121,
'z': 122,
}
for e in some_string:
hash_value += character_hash[e]
return hash_value
2.2.3.2 Idiomatic
hash_value = 0
for e in some_string:
hash_value += ord(e)
return hash_value
CHAPTER 2. WORKING WITH DATA 31
The clearest and most idiomatic way to format strings is to use the format function. Like old-style formatting, it takes
a format string and replaces placeholders with values. The similarities end there, though. With the format function,
we can use named placeholders, access their attributes, and control padding and string width, among a number of
other things. The format function makes string formatting clean and concise.
2.2.4.1 Harmful
def get_formatted_user_info_worst(user):
# Tedious to type and prone to conversion errors
return 'Name: ' + user.name + ', Age: ' + \
str(user.age) + ', Sex: ' + user.sex
def get_formatted_user_info_slightly_better(user):
# No visible connection between the format string placeholders
# and values to use. Also, why do I have to know the type?
# Don't these types all have __str__ functions?
return 'Name: %s, Age: %i, Sex: %c' % (
user.name, user.age, user.sex)
2.2.4.2 Idiomatic
def get_formatted_user_info(user):
# Clear and concise. At a glance I can tell exactly what
# the output should be. Note: this string could be returned
# directly, but the string itself is too long to fit on the
# page.
output = 'Name: {user.name}, Age: {user.age}, Sex: {user.sex}'.format(user=user)
return output
CHAPTER 2. WORKING WITH DATA 32
2.3 Lists
There are also (usually) performance benefits to using a list comprehension (or alternately, a generator
expression) due to optimizations in the cPython interpreter.
2.3.1.1 Harmful
some_other_list = range(10)
some_list = list()
for element in some_other_list:
if is_prime(element):
some_list.append(element + 5)
2.3.1.2 Idiomatic
some_other_list = range(10)
some_list = [element + 5
for element in some_other_list
if is_prime(element)]
CHAPTER 2. WORKING WITH DATA 33
2.3.2.1 Harmful
def get_suffix(word):
word_length = len(word)
return word[word_length - 2:]
2.3.2.2 Idiomatic
def get_suffix(word):
return word[-2:]
CHAPTER 2. WORKING WITH DATA 34
2.3.3 Prefer list comprehensions to the built-in map() and filter() functions.
Python is a language that has evolved over time. As such, it still has vestiges of its past self lingering in the language
proper. One example is the map and filter functions. While there are times when the use of these functions is
appropriate, almost all usage can (and should) be replaced by a list comprehension. List comprehensions are both
more concise and more readable, a winning combination in my, erm…, book.
2.3.3.1 Harmful
def is_odd(number):
return number % 2 == 1
2.3.3.2 Idiomatic
2.3.4 Use the built-in function sum to calculate the sum of a list of values
This may seem like a surprising idiom to those who use sum regularly. What would be more surprising is how often
novice Python developers re-invent the sum function. If a function is built-in, it's usually best to use it rather than
rolling your own version.
2.3.4.1 Harmful
2.3.4.2 Idiomatic
2.3.5.1 Harmful
def contains_zero(iterable):
for e in iterable:
if e == 0:
return True
return False
2.3.5.2 Idiomatic
def contains_zero(iterable):
# 0 is "Falsy," so this works
return not all(iterable)
CHAPTER 2. WORKING WITH DATA 37
2.3.6.1 Harmful
2.3.6.2 Idiomatic
2.4 Dictionaries
The naive alternative in Python is to write a series of if...else statements. This gets old quickly. Thankfully,
functions are first-class objects in Python, so we can treat them the same as any other variable. This is a very powerful
concept, and many other powerful concepts use first-class functions as a building block.
So how does this help us with switch...case statements? Rather than trying to emulate the exact functionality, we
can take advantage of the fact that functions are first-class object and can be stored as values in a dict. Returning
to the calculator example, storing the string operator (e.g. ``+'') as the key and it's associated function as the value,
we arrive at a clear, readable way to achieve the same functionality as switch...case.
This idiom is useful for more than just picking a function to dispatch using a string key. It can be generalized to anything
that can be used as a dict key, which in Python is just about everything. Using this method, one could create a Factory
class that chooses which type to instantiate via a parameter. Or it could be used to store states and their transitions
when building a state machine. Once you fully appreciate the power of ``everything is an object'', you'll find elegant
solutions to once-difficult problems.
2.4.1.1 Harmful
2.4.1.2 Idiomatic
2.4.2.1 Harmful
log_severity = None
if 'severity' in configuration:
log_severity = configuration['severity']
else:
log_severity = 'Info'
2.4.2.2 Idiomatic
2.4.3.1 Harmful
user_email = {}
for user in users_list:
if user.email:
user_email[user.name] = user.email
2.4.3.2 Idiomatic
2.5 Sets
Don't worry; you don't need a degree in math to understand or use sets. You just need to remember a few simple
operations:
Symmetric Difference The set of elements in either A or B, but not both A and B (written A ˆ B in Python).
When working with lists of data, a common task is finding the elements that appear in all of the lists. Any time you
need to choose elements from two or more sequences based on properties of sequence membership, look to use a
set.
Below, we'll explore some typical examples.
2.5.1.1 Harmful
def get_both_popular_and_active_users():
# Assume the following two functions each return a
# list of user names
most_popular_users = get_list_of_most_popular_users()
most_active_users = get_list_of_most_active_users()
popular_and_active_users = []
for user in most_active_users:
if user in most_popular_users:
popular_and_active_users.append(user)
CHAPTER 2. WORKING WITH DATA 43
return popular_and_active_users
2.5.1.2 Idiomatic
def get_both_popular_and_active_users():
# Assume the following two functions each return a
# list of user names
return(set(
get_list_of_most_active_users()) & set(
get_list_of_most_popular_users()))
CHAPTER 2. WORKING WITH DATA 44
2.5.2.1 Harmful
users_first_names = set()
for user in users:
users_first_names.add(user.first_name)
2.5.2.2 Idiomatic
Continuing the example, we may have an existing display function that accepts a sequence and displays its el-
ements in one of many formats. After creating a set from our original list, will we need to change our display
function?
Nope. Assuming our display function is implemented reasonably, our set can be used as a drop-in replacement for
a list. This works thanks to the fact that a set, like a list, is an Iterable and can thus be used in a for loop,
list comprehension, etc.
2.5.3.1 Harmful
unique_surnames = []
for surname in employee_surnames:
if surname not in unique_surnames:
unique_surnames.append(surname)
2.5.3.2 Idiomatic
unique_surnames = set(employee_surnames)
2.6 Tuples
When working with tuples in this way, each index in the tuple has a specific meaning. In our database example
(where a tuple represents a single result row) each index corresponds to a specific column. Writing code that forces
you to remember which index corresponds to which column (i.e. result[3] is the salary column) is confusing and
error prone.
Luckily, the collections module has an elegant solution: collections.namedtuple. A namedtuple is a normal
tuple with a few extra capabilities. Most importantly, namedtuples give you the ability to access fields by names
rather than by index.
collections.namedtuple is a powerful tool for increasing the readability and maintainability of code. The example
above was but one of the many cases where collections.namedtuple is useful.
2.6.1.1 Harmful
2.6.1.2 Idiomatic
def print_employee_information(db_connection):
db_cursor = db_connection.cursor()
results = db_cursor.execute('select * from employees').fetchall()
for row in results:
employee = EmployeeRow._make(row)
2.6.2.1 Harmful
2.6.2.2 Idiomatic
2.6.3.1 Harmful
2.6.3.2 Idiomatic
This is one of those patterns you'll see all the time when reading Python code in the standard library or third-party
packages. It's also an idiom that sounds obvious when you hear or see it, but is not something those new to the
language typically divine on their own.
2.6.4.1 Harmful
STATS_FORMAT = """Statistics:
Mean: {mean}
Median: {median}
Mode: {mode}"""
def calculate_mean(value_list):
return float(sum(value_list) / len(value_list))
def calculate_median(value_list):
return value_list[int(len(value_list) / 2)]
def calculate_mode(value_list):
return Counter(value_list).most_common(1)[0][0]
print(STATS_FORMAT.format(mean=mean, median=median,
mode=mode))
CHAPTER 2. WORKING WITH DATA 52
2.6.4.2 Idiomatic
STATS_FORMAT = """Statistics:
Mean: {mean}
Median: {median}
Mode: {mode}"""
def calculate_staistics(value_list):
mean = float(sum(value_list) / len(value_list))
median = value_list[int(len(value_list) / 2)]
mode = Counter(value_list).most_common(1)[0][0]
return (mean, median, mode)
2.7 Classes
2.7.1.1 Harmful
def get_size(some_object):
"""Return the "size" of *some_object*, where size = len(some_object) for
sequences, size = some_object for integers and floats, and size = 1 for
True, False, or None."""
try:
return len(some_object)
except TypeError:
if some_object in (True, False, type(None)):
return 1
else:
return int(some_object)
print(get_size('hello'))
print(get_size([1, 2, 3, 4, 5]))
print(get_size(10.0))
2.7.1.2 Idiomatic
def get_size(some_object):
if isinstance(some_object, (list, dict, str, tuple)):
return len(some_object)
CHAPTER 2. WORKING WITH DATA 54
print(get_size('hello'))
print(get_size([1, 2, 3, 4, 5]))
print(get_size(10.0))
CHAPTER 2. WORKING WITH DATA 55
2.7.2 Use leading underscores in function and variable names to denote ``private''
data
All attributes of a class, be they data or functions, are inherently ``public'' in Python. A client is free to add attributes
to a class after it's been defined. In addition, if the class is meant to be inherited from, a subclass may unwittingly
change an attribute of the base class. Lastly, it's generally useful to be able to signal to users of your class that certain
portions are logically public (and won't be changed in a backwards incompatible way) while other attributes are purely
internal implementation artifacts and shouldn't be used directly by client code using the class.
A number of widely followed conventions have arisen to make the author's intention more explicit and help avoid
unintentional naming conflicts. While the following two idioms are commonly referred to as `nothing more than con-
ventions,' both of them, in fact, alter the behavior of the interpreter when used.
First, attributes to be `protected', which are not meant to be used directly by clients, should be prefixed with a single
underscore. Second, `private' attributes not meant to be accessible by a subclass should be prefixed by two under-
scores. Of course, these are (mostly) merely conventions. Nothing would stop a client from being able to access your
`private' attributes, but the convention is so widely used you likely won't run into developers that purposely choose not
to honor it. It's just another example of the Python community settling on a single way of accomplishing something.
Before, I hinted that the single and double underscore prefix were more than mere conventions. Few developers
are aware of the fact that prepending attribute names in a class does actually do something. Prepending a single
underscore means that the symbol won't be imported if the `all' idiom is used. Prepending two underscores to an
attribute name invokes Python's name mangling. This has the effect of making it far less likely someone who subclasses
your class will inadvertently replace your class's attribute with something unintended. If Foo is a class, the definition
def __bar() will be `mangled' to _classname__attributename.
2.7.2.1 Harmful
class Foo():
def __init__(self):
self.id = 8
self.value = self.get_value()
def get_value(self):
pass
def should_destroy_earth(self):
return self.id == 42
CHAPTER 2. WORKING WITH DATA 56
class Baz(Foo):
def get_value(self, some_new_parameter):
"""Since 'get_value' is called from the base class's
__init__ method and the base class definition doesn't
take a parameter, trying to create a Baz instance will
fail.
"""
pass
class Qux(Foo):
"""We aren't aware of Foo's internals, and we innocently
create an instance attribute named 'id' and set it to 42.
This overwrites Foo's id attribute and we inadvertently
blow up the earth.
"""
def __init__(self):
super(Qux, self).__init__()
self.id = 42
# No relation to Foo's id, purely coincidental
q = Qux()
b = Baz() # Raises 'TypeError'
q.should_destroy_earth() # returns True
q.id == 42 # returns True
2.7.2.2 Idiomatic
class Foo():
def __init__(self):
"""Since 'id' is of vital importance to us, we don't
want a derived class accidentally overwriting it. We'll
prepend with double underscores to introduce name
mangling.
"""
self.__id = 8
CHAPTER 2. WORKING WITH DATA 57
def get_value(self):
pass
def should_destroy_earth(self):
return self.__id == 42
class Baz(Foo):
def get_value(self, some_new_parameter):
pass
class Qux(Foo):
def __init__(self):
"""Now when we set 'id' to 42, it's not the same 'id'
that 'should_destroy_earth' is concerned with. In fact,
if you inspect a Qux object, you'll find it doesn't
have an __id attribute. So we can't mistakenly change
Foo's __id attribute even if we wanted to.
"""
self.id = 42
# No relation to Foo's id, purely coincidental
super(Qux, self).__init__()
q = Qux()
b = Baz() # Works fine now
q.should_destroy_earth() # returns False
q.id == 42 # returns True
with pytest.raises(AttributeError):
CHAPTER 2. WORKING WITH DATA 58
getattr(q, '__id')
CHAPTER 2. WORKING WITH DATA 59
2.7.3.1 Harmful
class Product():
def __init__(self, name, price):
self.name = name
# We could try to apply the tax rate here, but the object's price
# may be modified later, which erases the tax
self.price = price
2.7.3.2 Idiomatic
class Product():
def __init__(self, name, price):
self.name = name
self._price = price
@property
def price(self):
# now if we need to change how price is calculated, we can do it
# here (or in the "setter" and __init__)
return self._price * TAX_RATE
@price.setter
def price(self, value):
# The "setter" function must have the same name as the property
self._price = value
CHAPTER 2. WORKING WITH DATA 60
2.7.4.1 Harmful
class Foo():
def __init__(self, bar=10, baz=12, cache=None):
self.bar = bar
self.baz = baz
self._cache = cache or {}
def __str__(self):
return 'Bar is {}, Baz is {}'.format(self.bar, self.baz)
def log_to_console(instance):
print(instance)
2.7.4.2 Idiomatic
class Foo():
def __init__(self, bar=10, baz=12, cache=None):
self.bar = bar
self.baz = baz
self._cache = cache or {}
def __str__(self):
return '{}, {}'.format(self.bar, self.baz)
def __repr__(self):
return 'Foo({}, {}, {})'.format(self.bar, self.baz, self._cache)
CHAPTER 2. WORKING WITH DATA 61
def log_to_console(instance):
print(instance)
2.7.5.1 Harmful
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
p = Point(1, 2)
print(p)
2.7.5.2 Idiomatic
class Point():
def __init__(self, x, y):
self.x = x
self.y = y
def __str__(self):
return '{0}, {1}'.format(self.x, self.y)
p = Point(1, 2)
print(p)
There are a number of classes in the standard library that support or use a context manager. In addition, user de-
fined classes can be easily made to work with a context manager by defining __enter__ and __exit__ methods.
Functions may be wrapped with context managers through the contextlib module.
2.8.1.1 Harmful
2.8.1.2 Idiomatic
2.9 Generators
Your first instinct should be to build and iterate over the sequence in place. A list comprehension seems ideal,
but there's an even better Python built-in: a generator expression.
The main difference? A list comprehension generates a list object and fills in all of the elements immediately.
For large lists, this can be prohibitively expensive. The generator returned by a generator expression, on the
other hand, generates each element ``on-demand''. That list of uppercase user names you want to print out? Probably
not a problem. But what if you wanted to write out the title of every book known to the Library of Congress? You'd
likely run out of memory in generating your list comprehension, while a generator expression won't bat
an eyelash. A logical extension of the way generator expressions work is that you can use a them on infinite
sequences.
2.9.1.1 Harmful
2.9.1.2 Idiomatic
In both cases, generators are your friend. A generator is a special type of coroutine which returns an iterable.
The state of the generator is saved, so that the next call into the generator continues where it left off. In the
examples below, we'll see how to use a generator to help in each of the cases mentioned above.
2.9.2.1 Harmful
def get_twitter_stream_for_keyword(keyword):
"""Get's the 'live stream', but only at the moment
the function is initially called. To get more entries,
the client code needs to keep calling
'get_twitter_livestream_for_user'. Not ideal.
"""
imaginary_twitter_api = ImaginaryTwitterAPI()
if imaginary_twitter_api.can_get_stream_data(keyword):
return imaginary_twitter_api.get_stream(keyword)
current_stream = get_twitter_stream_for_keyword('#jeffknupp')
for tweet in current_stream:
process_tweet(tweet)
def get_list_of_incredibly_complex_calculation_results(data):
return [first_incredibly_long_calculation(data),
second_incredibly_long_calculation(data),
third_incredibly_long_calculation(data),
]
CHAPTER 2. WORKING WITH DATA 66
2.9.2.2 Idiomatic
def get_twitter_stream_for_keyword(keyword):
"""Now, 'get_twitter_stream_for_keyword' is a generator
and will continue to generate Iterable pieces of data
one at a time until 'can_get_stream_data(user)' is
False (which may be never).
"""
imaginary_twitter_api = ImaginaryTwitterAPI()
while imaginary_twitter_api.can_get_stream_data(keyword):
yield imaginary_twitter_api.get_stream(keyword)
def get_list_of_incredibly_complex_calculation_results(data):
"""A simple example to be sure, but now when the client
code iterates over the call to
'get_list_of_incredibly_complex_calculation_results',
we only do as much work as necessary to generate the
current item.
"""
yield first_incredibly_long_calculation(data)
yield second_incredibly_long_calculation(data)
yield third_incredibly_long_calculation(data)
Chapter 3
3.1 Formatting
3.1.1 Use all capital letters when declaring global constant values
To distinguish constants defined at the module level (or global in a single script) from imported names, use all
uppercase letters.
3.1.1.1 Harmful
seconds_in_a_day = 60 * 60 * 24
# ...
def display_uptime(uptime_in_seconds):
percentage_run_time = (
uptime_in_seconds/seconds_in_a_day) * 100
# "Huh!? Where did seconds_in_a_day come from?"
67
CHAPTER 3. ORGANIZING YOUR CODE 68
3.1.1.2 Idiomatic
SECONDS_IN_A_DAY = 60 * 60 * 24
# ...
def display_uptime(uptime_in_seconds):
percentage_run_time = (
uptime_in_seconds/SECONDS_IN_A_DAY) * 100
# "Clearly SECONDS_IN_A_DAY is a constant defined
# elsewhere in this module."
# ...
uptime_in_seconds = 60 * 60 * 24
display_uptime(uptime_in_seconds)
CHAPTER 3. ORGANIZING YOUR CODE 69
Basically everything not listed should follow the variable/function naming conventions of `Words joined by an under-
score'.
CHAPTER 3. ORGANIZING YOUR CODE 70
3.1.3.1 Harmful
3.1.3.2 Idiomatic
3.2 Documentation
Writing documentation for all of a class's public methods and everything exported by a module (including the mod-
ule itself) may seem like overkill, but there's a very good reason to do it: helping documentation tools. Third-party
software like Sphinx are able to automatically generate documentation (in formats like HTML, LaTex, man pages, etc)
from properly documented code. If you decide to use such a tool, all of your classes' public methods and all exported
functions in a module will automatically have entries in the generated documentation. If you've only written documen-
tation for half of these, the documentation is far less useful to an end user. Imagine trying to read the official Python
documentation for itertools if half of the functions listed only their signature and nothing more.
In addition, this is one of those rules that helps remove a cognitive burden on the programmer. By following this
rule, you never have to ask yourself ``does this function merit a docstring?'' or try to determine the threshold for
documentation. Just follow the rule and don't worry about it. Of course, use common sense if there's a good reason
not to write documentation for something.
The formatting rules help both documentation tools and IDEs. Using a predictable structure to your documentation
allows it to be parsed in a useful way. For example, the first line of a docstring should be a one-sentence summary.
If more lines are necessary, they are separated from the first line by a blank line. This allows documentation tools and
IDEs to present a summary of the code in question and hide more detailed documentation if it's not needed. There's
really no good reason not to follow the formatting rules (that I can think of), and you're only helping yourself by doing
so.
3.2.1.1 Harmful
def calculate_statistics(value_list):
3.2.1.2 Idiomatic
def calculate_statistics(value_list):
"""Return a tuple containing the mean, median,
and mode of a list of integers
Arguments:
value_list -- a list of integer values
"""
<The body of the function>
CHAPTER 3. ORGANIZING YOUR CODE 73
3.2.2.1 Harmful
def calculate_mean(numbers):
"""Return the mean of a list of numbers"""
3.2.2.2 Idiomatic
def calculate_mean(numbers):
"""Return the mean of a list of numbers"""
return sum(numbers) / len(numbers)
CHAPTER 3. ORGANIZING YOUR CODE 74
3.2.3.1 Harmful
def is_prime(number):
"""Mod all numbers from 2 -> number and return True
if the value is never 0"""
3.2.3.2 Idiomatic
def is_prime(number):
"""Return True if number is prime"""
3.3 Imports
Many choose to arrange the imports in (roughly) alphabetical order. Others think that's ridiculous. In reality, it doesn't
matter. What matters it that you do choose a standard order (and follow it of course).
3.3.1.1 Harmful
import os.path
import concurrent.futures
from flask import render_template
if __name__ == '__main__':
CHAPTER 3. ORGANIZING YOUR CODE 76
3.3.1.2 Idiomatic
Relative imports specify a module relative to the current module's location on the file system. If you are the module
package.sub_package.module and need to import package.other_module, you can do so using the dotted
relative import syntax: from ..other_module import foo. A single . represents the current package a
module is contained in (like in a file system). Each additional . is taken to mean ``the parent package of'', one level
per dot. Note that relative imports must use the from ... import ... style. import foo is always treated
as an absolute import.
import package.other_module (possibly with an as clause to alias the module to a shorter name.
Why, then, should you prefer absolute imports to relative? Relative imports clutter a module's namespace. By writing
from foo import bar, you've bound the name bar in your module's namespace. To those reading your code, it will
not be clear where bar came from, especially if used in a complicated function or large module. foo.bar, however,
makes it perfectly clear where bar is defined. The Python programming FAQ goes so far as to say, ``Never use relative
package imports.''
3.3.2.1 Harmful
# My location is package.sub_package.module
# and I want to import package.other_module.
# The following should be avoided:
3.3.2.2 Idiomatic
# My location is package.sub_package.another_sub_package.module
# and I want to import package.other_module.
# Either of the following are acceptable:
import package.other_module
import package.other_module as other
CHAPTER 3. ORGANIZING YOUR CODE 78
3.3.3 Do not use from foo import * to import the contents of a module.
Considering the previous idiom, this one should be obvious. Using an asterisk in an import (as in from foo import
*) is an easy way to clutter your namespace. This may even cause issues if there are clashes between names you
define and those defined in the package.
But what if you have to import a number of names from the foo package? Simple. Make use of the fact that parenthesis
can be used to group import statements. You won't have to write 10 lines of import statements from the same module,
and your namespace remains (relatively) clean.
Better yet, simply use absolute imports. If the package/module name is too long, use an as clause to shorten it.
3.3.3.1 Harmful
3.3.3.2 Idiomatic
# or even better...
import foo
CHAPTER 3. ORGANIZING YOUR CODE 79
The idiomatic way to perform this check is through the use of a try/ except block. If an exception is raised, the
package does not exist and the fallback package should be imported. If the import succeeds, it is imported using
as with some name that will be used regardless of whether the target package or backup package is used. See the
example for more details.
3.3.4.1 Harmful
import cProfile
# Uh-oh! The user doesn't have cProfile installed! Raise an exception
# here...
print(cProfile.__all__)
3.3.4.2 Idiomatic
try:
import cProfile as profiler
except:
import profile as profiler
print(profiler.__all__)
CHAPTER 3. ORGANIZING YOUR CODE 80
A little-known usage of the import statement allows for a tuple of names to be imported to be used. The standard
line-continuation rules then apply. This makes it easy to logically group a long list of import targets.
3.3.5.1 Harmful
3.3.5.2 Idiomatic
This is very commonly done in libraries and frameworks, like Flask, which aim to create simple interfaces for client
code.
3.4.1.1 Harmful
3.4.1.2 Idiomatic
# __init__.py:
#client code:
from gizmo import Gizmo, GizmoHelper
CHAPTER 3. ORGANIZING YOUR CODE 82
3.4.2 Use modules for encapsulation where other languages would use Objects
While Python certainly supports Object Oriented programming, it does not require it. Most experienced Python pro-
grammers (and programmers in general using a language that facilitates it) use classes and polymorphism rela-
tively sparingly. There are a number of reasons why.
Most data that would otherwise stored in a class can be represented using the simple list, dict, and set types.
Python has a wide variety of built-in functions and standard library modules optimized (both in design and implemen-
tation) to interact with them. One can make a compelling case that classes should be used only when necessary and
almost never at API boundaries.
In Java, classes are the basic unit of encapsulation. Each file represents a Java class, regardless of whether that makes
sense for the problem at hand. If I have a handful of utility functions, into a ``Utility'' class they go! If we don't
intuitively understand what it means to be a ``Utility'' object, no matter. Of course, I exaggerate, but the point is clear.
Once one is forced to make everything a class, it is easy to carry that notion over to other programming languages.
In Python, groups of related functions and data are naturally encapsulated in modules. If I'm using an MVC web
framework to build ``Chirp'', I may have a package named chirp with model, view, and controller modules.
If ``Chirp'' is an especially ambitious project and the code base is large, those modules could easily be packages
themselves. The controller package may have a persistence module and a processing module. Neither of
those need be related in any way other than the sense that they intuitively belong under controller.
If all of those modules became classes, interoperability immediately becomes an issue. We must carefully and precisely
determine the methods we will expose publicly, how state will be updated, and the way in which our class supports
testing. And instead of a dict or list, we have Processing and Persistence objects we must write code to
support.
Note that nothing in the description of ``Chirp'' necessitates the use of any classes. Simple import statements make
code sharing and encapsulation easy. Passing state explicitly as arguments to functions keeps everything loosely
coupled. And it becomes far easier to receive, process, and transform data flowing through our system.
To be sure, classes may be a cleaner or more natural way to represent some ``things''. In many instances, Object
Oriented Programming is a handy paradigm. Just don't make it the only paradigm you use.
CHAPTER 3. ORGANIZING YOUR CODE 83
3.5.1.1 Harmful
import sys
import os
FIRST_NUMBER = float(sys.argv[1])
SECOND_NUMBER = float(sys.argv[2])
3.5.1.2 Idiomatic
import sys
import os
first_number = float(sys.argv[1])
second_number = float(sys.argv[2])
if second_number != 0:
print(divide(first_number, second_number))
CHAPTER 3. ORGANIZING YOUR CODE 85
3.5.3.1 Harmful
if __name__ == '__main__':
import sys
3.5.3.2 Idiomatic
def main():
import sys
if len(sys.argv) < 2:
# Calling sys.exit with a string automatically
# prints the string to stderr and exits with
# a value of '1' (error)
sys.exit('You forgot to pass an argument')
argument = sys.argv[1]
result = do_stuff(argument)
if not result:
CHAPTER 3. ORGANIZING YOUR CODE 87
sys.exit(1)
# We can also exit with just the return code
do_stuff_with_result(result)
3.5.4.1 Harmful
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser(usage="my_cat.py <filename>")
parser.add_argument('filename', help='The name of the file to use')
parsed = parser.parse_args(sys.argv)
print(open(parsed['filename']).read())
3.5.4.2 Idiomatic
if __name__ == '__main__':
try:
print(open(sys.argv[1]).read())
except IndexError:
print('You forgot the file name!')
Chapter 4
General Advice
The index is fully searchable and contains both Python 2 and Python 3 based packages. Of course, not all packages
are created equal (or equally maintained), so be sure to check when the package was last updated. A package with
documentation hosted externally on a site like ReadTheDocs is a good sign, as is one for which the source is available
on a site like GitHub or Bitbucket.
Now that you found a promising looking package, how do you install it? By far the most popular tool to manage third
party packages is pip. A simple pip install <package name> will download the latest version of the package and
install it in your site-packages directory. If you need the bleeding edge version of a package, pip is also capable
of installing directly from a DVCS like git or mercurial.
If you create a package that seems generally useful, strongly consider giving back to the Python community by pub-
lishing it to PyPI. Doing so is a straightforward process, and future developers will (hopefully) thank you.
89
CHAPTER 4. GENERAL ADVICE 90
Making use of the standard library has two primary benefits. Most obviously, you save yourself a good deal of time
when you don't have to implement a piece of functionality from scratch. Just as important is the fact that those who
read or maintain your code will have a much easier time doing so if you use packages familiar to them.
Remember, the purpose of learning and writing idiomatic Python is to write clear, maintainable, and bug-free code.
Nothing ensures those qualities in your code more easily than reusing code written and maintained by core Python
developers. As bugs are found and fixed in the standard library, your code improves with each Python release without
you lifting a finger.
CHAPTER 4. GENERAL ADVICE 91
4.2.2 Use functions in the os.path module when working with directory paths
When writing simple command-line scripts, new Python programmers often perform herculean feats of string ma-
nipulation to deal with file paths. Python has an entire module dedicated to functions on path names: os.path.
Using os.path reduces the risk of common errors, makes your code portable, and makes your code much easier to
understand.
4.2.2.1 Harmful
filename_to_archive = 'test.txt'
new_filename = 'test.bak'
target_directory = './archives'
today = date.today()
os.mkdir('./archives/' + str(today))
os.rename(
filename_to_archive,
target_directory + '/' + str(today) + '/' + new_filename)
4.2.2.2 Idiomatic
current_directory = os.getcwd()
filename_to_archive = 'test.txt'
new_filename = os.path.splitext(filename_to_archive)[0] + '.bak'
target_directory = os.path.join(current_directory, 'archives')
today = date.today()
new_path = os.path.join(target_directory, str(today))
if (os.path.isdir(target_directory)):
if not os.path.exists(new_path):
os.mkdir(new_path)
os.rename(
os.path.join(current_directory, filename_to_archive),
CHAPTER 4. GENERAL ADVICE 93
os.path.join(new_path, new_filename))
CHAPTER 4. GENERAL ADVICE 94
4.3 Testing
For most, the standard library's unittest module will be sufficient. It's a fully featured and reasonably user-friendly
testing framework modeled after JUnit. Most of Python's standard library is tested using unittest, so it is quite
capable of testing reasonably large and complex projects. Among other things, it includes:
If you find the unittest module lacking in functionality or writing test code not as intuitive as you'd like, there are
a number of third-party tools available. The two most popular are nose and py.test, both freely available on PyPI.
Both are actively maintained and extend the functionality offered by unittest.
If you decide to use one of them, choosing which is largely a matter of support for the functionality required by your
project. Otherwise, it's mostly a matter of taste regarding the style of test code each packaged supports. This book,
for example, has used both tools at various points of its development, switching based on changing test requirements.
When you do make a decision, even if it's to use the unittest module, familiarize yourself with all of the capabilities
of the tool you chose. Each have a long list of useful features. The more you take advantage of these features, the less
time you'll spend inadvertently implementing a feature which you weren't aware the tool you use already supports.
CHAPTER 4. GENERAL ADVICE 95
There's no good reason to shoehorn test code and application code into the same file, but there are a number of
reasons not to. The documentation on Python's unittest module succinctly enumerates these, so I'll simply list their
reasons here:
• The test module can be run standalone from the command line.
• The test code can more easily be separated from shipped code.
• There is less temptation to change test code to fit the code it tests without a good reason.
• Test code should be modified much less frequently than the code it tests.
• Tested code can be refactored more easily.
• Tests for modules written in C must be in separate modules anyway, so why not be consistent?
• If the testing strategy changes, there is no need to change the source code.
As a general rule, if the official Python documentation strongly suggests something, it can safely be considered id-
iomatic Python.
CHAPTER 4. GENERAL ADVICE 96
To refactor code is to restructure it without changing its observable behavior. Imagine we have a function that calculates
various statistics about students' test scores and outputs the results in nicely-formatted HTML. This single function
might be refactored into two smaller functions: one to perform the calculations and the other to print the results as
HTML. After the refactoring, the resulting HTML output will be the same as before, but the structure of the code itself
was changed to increase readability. Refactoring is a deep topic and a full discussion is outside the scope of this book,
but it's likely something you'll find yourself doing often.
As you're making changes, though, how do you know if you've inadvertently broken something? And how do you know
which portion of code is responsible for the bug? Automated unit testing is your canary in the mine shaft of refactoring.
It's an early warning system that lets you know something has gone wrong. Catching bugs quickly is important; the
sooner you catch a bug, the easier it is to fix.
Unit tests are the specific type of tests helpful for this purpose. A unit test is different from other types of tests in that
it tests small portions of code in isolation. Unit tests may be written against functions, classes, or entire modules, but
they test the behavior of the code in question and no more. Thus, if the code makes database queries or network
connections, these are simulated in a controlled way by mocking those resources. The goal is to be absolutely sure
that, if a test fails, it's because of the code being tested and not due to some unrelated code or resource.
It should be clear, then, how unit tests are helpful while refactoring. Since only the internal structure of the code is
changing, tests that examine the code's output should still pass. If such a test fails, it means that the refactoring
introduced unintended behavior. Sometimes you'll need to make changes to your tests themselves, of course, de-
pending on the scope of the code changes you're making. In general, though, passing unit tests is a good litmus test
for determining if your refactoring broke anything.
Lastly, your unit tests should be automated so that running them and interpreting the results requires no thought on
your part. Just kick off the tests and make sure all the tests still pass. If you need to manually run specific tests based
on what part of the system you're changing, you run the risk of running the wrong tests or missing a bug introduced
in a portion of the code you didn't intend to affect (and thus didn't test). Like most developer productivity tools, the
purpose is to reduce the cognitive burden on the developer and increase reliability and repeatability.
CHAPTER 4. GENERAL ADVICE 97
4.3.4.1 Harmful
class Test(unittest.TestCase):
def test_adding_positive_ints(self):
"""Does adding together two positive integers work?"""
self.assertTrue(my_addition(2, 2) == 4)
def test_increment(self):
"""Does increment return a value greater than what was passed as an
argument?"""
self.assertTrue(increment(1) > 1)
def test_divisors_of_prime_number(self):
self.assertTrue(get_divisors(11) is None)
4.3.4.2 Idiomatic
class Test(unittest.TestCase):
def test_adding_positive_ints(self):
"""Does adding together two positive integers work?"""
self.assertEqual(my_addition(2, 2), 4)
def test_increment(self):
"""Does increment return a value greater than what was passed as an
argument?"""
self.assertGreaterThan(increment(1), 1)
def test_divisors_of_prime_number(self):
CHAPTER 4. GENERAL ADVICE 98
self.assertIsNone(get_divisors(11))
Chapter 5
Contributors
I actively solicit feedback on bugs, typos, grammatical and spelling errors, and unclear portions of the text. The
following awesome individuals have greatly helped improve the quality of this text by emailing me about an issue
they found.
• R. Daneel Olivaw
• Jonathon Capps
• Michael G. Lerner
• Daniel Smith
• Arne Sellmann
• Nick Van Hoogenstyn
• Kiran Gangadharan
• Brian Brechbuhl
• Mike White
• Brandon Devine
• Johannes Krampf
• Marco Kaulea
• Florian Bruhin
• Brian Forst
• Daniel J. Lauk
• Seth Mason
• Martin Falatic
• Christian Clauss
99
Index
''.join, 29 collections.defaultdict, 40
* operator, 37 collections.namedtuple, 47
**kwargs, 15 comprehension, 41
*args, 15 constants, 67
+, 31, 53 Container, 42
==, 4 context manager, 63
_, 49 context managers, 63
__enter__, 63 contextlib, 63
__exit__, 63
__getitem__(), 20, 21 default, 40
__nonzero__, 4 dict.get, 40
__str__, 62 docstring, 71
_eq, 4
elif, 2, 70
all, 36 enumerate, 8
assert, 97 except, 79
assertTrue, 97 exceptions, 24
bar, 77 False, 4
break, 10 filter, 34
filter(), 34
chr, 30 for, 10, 20, 42, 45
class, 82 format, 31
classes, 82 functional programming, 18
100
INDEX 101
identity, 4 packages, 82
if, 2--4, 7, 22, 70 PEP8, 69
if __name__ == '__main__', 83, 86 polymorphism, 82
import, 75, 77, 80, 82 positional parameters, 15
import *, 78 print, 14, 62
in, 9, 42 print_first_row, 22
IndexError, 21 properties, 59
int, 53 PyPI, 89
is, 4
isinstance, 53 relative imports, 77
iter(), 20 return, 13
Iterable, 42, 45
salary, 47
iterable, 3, 9, 20, 36, 65
sequence, 20, 37, 45, 64, 65
iterables, 20
set, 42, 44, 45, 82
iterate, 20
set comprehension, 44
iterator, 20
StopIteration, 20, 21
itertools, 91
string, 28, 29, 31, 53
modules, 82
with, 63
namedtuples, 47
None, 4