100% found this document useful (1 vote)
266 views

Software Developer Journal - Python Starter Kit (13 - 2013)

Python

Uploaded by

Tomas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
266 views

Software Developer Journal - Python Starter Kit (13 - 2013)

Python

Uploaded by

Tomas
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 89

featuring

the best of Java tech

October 28th – 30th 2013|Park Plaza Victoria, London

15%
it h
of f w code
c o u nt
dis SD
JL13

s • E x p o Area •
s i o n s • K eynote t y E v ents •
• 60+ s e s Comm u n i
k s h o p s •
H a n d s - o n wor

www.jaxlondon.com
Early Bird discount ends 15th August - £649 all access!

follow us: twitter.com/JAXlondon JAXlondon JAX London


Editor’s Note

Dear Readers,
O ur ‘Python Starter Kit’ is prepared for you to start your
adventure with Python Programming. We did our best to
collect the best authors, who will show you how to start. Editor in Chief: Karolina Rekun
[email protected]
team

Special thanks to our Beta testers and


For the very beginning we have tutorials, which will teach you Proofreaders who helped us with this
step by step how to begin. In Python Guide for Beginners by issue. Our magazine would not exist
Mohit Saxena you will be introduced to Python as a programming without your assistance and expertise.
language, Sotaya Yakubuin his Starting with Python will show you Publisher: Paweł Marciniak
the best basics to good start with Python. All you have to know to
open yourself possibilities in Django you will find in the article of Managing Director: Ewa Dudzic
Alberto Paro : Beginning with Django.
Production Director: Andrzej Kuca
[email protected]
When you catch some basics you will be ready for a bit more
advance subjects that our authors prepared for you learn how to Art. Director: Ireneusz Pogroszewski
[email protected]
make Better Django Unit Testing Using Factories instead of Fixtures
with Anton Sipos. Another thing that is good to know is a library in DTP: Ireneusz Pogroszewski
the article Python Fabric by Renato Candido. W. Matthew Wilson
will introduce you to Python logging module, after that read some Marketing Director: Ewa Dudzic
about Web Security in Python and Django by Steve Lott, interesting Publisher: Hakin9 Media SK
and pleasant tutorial about why safety is so important. 02-676 Warsaw, Poland
Postepu 17D
Phone: 1 917 338 3631
You will know more and more about Python, and thanks to http://en.sdjournal.org/
George Psarksis and his Building a console 2-player chess board
game in Python you will be able to learn Python Object-Oriented Whilst every effort has been made
Concepts Interacting with user input on the command line. If you to ensure the highest quality of
the magazine, the editors make
want to know how to write an app you should read Write a Web no warranty, expressed or implied,
App and Learn Python by Adam Nelson. concerning the results of the content’s
usage. All trademarks presented in the
magazine were used for informative
For those who feel they want more, we have excellent article purposes only.
by Yves J. Hilpisch : Efficient Data and Financial Analytics with
Python, which will make you be able to face today’s data analytics All rights to trademarks presented
in the magazine are reserved by the
challenges. companies which own them.

Our tutorial: Test-Driven Development With Python by Josh DISCLAIMER!


VanderLinden will guide you through mysteries of testing, with The techniques described in
such great commented and done guideline you will really enjoy our magazine may be used
learning. For the end get to know what Python Interetors are and in private, local networks
how Saad Bin Akhlaq is showing you its secrets. only. The editors hold no
responsibility for the misuse
I hope you will enjoy learning with us! For more take a look at our of the techniques presented or
Python Programming issue, and enjoy what Software Developer’s any data loss.
Journal prepared for you!

For staying in touch follow us on twitter and like us on facebook.

Karolina Rekun
& the SDJ Team

4 13/2013
contents

06 Python: A Guide for Beginners eventually, but encryption means all passwords are
exposed once the encryption key is available.
Mohit Saxena
Python is an easy and powerful programming language.
It has highly efficient data structures with object-oriented 46 Building a Console 2-player Chess
programming approach. Its neat syntax and dynamic typing
makes it more efficient. It is the best programming language
Board Game in Python
for rapid application development for many platforms.
George Psarakis
Python is a very powerful language particularly for writing
server-side backend scripts, although one can also use it
10 Starting Python Programming and for web development tasks through the Django framework
the Use of Docstring and dir() and it is gaining popularity in that field as well. A very
thorough and complete documentation, the huge variety
Sotaya Yakubu
of libraries and open-source projects – easily installed
Python is an interpreted language and features dynamic
with the package managers and the huge knowledge
system with an automatic memory management. It can
base in Q&A sites like StackOverflow and mailing lists are
be used as a full fledged language, or integrated as a
among the main characteristics to which the widespread
scripting language in another such as C, Java e.t.c The
use of Python can be attributed to.
language itself is not limited to a specific programming
paradigm, different styles of coding can be used in this
language such as; Imperative, Object-oriented, functional 52 Write a Web App and Learn Python
and procedural styles.
Background and Primer for Tackling
16 Beginning with Django the Django Tutorial
Alberto Paro Adam Nelson
A ‘framework’ is a set of tools and libraries that facilitates
What are the success keys for a web framework? Is it easy to
the development of a certain type of application.
use? Is it easy to deploy? Does it provide user satisfaction?
Web frameworks facilitate the development of web
Django framework is more that these answers because, in
applications by allowing languages like Python or Ruby
my opinion, is one of the few framework that is able to hit
to take advantage of standard methods to complete tasks
its goal: it “makes it easier to build better Web apps more
like interacting with HTTP payloads, or tracking users
quickly and with less code”.
throughout a site, or constructing basic HTML pages.
Leveraging this scaffolding, a developer can focus on
24 Better Django Unit Testing Using creating a web application instead of doing a deep dive
Factories Instead of Fixtures on HTTP internals and other lower-level technologies.
Anton Sipos
Unit testing is the key practice for improving software 56 Efficient Data and Financial Analytics
quality. Even though most of us agree with this in principle,
all too often when things get difficult programmers end up
with Python
skipping writing tests. We end up being pragmatic rather
Dr. Yves J. Hilpisch
Decision makers and analysts being faced with such an
than principled, especially when deadlines are involved.
environment cannot rely anymore on traditional approaches
to process data or to make decisions. In the past, these
30 Using Python Fabric to Automate areas where characterized by highly structured processes
GNU/Linux Server Configuration which were repeated regularly or when needed.

Tasks 66 Test-Driven Development With


Renato Candido
Fabric is a Python library and command-line tool for Python
automating tasks of application deployment and system Josh VanderLinden
administration via SSH. It provides tools for executing Software development is easier and more accessible now
local and remote shell commands and for transferring files than it ever has been. Unfortunately, rapid development
through SSH and SFTP, respectively. speeds offered by modern programming languages make
it easy for us as programmers to overlook the possible
36 The Python Logging Module is Much error conditions in our code and move on to other parts
of a project. Automated tests can provide us with a level
Better Than Print Statements of certainty that our code really does handle various
W. Matthew Wilso situations the way we expect it to, and these tests can
So I’m forcing myself to use logging in every script I do, save hundreds upon thousands of man-hours over the
no matter how trivial it is, so I can getcomfortable with course of a project’s development lifecycle.
the python standard library logging module. So far, I’m
really happy with it.I’ll start with a script that uses print
statements and revise it a few times and show off how
84 Python Iterators, Iterables, and the
logging is a better solution. Itertool Module
Saad Bin Akhlaq
40 Python, Web Security and Django Python makes a distinction between iterables and iterators,
it is quite essential to know the difference between them.
Steve Lott
Iterators are stateful objects they know how far through their
Two of the pillars of security are Authentication (who
sequence they are. Once they reach their thats is it. Iterables
are you?) and Authorization (what are you allowed to
are able to create iterators on demand. Itertool modules
do?). Best security practice is never to store a password
includes a set of functions for working with iterable datasets.
that can be easily recovered. A hash can be undone

en.sdjournal.org 5
Python: A Guide for
Beginners

Python is an easy and powerful programming language.


It has highly efficient data structures with object-oriented
programming approach. Its neat syntax and dynamic typing
makes it more efficient. It is the best programming language for
rapid application development for many platforms.

P
ython interpreter and extensive standard library • Python has inclusive standard library which helps
are available for free in the source code. Python programmers write almost any kind of code.
interpreter is easy to extend. It comes with new • It has industry standard encryption to 3D graphics.
functions, and its data types can be easily implemented • It can be easily installed in a variety of environ-
in C/C++. Python is also appropriated as an extension ments such as desktop, cloud server or handheld
language for customizable applications. devices.
Python was written by
a Dutch computer pro- In this article you will learn about the basics of Python
grammer Guido van Ros- such as system requirement, installation, basic math-
sum (who now works with ematical operations and some examples of writing
Google). Python is an ob- codes in Python. This article is intended to help you
ject-oriented programming learn to code in Python (Figure 2).
language, which is being If you are new to computers, you need to first under-
widely used for various stand and learn about how to start operating, and how
software and application the machine sees your program. For those who already
development. It provides know computer operations and operating systems can
strong support to get eas- directly jump into coding. But before you start cod-
ily integrated with various ing, you need to make sure that you are well equipped
other tools and languages. with an editor. It will help you to get familiarized your-
Figure 1. Python Logo
It has a rich set of libraries self with the basics of Python coding. Also, you need
that can be easily learned by beginners as well. Many Py- to understand basics of writing, executing and running
thon developers believe that Python provides high-qual- a program. Executing a Python program lets you know
ity of software development, support and maintenance. whether the Python interpreter converts into the code
Here are some advantages of using Python as a cod- that the computer can read and take action on it.
ing language:
System requirement for Python
• Python comes with simple syntax, which allows you Operating systems required for Python are Mac OS X
to use a few keywords to write code in Python. 10.8, Mac OS X 10.7, Mac OS X 10.6, Unix systems
• Python is an object oriented language thus there is and services. Windows doesn’t require Python natively.
everything is object in Python. You don’t need to pre-install a version of Python. The
• Python has advanced object oriented design ele- CPython has compiled Windows installers with each
ments which allow programmers to write huge codes. new release of Python (Figure 3).

6 13/2013
Python: A Guide for Beginners

Getting started with Python


As Python is an interpreted language, therefore pro-
grammers don’t need a compiler. Python is pre-installed
in Linux and Mac operating systems; you just need to
run it. Type “Python3” to get started with Python. If you
need interpreter, you can simply download it from www.
python .org/download/ (Figure 4).
Python 3 is a user-friendly version you can easily get
started with it. Once you have downloaded the interpret-
er, go through the instructions carefully to install it. You
also need to download a code editor to get started with
coding. For Windows users, Notepad can be a good op-
tion to write code. For Linux users every single little text
editor is a syntax-highlighting code editor. Mac users
can use Text Wrangler to write code in Python.

About the Author


The writers’ team at Wide Vision Technologies is well
versed at basic computer operations and writing for web
audience. The team has been writing articles, blogs and Figure 4. Python and other similar languages

Figure 2. How the computer sees Python

Figure 3. System requirement for Python

en.sdjournal.org 7
website content since the past five years. Each team mem- helloworld = “hello” + “ “ + “world”
ber has at least two years of experience in writing for web.
Python supports multiplication strings to structure a
Writing the first program string with a repeat sequence, for example:
To start writing your first program in Python, you need to
open the text editor. Write: lotsofhellos = “hello” * 10

#print(“Hello, How are you?”)#. Operators with Lists


In Python you can join lists with addition operators, for
After this, save the file, you can name it as “hello. example:
py.” To open Windows, click Start button, in Run op-
tion, type “cmd” in the prompt. Then you need to navi- even_numbers = [4,6,8]
gate to the index where you have saved your first pro- odd_numbers = [3,5,7]
gram and type “python hello.py” (without quotes). With all_numbers = odd_numbers + even_numbers
this effort, you can find out whether your Python is in-
stalled and working properly or not. You can now start Python supports creating new lists with repeat-
writing with more advanced codes (Listing 1). ing sequence with strings in multiplication operator,
for example:
Arithmetic operators
Python also has arithmetic operators such as addition, print [1,2,3] * 3
subtraction, multiplication, and division. You can eas-
ily use these standard operators with numbers to write Now, it’s time to try a simple mathematical program in
arithmetic codes. Python. Here are some simple basic commands of Py-
thon and how you can use them.
Operators with Strings
Python also supports strings with the addition operator, Table 1. Basic mathematical operations and examples
for example: Command Name Example Output
+ Addition 4+4 8
Listing 1. Simple code example of Python - Subtraction 8-2 6
* Multiplication 4*3 12
1: // def insert_powers(numbers, n)
2: // powers = (n, n*n, n*n*n) / Division 18/2 9
3: // numbers [n] = powers % Remainder 19%3 5
4: // return powers ** Exponent 2**4 16
5:
6: static PyObject *
7: insert_powersl(PyObject *self, PyObject *args) The simple mathematical operations can be applied
8: { easily in Python as well. Here is the list of names what
9: PyObject *numbers: you call in Python:
10: int n:
11: • Parentheses ()
12 if (!PyArg_ParseTuple(args, ‘oi” , &numbers, &n)) { • Exponents **
13: return NULL; • Multiplication *
14: } • Division \
16: PyObject *powers = Py_BuildValue(“(iii)” , n, • Remainder %
n*n, n*n*n); • Addition +
17: • Subtraction -
18: //Equivalent to Python: numbers[n] = powers
19: if (PySequence_SetItem(numbers, n, powers) < 0) { Here are some simple and try-it-yourself examples of
2o: return NULL; mathematical codes in Python:
21: }
22: >>> 1 + 2 * 3
23: return powers; 7
24: } >>> (1 + 2) * 3
9

8 13/2013
References
[1] https://www.google.com/url?sa=i&rct=j&q=&esrc=s&source
=images&cd=&cad=rja&docid=Uu27A3md38FOsM&tbnid=
ygia7G_YS151YM:&ved=0CAMQjhw&url=https%3A%2F%2F2.zoppoz.workers.dev%3A443%2Fhttp%2Fa
freemobile.blogspot.com%2F2011%2F07%2Fdownload-py-
thon-for-symbian.html&ei=FoLuUc2eFc6GrAfsm4GYDA&bv
m=bv.49478099,d.aGc&psig=AFQjCNERzUgwmKhr62FF5j_
pKicDzKgl5Q&ust=1374671724203993
[2] http://www.itmaybeahack.com/homepage/books/nonprog/
html/_images/p1c5-fig3.png
[3] http://freegee.sourceforge.net/FG_EN/freegee-overview800.png
[4] http://www.google.com/imgres?start=361&hl=en&biw=1366&bih
=667&sout=0&tbm=isch&tbnid=qwB5Xw9W8VEtWM
:&imgrefurl=http://quintagroup.com/services/python/
applications&docid=sKv9o-jtWEP8pM&imgurl=http://
quintagroup.com/services/python/python-applications.
png&w=377&h=205&ei=wYPuUbuWDc-ciQeTtYGgCw&zoom=1
&ved=1t:3588,r:77,s:300,i:235&iact=rc&page=17&tbnh=164&tbn
w=301&ndsp=22&tx=223&ty=97
[5] https://encrypted-tbn2.gstatic.com/images?q=tbn:ANd9GcRNM
0MYpdUcHbhV5hlRKv8nkEnsAKwNNukK9-1FhyFfbsoh07ra4g

In the above example, the machine first calculates 2


* 3 and then adds 1 to it. The reason is multiplication
is on the high priority (3) and addition is at the below
priority (4). In other one, the machine first calculates 1
+ 2 and then multiplies it by number 3. The reason is
that parentheses are on high priority and addition is on
the low priority than that. In Python the math is being
calculated from left to right, if not put in parentheses.
It is important to note that innermost parentheses are
being calculated first. For example:

>>> 4 – 40 – 3
-39
>>> 4 – (40 – 3)
-33

In this example, first 4-40 is evaluated first and then


-3. In the other one, first 40-3 is evaluated and then it
is subtracted from the number 4.
Python is one of the high-level languages available
these days. It is one of the most easy to learn and use
languages, and at the same time it is very popular as
well. It is being widely used by many professional pro-
grammers to create dynamic and extensive codes.
Google, Industrial Light and Magic, The New York Stock
Exchange, and other such big giants use Python. If you
have your own computer you can download and install
it easily. Python is free; you can start coding in Python
now!
For more information visit: www.widevisiontechnolo-
gies.com/.

Mohit Saxena

en.sdjournal.org
Starting Python
Programming and the Use
of Docstring and dir()
In this article, I will be talking about Python as a general-
purpose programming language which is designed for easy
integration, readability and most of all the ease in expressing
concepts in a few lines of code. Also we will be doing a lot of
practice, on the basics of python programming, after which we
will take a look at docstring and dir() and how they can be used
to learn about new API’s.

P
ython is an interpreted language and features has lots of features aside the interpreter. In this tutorial,
dynamic system with an automatic memory we will be using the interactive programming environ-
management. It can be used as a full fledged ment which can be accessed through the terminal in
language, or integrated as a scripting language in an- Linux, other Unix distributions and also IDLE.
other such as C, Java e.t.c The language itself is not
limited to a specific programming paradigm, different Getting Started
styles of coding can be used in this language such as; Enough chit chat, if you are using windows I presume
Imperative, Object-oriented, functional and procedural you have installed IDLE, once you open it, it will give
styles. There are several areas in which python is used, you an interactive environment with the python prompt
areas such as: >>> instantly. For Linux/Unix users, open the terminal
and type
• Mathematics,
• scientific research, $ python
• system administration,
• desktop application development with Tkinter e.t.c, at your shell prompt, press enter and you should have
• web application development in frameworks such the python prompt >>> Note. In defining functions or
as Django, blocks with more than one line, the interpreter provides
• and recently Mobile application development in …. which means a continuation.
Kivy framework, scripting layer for android, python
for android. Number, Variables and Operators
Let’s play with variables, numbers and arithmetic op-
Python Interpreter erators. Calculations in python have been interest-
As you know by now python is an interpreted language, ing as there are no special features or syntax need-
and it has its interpreter which runs on multiple plat- ed for calculations; simple addition, subtraction and
forms such as Windows, Linux, Mac OSX, other UNIX multiplication are straight forward as if you are using
distributions and SL4A which contains a python inter- a calculator.
preter that runs on Android. Linux and Mac OSX come First of all let’s talk about variables; variables are con-
with python 2.7 preinstalled in them, if you are using tainers/memory locations that can store known or un-
Windows you can download IDLE (Python IDE) which known quantities. This allows us to manipulate quanti-

10 13/2013
ties without having to explicitly define them every time
they are needed. Note that python is not a strong typed
language, variable types are determined by their con-
tents not defined. e.g:

num = 10 – means that variable ‘num’ is an integer


num = ‘Name’ – means variable ‘num’ is a string.

Having our python prompt, we are going to do some


calculations and store our results in variables.

>>> a = 2 – variable ‘a’ stores 2


>>> b = 3 – variable ‘b’ stores 3
>>> sum = a+b – variable ‘sum’ stores value of ‘a+b’

Now “sum” contains the sum of “a” and “b”, how do we


know if this actually worked, well lets print the value of
“sum” and see:

>>> print sum


5 – result
>>> sub = 50 -20
>>> print sub
30

Yes, it’s as easy as that, unlike C, Java etc. You do not


need to compile your code in order to see the output,
this is an interpreted language and when using the in-
teractive programming environment, we get outputs
immediately. Let’s do some multiplication.

>>> product = 3*6


>>> print product
18

Division and Modulo:

>>> div = 5/2


>>> print div
2

I know you want to ask a question, how did 5/2 be-


comes 2 right! Yes its 2 because our answer has been
rounded down to the nearest integer. If we want our
answer in float we can simply divide like Listing 1.
Now if we want to find the square of a number how
are we going to do that, unlike other languages python’s
method of calculating square is not ^ but **. Let’s try it
and see:

>>> square = 5**2


>>> print square
25

You can try as much examples as you want.

en.sdjournal.org
Importing Modules Enough with the basics, let’s get down to some data
Now, what if we want to calculate the square root of structures.
a number? Unfortunately square root is not part of
the python standard library (built in functions can be Lists
found here http://docs.python.org/2/library/functions. Lists are very similar to arrays and they can store ele-
html#raw%5Finput) but fortunately enough there are ments of any type and contain as much elements as
lots of tools provided in python and one of those is the you want. Let’s take a look at declarations and storage
math module. of elements in a list:
A module is a file that contains variable declarations,
function implementations, classes etc. And we can >>> myList = []
make use of this functions and variables by importing
the module into our environment. Let’s get to practice; This automatically declares a list for you, and you can
this is how you import a module to your environment populate it with elements using a method provided by
the list object “append” see Listing 3. And can also
>>> import math print elements in a specific location like this:
And now we have imported that module with all its
tools, somewhere in it, is the square root function that >>> print myList[0]
we can call, using: 1

>>> math.sqrt(25) You can learn more about other list functions here
5 http://docs.python.org/2/tutorial/datastructures.html.

You see that we used math.sqrt() what if we just want Condition and Iteration
to use sqrt(), well there is a way, we import sqrt this Conditions are important aspects of programming, even
way in Listing 2. in real life we use condition everyday i.e. I want to buy
milk and I have only 20 bucks, now I will go through
Strings and Input each shop and check if the milk is less or equal to the
We can equally store strings in variables: amount I have, I buy it else I move to the next shop. The
same applies in programming.
>>> name = “Jane Doe” Unlike conditions, iteration is a way of going through
>>> print name all the elements in a list, sequence or repeating a partic-
‘Jane Doe’ ular process over and over again, and this can be very
useful in terms of decision making since we have a lot of
Also we can concatenate strings together by the use options but we need to go throught each and eveluate
of the + operator like this:
Listing 1. Division and modular
>>> print “Jane” + “ ” +”Doe”
‘Jane Doe’ >>> div = 5/2.0
>>>print div
In some cases we do not want to just hard code data 2.5
into our program, but we want it to be supplied by the >>> mod = 10 % 2
user. In this case we can use raw _ input(): >>> print mod
0
>>> yourName = raw_input(‘Enter name: ’)
Enter name: jane Listing 2. Importing individual functions
>>> print yourName
jane >>> from math import sqrt
>>> root = sqrt(36)
Note: there is another way of taking user defined in- >>> print root
puts using the input() but I don’t advice using it now un- 6
til you really know what you are doing, the fact is what- >>> from math import pow
ever you pass to input() it gets evaluated, if you want >>> pow(5, 2)
for instance a string ‘3’ when you pass it to input() it 25
gets evaluated and converted to an integer and that can
cause a whole lot of trouble. So just avoid it.

12 13/2013
Starting Python programming

to find the best. In this document we will make use of for It prints out what you passed to it, also we can return
loop; however there are other methods of iteration such values from a function, take for instance, let’s write a
as while loop (Listing 4). function that takes in two numbers, add them together
Now this is a bit new to some, what was done here and returns the value:
is, we go through each element in “goods” list using for
loop and the variable “I” assume each of the elements >>> def add(a, b):
one after the other until there are no more elements, … return a+b
evaluating at each stage.
And this is it, we can call add() with two arguments:
Functions
Functions are a way to divide our code into a modular >>> add(2, 3)
structure, to enable reuse, readability and save time. If >>>
there is a particular process that is written over and over
again, this can be a bit bogus and inefficient, but when Exactly nothing happened, because we did not print
we define functions, we can easily call does whenever the returned value. Now let’s store what is returned to
its needed. a variable and print it out.
I will show you how a function a written:
>>> sum = add(2, 4)
>>> def function(args): >>> print sum
… print args 6

This is a simple function the prints whatever is passed Comments


to it and you can test it by runing this: Commenting code is a good practice for programmers,
it helps whoever reading you code know what you were
>>> function(‘name’) doing and sometimes its helpful when you come back
name to modify you code or update. Comments in python are

Listing 3. Adding elements to a list


“””
>>> myList.append(1) class Animal(object):
>>> myList.append(2) def talk(self):
>>> myList.append(3) “”” Method that shows how animals
>>> print myList talk “””
[1, 2, 3] def mate(animal):
“”” Method for mating animals ”””
Listing 4. Iterating through list elements and checking for a
condition Listing 6. Using help() to learn more about a function usage and
definition (printing docstring)
>>> goods = [‘milk’, ‘steak’, ‘Sugar’]
>>> for I in goods: >>> import math
…. If I == ‘milk’: >>> help(math.pow)
…. print i Help on built-in function pow in module math:
…. else: pow(...)
Print ‘Not milk’ pow(x,y)
milk Return x**y (x to the power of y). – is
Not milk the docstring
Not milk

Listing 5. Format of Docstring

“””
Source defining the animal class, containing one
method and another separate
method

en.sdjournal.org 13
striped out during parsing, and we comment in python Let’s try some practice:
by putting # before the line we want to be commented.
Like this: >>>lis = []
>>> dir(lis)
>>>#this is a comment [‘append’, ‘count’, ‘extend’, ‘index’, ‘insert’, ‘pop’,
>>> ‘remove’, ‘reverse’, ‘sort’]

Does actually nothing because the interpreter knows First we defined a list object, and then passed it to
once it encounters # everything in that line after it will dir() and it returned all the methods that are applica-
be ignored. ble to this particular object.
Also the same goes for math module in Listing 7. We
Docstring had no idea what functions are contained in math mod-
Now that you have learned how to use variables, im- ule, but importing it into our environment and passing
port modules, operators, conditional statements, itera- the module object to dir() reveals all the functions in
tor, function and lists. Let’s introduce something called the module. The same goes for the functions, like the
Docstring. pow function contains sub attributes that we viewed us-
Docstring is a string literal that is used to document ing dir().
codes; usually stating what a particular function is, or
a class, or modules. Unlike comments or other type Summary
of documentations, docstring is not stripped from the This document is just an introduction to python, it is de-
source code during source parsing, but retained and in- signed to make you comfortable with the environment
spected together with the source file. This allows us to and some concepts, tricks and methods in python pro-
completely document our code within the source code gramming language. This will help you being able to
and this is written within three opening and closing learn more advanced topics on your own. I advice you
quotes e.g. “””contents “””. Let’s see how this is written. to keep practicing and creating different tasks for your-
Example at Listing 5. self. That is the only way you will become a good Soft-
Now what if we want to view the docstring of a func- ware Developer.
tion, to learn about what that function does or the us-
age, well we can use the help() function, it prints the
docstring of that function. Let’s see Listing 6.
See that, we learnt a lot about the pow() function by
printing the docstring of pow() using help(). Sotaya Yakubu
Sotaya Yakubu have been an active contributor to open
Viewing Functions of a Module(dir()) source projects, working as a freelance software develop-
What if you have several modules at your disposal er with several companies and individuals such as Mediapriz-
but have no idea what is contained in them, and you ma kft etc. for the past five years and also involved in develop-
are so lazy to go through a bunch of source code, ment of mobile frameworks and research in Artificial Intelli-
dir() is a function that can be used to view the func- gence mainly to develop and improve expert and surveillance
tions defined in a module, or the methods applicable systems. He is also a writer and some works can be found here
to certain objects. plaixes.blogspot.com contact: [email protected].

Listing 7. Use of dir() to learn more about functions and modules

>>> import math


>>> dir(math)
[‘__doc__’, ‘__name__’, ‘acos’, ‘asin’, ‘atan’, ‘atan2’, ‘ceil’, ‘cos’, ‘cosh’, ‘degrees’, ‘e’,
‘exp’, ‘fabs’, ‘floor’, ‘fmod’, ‘frexp’, ‘hypot’, ‘ldexp’, ‘log’, ‘log10’, ‘modf’,
‘pi’, ‘pow’, ‘radians’, ‘sin’, ‘sinh’, ‘sqrt’, ‘tan’, ‘tanh’]
>>>
>>> dir(math.pow)
[‘__call__’, ‘__class__’, ‘__cmp__’, ‘__delattr__’, ‘__doc__’, ‘__getattribute__’, ‘__hash__’,
‘__init__’, ‘__module__’, ‘__name__’, ‘__new__’, ‘__reduce__’, ‘__reduce_ex__’,
‘__repr__’, ‘__self__’, ‘__setattr__’, ‘__str__’]

14 13/2013
Beginning
with Django
In this article we’ll see the basis of using Django framework
to build web applications. As a variation of MVC (Model View
Control), we’ll learn how to configure a project, create a Django
App, interact with the ORM (Object Relation Model), the routing
(urls dispachting), the view (the Django “Control” part), the
templates and a taste of the admin interface.

W
hat are the success keys for a web frame- self and all the related project libraries. To create a vir-
work? Is it easy to use? Is it easy to deploy? tual environment, I suggest using the virtualenvwrapper
Does it provide user satisfaction? Django scripts available at https://pypi.python.org/pypi/virtua-
framework is more that these answers because, in my lenvwrapper for unix/macosx users (or https://pypi.py-
opinion, is one of the few framework that is able to hit its thon.org/pypi/virtualenvwrapper-win for windows). After
goal: it “makes it easier to build better Web apps more installing the virtualenvwrapper, we can create an envi-
quickly and with less code”. There are a lot of good web ronment sdjournal typing
frameworks, but few of them provide all the “batteries
included” that are required to create complex and “cus- mkvirtualenv --no-site-packages --clear -p /usr/bin/
tom” web applications. python2 sdjournal
Django initially starts an editorial project, at the Law-
rence Journal-World newspaper by Adrian Holovaty This command creates a python virtual environment
and Simon Willison, with a marked MVC approach. The called sdjournal with no references to other installed li-
complete separation of model, view and templates al- braries (--no-site-packages --clear) and using the py-
lows to fast replacement of its components and incre- thon interpreter 2.x.
ment modularization. Note: django works with both python 2.x and python
It’s often defined as “batteries included” framework 3.x versions, but many of the third part applications are
because it has built-in cache support, authentication, developed using python 2.x (the 2.x version is a safer
user pluggable, generic type management, pluggable version to be used).
middlewares, signals, pagination, syndication feeds, In future to access to the virtual environment in shell
logging, security enhancements (clickjacking protec- is required to activate it:
tion, Cross Site Request Forgery protection, Crypto-
graphic signing) and many others features. workon sdjournal
In this article, we’ll cover the main functionalities: we’ll
start setting up an environment and we’ll create a sim- and to move in it:
ple application.
NOTE: The code of this article is available on github cdvirtualenv
at https://github.com/aparo/mybookstore.
Now that we have a virtualenv, we can install Django
Settings up an Django Environment with pip:
When developing with python, a good practice is to cre-
ate a virtual environment in which stores the python it- pip install django

16 13/2013
Beginning with Django

It installs django version 2.5.1. A good practice is to • Set up media directory, which contains uploaded
install also packages to manage database changes media files. The settings is controlled by MEDIA_
(south: http://south.aeracode.org/), to do simple image ROOT setting: we’ll set it to media directory in the
manipulation libraries PIL (Pillow: https://pypi.python. virtualenv root.
org/pypi/Pillow) and to improve the python command
line (ipython): MEDIA_ROOT = os.path.join(os.path.dirname(os.getcwd()),
“media”)
pip install south Pillow ipython
• Set up static directory, which contains static files
Now the environment and some base libraries are in- such as images, javascript and css. This parame-
stalled we can create a simple Django project (a ter is controlled by STATIC_ROOT setting: we’ll set
book store): be sure to be in the virtualenv directory it to “static” directory in the virtualenv root.
(cdvirtualenv) and type:
STATIC_ROOT = os.path.join(os.path.dirname(os.getcwd()),
django-admin.py startproject mybookstore “static”)

After having installed Django, the django-admin.py • Set up installed applications. In INSTALLED_AP-
command is available in the virtualenv. It allows exe- PS setting, we must put the list of all the application
cuting a lot of administrative commands such as proj- that we want installed and available in the current
ect management, database management, i18n (trans- project.
lation) management, …
The syntax is django-admin.py <command>: so the INSTALLED_APPS = (
startproject command creates a stub working struc- ‘django.contrib.auth’,
ture with some files such as: ‘django.contrib.contenttypes’,
‘django.contrib.sessions’,
mybookstore/manage.py ‘django.contrib.sites’,
mybookstore/mybookstore ‘django.contrib.messages’,
mybookstore/mybookstore/__init__.py ‘django.contrib.staticfiles’,
mybookstore/mybookstore/settings.py ‘django.contrib.admin’
mybookstore/mybookstore/urls.py )
mybookstore/mybookstore/wsgi.py
This is the minimal setup required to configure Django.
The manage.py file is similar to django-admin.py, but
local to the project. Creating the first app
The settings.py is the core of all Django settings. As it aThe Django App is often completely reusable, mainly
contains a big number of options, I’ll point out the more because models and views rarely change. The template
important ones: part (HTML) of a web application is generally not fully
reusable, because it often need to be themed or cus-
• Set up the database. Django relies on a Data- tomized by users, so if generally overwritten with cus-
base and it must be configured to work. The da- tom ones.
tabase settings are in DATABASES dictionary. To create a new application is done with django-ad-
We’ll use sqlite as database as it is very simple min.py or manage.py command within the project direc-
to configure and it is automatically available in tory. For example, to create a new bookshop app, you
Python distribution. need to type:

DATABASES = { python manage.py startapp bookshop


‘default’: { This command creates a new app/directory called
‘ENGINE’: ‘django.db.backends.sqlite3’, bookshop containing these files:
‘NAME’: ‘mybookstore.db’,
‘USER’: ‘’, • _ _ init _ _ .py: special python file, which converts
‘PASSWORD’: ‘’, a standard directory in a package.
‘HOST’: ‘’, • models.py: the file that will contain the models of
‘PORT’: ‘’ this app. Initially it contains no objects.
} • tests.py: the unittest file for this application. This file
} contains a test stub to start with.

en.sdjournal.org 17
• views.py: this files contains the views that are used It starts a server listening on localhost port 8000, so
in this application. The standard file is empty. just navigate to http://127.0.0.1:8000 to see your site.
NOTE: generally for every common task, the Django
Generally an app directory contains some other files user doesn’t need to know the SQL language, as the
such as: Django ORM manages it transparently and multi DBMS
(oracle, mysql, postgresql, Sqlite). Django doesn’t re-
• admin.py: which contains the administrative inter- quire user’s SQL knowledge.
face definition for the application. We’ll have a fast
briefing on it at the end of the article. Creating the first models
• urls.py that contains app custom url routing. As example, we will create a simple Book Shop that
• migrations directory/package: if south app is in- stores authors, their books and tags related to books.
stalled and the app contains migrations. This direc- The following ER diagram shows the models relations:
tory stores all the model changes. Figure 1.
• management directory/package: which contains
script that are executed on syncdb and custom ap-
plication commands.
• static directory: which contains application related
static files (i.e. js, css, images)
• templates directory: which contains HTML tem-
plates used for rendering.
• templatetags: which contains custom template tags
and filters used for rendering in this application. Figure 1. Creating a book shop

Now that we have created an application, we must add This schema is easily converted to Django models
it to INSTALLED_APPS list to enable it. In settings.py, (Listing 1).
the INSTALLED_APPS setting will be:
The models.py file
INSTALLED_APPS = ( Create a Django Model is very easy: every model de-
‘django.contrib.auth’, rives from models.Model and it’s composed from sev-
‘django.contrib.contenttypes’, eral typed fields, such as:
‘django.contrib.sessions’,
‘django.contrib.sites’, • CharField (mapped to SQL VARCHAR) used for
‘django.contrib.messages’, small parts of text.
‘django.contrib.staticfiles’, • TextField (mapped to SQL TEXT) used for text of
‘django.contrib.admin’, undefined size
‘bookshop’ • DateField used to store date values
) • IntegerField used to store integer values
• BooleanField used to store boolean values (True/
Django takes care to create missing tables and popu- False) id must manage the null value, you shoud
lating the initial database with the syncdb command. use NullBooleanField
• FloatField used to store floating point values
python manage.py syncdb • ForeignKey used to store reference to others models
• ManyToManyField used to manage a many to ma-
The first time that is executed, if there is no superuser, ny relation. (Django creates automatically accesso-
the command asks to create it and guides the user to ry tables to manage them)
creating of an admin account.
The syncdb command creates the database if it’s These are the most common field types: Django al-
missing; if some tables are not available, it will be cre- lows to extends them so on the web there are a lot of
ated them with sequences, indices and foreign key con- special fields for managing borderline cases.
traints. Every field type has its own parameters: the most
Now our semi working complete application can be common ones are:
executed in developer mode using the built-in django
with the following command: • default: used to set a default value for the field
• blank (True/False): allows to put a empty value in
python manage.py runserver web interface;

18 13/2013
Beginning with Django

• null (True/False) allows to set null for this field Django allows control the urls in the urls.py file. There
• max_lenght (Charfield or derivated): sets the maxi- are several urls.py files in a Django project: one global
mum string size. for the all the project (in our example mybookstore/urls.
py) and, typically, one for every app.
After having defined the models, and added the new In our app (bookshop/urls.py) we’ll create two urls one
apps in INSTALLED_APPS in settings.py; it’s possi- for access to the list of books and another one for show-
ble to create tables for the database. The command ing a book detail view (Listing 2).
is again: Django urls control is based on regular expressions.
In our example, the first url command registers an emp-
python manage.py syncdb ty string, a view “index” (expanded in “bookshop.views.
index”) and a name to call this url. The second url com-
This command creates required table, sequence and mand registers a value “book_id” to be passed as vari-
index for the current installed applications. able to a “detail” view (formaly “bookshop.views.detail”)
and the name of this url.
Creating the First Views During url dispatching Django try to check the correct
Now we can start to design the urls and the views that view to serve based regular expression matching. The
are required to show our books. view function is a simple Python function that returns a

Listing 1. bookshop/models.py – our bookshop models


class Book(models.Model):
from django.db import models title = models.CharField(max_length=250)
from django.utils.translation import ugettext_ description = models.TextField(default=””,
lazy as _ blank=True, null=True)
release = models.DateField(auto_now_
add=True)
class Author(models.Model): in_stock = models.IntegerField(default=0)
name = models.CharField(max_length=50) available = models.
surname = models.CharField(max_length=50) BooleanField(default=True)
price = models.FloatField(default=0.0)
class Meta: author = models.ForeignKey(Author)
unique_together = [(“name”, “surname”)] tags = models.ManyToManyField(Tag,
ordering = [“name”, “surname”] blank=True, null=True)
verbose_name = _(‘Author’)
verbose_name_plural = _(‘Authors’) class Meta:
ordering = [“title”, “author”]
def __unicode__(self): verbose_name = _(‘Book’)
return u’%s %s’ % (self.name, self. verbose_name_plural = _(‘Books’)
surname)
def __unicode__(self):
return _(u’%s of %s’ % (self.title,
class Tag(models.Model): self.author))
name = models.CharField(max_length=50,
unique=True) Listing 2. bookshop/ urls.py – our bookshop urls

class Meta: from django.conf.urls import patterns, url


ordering = [“name”]
verbose_name = _(‘Tag’) urlpatterns = patterns(‘bookshop.views’,
verbose_name_plural = _(‘Tags’) url(r’^$’, “index”, name=’index’),
url(r’^(?P<book_id>\d+)$’, “detail”,
name=’detail’)
def __unicode__(self): )
return u’%s’ % self.name

en.sdjournal.org 19
Response object or its derived ones. We need to de- • {{value|filter}} is used to change with a filter: a val-
fines two views “index” and “detail” (Listing 3). ue transformation such as text formatting or num-
The “index” needs to show all the available books: we ber and date/time formatting. A field return a value
create a context with a books queryset and we render that can be passed to another filter.
it with a HTML template. The queyset, accessible for • {% tagname … %} are used to process tags: func-
every model using the objects attribute, is an ORM el- tions that extends HTML capabilities. (See https://
ement that allows executing query on data without us- docs.djangoproject.com/en/dev/ref/templates/built-
ing SQL. The Django ORM takes to create and execute ins/ for built in)
SQL code. In the “index”, Book.objects.all() retrieve
all the books objects. Generally templates of an application live in the template
The detail view, which takes a parameter book_id subdirectory of the same application. For the shop/index.
passed by url routing, create a context with a variable html page will have a similar template (Listing 4).
“book” which contains the Book data. In this case, the Also the shop/details.html template is very simple:
queryset method that executes a query with given pa- Listing 5. The Django tags used in these templates are:
rameters and returns a Book object or an Exception. If
there is no a book with pk equal to book_id variable a • load: it allows to load in the rendering context a tag
HTTP 400 error is returned: this fallback prevents nasty library. I loaded i18n to autolocalize string (translate
users url manipulation. string in your local language).
• trans: it marks the string to be translate in local lan-
Creating the Templates guage.
We have the data to render in context, now we need to • for…endfor: it iterates a value.
write some HTML fragments to render this data. • url: it executes an url reverse given a
Django templates are generally simple HTML files, namespace:url-name and optional values.
with special placeholders: • empty: it’s a shortcut to render some text if not
books are available.
• {{value}} or {{value0.method1.value2}} are used • if .. else ..endif it checks if a condition is verified.
to display objects, fields or complex nested values.
Django automatically tries to translate the object in- The templatetags and filters are very powerful tools,
to text. The failure is transparently managed and online there are a lot of libraries to extend the template
nothing is printed. engine for executing ajax, pagination, …

Listing 3. bookshop/ views.py – our bookshop index and detail Listing 4. shop/index.html – template used to render the index
views page

from django.shortcuts import render <!DOCTYPE html>{% load i18n %}


from django.http import Http404 <html><head><title>{% trans “Index of books”
from bookshop.models import Book %}</title></head>
<body>
def index(request): <h1>{% trans “Book List” %}</h1>
context = {‘books’: Book.objects.all()} <ul>
return render(request, ‘shop/index.html’, {% for book in books %}
context) <li><a href=”{% url “shop:detail” book.pk
%}”>
{{ book.title }} {% trans “by” %} {{
def detail(request, book_id): book.author }}</a>
try: </li>
book = Book.objects.get(pk=book_id) {% empty %}
except Book.DoesNotExist: <li>{% trans “No books” %}</li>
raise Http404 {% endfor %}
return render(request, ‘shop/detail.html’, </ul>
{‘book’: book}) </body>
</html>

20 13/2013
Beginning with Django

The results are shown in the following images (Figure To activate the admin interfaces, the admin module
2 and Figure 3). discovery and the admin urls must be registered in the
In this article we have privileged to keep simpler tem- main urls file (mybookstore/urls.py) (Listing 6).
plates. It’s very easy creating cool sites using some css To register some models in the admin interface a new
templating such as twitter bootstrap or other javascript/ file in our application directory is required: bookshop/
css web frameworks such as YUI or jquery. admin.py (Listing 7).
Register a models in the admin is very simple; it’s
Populating data with admin interface enough to call the admin.site.register method with the
The final step required to build every serious application model that we want register.
is to have an admin interface in which insert/edit/delete It’s possible to customize the admin per model pass-
your application data. Django, using reflection, allows ing a second value (a class derived by admin.Model-
create simple admin interface with few lines of code. Admin) that contains some extra info for rendering the
admin. In the example we have used:

• list _ display that contains a list of field names


that must be shown in the admin list view table
• search _ fields that contains a list of field names to
Figure 2. The shop/index.html template after inserting some books be used for searching items

Figure 3. The shop/detail.html template rendering a book

Listing 5. shop/details.html – template used to render the detail %}{% endif %}</td>
page
</tr>
<!DOCTYPE html>{% load i18n %} </table>
<html><head><title>{% trans “Book” %} – {{ book. </body>
title }}</title></head> </html>
<body>
<a href=”{% url “shop:index” %}”>{% trans “Books Listing 6. mybookstore/urls.py – global project urls
Index” %}</a>
<table> from django.conf.urls import patterns, include, url
<tr> from django.contrib import admin
<td>{% trans “Title” %}</td><td>{{ book. from django.views.generic import RedirectView
title }}</td> admin.autodiscover()
{% if book.description %}<td>{% trans
“Description” %}</td><td>{{ urlpatterns = patterns(‘’,
book.description }}</td>{% url(r’^$’, RedirectView.as_
endif %} view(url=”shop/”)),
<td>{% trans “Price” %}</td><td>{{ book. url(r’^shop/’, include(‘bookshop.urls’,
price }}</td> namespace=”shop”)),
<td>{% trans “Available” %}</td><td>{% url(r’^admin/’, include(admin.site.urls)),
if book.available %}{% trans )
“Yes” %}{% else %}{% trans “No”

en.sdjournal.org 21
Listing 7. bookshop/admin.py – bookshop admin file On The Web
• https://pypi.python.org/pypi/virtualenvwrapper – virtua-
lenvwrapper page
from django.contrib import admin
• https://pypi.python.org/pypi/virtualenvwrapper-win – vir-
from bookshop.models import Book, Author, Tag tualenvwrapper for windows page
• http://south.aeracode.org/ – intelligent schema and data
class BookAdmin(admin.ModelAdmin): migrations for D ​ jango projects
list_display = (‘title’, ‘author’, • http://ipython.org/ – python interpreter power-up
‘available’, “in_stock”, • https://pypi.python.org/pypi/Pillow/ – Python Image Library
• https://pypi.python.org/pypi/pip – python package installer
“price”) • https://www.djangoproject.com/ – Django site
search_fields = [‘title’, “description”] • https://docs.djangoproject.com/en/1.5/ – Django docu-
list_filter = (‘available’, “in_stock”, mentation
“price”) • https://www.djangopackages.com/ – archive of catego-
rized Django packages
admin.site.register(Book, BookAdmin)
admin.site.register(Author)
admin.site.register(Tag) • list _ filter that contains a list of field names to be
used for filtering.

The following images show the admin book list view


and the admin editing view (Figure 4 and Figure 5).

Conclusions
In this article we have a fast briefing on how easy and
powerful is Django. We have seen the installation, the
creation of an application, the base models-views-tem-
plates of Django and the admin interface setup. These
elements are the skeleton to build from simple sites to
big and complex ones.
If you are impatient, the tutorials and documentations
on Django site are good places to start with; otherwise
Figure 4. Django Admin – Book List Page
in the next articles we’ll go in deep on these article fea-
tures and we’ll introduce a lot of many others such as
the cache, user/group management, middlewares, cus-
tom filter and tags, …

Alberto Paro
Alberto Paro is the CTO at the Net Planet, a big-data company
working on advance knowledge management (NoSQL, NLP,
log analysis, CMS and KMS). He’s an Engineer from Politecni-
co di Milano, specialized in multi-user and multi-devices web
applications. In the spare time he write books for Packt Pub-
lishing and he works for opensource projects hosted on github
Figure 5. Django Admin – Book Add-Edit Page mainly django-nonrel, ElasticSearch and pyES.
Better Django Unit Testing
Using Factories Instead of
Fixtures
Best practices always stress writing unit tests for your
applications. But writing useful tests for a Django web
application can be difficult, particularly if your data model has
lots of related models. In this article we will demonstrate how to
make writing these tests easier using model factories instead of
Django’s data fixtures.

U
nit testing is the key practice for improving soft-
ware quality. Even though most of us agree with
this in principle, all too often when things get dif-
“ Bruce Eckel

If it’s not tested it’s broken.

ficult programmers end up skipping writing tests. We end code for this application can be found at: https://github.
up being pragmatic rather than principled, especially when com/aisipos/SampleBlog/.
deadlines are involved. The solution then to writing more The entities in this blog will be Users, Posts, Catego-
tests is not to grit our teeth and muscle through it, the so- ries, and Comments. We’ll create a Django application
lution involves using the proper tools to make writing our known as ‘blog’, and create our model classes like this:
tests easier. For this article we will focus on testing Django Listing 1. To keep things simple, we will reuse Django’s
applications. Django is a popular web framework for the built in auth.User model. For the purposes of our dis-
Python programming language. The standard method for cussion, we’ll pretend it looks like this: Listing 2.
testing Django applications requires you to create ‘fixtures’ Now suppose we need to write a test relating to the Post
– serialized forms of your data in separate files. While this model. Let’s assume we want to write a test to verify
is workable in simple applications, as we will see it be- that a view that renders a post shows the post’s cat-
comes unwieldy as your data model becomes more com- egory correctly. At a minimum, this requires having sev-
plex. Fixtures have the following difficulties: eral objects, at least a Post, a Category, and a User.
The standard method of testing Django applications re-
• You must also include all related data, even if not quires placing these into a ‘fixture’. Fixtures are serial-
relevant to the test. ized data in disk files, that can be stored in JSON, XML,
• Writing in serialized notation (such as JSON) re- or YAML format. Fixtures can be created by hand, but
quires a “mental shift” from writing Python code. this is not recommended. Django provides a command
• Test data lives in a different file separate from the to serialize the data in your current database by running
test code. the command: python manage.py dumpdata.
• If you need a large amount of redundant data in By default, this will serialize the data into JSON for-
your tests, you’ll likely need to write a separate mat to standard out. For our data model, if we wanted
script to do this rather than write it all by hand. to have a fixture to test a Post, the smallest fixture we
• When your data model changes, you will need to could use might look something like this: Listing 3.
rewrite most or all of your fixtures. We could use this fixture in our test case, but already
some questions may have come to your mind:
Testing with Fixtures
To illustrate these concepts let’s try an example web ap- • How do I make this test data in the first place be-
plication. We’ll use a simple blogging application. Full fore calling dumpdata?

24 13/2013
• How can I reuse this fixture if a different test case
needs slightly different test data?
• What happens when my data model changes?

Runtime data creation instead of fixtures


We could answer these questions using fixtures, but
there is an easier way to create your test data. Instead
of using fixtures, we build our data at runtime instead.
You could generate the data above by calling the mod-
el constructors individually like so (with some code left
out for brevity): Listing 4. For simple data models, this
may certainly be workable. However, when models have
many different fields and related-models span multiple
levels, we have to specify a lot of data even for simple
test cases. We can help reduce this burden by writing
“object factory” classes that allow us to specify default
values in object creation. This would allow us to speci-
fy only the data we make assertions about in our tests,
which would simplify writing these tests.

Using the model-mommy factory library


We could write our own object factory classes by
hand, but luckily there are libraries available to do
this for us. Two examples in the Python community
are model-mommy and Factory Boy. Both take their
inspiration (and their names) from the libraries Ob-
jectDaddy and FactoryGirl in the Ruby community.
In this article we’ll use the excellent model-mommy
library, written by Vanderson Mota dos Santos, and
available at https://github.com/vandersonmota/mod-
el_mommy. It can be installed in your Python virtual
environment by running:

pip install model_mommy

Let’s return to the task of creating a unit test for en-


suring the category name shows up when rendering a
post. In this case, the only piece of test data we care
about is the ‘name’ field of the category. Using mod-
el_mommy, we can write the entire test with just this
code: Listing 5. Note that this test case isn’t using fix-
tures at all, all the data for this test case is generated
by this single line:

post = mommy.make(Post, category__name=’TestCategory’)

In this one line, model-mommy has made for us a


Post, a Category, and a User. We have specified the
type of object, andname of the category in the argu-
ments to mommy.make, but nothing else. We didn’t
need to write any object factory class by hand. Model-
mommy has filled all the unspecified fields with auto-
generated data.
We don’t control this data (although as we’ll see lat-
er, we can tell model-mommy how to generate these

en.sdjournal.org
fields), but for this test case this data is irrelevant since easy to see by quick visual inspection that the as-
we are not making any assertions about it. Compared sertions match the data.
to using fixtures, some advantages may be immedi-
ately obvious: Tests written in this style are quicker to write and eas-
ier to read compared to using a fixture. Further, let’s
• We didn’t have to separately make a Post, Catego- suppose we add a field ‘hometown’ to the User mod-
ry, and User model instance, model_mommy can el. If we are using fixtures, we have to regenerate ev-
make an entire object graph in one invocation. ery fixture that contains a User instance. With model-
• We didn’t have to generate any data ahead of time, mommy, model-mommy will end up creating new Us-
all the data is made inside the test itself. ers with hometown fields automatically populated. You
• Since all the test data is inside the test itself, it is only need to specify a hometown in tests that make as-

Listing 1. Django application”blog”


username = models.CharField()
class Tag(models.Model): password = models.CharField()
“”” first_name = models.CharField()
One tag, represented as a single string last_name = models.CharField()
“”” email = models.CharField()
tag = models.CharField(max_length=50)
Listing 3. Model of data
class Category(models.Model):
“”” [
Categories for posts {
“”” “fields”: {
name = models.CharField(max_length=50) “description”: “TestDesciption”,
description = models.CharField(max_ “name”: “TestCategory”
length=300) },
“model”: “blog.category”,
class Post(models.Model): “pk”: 1
“”” },
Represent a single blog post {
“”” “fields”: {
title = models.CharField(max_length=300) “body”: “Test Body”,
body = models.TextField() “category”: 1,
date = models.DateTimeField() “date”: “2013-08-09T00:21:32.766Z”,
user = models.ForeignKey(User) “tags”: [],
category = models.ForeignKey(Category) “title”: “Test Post”,
tags = models.ManyToManyField(Tag) “user”: 1
},
class Comment(models.Model): “model”: “blog.post”,
“”” “pk”: 1
Represent one comment on one blog post }
“”” {
body = models.CharField(max_length=256) “fields”: {
date = models.DateTimeField() “email”: “[email protected]”,
post = models.ForeignKey(Post) “first_name”: “test”,
user = models.ForeignKey(User) “last_name”: “user”,
“password”: “”,
Listing 2. User model in Django “username”: “TestUser”
},
class User(models.Model): “model”: “blog.user”,
“”” “pk”: 1
Represent one user },
“”” ]

26 13/2013
Django Unit Testing

sertions about it, which presumably you will write only field1 and field2. If Model contains other fields, mod-
after you create the new field. All of your existing tests el-mommy will automatically generate values for
should continue to run. these fields. The instance is persisted in the con-
figured database immediately, thus it will be visible
Model-mommy basic Usage to subsequent code. You can use the mommy.prepare
The basic usage of model-mommy is fairly simple. The method if you don’t want the new instance to be per-
typical use involves calling mommy.make to create an in- sisted in the database.
stance of a model. You pass in as arguments all of the Model-mommy will create any foreign-key related
fields that you care about. Model-mommy will auto-gen- models that you don’t specify automatically. If you
erate the rest for you. Here’s an example: need to specify fields on these auto-generated mod-
els, you can tell model-mommy to create these fields
new_model = mommy.make(Model, field1=value1, in one step using a double underscore notation similar
field2=value2, …) to the Django ORM:

This instructs model-mommy to make an instance new_model = mommy.make(Model, related__field=’test’)


of a hypothetical Model class, specifying values for assert new_model.related.field == ‘test’

Listing 4. Build of data runtime Using this notation, you can of-
ten create data for a test in a
category = Category(name=’TestCategory’, description=’test’) single line of code. Howev-
category.save() er, if you are generating many
user = User(username=’TestUser’, email=’[email protected]’, …) fields, it can be easier to gen-
user.save() erate data in multiple steps:
post = Post(user=user, category=category, body=’test’, …)
post.save() new_user = mommy.make(User,
username=’testuser’, email=’t@t.
Listing 5. Test with model_mommy com’)
new_post = mommy.make(Post,
from django.test import TestCase post=’test’, user=new_user)
from model_mommy import mommy
Using model-mommy
class BlogTests(TestCase): recipes
def test_post_displays_category(self): For the fields you do not specify,
“”” model-mommy will auto-generate
Test view category page a random value. These will not be
“”” human-readable. For instance:
#Make a post and category
post = mommy.make(Post, category__name=’TestCategory’) >>> mommy.save(Category).name
#Request the posts view page ‘MhInizJgWlLYrNFVkxRgsTyOXHHaO
response = self.client.get(‘/post/{}’.format(post.id)) fqhrHrQbeGRADBEjzBTJI’
self.assertContains(response, ‘TestCategory’)
If you do want to control how
Listing 6. Specifing fields model-mommy generates un-
specified fields, you can define
>>> from model_mommy import mommy a “Recipe” that tells model-
>>> from model_mommy.recipe import seq, Recipe mommy how to generate fields
>>> category_recipe = Recipe(Category, name=seq(‘Test’)) you want specified: Listing 6.
>>> category_recipe.make().name In the above example we use
‘Test1’ the seq function, which allows
>>> category_recipe.make().name you to make unique values for
‘Test2’ multiple instances.
Recipes can also use call-
ables to programatically gener-
ate fields. Recipes can also use
other recipes to create foreign

en.sdjournal.org 27
If we had wanted to do this
Listing 7. Programatically generating fields
with a fixture, we’d have to
>>> from model_mommy import mommy write a script to generate a
>>> from model_mommy.recipe import seq, Recipe, foreign_key large amount of test data, and
>>> from datetime import datetime use the dumpdata management
>>> user_recipe = Recipe(User, username=seq(‘testuser’)) command to turn this data into
>>> post_recipe = Recipe(Post, date=datetime.now, user=foreign_key(user_ a JSON fixture. Most likely we’d
recipe)) have to check both this script
>>> post_recipe.make().user.name and the resulting fixture into our
‘testuser1’ project’s source control, and
>>> post_recipe.make().user.name change them if the schema of
‘testuser2’ User or Post ever changed. Us-
>>> post_recipe.make().date ing model-mommy, all these
datetime.datetime(2013, 8, 9, 0, 17, 17, 132454) steps are replaced with one line
of code.
Listing 8. Model mommy for „golden star”
Summary
from django.test import TestCase Using specific examples, I’ve
from model_mommy import mommy shown how using model-mom-
my can make your Django unit
class BlogTests(TestCase): tests much more concise, sim-
def test_gold_star(self): pler, and robust. We covered
“”” some basic patterns of how
Test gold star appearing in user page to use model-mommy to build
“”” simple test cases with simple
user = mommy.make(User) as well as repeated data. I’d
posts = mommy.make(Post, user=user, _quantity=50) like to thank Vanderson Mota
response = self.client.get(‘/user/{}’.format(user.username)) dos Santos and the entire mod-
self.assertContains(response, ‘gold star’) el-mommy development com-
munity for their helpful contribu-
tion to the Django development
community. Hopefully the meth-
ods shown in this article can
keys. Suppose we wanted be able to create multiple greatly simplify the writing of tests in your Django appli-
posts, all with unique dates and unique users. We could cations, leading to better test coverage and more robust
do this as follows: Listing 7. code. More importantly, by removing unnecessary data
For simple test cases, you can get by without needing and boilerplate, it just makes writing tests more fun.
to specify recipes. However, if you need more control
over how model-mommy generates data, recipes can
help you accomplish this.

Test cases with larger amounts of data


Suppose we coded our blog so that every user who had
50 posts or more had the words ‘gold star’ printed on
their profile page. This would be difficult and repetitive
to do with fixtures, but is very easy to do with model
mommy: Listing 8.
In this example we used model-mommy’s shortcut of
passing the ‘_quantity’ argument to mommy.make to Anton Sipos
create many models as once. We could have just as Anton Sipos has been programming computers since they were
easily created the models in our own loop, but using _ 8 bits old. He has professional experience in systems ranging
quantity can be convenient. We tell the make function to from microcontrollers to high traffic servers. He is an active
generate each one with the same generated user. Mod- contributor in the Python open-source community. His musings
el-mommy will automatically generate categories for all on programming can be found at http://softwarefuturism.com.
of our posts, since we didn’t specify one on invocation. You can reach him at [email protected].

28 13/2013
▼ InsIghts from the experts

!
— Fu N
G
o rkIN
johan arwidMark
w
a T NeT
— Gre
N T eNT
Co
sean deuby
New

chander dhall

Las Vegas
Mandalay Bay
Mary jo foley

Sept 30 – Oct 4, 2013


For more than 13 years, IT/Dev Connections has been the
dan holMe
premier training event for developers and IT professionals.
IT/Dev Connections provides in-depth training on the technology
platforms you’re currently using, real-world solutions that will give
you the competitive edge, and expert insight into how to plan
Mark Minasi
for and implement the latest technologies. with more than 175
sessions to choose from, the conference offers training on HTML5,
aSP.NeT, exchange, SQL Server, windows oS, windows Server,
Michael otey SharePoint, Visual Studio, office 365, business intelligence, cloud,
and all types of development.

Software Developer Journal Subscribers — use the code ITDSD13 and get

Mark russinovich $200 off any of the registration packages!

Brought to you By ▼

Go to devconnections.com to register
Using Python Fabric to
Automate GNU/Linux
Server Configuration Tasks
Fabric is a Python library and command-line tool for automating
tasks of application deployment or system administration via
SSH. It provides a basic suite of operations for executing local or
remote shell commands and transfer files.

F
abric (http://www.fabfile.org) is a Python library To work with Fabric, you must have SSH installed and
and command-line tool for automating tasks of properly configured with the necessary user’s permis-
application deployment and system administra- sions on the remote servers you want to work on. In the
tion via SSH. It provides tools for executing local and re- examples, we will consider a Debian system with IP ad-
mote shell commands and for transferring files through dress 192.168.250.150 and a user named “administra-
SSH and SFTP, respectively. With these tools, it is pos- tor” with sudo powers, which is required only for perform-
sible to write application deployment or system admin- ing actions that require superuser rights. One way to use
istration scripts, which allows to perform these tasks by Fabric is to create a file called fabfile.py containing one
the execution of a single command. or more functions that represent the tasks we want to ex-
In order to work with Fabric, you should have Python ecute, for example, take a look at Listing 1.
and the Fabric Library installed on your local comput- In this example, we have defined two tasks called “re-
er and we will consider using a Debian-based distribu- mote_info” and “local_info”, which are used to retrieve
tion on the examples within this article (such as Ubuntu, local and remote systems information through the com-
Linux Mint and others). mand “uname -a”. Also, we have defined the host user
As Python is shipped by default on most of the GNU/ and address we would like to use to connect to the re-
Linux distributions, you probably won’t need to install mote server using a special dictionary called “env”.
it. Regarding the Fabric library, you may use pip to in- Having this defined, it is possible to execute one of
stall it. Pip is a command line tool for installing and the tasks using the shell command fab. For example, to
managing Python packages. On Debian-based distri- execute the task “local_info”, from within the directory
butions, it can be installed with apt-get via the python- where fabfile.py is located, you may call:
pip package:
$ fab local_info
$ sudo apt-get install python-pip
which gives the output shown on Listing 2.
After installing it, you may update it to the latest ver- Similarly, you could execute the task called “remote_
sion using pip itself: info”, calling:

$ sudo pip install pip --upgrade $ fab remote_info

After that, you may use pip to install Fabric: In this case, Fabric will ask for the password of the us-
er “administrator”, as it is connecting to the server via
$ sudo pip install fabric SSH, as shown on Listing 3.

30 13/2013
Using Python Fabric

There are lots of parameters that can be used with the env.password = ‘mysupersecureadministratorpassword’
fab command. To obtain a list with a brief description of
them, you can run fab --help. For example, running fab If the server uses SSH keys instead of passwords to
-l, it is possible to check the Fabric tasks available on authenticate users (actually, this is a good practice
the fabfile.py file. Considering we have the fabfile.py concerning the server’s security), it is possible to use
file shown on Listing 1, we obtain the output of Listing 4 the setting env.key _ filename to specify the SSH key to
when running fab -l. be used. Considering that the public key ~/.ssh/id _
As in the previous example, on the file fabfile.py, the rsa.pub is installed on the remote server, you just need
function run() may be used to run a shell command on to add the following line to fabfile.py:
a remote server and the function local() may be used
to run a shell command on the local computer. Besides env.key_filename = ‘~/.ssh/id_rsa’
these, there are some other possible functions to use
on fabfile.py: It is also a good security practice to forbid root user
from logging in remotely on the servers and allow the
• sudo(‘shell command’):to run a shell command on necessary users to execute superuser tasks using the
the remote server using sudo,
• put(‘local path’, ‘remote path’): to send a file Listing 1. A basic fabfile. File: fabfile.py
from a local path on the local computer to the re-
mote path on the remote server, # -*- coding: utf-8 -*-
• get(‘remote path’, ‘local path’): to get a file from
a remote path on the remote server to the local from fabric.api import *
path on the local computer.
env.hosts = [‘192.168.250.150’]
Also, it is possible to set many other details about the env.user = ‘administrator’
remote connection with the dictionary “env”. To see a
full list of “env” vars that can be set, visit: def remote_info():
run(‘uname -a’)
http://docs.fabfile.org/en/1.6/usage/env.html#full-list-
of-env-vars. def local_info():
local(‘uname -a’)
Among the possible settings, its worth to spend some
time commenting on some of them: Listing 2. output of fab local_info

• user: defines which user will be used to connect to [192.168.250.150] Executing task ‘local_info’
the remote server; [localhost] local: uname -a
• hosts: a Python list with the addresses of the hosts Linux renato-laptop 3.2.0-23-generic #36-Ubuntu SMP
that Fabric will connect to perform the tasks. There Tue Apr 10 20:39:51 UTC 2012
may be more than one host, e.g., x86_64 x86_64 x86_64 GNU/Linux

env.hosts = [‘192.168.250.150’,’192.168.250.151’] Listing 3. Output of fab remote_info

• host_string: with this setting, it is possible to config- [192.168.250.150] Executing task ‘remote_info’
ure a user and a host at once, e.g. [192.168.250.150] run: uname -a
[192.168.250.150] Login password for
env.host_string = “[email protected]” ‘administrator’:
[192.168.250.150] out: Linux debian-vm 2.6.32-5-686
As it could be noticed from the previous example, Fab- #1 SMP Sun May 6 04:01:19 UTC
ric will ask for the user’s password to connect to the 2012 i686 GNU/Linux
remote server. [192.168.250.150] out:
However, for automated tasks, it is interesting to be
able to make Fabric run the tasks without prompting for
any user input. To avoid the need of typing the user’s Done.
password, it is possible to use the env.password setting, Disconnecting from 192.168.250.150... done.
which permits to specify the password to be used by
Fabric, e.g.

en.sdjournal.org 31
sudo command. On a Debian system, to allow the “ad- And call
ministrator” user to perform superuser tasks using su-
do, first you have to install the package sudo, using: $ fab create_dir

# apt-get install sudo which will ask for the password of the user “administra-
tor” to perform the sudo tasks, as shown on Listing 6.
and then, add the “administrator” user to the group When using SSH keys to log in to the server, you can
“sudo”, which can be done with: use the env.password setting to specify the sudo pass-
word, to avoid having to type it when you call the Fabric
# adduser administrator sudo script. In the previous example, by adding:

Having this done, you could use the sudo() function on env.password = ‘mysupersecureadministratorpassword’
Fabric scripts to run commands with sudo powers. For
example, to create a mydir directory within /home, you would be enough to make the script run without the
may use the fabfile.py file shown on Listing 5. need of user intervention.

Listing 4. output of fab -l

Available commands: Done.


Disconnecting from 192.168.250.150... done.
local_info
remote_info Listing 7. Example fabfile using an SSH key with a passphrase.
File: fabfile.py
Listing 5. script to create a directory. File: fabfile.py
# -*- coding: utf-8 -*- # -*- coding: utf-8 -*-

from fabric.api import * from fabric.api import *

env.hosts = [‘192.168.250.150’] env.hosts = [‘192.168.250.150’]


env.user = ‘administrator’ env.user = ‘administrator’
env.key_filename = ‘~/.ssh/id_rsa’ env.key_filename = ‘~/.ssh/id_rsa2’

def create_dir(): def remote_info():


sudo(‘mkdir /home/mydir’) run(‘uname -a’)

Listing 6. output of fab create_dir def create_dir():


sudo(‘mkdir /home/mydir’)
[192.168.250.150] Executing task ‘create_dir’
[192.168.250.150] sudo: mkdir /home/mydir Listing 8. Output of fab remote_info
[192.168.250.150] out:
[192.168.250.150] out: We trust you have received the [192.168.250.150] Executing task ‘remote_info’
usual lecture from the local System [192.168.250.150] run: uname -a
[192.168.250.150] out: Administrator. It usually boils [192.168.250.150] Login password for ‘administrator’:
down to these three things: [192.168.250.150] out: Linux debian-vm 2.6.32-5-686
[192.168.250.150] out: #1 SMP Sun May 6 04:01:19 UTC 2012
[192.168.250.150] out: #1) Respect the privacy of i686 GNU/Linux
others. [192.168.250.116] out:
[192.168.250.150] out: #2) Think before you type.
[192.168.250.150] out: #3) With great power comes
great responsibility. Done.
[192.168.250.150] out: Disconnecting from 192.168.250.150... done.
[192.168.250.150] out: sudo password:

[192.168.250.150] out:

32 13/2013
Using Python Fabric

However, some SSH keys are created using a pass- In this case, if you specify the env.password setting, it
phrase, required to log in to the server. Fabric treat these will be used as the SSH passphrase and, when running
passphrases and passwords similarly, which can some- the create_dir script, Fabric will ask for the password
times cause confusion. To illustrate Fabric’s behavior, of the user “administrator”. To avoid typing any of these
consider the user named “administrator” is able to log passwords, you may define env.password as the SSH
in to a remote server only by using his/her key named passphrase and, within the function that uses sudo(), re-
~/.ssh/id_rsa2.pub, created using a passphrase, and define it as the user’s password, as shown on Listing 9.
the Fabric file shown on Listing 7. Alternatively, you could specify the authentication set-
In this case, calling: tings from within the task function, as shown on Listing 10.
On this example, the command : does not do any-
fab remote_info thing. It only serves as a trick to enable setting env.
password twice: first for the SSH passphrase, required
makes Fabric ask for a “Login password”. However, for login and then to the user’s password, required for
as you shall notice, this “Login password” refers to the performing sudo tasks.
necessary passphrase to log in using the SSH key, as If necessary, it is possible to use Python’s with state-
shown on Listing 8. ment (learn about it on http://www.python.org/dev/peps/

Listing 9. Example fabfile using an SSH key with a passphrase. Listing 11. Example using Python’s with statement. File: fabfile.py
Improved to avoid the need of user intervention. File: fabfile.py
# -*- coding: utf-8 -*-
# -*- coding: utf-8 -*-
from fabric.api import *
from fabric.api import *
env.hosts = [‘192.168.250.150’]
env.hosts = [‘192.168.250.150’]
env.user = ‘administrator’ def create_dir():
env.key_filename = ‘~/.ssh/id_rsa2’ with settings(user = ‘administrator’,
env.password = ‘sshpassphrase’ key_filename = ‘~/.ssh/id_rsa2’,
password = ‘sshpassphrase’):
def remote_info(): run(‘:’)
run(‘uname -a’) env.password = ‘mysupersecureadministrator
password’
def create_dir(): sudo(‘mkdir /home/mydir’)
env.password = ‘mysupersecureadministratorpassword’
sudo(‘mkdir /home/mydir’) Listing 12. Python Script using Fabric. File: mypythonscript.py

Listing 10. Another example fabfile using an SSH key with a #! /usr/bin/env python
passphrase. Improved to avoid the need of user intervention. File: # -*- coding: utf-8 -*-
fabfile.py

# -*- coding: utf-8 -*- from fabric.api import *

from fabric.api import * def create_dir():


with settings(host_string =
env.hosts = [‘192.168.250.150’] ‘[email protected]’,
key_filename = ‘~/.ssh/id_rsa2’,
def create_dir(): password = ‘sshpassphrase’):
env.user = ‘administrator’ run(‘:’)
env.key_filename = ‘~/.ssh/id_rsa2’ env.password = ‘mysupersecureadministrator
env.password = ‘sshpassphrase’ password’
run(‘:’) sudo(‘mkdir /home/mydir’)
env.password = ‘mysupersecureadministrator
password’ if __name__ == ‘__main__’:
sudo(‘mkdir /home/mydir’) create_dir()

en.sdjournal.org 33
Listing 13. A very basic deploy example. File: deployhtml.py

#! /usr/bin/env python
# -*- coding: utf-8 -*-

from fabric.api import *

def deploy_html():
with settings(host_string = ‘[email protected]’,
key_filename = ‘~/.ssh/id_rsa2’,
password = ‘sshpassphrase’):
run(‘:’)
env.password = ‘mysupersecureadministratorpassword’
local(‘cd ~; tar -czvf website.tar.gz ./website/*’)
put(‘~/website.tar.gz’, ‘~’)
run(‘tar -xzvf ~/website.tar.gz’)
sudo(‘mv /home/administrator/website /var/www’)
sudo(‘chown -R www-data:www-data /var/www/website’)
sudo(‘/etc/init.d/apache2 restart’)
local(‘rm ~/website.tar.gz’)

if __name__ == ‘__main__’:
deploy_html()

pep-0343/), to specify the env settings. A compatible To conclude, we show a more practical example of
create_dir() task using the with statement is shown on a Python script that uses Fabric to deploy a very ba-
Listing 11. sic HTML application on a server. The script shown
The fab command is useful for performing system on Listing 13 creates a tarball from the local HTML
administration and application deployment tasks from files at ~/website, sends it to the server, expands the
a shell console. However, sometimes you may want to tarball, moves the files to the proper directory (/var/
execute tasks from within your Python scripts. To do www/website) and restarts the server. Hope this arti-
this, you may simply call the Fabric functions from your cle helped you learning a bit about Fabric to automate
Python code. To build a script that runs a specific task some of your tasks!
automatically, such as create_dir() shown previously,
you create a Python script as shown on Listing 12.
As we have seen, with Fabric, it is possible to auto- Renato Candido
mate the execution of tasks that can be done by execut- Renato Candido is a free (as in freedom) {software, hardware
ing shell commands locally, and remotely, using SSH. and culture} enthusiast, who works as a technology consul-
It is also possible to use Fabric’s features on other Py- tant at Liria Technology, Brazil, trying to solve the peoples’
thon scripts, and perform dynamic tasks, enabling the (technical) problems using these sorts of tools (he actually
developer to automate virtually anything that can be thinks the world would be a little better if all resources were
automated. The main goal of this article was to show free as in freedom). He is an electronics engineer, and enjoys
Fabric’s basic features and try to show a solution to dif- to learn things related to signal processing and computer sci-
ferent scenarios of remote connections, regarding dif- ence (and he actually thinks that there could be self-driving
ferent types of authentication. From this point, you may cars and speaking robots designed exclusively with free re-
customize your Fabric tasks to your needs using ba- sources). To know a bit more about him, visit: http://www.re-
sically the functions local(), run(), and sudo() to run natocandido.org.
shell commands and put() and get() to transfer files.
U P D AT E
NOW WITH
STIG
AUDITING

IN SOME CASES

nipper studio
HAS VIRTUALLY

REMOVED
the
NEED FOR a
MANUAL AUDIT
CISCO SYSTEMS INC.
Titania’s award winning Nipper Studio configuration
auditing tool is helping security consultants and end-
user organizations worldwide improve their network
security. Its reports are more detailed than those typically
produced by scanners, enabling you to maintain a higher
level of vulnerability analysis in the intervals between
penetration tests.

Now used in over 45 countries, Nipper Studio provides a


thorough, fast & cost effective way to securely audit over
100 different types of network device. The NSA, FBI, DoD
& U.S. Treasury already use it, so why not try it for free at
www.titania.com

www.titania.com
The Python Logging
Module is Much Better
Than Print Statements
A while back, I swore off using adding print statements to
my code while debugging. I forced myself to use the python
debugger to see values inside my code. I’m really glad I did
it. Now I’m comfortable with all those cute single-letter
commands that remind me of gdb. The pdb module and the
command-line pdb.py script are both good friends now.

H
owever, every once in a while, I find myself The bad thing is when I write in a bunch of print state-
lapsing back into cramming a bunch of print ments, then debug the problem, then comment out or
statements into my code because they’re just remove all those print statements, then run into a slight-
so easy. Sometimes I don’t want to walk through my ly different bug later., and find myself adding in all those
code using breakpoints. I just need to know a simple print statements again. So I’m forcing myself to use log-
value when the script runs. ging in every script I do, no matter how trivial it is, so

Listing 1. Python standard library logging module


logging.basicConfig(level=logging.DEBUG)
# This is a.py
def g(): def g():
    1 / 0     1/0

def f(): def f():


    print “inside f!”     logging.debug(“Inside f!”)
    try:     try:
        g()         g()
    except Exception, ex:     except Exception, ex:
        print “Something awful happened!”         logging.exception(“Something awful happened!”)
    print “Finishing f!”     logging.debug(“Finishing f!”)

if __name__ == “__main__”:  f() if __name__ == “__main__”:


    f()
Listing 2. Rewriting python standard library logging module

# This is b.py.
import logging

# Log everything, and send it to stderr.

36 13/2013
Listing 3. Output in Python logging module
$ python b.py
DEBUG 2007-09-18 23:30:19,912 debug 1327 Inside f!
ERROR 2007-09-18 23:30:19,913 error 1294 Something
awful happened!
Traceback (most recent call last):
File “b.py”, line 22, in f
g()
File “b.py”, line 14, in g
1/0
ZeroDivisionError: integer division or modulo by
zero
DEBUG 2007-09-18 23:30:19,915 debug 1327 Finishing
f!

Listing 4. Custom logger object

# This is c.py
import logging

# Make a global logging object.


x = logging.getLogger(“logfun”)
x.setLevel(logging.DEBUG)
h = logging.StreamHandler()
f = logging.Formatter(“%(levelname)s %(asctime)s
%(funcName)s %(lineno)d %(message)s”)
h.setFormatter(f)
x.addHandler(h)

def g():

1/0

def f():

logfun = logging.getLogger(“logfun”)

logfun.debug(“Inside f!”)

try:

g()

except Exception, ex:

logfun.exception(“Something awful
happened!”)

logfun.debug(“Finishing f!”)

if __name__ == “__main__”:
f()

en.sdjournal.org
I can get comfortable with the python standard library fining a custom logger object, and I’m using a more de-
logging module. So far, I’m really happy with it. I’ll start tailed format: Listing 4. And the output: Listing 5. Now
with a script that uses print statements and revise it a I will change how the script handles the different types
few times and show off how logging is a better solution. of log messages. Debug messages will go to a text file,
Here is the original script, where I use print statements and error messages will be emailed to me so that I am
to watch what happens: Listing 1. Running the script forced to pay attention to them (Listing 6). Lots of real-
yields this output: ly great handlers exist in the logging.handlers module.
You can log by sending HTTP gets or posts, you can
$ python a.py send UDP packets, you can write to a local file, etc.
inside f!
Something awful happened!
Finishing f! W. Matthew Wilson
Matt started his career doing economic research and statisti-
It turns out that rewriting that script to use logging in- cal analysis. Then he realized he had an aptitude for program-
stead just ain’t that hard: Listing 2. And here is the out- ming after working with tools like SAS, perl, and the UNIX op-
put: Listing 3. Note how we got that pretty view of the erating system. He spent the next several years taking inter-
traceback when we used the exception method. Doing esting graduate courses in computer science at night while
that with prints wouldn’t be very much fun. So, at the working as a developer and then a technical lead for a team
cost of a few extra lines, we got something pretty close of developers. In 2007, Matt walked out of the relative securi-
to print statements, which also gives us better views ty of the corporate world and then co-founded OnShift, a web
of tracebacks. But that’s really just the tip of the ice- application that helps employers intelligently manage their
berg. This is the same script written again, but I’m de- shift-based work force.

Listing 5. Output to custom logger object


$ python c.py # This handler emails me anything that is an error or worse.
DEBUG 2007-09-18 23:32:27,157 f 23 Inside f! h2 = logging.handlers.SMTPHandler(‘localhost’, ‘logger@
ERROR 2007-09-18 23:32:27,158 exception 1021 Something tplus1.com’, [‘[email protected]’], ‘ERROR log’)
awful happened! h2.setLevel(logging.ERROR)
Traceback (most recent call last): h2.setFormatter(f)
File “c.py”, line 27, in f x.addHandler(h2)
g()
File “c.py”, line 17, in g def g():
1/0
ZeroDivisionError: integer division or modulo by zero 1/0
DEBUG 2007-09-18 23:32:27,159 f 33 Finishing f!
def f():
Listing 6. Handling the different types of log messages
logfun = logging.getLogger(“logfun”)
# This is d.py
import logging, logging.handlers logfun.debug(“Inside f!”)

# Make a global logging object. try:


x = logging.getLogger(“logfun”)
x.setLevel(logging.DEBUG) g()

# This handler writes everything to a file. except Exception, ex:


h1 = logging.FileHandler(“/var/log/myapp.log”)
f = logging.Formatter(“%(levelname)s %(asctime)s logfun.exception(“Something awful happened!”)
%(funcName)s %(lineno)d %(message)
s”) logfun.debug(“Finishing f!”)
h1.setFormatter(f)
h1.setLevel(logging.DEBUG) if __name__ == “__main__”:
x.addHandler(h1) f()

38 13/2013
A BZ Media Event

Big Data gets real


at Big Data TechCon!
Discover how to master Big Data from real-world practitioners – instructors
who work in the trenches and can teach you from real-world experience!

Come to Big Data TechCon to learn the best ways to:


• Collect, sort and store massive quantities • Learn HOW TO integrate data-collection

Over 60
of structured and unstructured data technologies with analysis and

how-to sses
business-analysis tools to produce
• Process real-time data pouring into
l cla the kind of workable information
your organization
practicautorials and reports your organization needs
and t ose
• Master Big Data tools and technologies
to ch o • Understand HOW TO leverage Big Data
like Hadoop, Map/Reduce, NoSQL from! to help your organization today
databases, and more

“Big Data TechCon is loaded with great networking


opportunities and has a good mix of classes with technical
depth, as well as overviews. It’s a good, technically-focused
conference for developers.”
—Kim Palko, Principal Product Manager, Red Hat

“Big Data TechCon is great for beginners as well as


advanced Big Data practitioners. It’s a great conference!”
—Ryan Wood, Software Systems Analyst, Government of Canada
San Francisco
“If you’re in or about to get into Big Data, this is the October 15-17, 2013
conference to go to.”
—Jimmy Chung, Manager, Reports Development, Avectra www.BigDataTechCon.com

The HOW-TO conference for Big Data and IT professionals

Big Data TechCon™ is a trademark of BZ Media LLC.


Python, Web Security and
Django

Web sites must operate securely. Once we get past the basics
of asking users to login, what other use cases are there? It
turns out that almost everything is security-related. Security
must be a pervasive feature of our design. Details very, so we’ll
focus on Django.

L
ots of folks like to wring their hands over the Big about security. So let’s review some use cases for web
Vague Concept (BVC) they call “security”. Be- security considerations. Specifically users, passwords,
cause it’s nothing more than a BVC, there’s a lot authentication and authorization.
of quibbling. We’ll try to move past the vagueness to
concrete and interesting stuff. We’ll focus on Python Basics
and Django, specifically. Two of the pillars of security are Authentication (who are
It’s important to avoid wasting hours trying to detail you?) and Authorization (what are you allowed to do?).
all the business risks and costs. I’ve had the misfortune Authentication is not something to be invented. It’s
of sitting through meetings where managers spout the something to be used. In our preferred architecture,
“We don’t know what we don’t know” objection to imple- with an Apache/Django application, the Django authen-
menting a RESTful web services interface. This leads tication system works nicely for identity management.
them to the fallback plan of trying to quantify risk. Their It supports a simple model of users, groups and pass-
objection amounts to “We don’t know every possible words. It can be easily extended to add user profiles.
vulnerability; therefore we don’t know how to secure ev- Django handles passwords properly. This cannot
ery possible vulnerability; therefore we should stop de- be emphasized enough. Django uses a sophisticated
velopment right now!” state-of-the art hash of the password. Not encryption. I’ll
The OWASP top-ten list is a good place to start. It’s a repeat that for folks who still think encrypted passwords
focused list of specific vulnerabilities. https://www.owasp. are a good idea.
org/index.php/Category:OWASP_Top_Ten_Project.
This list provides a lot of evidence that an architec- Always use a hash of a password. Never use encryption
ture based on Apache plus Django plus Python (using Best security practice is never to store a password that
mod_wsgi for glue) prevents almost all of these vulnera- can be easily recovered. A hash can be undone eventual-
bilities. Other Python-based web frameworks will do al- ly, but encryption means all passwords are exposed once
most as well as Django. One secret (besides using Py- the encryption key is available. The Django auth mod-
thon) is relying on Apache for the “heavy lifting”. Apache ule includes methods that properly hash raw passwords,
must be used to serve the static content without any in case you have the urge to implement your own login
interaction from Django. It acts as a kind of cache. The page https://docs.djangoproject.com/en/dev/ref/contrib/
application processing is deployed via a Web Services auth/#django.contrib.auth.models.User.set_password.
Gateway Interface (WSGI). mod_wsgi can run this in a
separate process (Figure 1). Better Authentication
This architecture has a number of other benefits re- Better than Django’s internal authentication is some-
garding scalability and manageability. But this article is thing like Forge Rock Open AM. This takes identity

40 13/2013
Python, Web Security and Django

management out of Django entirely http://forgerock. in each view function and we have middleware classs
com/what-we-offer/open-identity-stack/openam/. to perform server-wide checks. All of this is important
While this adds components to the architecture, it’s and we’ll look at each piece in some detail.
also a blessed simplification. All of the username and When we define our data model with Django, each
password folderol is delegated to the Open AM server. model class has an implicit set of three permissions
Any time a page is visited without a valid Open AM to- (can_add, can_delete and can_change). We can add to
ken, the response from a Django app must be a simple this basic list, if we have requirements that aren’t based
redirect to the Open AM login server. Even the user sto- on simple Add, Change, Delete (or CRUD) processing.
ries are simplified by assuming a valid, active user. Each view function can test to see if the current user
The bottom line is this: authentication is a solved (or user’s group) has the required permission. This is
problem. This is something we shouldn’t reinvent. Not done through a simple @permission_required decorator
only is it solved, but it’s easy to get wrong when trying on the relevant view functions https://docs.djangopro-
to reinvent it. ject.com/en/1.4/topics/auth/#the-permission-required-
Best practice is to download or purchase an estab- decorator.
lished product for identity management and use it for all There are two small problems with this. First, permis-
authentication. sions wind up statically loaded into the database. Sec-
ond, it’s rarely enough information for practical – and
Authorization nuanced – problems.
The Authorization problem is more nuanced, and more The static database loading means that we have to be
interesting than Authentication. Once we know who the careful when making changes to the data model or the
user is, we still have to determine what they’re allowed permissions assigned to groups and users.
to do. This varies a lot. A small change to the organiza- We’ll often need to write admin script that deletes and
tion, or a business process, or available data can have rebuilds the group-level permissions that we have de-
a ripple effect through the authorization rules. fined. For example, we may have a “actuaries” group
We have to emphasize these two points: and a “underwriters” group which have different sets of
permissions on the data model in an application. That
• Security includes Authorization. application needs a permission_rebuild admin script
• Authorization pervades every feature. that deletes and reinserts the various permissions for
each group.
In the case of Django, there are multiple layers of au- The second problem requires a number of additional
thorization testing. We have settings, we have checks design patterns.

Figure 1. The Apache and Django Architecture

en.sdjournal.org 41
Additional User Features For Django 1.5 and newer, the get_profile() isn’t
Django’s pre-1.5 auth.profile module can be used to used, instead a customized User model is used
provide all of the additional authorization information. For https://docs.djangoproject.com/en/1.5/topics/auth/
release 1.5, a customized User model is used instead. customizing/#extending-user.
Here’s an example. In a recent project, we eventu- The second way to enforce the feature mapping is to
ally figured out that we have some “big picture” autho- enable or disable the entire application in the custom-
rizations. Our sales folks realized that some clusters of er’s settings file. This is a simple administrative step to
application features can be identified as “products” (or enable an application restart that customer’s mod_wsgi
“options” or “features” or something cooler-sounding). instance, and let them use their shiny, new web site.
These aren’t smallish things like Django models. They And yes, this is a form of security. It’s not directly relat-
aren’t largish things like whole sites. They’re intermedi- ed to passwords. It’s related to features, functions, what
ate things based on what customers like to pay for (and data users can see and what data users can modify.
not pay for).
They might be third-party data integration, which re- Database Feature Enablement
quires a more complex contract with pass-through We can create a model for contract terms and condi-
costs. It might be additional database fields for their tions. This allows us to map users or groups to specific
unique business process. features identified in the database. While this can seem
What’s made this easy for us is that we used an “in- handy, it’s less than ideal. The problem with keeping
stance-per-customer-organization” model. Each of our configuration data in the database is that it’s data. It’s
customer organizations has their own Django instance not code. In order to map data to processing, we are
with their own pool of users, their own database and often tempted to use a welter of if statements to sort
their own settings file. Apache is used to redirect the out what should and should not happen. Adding lots of
URL’s for each Django instance. if statements to enable and disable features increases
Each one of our “big picture” features (or products or complexity and reduces maintainability. For these rea-
options) is tied to a customer organization, which is, in sons, we’d like to minimize the use of if statements.
turn, tied to a Django settings file. The features are en-
abled via contract terms and conditions; the sales folks More Complexity
would offer upgrades or additional services, and we Sadly, some of the “features” our sales folks identified
would enable or disable features. are only a small part of a Django application. In one
(We could have done this with the Django sites mod- case, it cut across several applications. Drat. We have
el, but that means that customer data would be commin- several choices to implement these features.
gled in a common database. That was difficult to sell.) Option 1 is to use template changes to conceal or re-
Some of these “features” map directly to Django ap- veal the feature. This is the closest fit with the way Djan-
plications. Authorization is handled two ways. First, the go works. The data is available, it’s just not shown un-
application view functions all refuse to work if the user’s less the customer’s settings provides the proper set of
contract doesn’t include the option. templates on the template search path.
A decorator based on the built-in user_passes_test This can also be enforced in the code, also, by mak-
decorator simplifies this. The subtlety is that we’re us- ing the template name dependent on the custom-
ing relatively static settings data as well as the user’s er settings. Building the template name in code has
group and profile (Listing 1). the advantage of slightly simpler unit testing, since no
settings change is required for the various test cases.
Listing 1. A decorator based on the built-in user_passes_test
name= settings.FEATURE_W_APP1_TEMPLATE_NAME
def client_has_feature_x(function,login_url=”/login/”): render_to_response( “app1/{0}.html”.format(name),
def func_with_check(request): data,
if (request.user.logged_in context_instance=RequestContext(request) )
and settings.FEATURE_X_ENABLED
and request.user.get_profile().has_feature_X): Option 2 is to isolate a simple feature into a single
return function(request) class and write two subclasses of the feature: an ac-
else: tive, enabled implementation and a disabled imple-
return redirect( mentation. We can then configure the enabled or dis-
“{0}?next={1}”.format(login_url, request. abled subclass in the customer’s settings.
path)) This is the most Pythonic, since it’s a very common
return func_with_check OO programming practice. Picking a class to instantiate
at run time is simply this:

42 13/2013
Python, Web Security and Django

feature_class= eval(settings.FEATURE_X_CLASS_NAME) Option 4 is to refactor an application into two applica-


feature_x= feature_class() tions: one version with the feature enabled and a nearly
identical version without the feature enabled.
This is the easiest to test, also, since it’s simple object- The best way to tackle this option is to write an ab-
oriented programming. stract “super app”. This super app needs a plug-in
For those who don’t like eval() a more complex map- method or class for each feature which may (or may
ping can be used. not) be available to a customer. We can create concrete
Django apps which both have a structure like this:
feature_class = {
‘option’: class, ‘option’: class, … import feature_z_super
}[settings.FEATURE_X_CLASS_NAME] class App2_View( feature_z_super.App2_View ):
feature_x= feature_class() etc.

Option 3 is to isolate a more complex feature into a The App2 _ View subclass of feature _ z _ super.App2 _
single module and write several versions of this mod- View is a concrete implementation of the abstract
ule. We can then decide which version to import. class. All of the features are handled properly.
When the feature involves integration of external ser- The idea is that we our customer’s settings will in-
vices, this is ideal. For testing purposes, we’ll need to clude the concrete app module. The concrete app mod-
mock this module. We wind up with three implementa- ule will depend on the abstract “super app” code, plus
tions: active, inactive, and mock. the specific extensions to either enable feature or work
around the missing feature. When we need to make
feature_y = __import__(settings.FEATURE_Y_MODULE, common changes, we can change the abstract “super
globals(), locals(), [], -1) app” and know that the changes will correctly propagate
to the concrete implementations.
Now, the selected module is known as feature _ y In both cases, it’s very Django to have the application
throughout the application. configured dynamically in the settings file.

Figure 2. Django Processing Pipeline

en.sdjournal.org 43
RESTful Services When SSL is used, however, then BASIC authentica-
RESTful web services are slightly different from the de- tion works very nicely. BASIC is much easier to imple-
fault Django requests. REST requests expect XML or ment because it’s just a username and password sent
JSON replies instead of HTML replies. There will be in a request header. This means that RESTful requests
more than GET or POST requests. Additionally, REST- must be done through HTTPS and certificates must be
ful web services don’t rely on cookies to maintain state. actively managed.
Otherwise, REST requests are processed very much Here’s where middleware fits into the Django pipe-
like other Django requests. line. This shows typical HTML-based view functions. A
One school of thought is to provide the RESTful API RESTful interface won’t depend on template rendering.
as a separate server. The Django “front-end” makes Instead, it will simply return JSON or XML documents
RESTful requests to a Django “back-end”. This archi- dumped as text (Figure 2).
tecture makes it possible to build Adobe Flex or JavaS- It’s easy to use a Django middleware class to strip out
cript front-end presentations that work with the same the HTTP Authorization header, parse the username
underlying data as the HTML presentation. and password from the credentials and perform a Djan-
Another school of thought is to provide the RESTful go logon to update the request.
API in parallel with the Django HTML interface. Since Here’s a sample Middleware class (assuming Python
the RESTful view functions and the HTML view func- 2.7.5). This example handles all requests prior to URL
tions are part of the same application module, it’s easy parsing; it’s suitable for a purely RESTful server. In the
to use unit testing to assure that both HTML and REST case of mixed REST and HTML, then process_view shold
interfaces provide the same results. be used instead of process_request, and only RESTful
In either case, we need authentication on the REST- views should be authenticated this way. HTML view func-
ful API. This authentication doesn’t involve a redirect to tions should be left alone for Django’s own authentication
a login page, or the use of cookies. Each request must middleware (Listing 2). If you’re using Django 1.5 and Py-
provide the required information. HTTP provides two thon 3.2, the base 64 decode is slightly different.
standard forms of authentication: BASIC and DIGEST.
While we can move beyond the standard, it doesn’t base64.b64decode(auth).decode(“ASCII”)
seem necessary.
The idea behind DIGEST authentication is to provide The ASCII decode is essential because the decoded
hashed username and password credentials on an oth- auth header will be bytes, not a proper Unicode string.
erwise unsecured connection. DIGEST requires a dia- Note that a password is not stored anywhere. We re-
log so the server can provide a “nonce” which is hashed ly on Django’s password management via a hash and
in with username and password. If the client’s hash password matching. We also rely on SSL to keep the
agrees with the server’s expectation, the credentials credentials secret.
are good. The “back-and-forth” aspect of this makes it In the case that you’re using an Open AM identity
unpleasantly slow. management server, this changes very slightly.

Listing 2. Requests prior to URL parsing

class REST_Authentication( object ):


def process_request( request ):
if not request.is_secure():
return HttpResponse(“Not Secure”, status=500)
if request.method not in (“GET”, “POST”, “PUT”, “DELETE”):
return HttpResponse(“Not Supported”, status=500)
# The credentials are base64-encoded username “:” password.
auth= request.META[“Authorization”]
username, password = base64.b64decode(auth).split(“:”)
user = authenticate(
username=username, password=password)
if user is not None:
if user.is_active:
login(request, user)
return None # Continue middleware stack
return HttpResponse(“Invalid”, status=401)

44 13/2013
CYBER SECURITY
INTELLIGENCE CONTROL COMPLEXIT

What changes is the implementation of the


authenticate() method. You’ll provide your own au-
CORPORATE
COMPLEXITY THREAT INTELLIGENC
thentication backend which passes the credentials to
the Open AM server for authentication https://docs.djan-
goproject.com/en/1.5/topics/auth/customizing/#writing- PROTECTION INFORMATION
an-authentication-backend.
Are you
INTELLIGENCE
CONTROL ELECTRON

TECHNOLOGY
Summary
What we’ve seen are several of the squares used in
playing Buzzword Bingo. We’ve looked at “Defense in
Depth”: having multiple checks to assure that only the
prepared?
right features are available to the right people. Perhaps COMPLEXITY THREAT COMPLEXIT

INTELLIGENCE PROTECTION
the most important thing is this:
kpmg.ca/forensic
Always use a hash of a password. Never use
encryption ELECTRONIC CYBER SECURITY THREA

INTRUSION
We always want to use a trust identity manager. Either
the User model in Django or a good third-party imple-
mentation. We can easily implement Single Sign-on
(SSO) using a third-party identity manager.
If we use the Secure Socket Layer (SSL), then cre- ATTACK THREAT CYBER SECURIT
dentials for RESTful web services are easy to work with.
Django supplies at three levels of authorization con-
trol: group membership, Django settings to select appli-
TECHNOLOGY CORPORATE
cations and templates and the middleware processing ELECTRONIC INFORMATION COMPLEXIT
pipeline. To these three levels, we can easily add our
own customized settings.
We prefer to rely on Django group memberships and
DATA ANALYTICS
standard settings. This allows us to tweak permissions RISK INFORMATION TECHNOLOG

DATA RECOVERY
through the auth module. We can implement higher-lev-
el “product” or “feature” authorizations. We have a vari-
ety of design patterns: template selection, class hierar-
chies and class selection, dynamic module imports, and COMPLEXITY ELECTRONIC INFORMATIO

FORENSICS
even dynamic application configuration.
We can use the database. We can create a many-
to-many relationship between the Django Profile model
and a table of license terms and conditions with expira- DATABASE ELECTRONIC CONTRO
tion dates. Or (for Django 1.5) we can extend the User
model to include this relationship. Using the database,
however, must be done carefully, since it often leads to
a confusing collection of if statements.
We should feel confident using Django’s Middleware
INTELLIGENCE
INFORMATION RISK TECHNOLOG

eDISCOVERY
Classes to create a layered approach to security. It’s a
simple and elegant way to assure that all requests are
handled uniformly. Django rocks. This makes it easy to
fine-tune the available bits and pieces to match the mar-
keting and sales pitch and the the legal terms and con- COMPLEXITY THREAT INTELLIGENC

INVESTIGATIONS
ditions in the contracts and statements of work.

Steve Lott
The author has been a software developer for over 35 years.
Most recently, he’s been developing Python applications for
actuaries, including complex data sources, flexible schema
TECHNOLOGY
COMPLEXITY THREAT DATABAS
© 2013 KPMG LLP, a Canadian limited liability partnership and a member firm of the KPMG network of independent
design, and a secure RESTful API.

INTELLIGENCE PROTECTION
member firms affiliated with KPMG International Cooperative (“KPMG International”), a Swiss entity. All rights reserved.

en.sdjournal.org
Building a Console 2-player
Chess Board Game in
Python
Python is a very powerful language particularly for writing
server-side backend scripts, although one can also use it
for web development tasks through the Django framework
(https://www.djangoproject.com) and it is gaining popularity
in that field as well. A very thorough and complete
documentation, the huge variety of libraries and open-source
projects – easily installed with the package managers (https://
pypi.python.org/pypi/pip and https://pypi.python.org/pypi/
setuptools) and the huge knowledge base in Q&A sites like
StackOverflow (http://stackoverflow.com/questions/tagged/
python) and mailing lists are among the main characteristics to
which the widespread use of Python can be attributed to.

W
e will be building a console-based 2-play- • You enter the usernames of the two players.
er chess board using Python. For those not • White and Black are randomly assigned to each
familiar with the game of chess you should player.
probably first take a quick glance in the Wikipedia ar- • Timer starts for White since they play first.
ticle (https://en.wikipedia.org/wiki/Chess), before diving • Once you hit “Enter”, you will be prompted to enter
into any code details. You can find the code on Github a move following the convention
(https://github.com/georgepsarakis/python-chess-
board) as well as instructions on how to get it and run- PIECE POSITION -> TARGET SQUARE for exam-
ning it. For any questions or feedback feel free to open ple B2 -> B3 will move the white pawn one posi-
an issue (https://github.com/georgepsarakis/python- tion forward.
chess-board/issues/new – you will need a Github ac-
count to do this though). The code is packed in a single • The move is checked and
file to make it easier to find and view alternate code seg- • if approved then the new board state is printed and
ments. In addition, I have included several comments to timer starts for the other player.
make it easier to walk yourself through the code. The • if rejected timer restarts for the current player and a
script is tested on Linux so no guarantees can be made new move is requested.
for running on a Windows machine (anyone willing to • Process repeats.
test it and make any necessary modifications is more
than welcome to make a pull request!). Object Modeling
The concept is pretty simple, as stated in the READ- Quite briefly, Chess requires a board consisting of 8x8
ME.md as well: squares, 16 White pieces and 16 Black pieces. Each

46 13/2013
Building a Console 2-player Chess Board Game

player is assigned to a color, quite similarly to a general • Ability for diagonal, straight, L-shaped movement
leading an army. Piece types are (in parentheses is the in the board. L-shaped (or Gamma-shaped from
number of items in each set): the greek letter Γ) movement is performed only by
Knights.
• King (x1) • Ability to pass over other pieces in their movement
• Queen (x1) path. Actually, only Knights are allowed to do this.
• Bishop (x2) • Limitation on the number of squares that can be tra-
• Knight (x2) versed in each move. Pawns and Kings can move
• Rook (x2) one square distance, Knights are making stan-
• Pawn (x8) dard L-shaped moves and the rest of the pieces can
move freely as long their path is unobstructed.
After some brief consideration, we need 4 objects to • The color of the piece (Black or White).
describe the problem at hand in a simplistic manner. • The type, which can be any of 'Rook', 'Knight',
'Pawn', 'King', 'Queen', 'Bishop'.
Modeling Pieces
Pieces require the following properties to describe their These are the only information we need to construct
behavior: an instance of a chess piece. Basically the type of the

Listing 1. Piece Class


‘Knight’ : -1,
class Piece(object): }
‘’’
Object model of a chess piece def __init__(self, **kwargs):
‘’’ Constructor for a new chess piece ‘’’
We keep information on what kind of movements the self.Type = kwargs[‘Type’]
piece is able to make (straight, ‘’’ Perform a basic check for the type ‘’’
diagonal, gamma), if not self.Type in self.AvailableTypes:
how many squares it can cross in a single move, its raise Exception(‘Unknown Piece Type’)
type (of course) and the color x(1)
(white or black). self.Color = kwargs[‘Color’]
‘’’ directions = self.Types_Direction_Map[self.Type]
DirectionDiagonal = False ‘’’ Check allowed directions for movement ‘’’
DirectionStraight = False self.DirectionDiagonal = ‘diagonal’ in directions
DirectionGamma = False self.DirectionGamma = ‘gamma’ in directions
LeapOverPiece = False self.DirectionStraight = ‘straight’ in directions
MaxSquares = 0 ‘’’ Determine if there is a limitation on the
Color = None number of squares per move ‘’’
Type = None self.MaxSquares = self.Types_MaxSquares_Map[self.
AvailableTypes = [ ‘Rook’, ‘Knight’, ‘Pawn’, ‘King’, Type]
‘Queen’, ‘Bishop’ ] ‘’’ Only Knights can move over other pieces ‘’’
Types_Direction_Map = { if self.Type == ‘Knight’:
‘Rook’ : [ ‘straight’ ], self.LeapOverPiece = True
‘Knight’ : [ ‘gamma’ ],
‘Pawn’ : [ ‘straight’ ], def __str__(self):
‘King’ : [ ‘straight’, ‘diagonal’ ], ‘’’ Returns the piece’s string representation:
‘Queen’ : [ ‘straight’, ‘diagonal’ ], color and type ‘’’
‘Bishop’ : [ ‘diagonal’ ] return self.Color[0].lower() + self.Type[0].
} upper()
Types_MaxSquares_Map = {
‘Rook’ : 0,
‘Pawn’ : 1,
‘King’ : 1,
‘Queen’ : 0,
‘Bishop’ : 0,

en.sdjournal.org 47
piece actually determines the rest of the properties ex- Creating the board
cept for the color of course which is explicitly set in re- Our board is consisted of a collection of square object
spect to which piece set the piece belongs (Listing 1). instances. We choose to represent this with a diction-
As we can see, the constructor (__init__ meth- ary, where the keys are the square coordinates in chess
od) receives the Type and Color parameters through notation (for example A1) and the values are Square
the **kwargs keyworded argument list (you can read object instances. This will make lookups easier than
more on keyworded and non-keyworded variable storing the squares in a list (or a list of lists where each
length argument lists on this blog article – http://www. entry of the outer list is a column and the referred list
saltycrane.com/blog/2008/01/how-to-use-args-and- represents a row or vice versa).
kwargs-in-python/). We could also hard-code the setup of the board but
where is the fun in that? Apart from our laziness and
Modeling the board Squares distaste for hard-coded solutions (which in fact can
The board squares are described by their position in bring much trouble in the future), giving it some broad-
the board, row and column. Now, instead of adding the er thinking, it would be better to create our own set-
position properties to the Piece class, which is possible up format and mini-parser conventions so that a more
but would make things far more complicated, we add a generic solution is constructed; this will facilitate pos-
Piece property to the Square class which stores an in- sible “Save Game” and “Load Game” features that we
stance of the Piece class. may want to add to our game in the future. We will be
As we observe, the constructor simply receives the using a dictionary whose keys are piece types and
position information, in zero-based index format. That values are strings containing the square positions that
means that we convert column notation A..H to 0..7 are to be placed.
and rows from 1..8 to 0..7 as well as a global conven-
tion when internally referring to rows and columns. This • Black and White positions are separated with a “|”
makes it easier to handle in loops. There are four static character
methods which handle these conversions: • Ranges use a “:” separator. For example, for the ini-
tial setup, to set our pawns we want the entire sec-
• row_index – receives a row index and returns the ond row occupied by white pawns. Thus the nota-
index according to our convention tion in https://github.com/georgepsarakis/python-
• column_index – ascii_uppercase is a string con- chess-board/blob/master/chessboard.py #L165.
taining uppercase ASCII characters sorted alpha- Ranges can be either vertical or horizontal, not
betically. Thus using the index method returns the mixed, so the row or the column must be constant.
position of the letter of the string. • Distinct positions use a “,” separator, for exam-
• index_column – inverse of column_index ple for the Knight positioning (https://github.com/
• position – returns the string representation of a georgepsarakis/python-chess-board/blob/master/
square position in chess notation for example A1 chessboard.py#L168)

Static methods in Python are denoted with the @ When instantiated, the board object, just needs to
staticmethod decorator and we do not pass the in- create the square objects, so the constructor is re-
ternal class instance variable self. If you happen sponsible for instantiating the Square objects and
to read more on decorators (and I strongly advise placing them accordingly in the Squares diction-
you to do) you can take a look at the manual entry ary. The setup method uses the private method _ _
– http://docs.python.org/2/glossary.html#term-dec- parse _ range (Python’s private methods are some-
orator and the Python Wiki – http://wiki.python.org/ what different from other languages since they re-
moin/PythonDecorators. main implicitly accessible for public calls – you can
We are also using the __str__ special method of Py- read more here http://www.diveintopython.net/ob-
thon objects, which returns a string with what we wish ject_oriented_framework /private_functions.html)
to be the string representation of an object. The spe- which receives a string and returns a list of tuples
cific method is not used in our code, but is a good point with the square coordinates and finally pieces are
to explain the meaning of the __str__ special method set on the square positions on the board. Quite sim-
since we will be using it heavily further on. There are a ply, another method called add_piece allows us to
number of special methods regarding object represen- construct and attach a Piece instance to one of the
tation, comparisons between instances etc and if you board’s squares.
want to read more on the subject I would strongly refer The most complicated and useful method is the __
you to the manual (http://docs.python.org/2/reference/ str__ special method since it returns the string repre-
datamodel.html#special-method-names). sentation of the board’s current state, which of course

48 13/2013
Building a Console 2-player Chess Board Game

is essential to the gameplay. Our board is drawn with the timeout to be one second and contents are available
these components: in the r variable. Just by hitting “Enter” (https://github.
com/georgepsarakis/python-chess-board/blob/master/
• A row at the top and the bottom containing the col- chessboard.py#L334) the timer stops and current time
umn names (A-H). is stored in the Timers dictionary under the key of the
• Each row starts and ends with the row number. current user.
• Squares are enclosed with “|” and “-” characters. The largest portion of game logic resides in the move_
• Pieces are represented with their color’s initial letter in piece method. This method begins by requesting user
lowercase and the initial letter of the piece type in up- input; a move in our convention requires specifying the
percase. For example a White Rook becomes wR. source square with the piece in chess notation and the
target square where it should move. These two posi-
We start by looping rows inversely since we are tions are separated by a dash and greater sign charac-
printing top-to-bottom and we want White pieces to ters (loosely resembling an arrow). For example “B2-
be in the bottom of the board always. Each square >B3” will move the white pawn from B2 to B3. If a user
while building a row is separated with the “|” char- enters the string “quit” the game terminates.
acter and we print a row consisting of “-” characters In order to perform the move, a number of checks must
with the full row length which serves as a separa- be made and either the move is approved or rejected. In
tor. Just a reminder here: in Python we can produce the first case, the game modifies the board accordingly
a string by repeating another string N times, simply and starts the timer for the other player, waiting for the
multiplying it with an integer value. Our board now next move. The following checks are performed:
looks like this: Figure 1.
• Whether the piece square is actually occupied and
Finally the Game class if occupied if it belongs to the user.
The Game class is a class that contains actions that • If the move is in a straight line, a diagonal line or an
refer to the gameplay. Properties include the players, L-shaped (gamma-shaped) pattern. These checks
a variable that holds a Board instance and a diction- require calculation of the absolute distance be-
ary named Timers for storing timer info for each user. tween starting and target rows and columns (https://
Instantiating a Game object randomly assigns colors to github.com/georgepsarakis/python-chess- board/
users (with the randint function) and also instantiates blob/master/chessboard.py#L373 and https://github.
and sets up a board for our game. The timer display re- com/georgepsarakis/python-chess-board/blob/mas-
quires a helper function, the time_format method which ter/chessboard.py#L374). Straight line movement
displays time elapsed for the current user in human is easily detected if the starting and target columns
readable format (MM:SS). are the same or starting and target rows are equal;
The way the timer works is pretty straightforward; in the first case the piece moves in a vertical line on
entering the while-loop and checking if the second is the board otherwise in a horizontal. The condition for
changed (https://github.com/georgepsarakis/python- diagonal moves is that the absolute column and row
chess-board/blob/master/chessboard.py#L312) it prints distances must be equal. At last, L-shaped moves
the time elapsed from the start of game for the user. (valid only for Knights) are detected if either a row or
Now the user needs a way of stopping the timer. For column distance is equal to 2 and the other coordi-
prompting the user for input but with a timeout we are nate difference is equal to 1. So if we have a row dis-
using the select function of the Python build-in select tance of 2, then the column distance must be 1 also.
module (you can read more on this module in the man- • Checking if the piece’s path is blocked by oth-
ual http://docs.python.org/2/library/select.html). We set er pieces. This check is performed only if the
LeapOverPiece property of the moving piece is
False. We must first construct the list of squares
that must be crossed by the piece in order to ac-
complish the move, thus we distinguish our cas-
es in respect to the type of movement; whether it
is happening on a straight line (https://github.com/
georgepsarakis/python-chess-board/blob/master/
chessboard.py#L402) or a diagonal (https://github.
com/georgepsarakis/python-chess-board/blob/
master/chessboard.py#L418). L-shaped moves are
performed by Knights which incidentally can leap
Figure 1. The board with all the pieces in its initial state over pieces as well.

en.sdjournal.org 49
• Having the path that outlines the move, we can first • If the move is accepted the new state of the board
implement the check about the permitted number is printed and steps 1 & 2 are performed for the
of squares for this piece (https://github.com/george- other user.
psarakis/python-chess-board/blob/master/chess- • If the move is rejected the player is prompted with a
board.py#L424). message to play again and the process repeats for
• Looping over the path we check if any of the the same player by repeating steps 1 & 2 (new loop).
squares is already occupied by another piece • Process repeats until a user quits.
(https://github.com/georgepsarakis/python-chess-
board/blob/master/chessboard.py#L427). Summary
• We must also check if the target square is occu- In this tutorial we have walked through the Python code
pied by a piece which belongs to the current user that builds a simplistic version of a chess game be-
(https://github.com/georgepsarakis/python-chess- tween two human players. We explored some aspects
board/blob/master/chessboard.py#L432). of object modeling and gained some experience on cre-
ating and interacting with Python objects. Dealing with
Finally using the set_piece of the Square class we user text input, displaying the board on the console and
change the piece’s current square Piece property to displaying the timer were some of the interface difficul-
None and the target square has now attached to it the ties while outlining the game process, setting up the
moved piece, thus completing our move. The next step board with the pieces and validating user moves were
is to change the player. The method returns a tuple con- amongst the algorithmic challenges we faced here. Of
sisting of a Boolean value which is False if the move is course, this is not a complete game implementation but
rejected and a string containing an error message in- rather a working example; it would definitely require
forming the user why the move cannot be performed. much more error handling and validations, as well as
The string representation of the game is displayed incorporating all the chess rules. Some thoughts on ex-
with the __str__ special method. It uses the board’s panding the code can be:
string representation along with two extra lines printing
the usernames. The format method of the string class is • Adding a --timer parameter and restrict the user’s
used in order to center align the usernames (format is game time to this number of seconds.
the preferred method of string formatting with variable • Keeping history of the moves and display lost piec-
substitutions and alignment – http://docs.python.org/2/ es for each user.
library/string.html#string-formatting). • Adding check and checkmate detection.
• Play with computer feature (!) – building a chess
Putting it all together engine is very difficult unfortunately.
The game starts when an instance of the Game class
is created (https://github.com/georgepsarakis/python-
chess-board/blob/master/chessboard.py#L475). The
argparse module (http://docs.python.org/dev/library/ar-
gparse.html) provides us with an easy way to pass com-
mand line parameters to our scripts; here we can pass
the usernames via the --user1 and --user2 parameters. George Psarakis
We could also for example, pass the maximum number George Psarakis studied Mechanical Engineering and com-
of seconds for the timer, so once a player exceeds that pleted an MSc in Computational Mechanics. He has been
time of playing, he loses. working intensively with PHP, MySQL, Python & BASH on
After the game is created, a printout of the board is Linux machines since 2007 to develop efficient backend
given in its initial state and the players’ alternation works scripts, monitoring tools, Web administration panels for in-
with a simple while-loop: frastructure purposes and performing server administration
tasks. His interests include NoSQL databases, learning new
• The timer is started for the current player. languages, mastering Python, PHP, MySQL and starting new
• A move is requested via the move_piece method, projects on Github (https://github.com/georgepsarakis). You
which we analyzed previously. can find him on Twitter @georgepsarakis.
Write a Web App and Learn
Python
Background and Primer for Tackling the Django Tutorial
While many resources exist online for anyone interested in taking
on Python, as with many programming languages, the best way to
get started is often by getting your feet wet on an actual project.
Over the past 15 years, I have been involved in many aspects of
web development from building out internal intranet applications
on Microsoft ASP to writing Perl and PHP for large web sites.

B
uilding off past experiences as CTO for various Web frameworks facilitate the development of web ap-
New York based startups and my most recent plications by allowing languages like Python or Ruby to
effort to launch a cloud infrastructure solution for take advantage of standard methods to complete tasks
African startups headquartered in Nairobi, Kili.io, I have like interacting with HTTP payloads, or tracking users
become a big proponent of Python- by far one of the throughout a site, or constructing basic HTML pages.
easiest programming languages to write, read and ex- Leveraging this scaffolding, a developer can focus on
tend with superior speed. creating a web application instead of doing a deep dive
Python is an extremely flexible language that allows on HTTP internals and other lower-level technologies.
students and serious programmers to accomplish di- While the dominant web framework for the Ruby lan-
verse tasks varying from SMS gateways to web appli- guage is Rails, Python has many different web frame-
cations to basic data visualization. Existing resources works including Bottle, web.py, and Flask with the vast
on Python that new and experienced programmers can majority of Python web applications being developed
turn to include: Learn Python the Hard Way by Zed Shaw right now using the mature framework Django. Django
(http://learnpythonthehardway.org/), educational sites is a full-stack web framework which includes an Object
like Udacity (https://www.udacity.com/course/cs101) and Relational Mapper (so you can use Python syntax to
the Python.org website itself which has a thorough tuto- access values in a relational database), a template ren-
rial (http://docs.python.org/3.3/tutorial/) and some of the derer (so you can insert variables into an HTML page
best overall documentation (http://docs.python.org/3.3/ that will then be populated before the page is sent to the
reference/index.html) available for any programming lan- browser), and various additional utilities like date pars-
guage. However, when speaking to students trying to get ers, form handlers, and cache helpers.
started with the language, a common complaint is that Learning Django via their tutorial can be one of the
available command line tutorials do not always provide easiest ways to get started with Python and really isn’t
concrete and detailed recommendations for initial setup. much more difficult. I advise this approach to all people
This article aims to introduce the most widely used frame- new to Python and think it’s the best way to get going.
work for Python web development, Django, which can
complement what is already online, (https://docs.django- Getting Started
project.com/en/1.5/intro/tutorial01/) and provide practical Before starting this tutorial, you should try to have a cur-
advice to getting you started on your own project. rent version of either Mac OS X or Ubuntu Linux. These
are the easiest operating systems on which to devel-
What’s a Framework? op for the web and the most well supported in terms of
A ‘framework’ is a set of tools and libraries that facili- documentation and setup guides. If you get lost, you’ll
tates the development of a certain type of application. be much happier to be on one of these two platforms.

52 13/2013
Tackling the Django Tutorial

If you’re a fan of another Linux distribution, you shouldn’t slow; others really benefit from it. A brief overview and set
have too many problems. If you want to use Windows of tutorials on Sublime, my preferred and middle of the
though, while doable, this is certainly not advised. Not road text editor, can be found here (https://tutsplus.com/
all Python libraries are easily installed on Windows and course/improve-workflow-in-sublime-text-2/).
since most Python developers use OS X or Linux, you’ll
run into fewer surprises as you go along. If you have Win- Initial Steps & Installing Package Managers
dows and don’t want to dual-boot your machine, get Vir- The first thing you need to do in order to get started is to
tualBox (https://www.virtualbox.org/) and install Ubuntu create a development folder in your home directory to
13.04 inside a virtual machine. The software is free and hold all of your related work and projects. I like to have
widely used and running Ubuntu, inside of Windows, is a folder called dev/ in my home directory. As a first step
one of the most common scenarios for Virtual Box. for this basic best practice, open up the terminal and
type cd to get to your home folder and then mkdir -p dev
A Word on Text Editors (the -p exits silently in case you already have a dev fold-
In order to write a program, you’re going to need a text er). Then type cd dev to get to your new working folder.
editor. People have been using vi (or vim) and emacs If you’re on a mac, type brew. Since that probably
for decades with great success. If you are comfortable won’t work on the first try, follow the installation instruc-
with these editors, great. I mostly use vi when on the tions at http://brew.sh/. On Ubuntu (or any Debian-
server and use Sublime Text (http://www.sublimetext. based system), you’ll type apt-get or aptitude at the
com/) on my laptop. More powerful editors called Inte- command line in order to install software. These pack-
grated Development Environments (IDEs) also exist like age managers facilitate the installation of pretty much
PyDev (Eclipse – http://pydev.org/), IDLE (included with any open source software you could ever want – and
Python – http://docs.python.org/2/library/idle.html), and it takes care of dependencies so if one package (like
PyCharm (http://www.jetbrains.com/pycharm/). IDEs can Django) requires another one to be on the system as a
interpret the code you write and suggest many fixes and dependency (like Python), aptitude or brew will install
solutions to your programming needs (including links to both of the packages automatically. RPM is the pack-
documentation). Some people find this overbearing and age manager for Red Hat-related distributions.

a d v e r i s e m e n t
Downloading and Installing Python usually important to the professional Python developer,
Operating system-level package managers like Apti- they are not always critical until you’re working on a full-
tude and Brew help install software like Python itself, scale production project.
but if you want to install Python libraries, you will need
to use pip, the Python installer. Once you have your And what about a Database?
package managers installed, you can install python by A database,the piece of software that holds onto all
simply typing brew install python or apt-get install of your data, is critical for any Django project’ s back-
python which will give you python as well as the pip in- end. MySQL and PostgreSQL are popular and the most
staller. If you have trouble, different installation methods widely used open source relational databases around,
for installing pip can be found here (http://www.pip-in- but can be harder to setup and maintain for a beginner.
staller.org/en/latest/installing.html). The simplest database available and one I often rec-
ommend is called SQLite. It runs directly off of files
And Django? on your home directory and can be installed through
Once Python and pip are installed, you can look at the the Package Managers above by typing brew install
Python Package Index for all the different packages sqlite (or apt-get install sqlite)
available to install. Installing Django from here is as
easy as typing pip install Django into the terminal. Final Thoughts
More information can be found in the docs (https://docs. The above information is not meant to be all en-com-
djangoproject.com/en/1.5/topics/install/), but installing passing, but hopefully provides some basic information
via pip should work just fine. and background on getting started with Python and the
Django Tutorial. Often the best resource for getting fur-
Should I really be installing these Python ther into online tutorials is experimenting with project
libraries globally? related tasks and peer advice for when you get stuck.
Installing development libraries globally can cause Having an easily accessed community of support at
headaches to a developer working on many different your fingertips is also one of the best things about Py-
projects. One of the great disadvantages of libraries thon by far and you should feel free to post comments
that are globally available is that when you have a client and questions to... at...
whose production system is on Django 1.4 and anoth- Lastly, for any new or longtime Python enthusiasts,
er one whose production system is on Django 1.5, you I’m happy to respond to emails, IMs and coffees if you
have to pick one for your system and hope problems ever make it to Nairobi.
with compatibility are not an issue. Hopefully you now have a background on how to get
The solution to this problem is a package called vir- started with Python with the Django tutorial. So, get to
tualenv (http://www.virtualenv.org/en/latest/), which let’s it and write in with any questions you have so we can
you set up complete Python environments inside of fold- help you out.
ers so that a developer can run with different versions of
libraries on different projects. This solution has proven
so successful that there’s now a workflow tool built to
use it called virtualenvwrapper (http://virtualenvwrap-
per.readthedocs.org/en/latest/). While both of these are Adam Nelson
Efficient Data and Financial
Analytics with Python

In this article, we will be talking about first steps in Python


programming, we will show you the way how to start and
make it as easy as possible. You will see how user friendly
Python is and why it makes it so much popular in the world of
programmers.

T
he data and financial analytics environment has into separate data warehouses for analytics purpos-
changed dramatically over the last years and it es by executing weekly or monthly batch processes.
is still changing at a fast pace. Among the major Similarly, with regard to decision making, having time
trends to be observed are: consuming, yearly strategy and budgeting processes
seems still common practice among the majority of
• big data: be it in terms of volume, complexity or ve- larger companies.
locity, available data is growing drastically; new While these approaches might still be valid for certain
technologies, an increasingly connected world, industries, big data and the real-time economy demand
more sophisticated data gathering techniques and for much more agile and interactive data analytics and
devices as well as new cultural attitudes towards decision making. One extreme example illustrating this
social media are among the drivers of this trend is high-frequency trading of financial securities where
• real-time economy: today, decisions have to be data has to be analyzed on a massive scale and deci-
made in real-time, business strategies are much sions have to be made sometimes in milliseconds. This
shorter lived and the need to cope faster with the is only possible by making use of high performance
ever increasing amount and complexity of decision- technology and by applying automated, algorithmic de-
relevant data steadily increases cision processes. While this might seem extreme for
most other business areas, the need for more interac-
Decision makers and analysts being faced with such tive analytics and faster decisions has become a quite
an environment cannot rely anymore on traditional ap- common phenomenon.
proaches to process data or to make decisions. In the
past, these areas where characterized by highly struc- Typical Data-Related Problems
tured processes which were repeated regularly or Corporations, decision makers and analysts acknowl-
when needed. edging the changing environment and setting out to do
For example, on the data processing side, it was something about it, generally face a number of problems:
and it is still quite common to transfer operational data
• sources: data typically comes from different sourc-
es, like from the Web, from in-house databases or
What you should have
• Desktop PC or notebook with modern browser (Firefox, it is generated in-memory, e.g. for simulation pur-
Chrome, Safari) poses
• Free account for Web-based analytics environment Wa- • formats: data is generally available in different for-
kari (http://www.wakari.io) mats, like SQL databases/tables, Excel files, CSV
files, arrays, proprietary formats

56 13/2013
Financial Analytics with Python

• structure: data typically comes differently struc- • matplotlib (http://www.matplotlib.org),


tured, be it unstructured, simply indexed, hierarchi- • pandas (http://pandas.pydata.org) and
cally indexed, in table form, in matrix form, in multi- • PyTables (http://www.pytables.org)
dimensional arrays
• completeness: real-world data is generally incom- In addition, the powerful interactive development en-
plete, i.e. there is missing data (e.g. along an index) vironment IPython (http://www.ipython.org) makes de-
or multiple series of data cannot be aligned correct- velopment and interactive analytics much more conve-
ly (e.g. two time series with different time indexes) nient and productive. All code presented is in the fol-
• conventions: for some types of data there a many lowing is Python 2.7.
“competing” conventions with regard to formatting, However, you will need in general additional libraries
like for dates and time such that it is best to install a complete scientific Python
• interpretation: some data sets contain information distribution like Anaconda (www.continuum.io/anacon-
that can be easily and intelligently interpreted, like a da) or to use a pre-configured, browser-based analytics
time index, others not, like texts environment like Wakari (http://www.wakari.io).
• performance: reading, streamlining, aligning, analyz-
ing – i.e. processing – (big) data sets might be slow Addressing some Typical Problems
Before we go into some specific examples, the library
In addition to these data-oriented problems, there typi- pandas shall be highlighted as a useful tool to cope with
cally are organizational issues that have to considered: typical problems regarding available data. pandas can,
among others, help with the following issues:
• departments: the majority of companies is orga-
nized in departments with different technologies, • sources: pandas reads data directly from differ-
databases, etc., leading to “data silos” ent data sources such as SQL databases or JSON
• analytics skills: analytical and business skills in based APIs
general are possessed by people working in line • formats: pandas can process input data in different for-
functions (e.g. production) or administrative func- mats like CSV files or Excel files; it can also generate
tions (e.g. finance) output in different formats like CSV, HTML or JSON
• technical skills: technical skills, like retrieving data • structure: pandas strengths lies in structured data
from databases and visualizing them, are generally formats, like time series and panel data
possessed by people in technology functions (e.g. • completeness: pandas automatically deals with
development, systems operations) missing data in most circumstances, e.g. comput-
ing sums even if there are a few or many “not a
In the past, companies have spent huge amounts of number”, i.e. missing, values
money to cope with these problems around ever in- • conventions/interpretation: for example, pandas
creasing data volumes. In 2011, companies around the can interpret and convert different date-time for-
world spent an estimated 100 bn USD on data center mats to Python datetime objects and/or timestamps
infrastructure and 24 bn USD on database software • performance: the majority of pandas classes, meth-
(Source: Gartner Group as reported in Bloomberg Busi- ods and functions is implemented in a perfor-
nessweek, 2 July 2012, “Data Centers – Revenge of the mance-oriented fashion making heavy use of the
Nerdiest Nerds”). This illustrates that improvements in Python/C compiler Cython (http://www.cython.org)
data management and analytics can pay off quite well.
Small cost savings, faster implementation approaches pandas is a canonical example for Python being an ef-
or more efficient data analytics processes can have a ficiency driver for data analytics (For more details re-
huge impact on the bottom line of any business. fer to the book McKinney, Wes (2012): Python for Da-
ta Analysis. O’Reilly). First, it is open source and free
Python as Analytics Environment of cost. Second, through a high level programming ap-
Getting Started with Python proach with built-in convenience functions it makes
In recent years, Python has positioned itself more and writing and maintaining code much faster and less
more as the environment of choice for efficient data and costly. Third, it shows high performance in many disci-
financial analytics. A fundamental stack for data analyt- plines, reducing execution speeds for typical analytics
ics with Python shall comprise at least. tasks and therewith time-to-insights.
pandas itself uses NumPy arrays as the basis build-
• Python (http://www.python.org), ing block. It also tightly integrates with PyTables for data
• NumPy (http://www.numpy.org), storage and retrieval. All three libraries are illustrated by
• SciPy (http://www.scipy.org), specific examples in what follows.

en.sdjournal.org 57
Analytics Examples from Finance follows a geometric Brownian motion, the respective
Two examples from finance show how efficient Py- stochastic differential equation (SDE) is given by
thon can be when it comes to typical financial analytics
tasks. The first is the implementation of a Monte Carlo
dS t=rS t dt +σS t dZ t
algorithm, simulating the future development of a stock where Z is a Brownian motion. To simulate the SDE,
price. The second is the analysis of two historical stock we use the discretization
price time series.
S t =S t− Δt exp((r −0.5 σ ) Δt + σ√ Δt z t )
2

Monte Carlo Simulation where z is a standard normally distributed random


Not only in finance, but in almost any other science, like variable and it holds 0<t ≤T with T the final time horizon
Physics or Chemistry, Monte Carlo simulation is an im- (For details, refer to the book Hilpisch, Yves (2013): De-
portant numerical method. In fact, it is among the top 10 rivatives Analytics with Python. Visixion GmbH, http://
most important numerical algorithms of the 20th century www.visixion.com).
(Cf. SIAM News, Volume 33, Number 4). To get mathematically reliable results, a high number
A typical financial analytics task is to simulate the evo- I of simulated stock price paths in combination with a
lution of the price of a company’s stock over time. This fine enough time grid is generally needed. This makes
could be necessary, for example, in the context of the the Monte Carlo simulation approach rather compute
valuation of an option on the stock or the estimation of intensive. For one million stock price paths with 50 time
certain risk measures. Assuming that the stock price S intervals each, this leads to 50 million single computa-
tions, each involving exponentiation, square roots and
Listing 1. Monte Carlo Simulation: Pure Python Code the draw of a (pseudo-)random number. The following
is a pure Python implementation of the respective sim-
# ulation algorithm, making heavy use of lists and for-
# Simulating Geometric Brownian Motion with Python loops (Listing 1).
# The execution of the script yields the following output:
from time import time
from math import exp, sqrt, log Absolute Log Return 0.050
from random import gauss Duration in Seconds 115.589

t0 = time() The absolute log return over one year is correct with
# Parameters 5%, so the discretization obviously works well. The ex-
S0 = 100; r = 0.05; sigma = 0.2 ecution takes almost 2 minutes in this case.
T = 1.0; M = 50; dt = T / M; I = 1000000 Although the Monte Carlo simulation is quite easily
implemented in pure Python, NumPy is especially de-
# Simulating I paths with M time steps signed to handle such operations. To this end, note that
S = [] our end product S is a list of one million lists with 51 en-
for i in range(I): tries each. This can be seen as a matrix – or a rectangu-
path = [] lar array – of size 1,000,000 x 51. And NumPy’s major
for t in range(M + 1): strength is to process data structures of this kind.
if t == 0: Therefore, the following Python script illustrates the
path.append(S0) implementation of the same algorithm, this time based
else: on NumPy’s array manipulation capabilities (Listing 2).
z = gauss(0.0, 1.0) The execution of this script gives:
St = path[t-1] * exp((r - 0.5 * sigma **
2) * dt Absolute Log Return 0.050
+ sigma * sqrt(dt) Duration in Seconds 5.046
* z)
path.append(St) Apart from being equally exact, we can say the following:
S.append(path)
• code: the NumPy version of the simulation is much
# Calculating the absolute log return more compact – involving only one loop instead of
av = sum([path[-1] for path in S]) / I two – and is therefore better readable and easier to
print “Absolute Log Return %7.3f” % log(av / S0) maintain
print “Duration in Seconds %7.3f” % (time() - t0) • speed: execution speed of the NumPy code is
about 22 times faster than pure Python

58 13/2013
Financial Analytics with Python

In terms of efficiency with regard to our financial an-


Listing 2. Monte Carlo Simulation: Python + NumPy Code alytics example, we have gained twofold by applying
NumPy: we have to write less code which executes
# faster. The faster execution results from the fact that
# Simulating Geometric Brownian Motion with NumPy the NumPy library is to a large extent implemented in
# C and also Fortran. This means that loops that are del-
from time import time egated to the NumPy level are executed at the speed
import numpy as np of C code.
The simulation algorithm can even be further short-
t0 = time() ened by applying a mathematical “trick”. Using the log
# Parameters version of the discretization scheme, we can avoid loops
S0 = 100; r = 0.05; sigma = 0.2 completely on the Python level. The respective simula-
T = 1.0; M = 50; dt = T / M; I = 1000000 tion algorithm boils down to two lines of code (Listing 3).
This code has almost identical execution speed as
# Simulating I paths with M time steps the previous NumPy version but is obviously even more
S = np.zeros((M+1, I)) compact. As a matter of software design and also taste,
S[0] = S0 it could be even a little bit too concise when it comes to
for t in range(1, M + 1): readability and maintenance.
z = np.random.standard_normal(I) No matter which approach is used, matplotlib helps
S[t] = S[t - 1] * np.exp((r - 0.5 * sigma ** 2) * dt with the convenient visualization of the simulation re-
+ sigma * sults. The following code plots the first 10 simulated
np.sqrt(dt) * z) paths from the NumPy array S and also the average
over time over all one million paths (Listing 4).
# Calculating the absolute log return The result from this code is shown in Figure 1 with the
print “Absolute Log Return %6.3f” % log(sum(S[-1] / thicker red line being the average over all paths.
I / S0))
print “Duration in Seconds %6.3f” % (time() - t0) Interactive Time Series Analytics
Time series, i.e. data labeled by date and/or time in-
Listing 3. Monte Carlo Simulation: Compact NumPy Code formation, can be found in any business area and any
scientific field. The processing of such data is there-
# fore an important analytics discipline. In what follows,
# Simulating Geometric Brownian Motion with NumPy we want to analyze a pair of stocks, namely those of
(log Version) Apple Inc. and Google Inc. The library we use for this
# is pandas which is especially designed to efficiently
from numpy import * handle time series data. The following is an interactive
session with IPython.
# Parameters as before
# Simulating I paths with M time steps In:
S = S0 * exp(cumsum((r - 0.5 * sigma ** 2) * dt import numpy as np
+ sigma * sqrt(dt) * random.standard_ import pandas as pd
normal((M + 1, I)), axis=0)) import pandas.io.data as web
S[0] = S0

Listing 4. Monte Carlo Simulation: Code to Generate Plot

#
# Plotting 10 Stock Price Paths + Average
#
import matplotlib.pyplot as plt
plt.plot(S[:, :10])
plt.plot(np.sum(S, axis=1) / I, ‘r’, lw=2.0)
plt.grid(True)
plt.title(‘Stock Price Paths’)
plt.show() Figure 1. 10 simulated stock price paths and the average over all
paths (red line)

en.sdjournal.org 59
pandas can retrieve stock price information directly GOOG[‘Close’].ix[0]}) * 100
from http://finance.yahoo.com: DATA.head()

In: Out:
GOOG = web.DataReader(‘GOOG’, ‘yahoo’, start=’7/28/2008’)
AAPL = web.DataReader(‘AAPL’, ‘yahoo’, start=’7/28/2008’) Date AAPL GOOG
2008-07-28 100.000000 100.000000
The analysis was implemented on 03.August 2013
2008-07-29 101.735751 101.255449
and the starting date is chosen to get about five years
of stock price data. GOOG and AAPL are now pan- 2008-07-30 103.549223 101.169517
das DataFrame objects that contain a time index and 2008-07-31 102.946891 99.293679
a number of different time series. Let’s have a look at 2008-08-01 101.463731 98.059188
the five most recent records of the Google data:

In: Calling the plot method of the DataFrame class gen-


GOOG.tail() erates a plot of the time series data.

Out: In:
DATA.plot()

Date Open High Low Close Volume Adj


Close Figure 2 shows the resulting figure. Although Apple
2013-07-29 884.90 894.82 880.89 882.27 1891900 882.27 stock prices recently decreased sharply, it nevertheless
outperformed Google over this particular time period.
2013-07-30 885.46 895.61 880.87 890.92 1755600 890.92
It is a stylized fact, that prices of technology stocks
2013-07-31 892.99 896.51 886.18 887.75 2072900 887.75 are highly positively correlated. This means, roughly
2013-08-01 895.00 904.55 895.00 904.22 2124500 904.22 speaking, that they tend to perform in tandem: when
2013-08-02 903.44 907.00 900.82 906.57 1713900 906.57 the price of one stock rises (falls) the other stock price
is likely to rise (fall) as well. To analyze if this is the case
We are only interested in the “Close” data of both with Apple and Google stocks, we first add log return
stocks, so we generate a third DataFrame, using the columns to our DataFrame.
respective columns of the other DataFrame objects. We
can do this by calling the DataFrame function and pro- In:
viding a dictionary specifying what we want from the two DATA[‘AR’] = np.log(DATA[‘AAPL’] / DATA[‘AAPL’].shift(1))
other objects. The time series are both normalized to DATA[‘GR’] = np.log(DATA[‘GOOG’] / DATA[‘GOOG’].shift(1))
start at 100 while the time index is automatically inferred DATA.tail()
from the input.
Out:
In:
DATA = pd.DataFrame({‘AAPL’ : AAPL[‘Close’] /
AAPL[‘Close’].ix[0],
‘GOOG’ : GOOG[‘Close’] /

Figure 3. Scatter plot of Google and Apple stock price returns from
Figure 2. Apple and Google stock prices since 28. July 2008 until 02. 28. July 2008 until 02. August 2013; red line is the OLS regression
August 2013; both time series normalized to start at 100 result with y = 0.005 + 0.67 x

60 13/2013
Financial Analytics with Python

Date AAPL GOOG AR GR log returns for both stocks and to conduct a least squares
regression. Some additional lines of code yield a custom
2013-07-29 290.019430 184.915744 0.015302 -0.003485
scatter plot of the return data plus the linear regression
2013-07-30 293.601036 186.728706 0.012274 0.009757
line. This illustrates that Python in combination with pan-
2013-07-31 293.089378 186.064302 -0.001744 -0.003564 das is highly efficient when it comes to interactive financial
2013-08-01 295.777202 189.516264 0.009129 0.018383 analytics. In addition, through the high level programming
2013-08-02 299.572539 190.008803 0.012750 0.002596 model the technical skills an analyst needs are reduced
to a minimum. As a rule of thumb, one can say that every
analytical question and/or analytics step can be translat-
We next want to implement an ordinary least regres- ed to one or two lines of Python/pandas code.
sion (OLS) analysis (Listing 5).
Obviously, there is indeed a high positive correlation of Performance and Memory Issues
+0.67 between the two stock prices. This is readily illus- Performance and memory management are important is-
trated by a scatter plot of the returns and the resulting sues for data analytics. On the one hand, since Python is
linear regression line (Listing 6). Figure 3 shows the re- an interpreted language, just-in-time compiling can pro-
sulting output of this code. All in all, we need about 10 vide a means for notable speed-ups. On the other hand,
lines of code to retrieve five years of stock price data for today’s common data sets often exceed the memory ca-
two stocks, to plot this data, to calculate and add the daily pacity generally available at single computing nodes/ma-

Listing 5. Implementing an (OLS) analysis

In:
model = pd.ols(y=DATA[‘AR’], x= DATA[‘GR’])
model

Out:
-------------------------Summary of Regression Analysis-------------------------

Formula: Y ~ <x> + <intercept>

Number of Observations: 1263


Number of Degrees of Freedom: 2

R-squared: 0.3578
Adj R-squared: 0.3573

Rmse: 0.0179

F-stat (1, 1261): 702.6634, p-value: 0.0000

Degrees of Freedom: model 1, resid 1261

-----------------------Summary of Estimated Coefficients------------------------


Variable Coef Std Err t-stat p-value CI 2.5% CI 97.5%
--------------------------------------------------------------------------------
x 0.6715 0.0253 26.51 0.0000 0.6218 0.7211
intercept 0.0005 0.0005 1.05 0.2959 -0.0005 0.0015
---------------------------------End of Summary---------------------------------

Listing 6. Scatter plot of the returns and the resulting linear x = np.linspace(plt.axis()[0], plt.axis()[1] + 0.01)
regression line
plt.plot(x, model.beta[1] + model.beta[0] * x, ‘r’, lw=2)
In: plt.grid(True); plt.axis(‘tight’)
import matplotlib.pyplot as plt plt.xlabel(‘Google Stock Returns’); plt.ylabel(‘Apple
plt.plot(DATA[‘GR’], DATA[‘AR’], ‘b.’) Stock Returns’)

en.sdjournal.org 61
chines. A solution to this might be to use out-of-memory In:
approaches. In addition, depending on the typical analyt- n = 400
ics tasks to be implemented, one should consider care- %time f_nb(n)
fully which hardware approach to follow.
Out:
Just-in-Time Compiling CPU times: user 41 ms, sys: 0 ns, total: 41 ms
A number of typical analytics algorithms demand for a Wall time: 40.2 ms
large number of iterations over data sets which then re- 31920000L
sults in (nested) loop structures. The Monte Carlo algo-
rithm is an example for this. In that case, using NumPy This time, the same number of loops only takes 40
and avoiding loops on the Python level yields a sig- milliseconds to execute. A speed-up of almost 1,900
nificant increase in execution speed. NumPy is really times. The remarkable aspects are that this speed-up
strong when it comes to fully populated matrices/arrays is reached by two additional lines of code only and that
of rectangular form. However, not all algorithms can be no changes to the Python function are necessary.
beneficially casted to such a structural set-up. Although this algorithm could in principle be imple-
We illustrate the use of the just-in-time compiler Num- mented by using standard NumPy arrays, the array
ba (http://numba.pydata.org) to speed up pure Python would have to be of shape 16,000 x 16,000 or approxi-
code through an interactive IPython session. mately 2 GB of size. In addition, due to the very nature
The following is an example function with a nested of the nested loop there would not be much potential to
loop structure where the inner loop increases in multi- vectorize it. In addition, operating with higher n would
plicative fashion with the outer loop. maybe lead to a too high memory demand.

In: In:
import math n = 1500
def f(n): %time f_nb(n)
iter = 0.0
for i in range(n): Out:
for j in range(n * i): CPU times: user 2.13 s, sys: 0 ns, total: 2.13 s
iter += math.sin(pi / 2) Wall time: 2.13 s
return int(iter) 1686375000L

It returns the number of iterations, with the counting For n = 1,500 the algorithm loops more than 1.6 billion
being made a bit more compute intensive than usual. times with the last inner loop looping 1,499 x 1,499 =
Let’s measure execution speed for this function by us- 2,247,001 times. With this parametrization, the typical
ing the IPython magic function %time. NumPy approach is not applicable anymore. However,
the Numba compiled function does the job in a little bit
In: more than 2 seconds.
n = 400 In summary, we can say the following:
%time f(n)
• code: two lines of code suffice to generate a com-
Out: piled version of a loop-heavy pure Python algorithm
CPU times: user 1min 16s, sys: 0 ns, total: 1min 16s • speed: execution speed of the Numba-compiled func-
Wall time: 1min 16s tion is about 1,900 times faster than pure Python
31920000 • memory: Numba preserves the memory efficiency
of the algorithm since it only needs to store a single
32 million loops take about 75 seconds to execute. floating point number – and not a large array of floats
Let’s see what we can get from just-in-time compiling
with Numba. Out-of-Memory Operations
Just-in-time compiling obviously helps to implement
In: custom algorithms that are fast and memory efficient.
import numba as nb However, there are in general data sets that exceed
f_nb = nb.autojit(f) available memory, like large arrays which might grow
over time, and on which one has to implement numeri-
Two lines of code suffice to compile the pure Python cal operations resulting in output that again might ex-
function into a Python-callable compiled C function. ceed available memory.

62 13/2013
Financial Analytics with Python

The library PyTables, which is based on the HDF5 Out:


standard (http://www.hdfgroup.org/HDF5/), offers a num- /ear (EArray(150000, 600)) ‘’
ber of routes to implement out-of-memory calculations. atom := Float64Atom(shape=(), dflt=0.0)
Suppose you have a computing node with 512 MB of maindim := 0
RAM, like with a free account of Wakari. Assume further flavor := ‘numpy’
that you have an array called ear which is of size 700 byteorder := ‘little’
MB or larger. On this array, you might want to calculate chunkshape := (13, 600)
the Python expression
We can easily get the size of this array on disk by:
3 * sin(ear) + abs(ear) ** 0.5
In:
Using pure NumPy would lead to four temporary ar- ear.size_on_disk
rays of the size of ear and of an additional result array
of the same size. This is all but memory efficient. The Out:
library numexpr (https://code.google.com/p/numexpr/) 720033600L
resolves this problem by optimizing, parallelizing and
compiling numerical expressions like these and avoid- The array has a size of more than 700 MB. We need a
ing temporary arrays – leading to significant speed- disk-based results store for our numerical calculation
ups in general and much better use of memory. How- since it does not fit in the memory of 512 MB either.
ever, in this case it does not solve the problem since
even the input array does not fit into the memory. In:
PyTables offers a solution through the Expr mod- out = h5.createEArray(h5.root, ‘out’, atom=tb.
ule which is similar in spirit to numexpr but works with Float64Atom(), shape=(0, n))
disk-based arrays. Let’s have a look at a respective
IPython session: Now, we can use the Expr module to evaluate the nu-
merical expression from above: Listing 7.
In: This code calculates the expression and writes the re-
import numpy as np sult in the out array on disk. This means that doing all
import tables as tb the calculations plus writing 700+ MB of output takes
h5 = tb.openFile(‘data.h5’, ‘w’) about 35 seconds in this case. This might seem not too
fast, but it made possible a calculation which was im-
This opens a PyTables/HDF5 database where we can possible otherwise on the given hardware beforehand.
store our example data. Finally, you should close your database.

In: In:
n = 600 h5.close()
ear = h5.createEArray(h5.root, ‘ear’, atom=tb.
Float64Atom(), shape=(0, n)) The example illustrates that PyTables allows to imple-
ment an array operation which would at least involve
This creates a disk-based array with name ear that is 1.4 GB of RAM by using NumPy and numexpr on a
expandable in the first dimension and has fixed width machine with 512 MB RAM only.
of 600 in the second dimension.
Scaling-Out vs. Scaling-Up
In: Although the majority of today’s business and research
rand = np.random.standard_normal((n, n)) data analytics efforts are confronted with “big” data, sin-
for i in range(250): gle analytics tasks generally use data (sub-)sets that
ear.append(rand) fall in the “mid” data category. A recent study concluded:
ear.flush() “Our measurements as well as other recent work
shows that the majority of real-world analytic jobs
This populates the disk-based array with (pseudo-)ran- process less than 100 GB of input, but popular in-
dom numbers. We do it via looping to generate an ar- frastructures such as Hadoop/MapReduce were orig-
ray which is larger than the memory size. inally designed for petascale processing. We claim
that a single “scale-up” server can process each of
In: these jobs and do as well or better than a cluster in
ear terms of performance, cost, power, and server den-

en.sdjournal.org 63
sity” (Raja Appuswamy et al. (2013): “Nobody Ever Now, the whole data set is in the memory and can be
Got Fired for Buying a Cluster.” Microsoft Research, processed there.
Cambridge UK).
In terms of frequency, analytics tasks generally pro- In:
cess data not more than a couple of gigabytes. And this import numexpr as ne
is a sweet spot for Python and its performance libraries %time res = ne.evaluate(‘3 * sin(arr) + abs(arr) ** 0.5’)
like NumPy, pandas, PyTables, Numba, IOPro, etc.
Companies, research institutes and others involved in Out:
data analytics should therefore analyze first what spe- CPU times: user 6.37 s, sys: 264 ms, total: 6.64 s
cific tasks have to be accomplished in general and then Wall time: 881 ms
decide on the hard-/software architecture in terms of
In terms of hardware, the following components gener-
• scaling-out – cluster with many commodity nodes or ally help improve performance:
• scaling-up – single or few powerful servers with
many CPU cores, possibly a GPU and large • storage: better storage hardware, like hybrid drives or
amounts of memory. SSD, can improve disk-based I/O operations signifi-
cantly; in the example, by a factor of about 4 times
Two examples underpin this observation. First, the out- • memory: larger memory allows to implement more
of-memory calculation of the numerical expression with analytics tasks in-memory, avoiding generally slow-
PyTables takes 35 seconds on a standard node in the er disk-based I/O operations completely (apart from
cloud. Using a different hardware set-up, like hybrid maybe reading the input data from disk)
disk drives or SSD, can significantly improve I/O speeds • CPU: using multi-core CPUs allows for the par-
which is the bottleneck for out-of-memory calculations. allelization of such calculations; in the example
For example, the same operation takes only 9 seconds case, numexpr used eight threads of a four core
on a different machine with a hybrid disk drive. CPU to parallelize the execution of the code; in-
Similarly, having available enough RAM, allowing for memory, parallel execution then leads to a speed-
the in-memory evaluation of the same numerical ex- up of 40 times relative to the original out-of-mem-
pression, saves even more time. To this end, we read ory, disk-based calculation (881 milliseconds vs.
the complete disk-based array to the memory of a ma- 34.6 seconds).
chine with enough memory and then implement the cal-
culation with NumPy and numexpr. The Future of Python-based Analytics
Python has evolved from a high-level scripting language
In: to an environment for efficient and high performing da-
h5 = tb.openFile(‘data.h5’, ‘r’) ta and financial analytics. Python, in combination with
arr = h5.root.ear.read() such libraries as pandas or Numba, has the potential to
h5.close() revolutionize analytics as we know it today. At our com-
pany Continuum Analytics, the vision for Python-based
Listing 7. Expr module to evaluate the numerical expressio data analytics is the following:
“To revolutionize data analytics and visualization by
In: moving high-level Python code and domain expertise
expr = tb.Expr(‘3 * sin(ear) + abs(ear) ** 0.5’) closer to data. This vision rests on four pillars:
expr.setOutput(out, append_mode=True)
%time expr.eval() • simplicity: advanced, powerful analytics, accessible
to domain experts and business users via a simpli-
Out: fied programming paradigm
CPU times: user 2.29 s, sys: 1.51 s, total: 3.8 s • interactivity: interactive analysis and visualization of
Wall time: 34.6 s massive data sets
• collaboration: collaborative, shareable analysis (da-
/out (EArray(150000, 600)) ‘’ ta, code, results, graphics)
atom := Float64Atom(shape=(), dflt=0.0) • scale: out-of-core, distributed data processing”
maindim := 0
flavor := ‘numpy’ Continuum Analytics is actively involved in a number
byteorder := ‘little’ of open source and other Python-related projects – a
chunkshape := (13, 600) small selection of which have been introduced in this ar-
ticle – that aim at realizing this vision. Among them are:

64 13/2013
Financial Analytics with Python

• Anaconda This open source Python distribution • IOPro this commercial library provides optimized
contains the most important Python libraries and SQL, NoSQL, CSV interfaces for NumPy, SciPy,
tools needed – like NumPy, SciPy, PyTables, pan- and PyTables leading to high performance I/O op-
das, IPython – to set-up a consistent Python ana- erations with Python.
lytics environment on desktops/notebooks and or • Bokeh visualization of large data sets generally is a
servers/cloud nodes. difficult and/or slow task, in particular when the da-
• Wakari This Web-based solution allows the deploy- ta set has to be transferred via Web or intranet; the
ment of Python and e.g. the use of IPython Note- open source library Bokeh addresses this problem
books via public or private clouds (Currently, the and allows the browser-based, interactive visualiza-
cloud version of Wakari is operated by Continu- tion of large data sets.
um Analytics on Amazon EC2); in that way, Python
can be deployed across an organization by using In the end, Python in combination with these and sim-
standard browsers only and therewith avoiding the ilar libraries and tools will make possible data in-
need for costly software distribution and mainte- frastructures that are like large, interconnected Da-
nance; in addition, Wakari offers a number of func- ta Webs in the same way as technologies like URL,
tions to easily share both analytics code and re- HTTP or HTML made possible the World Wide Web.
sults within an organization or with the general pub- Using solutions like Wakari, businesses and other in-
lic. stitutions can then implement data analytics process-
• Blaze this open source library is designed to be a es on such a Data Web that are characterized by agili-
combination of the high performing array library ty, interactivity and collaboration.
NumPy and the fast hierarchical database PyTa- In the end, decision makers will be able to process,
bles; Blaze allows the use of very large arrays analyze and visualize big data interactively and in real-
which are potentially disk-based and distributed time, contributing to the bottom line of those companies
among a number of computing nodes; for example, who are able to systematically deploy these new Py-
multiplications of two arrays, each of size 400 GB, thon-based technologies and approaches.
become possible with this approach.
• NumbaPro this commercial just-in-time compiler
relies on the LLVM (aka for Low Level Virtual Ma-
chine), a compiler infrastructure written in C++; for Dr. Yves J. Hilpisch
example, as shown in this article, loop-heavy al- Managing Director Europe of Continuum Analytics, Inc., Aus-
gorithms can experience speed-ups of up to 1,900 tin, TX, USA. Lecturer Mathematical Finance at Saarland
times by being compiled with Numba; the Pro ver- University, Saarbruecken, Germany. Ph.D. in Mathemati-
sion adds additional capabilities to generate paral- cal Finance. http://www.hilpisch.com – [email protected]
lel, vectorized code for both CPUs and GPUs http://www.twitter.com/dyjh.

a d v e r t i s e m e n t
Test-Driven Development
With Python
Software development is easier and more accessible now than
it ever has been. Unfortunately, rapid development speeds
offered by modern programming languages make it easy for us
as programmers to overlook the possible error conditions in our
code and move on to other parts of a project. Automated tests
can provide us with a level of certainty that our code really does
handle various situations the way we expect it to, and these
tests can save hundreds upon thousands of man-hours over the
course of a project’s development lifecycle.

A
utomated testing is a broad topic–there are ma- • add a new (failing) test,
ny different types of automated tests that one • run your entire test suite and see the new test fail,
might write and use. In this article we’ll be con- • write code to satisfy the new test,
centrating on unit testing and, to some degree, integra- • run your entire test suite again and see all tests
tion testing using Python 3 and a methodology known pass,
as “test-driven development” (referred to as “TDD” from • refactor your code,
this point forward). Using TDD, you will learn how to • repeat.
spend more time coding than you spend manually test-
ing your code. There are several advantages to writing tests for your
To get the most out of this article, you should have a code before you write the actual code. One of the
fair understanding of common programming concepts. most valuable is that this process forces you to really
For starters, you should be familiar with variables, func- consider what you want the program to do before you
tions, classes, methods, and Python’s import mecha- start deciding how it will do so. This can help prepare
nism. We will be using some neat features in Python, you for unforeseen difficulties integrating your code
such as context managers, decorators, and monkey- with existing code or systems. You could also unearth
patching. You don’t necessarily need to understand the possible conflicts between requirements that are deliv-
intricacies of these features to use them for testing. ered to you to fulfill.
Josh VanderLinden is a life-long technology enthusi- Another incredibly appealing advantage of TDD is that
ast, who started programming at the age of ten. Josh you gain a higher level of confidence in the code that
has worked primarily in web development, but he also you’ve written. You can quickly detect bugs that new de-
has experience with network monitoring and systems velopment efforts might introduce when combined with
administration. He has recently gained a deep appre- older, historically stable code. This high level of confi-
ciation for automated testing and TDD. dence is great not only for you as a developer, but also
The main idea behind TDD is, as the name implies, for your supervisors and clients.
that your tests drive your development efforts. When The best way to learn anything like this is to do it your-
presented with a new requirement or goal, TDD would self. We’re going to build a simple game of Pig, relying
have you run through a series of steps: on TDD to gain a high level of confidence that our game

66 13/2013
Test-Driven Development With Python

will do what we want it to long before we actually play Player B may roll a six and decide to roll again. If play-
it. Some of the basic tasks our game should be able to er B rolls another six on the second roll and decides to
handle include the following: hold, player B will add 12 points to their total score. It
then becomes the next player’s turn.
• allow players to join, We’ll design our game of Pig as its own class, which
• roll a six-sided die, should make it easier to reuse the game logic else-
• track points for each player, where in the future.
• prompt players for input,
• end the game when a player reaches 100 points. Joining The Game
Before anyone can play a game, they have to be able to
We’ll tackle each one of those tasks, using the TDD join it, correct? We need a test to make sure that works:
process outlined above. Python’s built-in unittest li- Listing 2.
brary makes it easy to describe our expectations using
assertions. There are many different types of assertions Listing 1. An empty TestCase subclass
available in the standard library, most of which are pret-
ty self-explanatory given a mild understanding of Py- from unittest import TestCase
thon. For the rest, we have Python’s wonderful docu-
mentation [1]. We can assert that values are equal, one
object is an instance of another object, a string matches class GameTest(TestCase):
a regular expression, a specific exception is raised un-
der certain conditions, and much more. pass
With unittest, we can group a series of related tests
into subclasses of unittest.TestCase. Within those sub- Listing 2. Our first test
classes, we can add a series of methods whose names
begin with test. These test methods should be de- from unittest import TestCase
signed to work independently of the other test methods.
Any dependency between one test method and anoth- import game
er will be brittle and introduces the potential to cause
a chain reaction of failed tests when running your test
suite in its entirety. class GameTest(TestCase):
So let’s take a look at the structure for our project and
get into the code to see all of this in action. def test_join(self):
“””Players may join a game of Pig”””
pig/
game.py pig = game.Pig(‘PlayerA’, ‘PlayerB’, ‘PlayerC’)
test_game.py self.assertEqual(pig.get_players(), (‘PlayerA’,
‘PlayerB’, ‘PlayerC’))
Both files are currently empty. To get started, let’s add
an empty test case to test_game.py to prepare for our Listing 3. Running our first test
game of Pig: Listing 1.
E
The Game Of Pig ====================================================
The rules of Pig are simple: a player rolls a single die. ERROR: test_join (test_game.GameTest)
If they roll anything other than one, they add that value Players may join a game of Pig
to their score for that turn. If they roll a one, any points ----------------------------------------------------
they’ve accumulated for that turn are lost. A player’s Traceback (most recent call last):
turn is over when they roll a one or they decide to hold. File “./test_game.py”, line 11, in test_join
When a player holds before rolling a one, they add their pig = game.Pig(‘PlayerA’, ‘PlayerB’, ‘PlayerC’)
points for that turn to their total points. The first player to AttributeError: ‘module’ object has no attribute ‘Pig’
reach 100 points wins the game.
For example, if player A rolls a three, player A may ----------------------------------------------------
choose to roll again or hold. If player A decides to roll Ran 1 test in 0.000s
again and they roll another three, their total score for the
turn is six. If player A rolls again and rolls a one, their FAILED (errors=1)
score for the turn is zero and it becomes player B’s turn.

en.sdjournal.org 67
We simply instantiate a new Pig game with some This is obviously a failed test, but there’s a little more to
player names. Next, we check to see if we’re able to get it than just our assertion failing. Looking at the output a
an expected value out of the game. As mentioned ear- bit more closely, you’ll notice that it’s telling us that our
lier, we can describe our expectations using assertions– game module has no attribute Pig. This means that our
we assert that certain conditions are met. In this case, game.py file doesn’t have the class that we tried to in-
we’re asserting equality with TestCase.assertEqual. We stantiate for the game of Pig.
want the players who start a game of Pig to equal the It is very easy to get errors like this when you practice
same players returned by Pig.get_players. The TDD TDD. Not to worry; all we need to do at this point is stub
steps suggest that we should now run our test suite and out the class in game.py and run our test suite again. A
see what happens. stub is just a function, class, or method definition that
To do that, run the following command from your proj- does nothing other than create a name within the scope
ect directory: of the program (Listing 4).
When we run our test suite again, the output should
python -m unittest be a bit different: Listing 5.
Much better. Now we see F on the first line of output,
It should detect that the test_game.py file has a which is what we want at this point. This indicates that we
unittest.TestCase subclass in it and automatically run have a failing test method, or that one of the assertions
any tests within the file. Your output should be similar within the test method did not pass. Inspecting the ad-
to this: Listing 3. ditional output, we see that we have an AssertionError.
We had an error! The E on the first line of output indi- The return value of our Pig.get_players method is cur-
cates that a test method had some sort of Python error. rently None, but we expect the return value to be a tuple

Listing 4. Stubbing code that we plan to test Listing 6. Implementing code to satisfy the test

class Pig: class Pig:

def __init__(self, *players): def __init__(self, *players):


pass self.players = players

def get_players(self): def get_players(self):


“””Return a tuple of all players””” “””Returns a tuple of all players”””

pass return self.players

Listing 5. The test fails for the right reason Listing 7. The test is satisfied

F .
====================================================== ------------------------------------------------------
FAIL: test_join (test_game.GameTest) Ran 1 test in 0.000s
Players may join a game of Pig
------------------------------------------------------ OK
Traceback (most recent call last):
File “./test_game.py”, line 12, in test_join Listing 8. Test for the roll of a six-sided die
self.assertEqual(pig.get_players(), (‘PlayerA’,
‘PlayerB’, ‘PlayerC’)) def test_roll(self):
AssertionError: None != (‘PlayerA’, ‘PlayerB’, ‘PlayerC’) “””A roll of the die results in an integer
between 1 and 6”””
------------------------------------------------------
Ran 1 test in 0.000s pig = game.Pig(‘PlayerA’, ‘PlayerB’)

FAILED (failures=1) for i in range(500):


r = pig.roll()
self.assertIsInstance(r, int)
self.assertTrue(1 <= r <= 6)

68 13/2013
Test-Driven Development With Python

with player names. Now, following with the TDD pro- but it should give us a fair level of confidence anyway.
cess, we need to satisfy this test. No more, no less (List- Don’t forget to stub out the new Pig.roll method so our
ing 6). And we need to verify that we’ve satisfied the test fails instead of errors out (Listing 9 and listing 10).
test: Listing 7. Let’s check the output. There is a new F on the first
Excellent! The dot (.) on the first line of output indi- line of output. For each test method in our test suite, we
cates that our test method passed. The return value of should expect to see some indication that the respec-
Pig.get_players is exactly what we want it to be. We tive methods are executed. So far we’ve seen three
now have a high level of confidence that players may common indicators:
join a game of Pig, and we will quickly know if that stops
working at some point in the future. There’s nothing • E, which indicates that a test method ran but had a
more to do with this particular part of the game right Python error,
now. We’ve satisfied our basic requirement. Let’s move • F, which indicates that a test method ran but one of
on to another part of the game. our assertions within that method failed,
• ., which indicates that a test method ran and that
Rolling The Die all assertions passed successfully.
The next critical piece of our game has to do with how
players earn points. The game calls for a single six-sid- There are other indicators, but these are the three we’ll
ed die. We want to be confident that a player will always deal with for the time being. The next TDD step is to
roll a value between one and six. Here’s a possible test satisfy the test we’ve just written. We can use Python’s
for that requirement (Listing 8). built-in random library to make short work of this new
Since we’re relying on “random” numbers, we test the Pig.roll method (Listing 11 and Listing 12).
result of the roll method repeatedly. Our assertions all
happen within the loop because it’s important that we Checking Scores
always get an integer value from a roll and that the val- Players might want to check their score mid-game, so
ue is within our range of one to six. It’s not bulletproof, let’s add a test to make sure that’s possible. Again, don’t

Listing 9. Stub of our new Pig.roll method


“””Return a number between 1 and 6”””
def roll(self):
“””Return a number between 1 and 6””” return random.randint(1, 6)

pass Listing 12. Implemention meets our expectations

Listing 10. Die rolling test fails ..


------------------------------------------------------
.F Ran 2 tests in 0.003s
======================================================
FAIL: test_roll (test_game.GameTest) OK
A roll of the die results in an integer between 1 and 6
------------------------------------------------------ Listing 13. Test that each player has a default score
Traceback (most recent call last):
File “./test_game.py”, line 21, in test_roll def test_scores(self):
self.assertIsInstance(r, int) “””Player scores can be retrieved”””
AssertionError: None is not an instance of <class ‘int’>
pig = game.Pig(‘PlayerA’, ‘PlayerB’,
------------------------------------------------------ ‘PlayerC’)
Ran 2 tests in 0.001s self.assertEqual(
pig.get_score(),
FAILED (failures=1) {
‘PlayerA’: 0,
Listing 11. Implementing the roll of a die ‘PlayerB’: 0,
‘PlayerC’: 0
import random }
def roll(self): )

en.sdjournal.org 69
forget to stub out the new Pig.get_scores method (List- way to leverage functionality built into Python to refac-
ing 13 and Listing 14). tor our code. The end result is the same. Had we test-
Note that ordering in dictionaries is not guaranteed, ed the specific low-level implementation of our Pig.get_
so your keys might not be printed out in the same order score definition, the test could have easily broken after
that you typed them in your code. And now to satisfy the refactoring despite the code still ultimately doing what
test (Listing 15 and Figure 16). we want.
The test has been satisfied. We can move on to an- The idea of validating the output of a unit of code
other piece of code now if we’d like, but let’s remember when given known input encourages another valuable
the fifth step from our TDD process. Let’s try refactoring practice. It stimulates the desire to design our code with
some code that we already know is working and make more single-purpose functions and methods. It also dis-
sure our assertions still pass. courages the inclusion of side effects.
Python’s dictionary object has a neat little method In this context, side effects can mean that we’re
called fromkeys that we can use to create a new dic- changing internal variables or state which could influ-
tionary with a list of keys. Additionally, we can use this ence the behavior other units of code. If we only deal
method to set the default value for all of the keys that with input values and return values, it’s very easy to rea-
we specify. Since we’ve already got a tuple of player son about the behavior of our code. Side effects are
names, we can pass that directly into the dict.fromkeys not always bad, but they can introduce some interesting
method (Listing 17 and Listing 18). conditions at runtime that are difficult to reproduce for
The fact that our test still passes illustrates a few very automated testing.
important concepts to understand about valuable auto- It’s much easier to confidently test smaller, single-
mated testing. The most useful unit tests will treat the purpose units of code than it is to test massive blocks
production code as a “black box”. We don’t want to test of code. We can achieve more complex behavior by
implementation. Rather, we want to test the output of a chaining together the smaller units of code, and we
unit of code given known input. can have a high level of confidence in these compo-
Testing the internal implementation of a function or sitions because we know the underlying units meet
method is asking for trouble. In our case, we found a our expectations.

Listing 14. Default score is not implemented


“””Return the score for all players”””
..F
====================================================== return self.scores
FAIL: test_scores (test_game.GameTest)
Player scores can be retrieved Listing 16. Checking our default scores implementation
------------------------------------------------------
Traceback (most recent call last): ...
File “./test_game.py”, line 33, in test_scores ------------------------------------------------------
‘PlayerC’: 0 Ran 3 tests in 0.004s
AssertionError: None != {‘PlayerB’: 0, ‘PlayerC’: 0,
‘PlayerA’: 0} OK

------------------------------------------------------ Listing 17. Another way to handle default scores


Ran 3 tests in 0.004s
def __init__(self, *players):
FAILED (failures=1) self.players = players
self.scores = dict.fromkeys(self.players, 0)
Listing 15. First implementation for default scores
Listing 18. The new implementation is acceptable
def __init__(self, *players):
self.players = players ...
------------------------------------------------------
self.scores = {} Ran 3 tests in 0.003s
for player in self.players:
self.scores[player] = 0 OK
def get_score(self):

70 13/2013
Test-Driven Development With Python

Prompt Players For Input The mock library is extremely powerful, but it can take
Now we’ll get into something more interesting by test- a while to get used to. Here we’re using it to mock the
ing user input. This brings up a rather large stumbling return value of multiple calls to Python’s built-in input
block that many encounter when learning how to test function through mock‘s side_effect feature. When you
their code: external systems. External systems may in- specify a list as the side effect of a mocked object,
clude databases, web services, local filesystems, and you’re specifying the return value for each call to that
countless others. mocked object. For each call to input, the first value will
During testing, we don’t want to have to rely on our be removed from the list and used as the return value
test computer, for example, being on a network, con- of the call.
nected to the Internet, having routes to a database serv- In our code the first call to input will consume and re-
er, or making sure that a database server itself is online. turn ‘A’, leaving [‘M’, ‘Z’, ‘’] as the remaining return
Depending on all of those external systems being online values. We add an additional empty value as a side ef-
is brittle and error-prone for automated testing. fect to signal when we’re done entering player names.
In our case, user input can be considered an exter- And we don’t expect the empty value to appear as a
nal system. We don’t control values given to use by the player name.
user, but we want to be able to deal with those values. Note that if you supply fewer return values in the side_
Prompting the user for input each and every time we effect list than you have calls to the mocked object, the
launch our test suite would adversely affect our tests in code will raise a StopIteration exception. Say, for ex-
multiple ways. For example, the tests would suddenly ample, that you set the side_effect to [1] but that you
take much longer, and the user would have to enter the called input twice in the code. The first time you call
same values each time they run the tests. input, you’d get the 1 back. The second time you call
We can leverage a concept called “mocking” to remove input, it would raise the exception, indicating that our
all sorts of external systems from influencing our tests in side_effect list has nothing more to return.
bad ways. We can mock, or fake, the user input using We’re able to use this mocked input function through
known values, which will keep our tests running quickly what’s called a context manager. That is the block that
and consistently. We’ll use a fabulous library that is built begins with the keyword with. A context manager basi-
into Python (as of version 3.3) called mock for this. cally handles the setup and teardown for the block of
Let’s begin testing user input by testing that we can code it contains. In this example, the mock.patch context
prompt for player names. We’ll implement this one as a manager will handle the temporary patching of the built-
standalone function that is separate from the Pig class. in input function while we run game.get_player_names().
First of all, we need to modify our import line in test_ After the code in the with block as been executed, the
game.py so we can use the mock library (Listing 19-21). context manager will roll back the input function to its

Listing 19. Importing the mock library

from unittest import TestCase, mock pass

Listing 20. Introducing mocked objects Listing 22. The new test fails

def test_get_player_names(self): F...


“””Players can enter their names””” ======================================================
FAIL: test_get_player_names (test_game.GameTest)
fake_input = mock.Mock(side_effect=[‘A’, ‘M’, Players can enter their names
‘Z’, ‘’]) ------------------------------------------------------
Traceback (most recent call last):
with mock.patch(‘builtins.input’, fake_input): File “./test_game.py”, line 45, in test_get_player_
names = game.get_player_names() names
self.assertEqual(names, [‘A’, ‘M’, ‘Z’])
self.assertEqual(names, [‘A’, ‘M’, ‘Z’]) AssertionError: None != [‘A’, ‘M’, ‘Z’]

Listing 21. Stub function that we will test ------------------------------------------------------


Ran 4 tests in 0.004s
def get_player_names():
“””Prompt for player names””” FAILED (failures=1)

en.sdjournal.org 71
original, built-in state. This is very important, particularly player_names function here – we’ve already done that in
if the code in the with block raises some sort of error. another test (Listing 26).
Even in conditions such as these, the changes to the Perfect. It works as we expect it to. One thing to
input function will be reverted, allowing other code that take away from this example is that there does not
may depend on input‘s (or whatever object we have need to be a one-to-one ratio of test methods to actu-
mocked) original functionality to proceed as expected. al pieces of code. Right now we’ve got two test meth-
Let’s run the test suite to make sure our new test fails ods for the very same get_player_names function. It is
(Listing 22). Well that was easy! Here’s a possible way often good to have multiple test methods for a single
to satisfy this test: Listing 23 and Listing 24. unit of code if that code may behave differently under
Would you look at that?! We’re able to test user input various conditions.
without slowing down our tests much at all! Also note that we didn’t exactly follow the TDD pro-
Notice, however, that we have passed a parameter cess for this last test. The code for which we wrote the
to the input function. This is the prompt that appears on test had already been implemented to satisfy an earlier
the screen when the program asks for player names. test. It is acceptable to veer away from the TDD pro-
Let’s say we want to make sure that it’s actually printing cess, particularly if we want to validate assumptions
out what we expect it to print out (Listing 25). that have been made along the way. When we imple-
This time we’re mocking the input function a bit dif- mented the original get_player_names function, we as-
ferently. Instead of defining a new mock.Mock object ex- sumed that the prompt would look the way we wanted it
plicitly, we’re letting the mock.patch context manager de- to look. Our latest test simply proves that our assump-
fine one for us with certain side effects. When you use tions were correct. And now we will be able to quickly
the context manager in this way, you’re able to obtain detect if the prompt begins misbehaving at some point
the implicitly-created mock.Mock object using the as key- in the future.
word. We have assigned the mocked input function to a
variable called fake, and we simply call the same code To Hold or To Roll
as in our previous test. Now it’s time to write a test for different branches of
After running that code, we check to see if our fake code for when a player chooses to hold or roll again.
input function was called with certain arguments using We want to make sure that our roll_or_hold method
the assert_has_calls method on our mock.Mock object. will only return roll or hold and that it won’t error out
Notice that we aren’t checking the result of the get_ with invalid input (Listing 27).

Listing 23. Getting a list of player names from the user Listing 25. Test that the correct prompt appears on screen

def get_player_names(): def test_get_player_names_stdout(self):


“””Prompt for player names””” “””Check the prompts for player names”””

names = [] with mock.patch(‘builtins.input’, side_


effect=[‘A’, ‘B’, ‘’]) as fake:
while True: game.get_player_names()
value = input(“Player {}’s name:
“.format(len(names) + 1)) fake.assert_has_calls([
if not value: mock.call(“Player 1’s name: “),
break mock.call(“Player 2’s name: “),
mock.call(“Player 3’s name: “)
names.append(value) ])

return names Listing 26. All tests pass

Listing 24. Our implementation meets expectations .....


------------------------------------------------------
.... Ran 5 tests in 0.005s
------------------------------------------------------
Ran 4 tests in 0.004s OK

OK

72 13/2013
Test-Driven Development With Python

This example shows yet another option that we have working. Eventually you will enjoy seeing failed tests as
for mocking objects. We’ve “decorated” the test_roll_ well. Trust me.
or_hold method with @mock.patch(‘builtins.input’). And to satisfy our new test, we could use some-
When we use this option, we basically turn the entire thing like this: Listing 29. Run the test suite (Listing
contents of the method into the block within a context 30). We know that our new code works. Even better
manager. The builtins.input function will be a mocked than that, we know that we haven’t broken any exist-
object throughout the entire method. ing functionality.
Also notice that the test method needs to accept an
additional parameter, which we’ve called fake_input. Refactoring Tests
When you mock objects with decorators in this way, Since we’re doing so much with user input, let’s take a
your test methods must accept an additional parameter few minutes to refactor our tests to use a common mock
for each mocked object. for the built-in input function before proceeding with our
This time we’re expecting to prompt the player to see testing (Listing 31).
whether they want to roll again or hold to end their turn. A lot has changed in our tests code-wise, but the be-
We set the side_effect of our fake_input mock to in- havior should be exactly the same as before. Let’s re-
clude our expected values of r (roll) and h (hold) in both view the changes (Listing 32).
lower and upper case, along with some input that we We have defined a global mock.Mock instance called
don’t know how to use. INPUT. This will be the variable that we use in place of
When we run the test suite with this new test (after the various uses of mocked input. We are also using
stubbing out our roll_or_hold method), it should fail mock.patch as a class decorator now, which will allow
(Listing 28). Fantastic! Notice how I get excited when all test methods within the class to access the mocked
I see a failing test? It means that the TDD process is input function through our INPUT global.

Listing 27. Player can choose to roll or hold


------------------------------------------------------
@mock.patch(‘builtins.input’) Ran 6 tests in 0.007s
def test_roll_or_hold(self, fake_input):
“””Player can choose to roll or hold””” FAILED (failures=1)

fake_input.side_effect = [‘R’, ‘H’, ‘h’, ‘z’, Listing 29. Implementing the next action prompt
‘12345’, ‘r’]
def roll_or_hold(self):
pig = game.Pig(‘PlayerA’, ‘PlayerB’) “””Return ‘roll’ or ‘hold’ based on user
input”””
self.assertEqual(pig.roll_or_hold(), ‘roll’)
self.assertEqual(pig.roll_or_hold(), ‘hold’) action = ‘’
self.assertEqual(pig.roll_or_hold(), ‘hold’) while True:
self.assertEqual(pig.roll_or_hold(), ‘roll’) value = input(‘(R)oll or (H)old? ‘)
if value.lower() == ‘r’:
Listing 28. Test fails with stub action = ‘roll’
break
....F. elif value.lower() == ‘h’:
====================================================== action = ‘hold’
FAIL: test_roll_or_hold (test_game.GameTest) break
Player can choose to roll or hold
------------------------------------------------------ return action
Traceback (most recent call last):
File “/usr/lib/python3.3/unittest/mock.py”, line Listing 30. All tests pass
1087, in patched
return func(*args, **keywargs) ......
File “./test_game.py”, line 67, in test_roll_or_hold ------------------------------------------------------
self.assertEqual(pig.roll_or_hold(), ‘roll’) Ran 6 tests in 0.006s
AssertionError: None != ‘roll’
OK

en.sdjournal.org 73
This decorator is a bit different from the one we used to accept the mocked input function. Instead, any test
earlier. Instead of allowing a mock.Mock object to be method that needs to access the mock may use the
implicitly created for us, we’re specifying our own in- INPUT global (Listing 33).
stance. The value in this solution is that you don’t have We’ve added a setUp method to our class. This meth-
to modify the method signatures for each test method od name has a special meaning when used with Py-

Listing 31. Refactoring test code


def test_get_player_names(self):
from unittest import TestCase, mock “””Players can enter their names”””

import game INPUT.side_effect = [‘A’, ‘M’, ‘Z’, ‘’]

INPUT = mock.Mock() names = game.get_player_names()

self.assertEqual(names, [‘A’, ‘M’, ‘Z’])


@mock.patch(‘builtins.input’, INPUT)
class GameTest(TestCase): def test_get_player_names_stdout(self):
“””Check the prompts for player names”””
def setUp(self):
INPUT.reset_mock() INPUT.side_effect = [‘A’, ‘B’, ‘’]

def test_join(self): game.get_player_names()


“””Players may join a game of Pig”””
INPUT.assert_has_calls([
pig = game.Pig(‘PlayerA’, ‘PlayerB’, mock.call(“Player 1’s name: “),
‘PlayerC’) mock.call(“Player 2’s name: “),
self.assertEqual(pig.get_players(), mock.call(“Player 3’s name: “)
(‘PlayerA’, ‘PlayerB’, ‘PlayerC’)) ])

def test_roll(self): def test_roll_or_hold(self):


“””A roll of the die results in an integer “””Player can choose to roll or hold”””
between 1 and 6”””
INPUT.side_effect = [‘R’, ‘H’, ‘h’, ‘z’,
pig = game.Pig(‘PlayerA’, ‘PlayerB’) ‘12345’, ‘r’]

for i in range(500): pig = game.Pig(‘PlayerA’, ‘PlayerB’)


r = pig.roll()
self.assertIsInstance(r, int) self.assertEqual(pig.roll_or_hold(), ‘roll’)
self.assertTrue(1 <= r <= 6) self.assertEqual(pig.roll_or_hold(), ‘hold’)
self.assertEqual(pig.roll_or_hold(), ‘hold’)
def test_scores(self): self.assertEqual(pig.roll_or_hold(), ‘roll’)
“””Player scores can be retrieved”””
Listing 32. Global mock.Mock object and class decoration
pig = game.Pig(‘PlayerA’, ‘PlayerB’,
‘PlayerC’) INPUT = mock.Mock()
self.assertEqual(
pig.get_score(),
{ @mock.patch(‘builtins.input’, INPUT)
‘PlayerA’: 0, class GameTest(TestCase):
‘PlayerB’: 0,
‘PlayerC’: 0 Listing 33. Reset global mocks before each test method
} def setUp(self):
) INPUT.reset_mock()

74 13/2013
Test-Driven Development With Python

thon’s unittest library. The setUp method will be ex- The test_get_player_names test method no longer de-
ecuted before each and every test method within the fines its own mock object. The context manager is also
class. There’s a similar special method called tearDown not necessary anymore, since the entire method is ef-
that is executed after each and every test method within fectively executed within a context manager because
the class. we’ve decorated the entire class. All we need to do is
These methods are useful for getting things into a specify the side effects, or list of return values, for our
state such that our tests will run successfully or clean- mocked input function. The test_get_player_names_
ing up after our tests. We’re using the setUp method to stdout test method has also been updated in a similar
reset our mocked input function. This means that any fashion (Listing 35).
calls or side effects from one test method are removed Finally, our test_roll_or_hold test method no longer
from the mock, leaving it in a pristine state at the start of has its own decorator. Also note that the additional pa-
each test (Listing 34). rameter to the method is no longer necessary.

Listing 34. Updating existing test methods to use global mock


------------------------------------------------------
def test_get_player_names(self): Ran 6 tests in 0.005s
“””Players can enter their names”””
OK
INPUT.side_effect = [‘A’, ‘M’, ‘Z’, ‘’]
Listing 37. Testing actual gameplay
names = game.get_player_names()
def test_gameplay(self):
self.assertEqual(names, [‘A’, ‘M’, ‘Z’]) “””Users may play a game of Pig”””
def test_get_player_names_stdout(self):
“””Check the prompts for player names””” INPUT.side_effect = [
# player names
INPUT.side_effect = [‘A’, ‘B’, ‘’] ‘George’,
‘Bob’,
game.get_player_names() ‘’,

INPUT.assert_has_calls([ # roll or hold


mock.call(“Player 1’s name: “), ‘r’, ‘r’, # George
mock.call(“Player 2’s name: “), ‘r’, ‘r’, ‘r’, ‘h’, # Bob
mock.call(“Player 3’s name: “) ‘r’, ‘r’, ‘r’, ‘h’, # George
]) ]

Listing 35. Using the global mock pig = game.Pig(*game.get_player_names())


pig.roll = mock.Mock(side_effect=[
def test_roll_or_hold(self): 6, 6, 1, # George
“””Player can choose to roll or hold””” 6, 6, 6, 6, # Bob
5, 4, 3, 2, # George
INPUT.side_effect = [‘R’, ‘H’, ‘h’, ‘z’, ])
‘12345’, ‘r’]
self.assertRaises(StopIteration, pig.play)
pig = game.Pig(‘PlayerA’, ‘PlayerB’)
self.assertEqual(
self.assertEqual(pig.roll_or_hold(), ‘roll’) pig.get_score(),
self.assertEqual(pig.roll_or_hold(), ‘hold’) {
self.assertEqual(pig.roll_or_hold(), ‘hold’) ‘George’: 14,
self.assertEqual(pig.roll_or_hold(), ‘roll’) ‘Bob’: 24
}
Listing 36. Refactoring has not broken our tests )

......

en.sdjournal.org 75
When you find that you are mocking the same thing in Instead of monkey patching the Pig.roll method, we
many different test methods, as we were doing with the could have mocked the random.randint function. How-
input function, a refactor like what we’ve just done can ever, doing so would be walking the fine and dangerous
be a good idea. Your test code becomes much clean- line of relying on the underlying implementation of our
er and more consistent. As your test suite continues to Pig.roll method. If we ever changed our algorithm for
grow, just like with any code, you need to be able to rolling a die and our tests mocked random.randint, our
maintain it. Abstracting out common code, both in your test would likely fail.
tests and in your production code, early on will help you Our first course of action is to specify the values that
and others to maintain and understand the code. we want to have returned from both of these mocked
Now that we’ve reviewed the changes, let’s verify that functions. For our input, we’ll start with prompting for
our tests haven’t broken (Listing 36). Wonderful. All is player names and also include some “roll or hold” re-
well with our refactored tests. sponses. Next we instantiate a Pig game and define
some not-so-random values that the players will roll.
Tying It All Together All we are interested in checking for now is that play-
We have successfully implemented the basic components ers each take turns rolling and that their scores are
of our Pig game. Now it’s time to tie everything together adjusted according to the rules of the game. We don’t
into a game that people can play. What we’re about to do need to worry just yet about a player winning when they
could be considered a sort of integration test. We aren’t earn 100 or more points.
integrating with any external systems, but we’re going to We’re using the self.assertRaises() method because
combine all of our work up to this point together. We want we know that neither player will obtain at least 100 points
to be sure that the previously tested units of code will op- given the side effect values for each mock. As discussed
erate nicely when meshed together (Listing 37). This test earlier, we know that the game will exhaust our list of re-
method is different from our previous tests in a few ways. turn values and expect that the mock library itself (not our
First, we’re dealing with two mocked objects. We’ve got game!) will raise the StopIteration exception.
our usual mocked input function, but we’re also monkey After defining our input values and “random” roll val-
patching our game’s roll method. We want this addition- ues, we run through the game long enough for the play-
al mock so that we’re dealing with known values as op- ers to earn some points. Then we check that each play-
posed to randomly generated integers. er has the expected number of points. Our test is relying

Listing 38. Test fails with the stub

F...... for player in cycle(self.players):


====================================================== print(‘Now rolling: {}’.format(player))
FAIL: test_gameplay (test_game.GameTest) action = ‘roll’
Users may play a game of Pig turn_points = 0
------------------------------------------------------
Traceback (most recent call last): while action == ‘roll’:
File “/usr/lib/python3.3/unittest/mock.py”, line value = self.roll()
1087, in patched if value == 1:
return func(*args, **keywargs) print(‘{} rolled a 1 and lost {}
File “./test_game.py”, line 99, in test_gameplay points’.format(player, turn_points))
self.assertRaises(StopIteration, pig.play) break
AssertionError: StopIteration not raised by play
turn_points += value
------------------------------------------------------ print(‘{} rolled a {} and now has {}
Ran 7 tests in 0.007s points for this turn’.format(
player, value, turn_points
FAILED (failures=1) ))

Listing 39. Gameplay implementation action = self.roll_or_hold()

from itertools import cycle self.scores[player] += turn_points


def play(self):
“””Start a game of Pig”””

76 13/2013
Test-Driven Development With Python

on the fact that our assertions up to this point are pass- our roll_or_hold method to see if the user would like to
ing. bSo let’s take a look at our failing test (again, after roll again or hold. When the user chooses to roll again,
stubbing the new play method): Listing 38. action is set to ‘roll’, which satisfies the condition for
Marvelous, the test fails, exactly as we want it to. Let’s the while loop to iterate again. If the user chooses to
fix that by implementing our game (Listing 39). hold, action is set to ‘hold’, which does not satisfy the
So the core of any game is that all players take turns. while loop condition.
We will use Python’s built-in itertools library to make When a player’s turn is over, either from rolling a one
that easy. This library has a cycle function, which will or choosing to hold, we add the points they earned dur-
continue to return the same values over and over. All we ing their turn to their overall score. The for loop and
need to do is pass our list of player names into cycle(). itertools.cycle function takes care of moving on to the
Obviously, there are other ways to achieve this same next player and starting all over again.
functionality, but this is probably the easiest option. Let’s run our test to see if our code meets our expec-
Next, we print the name of the player who is about to tations (Listing 40).
roll and set the number of points earned during the turn Oh boy. This is not quite what we expected. First of
to zero. Since each player gets to choose to roll or hold all, we see the output of all of the print functions in our
most of the time, we roll the die within a while loop. That game, which makes it difficult to see the progression of
is to say, while the user chooses to roll, execute the our tests. Additionally, our player scores did not quite
code block within the while statement. end up as we wanted them to.
The first step to that loop is to roll the die. Because Let’s fix the broken scores problem first. Notice that
of the values that we specified in our test for the roll() George has many more points than we expected – he
method, we know exactly what will come of each roll ended up with 26 points instead of the 14 that he should
of the die. Per the rules of Pig, we need to check if the have earned. This suggests that he still earned points
rolled value is a one. If so, the player loses all points for a turn when he shouldn’t have. Let’s inspect that
earned for the turn and it becomes the next player’s block of code: Listing 41.
turn. The break statement allows us to break out of the Ah hah! We display that the player loses their turn
while loop, but continue within the for loop. points when they roll a one, but we don’t actually have
If the rolled value is something other than one, we add code to do that. Let’s fix that: Listing 42. Now to verify
the value to the player’s points for the turn. Then we use that this fixes the problem (Listing 43).

Listing 40. Broken implementation and print output in test


results 1087, in patched
return func(*args, **keywargs)
F......Now rolling: George File “./test_game.py”, line 105, in test_gameplay
George rolled a 6 and now has 6 points for this turn ‘Bob’: 24
George rolled a 6 and now has 12 points for this turn AssertionError: {‘George’: 26, ‘Bob’: 24} !=
George rolled a 1 and lost 12 points {‘George’: 14, ‘Bob’: 24}
Now rolling: Bob - {‘Bob’: 24, ‘George’: 26}
Bob rolled a 6 and now has 6 points for this turn ? ^^
Bob rolled a 6 and now has 12 points for this turn
Bob rolled a 6 and now has 18 points for this turn + {‘Bob’: 24, ‘George’: 14}
Bob rolled a 6 and now has 24 points for this turn ? ^^
Now rolling: George
George rolled a 5 and now has 5 points for this turn
George rolled a 4 and now has 9 points for this turn ------------------------------------------------------
George rolled a 3 and now has 12 points for this turn Ran 7 tests in 0.009s
George rolled a 2 and now has 14 points for this turn
Now rolling: Bob FAILED (failures=1)

====================================================== Listing 41. The culprit


FAIL: test_gameplay (test_game.GameTest)
Users may play a game of Pig if value == 1:
------------------------------------------------------ print(‘{} rolled a 1 and lost {}
Traceback (most recent call last): points’.format(player, turn_points))
File “/usr/lib/python3.3/unittest/mock.py”, line break

en.sdjournal.org 77
Perfect. The scores end up as we expect. The only @mock.patch(‘builtins.print’, mock.Mock())
problem now is that we still see all of the output of the def test_something(self):
print function, which clutters our test output. There a
many ways to hide this output. Let’s use mock to hide it. The first option requires an additional parameter to the
One option for hiding output with mock is to use a dec- decorated test method while the second option requires
orator. If we want to be able to assert that certain strings no change to the test method signature. Since we aren’t
or patterns of strings will be printed to the screen, we particularly interested in testing the print function right
could use a decorator similar to what we did previously now, we’ll use the second option (Listing 44).
with the input function: Let’s see if the test output has been cleaned up at all
with our updated test (Listing 45).
@mock.patch(‘builtins.print’) Isn’t mock wonderful? It is so very powerful, and we’re
def test_something(self, fake_print): only scratching the surface of what it offers.

Alternatively, if we don’t care to make any assertions Winning The Game


about what is printed to the screen, we can use a dec- The final piece to our game is that one player must be
orator such as: able to win the game. As it stands, our game will con-

Listing 42. The solution


INPUT.side_effect = [
if value == 1: # player names
print(‘{} rolled a 1 and lost {} ‘George’,
points’.format(player, turn_points)) ‘Bob’,
turn_points = 0 ‘’,
break
# roll or hold
Listing 43. Acceptable implementation still with print output ‘r’, ‘r’, # George
‘r’, ‘r’, ‘r’, ‘h’, # Bob
.......Now rolling: George ‘r’, ‘r’, ‘r’, ‘h’, # George
George rolled a 6 and now has 6 points for this turn ]
George rolled a 6 and now has 12 points for this turn
George rolled a 1 and lost 12 points pig = game.Pig(*game.get_player_names())
Now rolling: Bob pig.roll = mock.Mock(side_effect=[
Bob rolled a 6 and now has 6 points for this turn 6, 6, 1, # George
Bob rolled a 6 and now has 12 points for this turn 6, 6, 6, 6, # Bob
Bob rolled a 6 and now has 18 points for this turn 5, 4, 3, 2, # George
Bob rolled a 6 and now has 24 points for this turn ])
Now rolling: George
George rolled a 5 and now has 5 points for this turn self.assertRaises(StopIteration, pig.play)
George rolled a 4 and now has 9 points for this turn
George rolled a 3 and now has 12 points for this turn self.assertEqual(
George rolled a 2 and now has 14 points for this turn pig.get_score(),
Now rolling: Bob {
‘George’: 14,
------------------------------------------------------ ‘Bob’: 24
Ran 7 tests in 0.007s }
)
OK
Listing 45. All tests pass with no print output
Listing 44. Suppressing print output
.......
@mock.patch(‘builtins.print’, mock.Mock()) ------------------------------------------------------
def test_gameplay(self): Ran 7 tests in 0.007s
“””Users may play a game of Pig”””
OK

78 13/2013
Test-Driven Development With Python

Listing 46. Check that a player may indeed win the game
result = next(effect)
@mock.patch(‘builtins.print’) StopIteration
def test_winning(self, fake_print):
“””A player wins when they earn 100 points””” ------------------------------------------------------
Ran 8 tests in 0.011s
INPUT.side_effect = [
# player names FAILED (errors=1)
‘George’,
‘Bob’, Listing 48. First attempt to allow winning
‘’,
def play(self):
# roll or hold “””Start a game of Pig”””
‘r’, ‘r’, # George
] for player in cycle(self.players):
print(‘Now rolling: {}’.format(player))
pig = game.Pig(*game.get_player_names()) action = ‘roll’
pig.roll = mock.Mock(side_effect=[2, 2]) turn_points = 0

pig.scores[‘George’] = 97 while action == ‘roll’:


pig.scores[‘Bob’] = 96 value = self.roll()
if value == 1:
pig.play() print(‘{} rolled a 1 and lost
{} points’.format(player, turn_
self.assertEqual( points))
pig.get_score(), turn_points = 0
{ break
‘George’: 101,
‘Bob’: 96 turn_points += value
} print(‘{} rolled a {} and now has {}
) points for this turn’.format(
fake_print.assert_called_with(‘George won the player, value, turn_points
game with 101 points!’) ))

Listing 47. Players currently cannot win action = self.roll_or_hold()

.......E self.scores[player] += turn_points


====================================================== if self.scores[player] >= 100:
ERROR: test_winning (test_game.GameTest) print(‘{} won the game with {}
A player wins when they earn 100 points points!’.format(
------------------------------------------------------ player, self.scores[player]
Traceback (most recent call last): ))
File “/usr/lib/python3.3/unittest/mock.py”, line return
1087, in patched
return func(*args, **keywargs)
File “./test_game.py”, line 130, in test_winning
pig.play()
File “./game.py”, line 50, in play
value = self.roll()
File “/usr/lib/python3.3/unittest/mock.py”, line
846, in __call__
return _mock_self._mock_call(*args, **kwargs)
File “/usr/lib/python3.3/unittest/mock.py”, line
904, in _mock_call

en.sdjournal.org 79
tinue indefinitely. There’s nothing to check when a play- continues, even when George’s score exceeds the
er’s score reaches or exceeds 100 points. To make our maximum, and our mocked Pig.roll method runs out
lives easier, we’ll assume that the players have already of return values. We don’t want to use the TestCase.
played a few rounds (so we don’t need to specify a bil- assertRaises method here. We expect the game to
lion input values or “random” roll values) (Listing 46). end after any player’s score reaches 100 points, which
The setup for this test is very similar to what we did means the Pig.roll method should not be called any-
for the previous test. The primary difference is that we more. Let’s try to satisfy the test (Listing 48).
set the scores for the players to be near 100. We also After each player’s turn, we check to see if the play-
want to check some portion of the screen output, so we er’s score is 100 or more. Seems like it should work,
changed the method decorator a bit. nWe’ve introduced right? Let’s check (Listing 49).
a new call with our screen output check: Mock.assert_ Hmmm... We get the same StopIteration exception.
called_with(). This handy method will check that the Why do you suppose that is? We’re just checking to see
most recent call to our mocked object had certain pa- if a player’s total score reaches 100, right? That’s true,
rameters. Our assertion is checking that the last thing but we’re only doing it at the end of a player’s turn. We
our print function is invoked with is the winning string. need to check to see if they reach 100 points during
What happens when we run the test as it is (Listing 47)? their turn, not when they lose their turn points or decide
Hey, there’s the StopIteration exception that we dis- to hold. Let’s try this again (Listing 50).
cussed a couple of times before. We’ve only specified We’ve moved the total score check into the while
two roll values, which should be just enough to push loop, after the check to see if the player rolled a one.
George’s score over 100. The problem is that the game How does our test look now (Listing 51)?

Listing 49. Same error as before; players still cannot win


for player in cycle(self.players):
.......E print(‘Now rolling: {}’.format(player))
====================================================== action = ‘roll’
ERROR: test_winning (test_game.GameTest) turn_points = 0
A player wins when they earn 100 points
------------------------------------------------------ while action == ‘roll’:
Traceback (most recent call last): value = self.roll()
File “/usr/lib/python3.3/unittest/mock.py”, line if value == 1:
1087, in patched print(‘{} rolled a 1 and lost
return func(*args, **keywargs) {} points’.format(player, turn_
File “./test_game.py”, line 130, in test_winning points))
pig.play() turn_points = 0
File “./game.py”, line 50, in play break
value = self.roll()
File “/usr/lib/python3.3/unittest/mock.py”, line turn_points += value
846, in __call__ print(‘{} rolled a {} and now has {}
return _mock_self._mock_call(*args, **kwargs) points for this turn’.format(
File “/usr/lib/python3.3/unittest/mock.py”, line player, value, turn_points
904, in _mock_call ))
result = next(effect)
StopIteration if self.scores[player] + turn_points
>= 100:
------------------------------------------------------ self.scores[player] += turn_points
Ran 8 tests in 0.011s print(‘{} won the game with {}
points!’.format(
FAILED (errors=1) player, self.scores[player]
))
Listing 50. Winning check needs to happen elsewhere return

def play(self): action = self.roll_or_hold()


“””Start a game of Pig”””
self.scores[player] += turn_points

80 13/2013
Test-Driven Development With Python

Playing From the Command Line The first object we’re mocking is the built-in print func-
It would appear that our basic Pig game is now com- tion. Again, this way of mocking objects is very similar to
plete. We’ve tested and implemented all of the basics of mocking with class or method decorators. Since we will
the game. But how can we play it ourselves? We should be invoking the game from the command line, we won’t
probably make the game easy to run from the command be able to easily inspect the internal state of our Pig game
line. But first, we need to describe our expectations in instance for scores. As such, we’re mocking print so that
a test (Listing 52). This test starts out much like our re- we can check screen output with our expectations.
cent gameplay tests by defining some return values for We’re also patching our Pig.roll method as before,
our mocked input function. After that, though, things are only this time we’re using a new mock.patch.object
very much different. We see that multiple context man- function. Notice that all of our uses of mock.patch thus
agers can be used with one with statement. It’s also far have been passed a simple string as the first param-
possible to do multiple nested with statements, but that eter. This time we’re passing an actual object as the first
depends on your preference. parameter and a string as the second parameter.

Listing 51. Players may now win the game


mock.call(‘George rolled a 5 and now has
........ 13 points for this turn’),
------------------------------------------------------ mock.call(‘Now rolling: Bob’),
Ran 8 tests in 0.009s mock.call(‘Bob rolled a 1 and lost 0
points’),
OK mock.call(‘Now rolling: George’),
mock.call(‘George rolled a 4 and now has 4
Listing 52. Command line invocation points for this turn’),
mock.call(‘George rolled a 3 and now has 7
def test_command_line(self): points for this turn’),
“””The game can be invoked from the command mock.call(‘Now rolling: Bob’),
line””” mock.call(‘Bob rolled a 6 and now has 6
points for this turn’),
INPUT.side_effect = [ mock.call(‘Bob rolled a 2 and now has 8
# player names points for this turn’),
‘George’, mock.call(‘Bob rolled a 5 and now has 13
‘Bob’, points for this turn’)
‘’, ])

# roll or hold Listing 53. Expected failure


‘r’, ‘r’, ‘h’, # George
# Bob immediately rolls a 1 F........
‘r’, ‘h’, # George ======================================================
‘r’, ‘r’, ‘h’ # Bob FAIL: test_command_line (test_game.GameTest)
] The game can be invoked from the command line
------------------------------------------------------
with mock.patch(‘builtins.print’) as fake_print, \ Traceback (most recent call last):
mock.patch.object(game.Pig, ‘roll’) as die: File “/usr/lib/python3.3/unittest/mock.py”, line
1087, in patched
die.side_effect = cycle([6, 2, 5, 1, 4, 3]) return func(*args, **keywargs)
self.assertRaises(StopIteration, game.main) File “./test_game.py”, line 162, in test_command_line
self.assertRaises(StopIteration, game.main)
# check output AssertionError: StopIteration not raised by main
fake_print.assert_has_calls([
mock.call(‘Now rolling: George’), ------------------------------------------------------
mock.call(‘George rolled a 6 and now has 6 Ran 9 tests in 0.013s
points for this turn’),
mock.call(‘George rolled a 2 and now has 8 FAILED (failures=1)
points for this turn’),

en.sdjournal.org 81
The mock.patch.object function allows us to mock Beauty! At this point, you should be able to invoke your
members of another object. Again, since we won’t have very own Pig game on the command line by running:
direct access to the Pig instance, we can’t monkey patch
the Pig.roll the way we did previously. The outcome of python game.py
this method should be the same as the other method.
Being the lazy programmers that we are, we’ve cho- Isn’t that something? We waited to manually run the
sen to use the itertools.cycle function again to con- game until we had written and satisfied tests for all
tinuously return some value back for each roll of the die. of the basic requirements for a game of Pig. The first
Since we don’t want to specify roll-or-hold values for an time we play it ourselves, the game just works!
entire game of Pig, we use TestCase.assertRaises to say
we expect mock to raise a StopIteration exception when Reflecting On Our Pig
there are no additional return values for the input mock. Now that we’ve gone through that exercise, we need to
I should mention that testing screen output as we’re think about what all of this new-found TDD experience
doing here is not exactly the best idea. We might change means for us. All tests passing absolutely does not mean
the strings, or we might later add more print calls. Ei- the code is bug-free. It simply means that the code meets
ther case would require that we modify our test itself, the expectations that we’ve described in our tests. There
and that’s added overhead. Having to maintain produc- are plenty of situations that we haven’t covered in our
tion code is a chore by itself, and adding test case main- tests or handled in our code. Can you think of anything
tenance to that is not exactly appealing. that is wrong with our game right now? What will hap-
That said, we will push forward with our test this way pen if you don’t enter any player names? What if you
for now. We should run our test suite now, but be sure to only enter one player name? Will the game be able to
mock out the new main function in game.py first (Listing 53). handle a large number of players?
We haven’t implemented our main function yet, so We can make assumptions and predictions about
none of the mocked input values are consumed, and no how the code will behave under such conditions, but
StopIteration exception is raised. Just as we expect for wouldn’t it be nice to have a high level of confidence
now. Let’s write some code to launch the game from the that the code will handle each scenario as we expect?
command line now (Listing 54). Hey, that code looks pretty
familiar, doesn’t it? It’s pretty much the same code we’ve What Now?
used in previous gameplay test methods. Awesome! Now that we have a functional game of Pig, here are
There’s one small bit of magic code that we’ve added some tasks that you might consider implementing to
at the bottom. That if statement is the way that you al- practice TDD.
low a Python script to be invoked from the command
line. Let’s run the test again to make sure the main func- • accept player names via the command line (without
tion does what we expect (Listing 55). the prompt),
• bail out if only one player name is given,
Listing 54. Basic command line entry point • allow the maximum point value to be specified on
the command line,
def main(): • allow players to see their total score when choosing
“””Launch a game of Pig””” to roll or hold,
• track player scores in a database,
game = Pig(*get_player_names()) • print the runner-up when there are three or more
game.play() players,
• turn the game into an IRC bot.

if __name__ == ‘__main__’: The topics covered in this article should have given
main() you a good enough foundation to write tests for each
one of these additional tasks.
Listing 55. All tests pass

......... Josh VanderLinden


-------------------------------------------------- Josh VanderLinden is a life-long technology enthusiast, who
Ran 9 tests in 0.014s started programming at the age of ten. Josh has worked primar-
ily in web development, but he also has experience with network
OK monitoring and systems administration. He has recently gained
a deep appreciation for automated testing and TDD.

82 13/2013
Python Iterators, Iterables,
and the Itertool Module
Python makes a distinction between iterables and iterators, it is
quite essential to know the difference between them. Iterators
are stateful objects they know how far through their sequence
they are. Once they reach their thats is it. Iterables are able to
create iterators on demand. Itertool modules includes a set of
functions for working with iterable datasets.

M
ost of us are familiar with how Python For loops Why the distinction? An iterable object is just some-
works, for a wide range of applications you can thing that it might make sense to treat as a collection,
just do For items in container: do something. somehow, in an abstract way. An iterator lets you spec-
But what happens under the hood and how could we ify exactly what it means to iterate over a type, without
create containers of our own? Well let us dive into it tying that type’s “iterableness” to any one specific itera-
and see. tion mode. Python has no interfaces, but this concept
In Python Iterables and Iterators have distinct mean- – separating interface (“this object supports X”) from
ings. Iterables are anything that can be looped over. It- implementation (“doing X means Y and Z”) – has been
erables define the __iter__ method which returns the carried over from languages that do, and it turns out to
iterator or it may have the __getitem__ method for in- be very useful.
dexed lookup (or raise an IndexError when indexes
are no longer valid). So an iterable type is something Itertools Module
you can treat abstractly as a series of values, like a The itertools module defines number of fast and highly
list (each item) or a file (each line). One iterable can efficient functions for working with sequence like data-
have many iterators: a list might have backwards and sets. The reason for functions in itertools module to be
forwards and every_n, or a file might have a lines (for so efficient is because all the data is not stored in the
ASCII files) and bytes (for each byte) depending on the memory, it is produced only when it is needed, which re-
file’s encoding. Iterators are objects that support the it- duces memory usage and thus reduces side effects of
erator protocol, which means that the __iter__ and the working with huge datasets and increases performance.
next() (__next__ in Python 3>) have to be defined. The chain(iter1, iter2, iter3.....) returns a single
__iter__ method returns itself and is implicitly called at iterator which is the result of adding all the iterators
the start of the loop and the next() method returns the passed in the argument.
next value every time it is invoked. In fewer words: an
iterable can be given to a for loop and an iterator dic- >>> from itertools import *
tates what each iteration of the loop returns (Listing 1 >>> for i in chain([‘a’, ‘b’, ‘c’], [1, 2, 3],
and Listing 2). [‘x’, ‘y’, ‘z’]):
Some types like file are iterables that are also their own print i,
iterators, which is a common source of confusion. But that abc123xyz
arrangement actually makes sense: the iterator needs to
know the details of how files are read and buffered, so it combinations(iterable, n) takes two arguments an it-
might as well live in the file where it can access all that in- erable and length of combination and returns all pos-
formation without breaking the abstraction (Listing 3). sible n length combination of elements in that iterable.

84 13/2013
Python Iterators

>>> for i in itertools.combinations([‘a’, ‘b’, ‘c’], 2): >>> for i in itertools.count(1, 2):
print i, if i > 10:
(‘a’, ‘b’) (‘a’, ‘c’) (‘b’, ‘c’) break
print i,
combinations _ with _ replacement(iterable, n) is sim- 1 3 5 7 9
ilar to combinations but it allows individual elements to
have successive repeats. cycle(iterable) returns an iterator that indefinitely cy-
cles over the contents of the iterable argument it is giv-
>>> for i in itertools.combinations_with_ en. It can consume a lot of memory if the argument is
replacement([‘a’, ‘b’, ‘c’], 2): a huge iterable.
print i,
(‘a’, ‘a’) (‘a’, ‘b’) (‘a’, ‘c’) (‘b’, ‘b’) (‘b’, ‘c’) >>> p = 0
(‘c’, ‘c’) >>> for i in itertools.cycle([1, 2, 3]):
p += 1
compress(data, selector) takes two iterables as argu- if p > 20: break
ments and returns an iterator with only those values in print i,
data which corresponds to true in the selector. 12312312312312312312

>>> for i in itertools.compress([‘lion’, ‘tiger’, dropwhile(condition, iterator) returns an iterator af-


‘panther’, ‘leopard’], [1, 0, 0, 1]): ter the condition becomes false for the very first time.
print i, After the condition becomes false it will return the rest
lion leopard of the values in the iterator till it gets exhausted.

count(start, step) both start and stop arguments are >>> for i in itertools.dropwhile(lambda x: x<5, [1, 2,
optional, the default start argument is 0. It returns con- 3, 4, 5, 6, 7, 8, 9]):
secutive integers if no step argument is provided and print i,
there is no upper bound so you will have t provide a 5 6 7 8 9
condition to stop the iteration.

Listing 1. Under the hood for loop looks like this


<listiterator object at 0x02A26DD0>
Iterable = [1, 2, 3] >>> iter(a).next()
iterator = iterable.__iter__()
try: Listing 3. Example of a file object
while True:
item = iterator.__next__() # Not the real implementation
# Loop body class file(object):
print “iterator returned: %d” % item def __iter__(self):
except StopIteration: # Called when something asks for this type’s
pass # End loop iterator.
# this makes it iterable
Listing 2. For example, a list and string are iterables but they are return self
not iterators def __next__(self):
# Called when this object is queried for its
>>> a = [1, 2, 3, 4, 5] next value.
>>> a.__iter__ # this makes it an iterator.
<method-wrapper ‘__iter__’ of list object at If self.has_next_line():
0x02A16828> return self.get_next_line()
>>> a.next() else:
Traceback (most recent call last): raise StopIteration
File “<pyshell#76>”, line 1, in <module> def next(self):
a.next() # Python 2.x compatibility
AttributeError: ‘list’ object has no attribute ‘next’ return self.__next__()
>>> iter(a)

en.sdjournal.org 85
groupby() returns a set of values group by a common key. >>> for i in itertools.izip_longest([1, 2, 3], [‘a’,
‘b’, ‘c’], [‘z’, ‘y’], fillvalue=’hello’):
>>> for key, igroup in itertools.groupby(xrange(12), print i
lambda x: x/5): (1, ‘a’, ‘z’)
print key, list(igroup) (2, ‘b’, ‘y’)
0 [0, 1, 2, 3, 4] (3, ‘c’, ‘hello’)
1 [5, 6, 7, 8, 9]
2 [10, 11] permutations(iterable, n) will return n length permu-
tations of the input iterable.
ifilter(condition, iterable) will return an iterator for
those arguments in the iterable for which the condition >>> for i in itertools.permutations([1, 2, 3, 4], 2):
is true, this is different from dropwhile, which returns print i,
all the elements after the first condition is false, this (1, 2) (1, 3) (1, 4) (2, 1) (2, 3) (2, 4) (3, 1) (3, 2)
will test the condition for all the elements. (3, 4) (4, 1) (4, 2) (4, 3)

>>> for i in itertools.ifilter(lambda x: x>5, [1, 2, 3, product(iter1, iter2,....) will return Cartesian prod-
4, 5, 6, 7, 8, 2.5, 3.5]): uct of the input iterables.
print i,
6 7 8 >>> for i in itertools.product([1, 2, 3], [‘a’, ‘b’, ‘c’]):
print i,
imap(function, iter1, iter2, iter3, ....) will return (1, ‘a’) (1, ‘b’) (1, ‘c’) (2, ‘a’) (2, ‘b’) (2, ‘c’)
an iterator which is a result of the function called on (3, ‘a’) (3, ‘b’) (3, ‘c’)
each iterator. It will stop when the smallest iterator gets
exhausted. repeat(object, n) will return the object for n number of
times, if n is not given then it returns the object endlessly
>>> for i in imap(lambda x, y: (x, y, x*y), xrange(5),
xrange(5, 8)): >>> for i in itertools.repeat(‘a’, 5):
print ‘%d * %d = %d’ %i print i,
0 * 5 = 0 a a a a a
1 * 6 = 6
2 * 7 = 14 starmap(function, iterable) returns an iterator whose
elements are result of mapping the function to the ele-
islice(iterable, start, stop, step) will return an iter- ments of the iterable. It is used instead of imap when the
ator with selected items from the input iterator by index. elements of the iterable is already grouped into tuples.
Start and step argument will default to 0 if not given.
>>> for i in itertools.starmap(lambda x, y: x**y,
>>> for i in itertools.islice(count(), 20, 30, 2): [(2, 3), (4, 2)]):
print i, print i,
20 22 24 26 28 8 16
>>> for i in itertools.imap(lambda x, y: x**y, [(2, 3),
izip(iter1, iter2, iter3....) will return an izip ob- (4, 2)]):
ject whose next() will return a tuple with i-th ele- print i,
ment from all the iterables given as argument. It will
raise a StopIteration error when the smallest iterable Traceback (most recent call last):
is exhausted.
File “<stdin>”, line 1, in <module>
>>> for i in izip([1, 2, 3], [‘a’, ‘b’, ‘c’], [‘z’, ‘y’]): TypeError: <lambda>() takes exactly 2 arguments (1 given)
print i
(1, ‘a’, ‘z’) takewhile(condition, iterable) this function is opposite
(2, ‘b’, ‘y’) of dropwhile, it will return an iterators whose values are
items from the input iterator until the condition is true. It
izip _ longest(iter1, iter2,...., fillvalue=None) is will stop as soon as the first value becomes false.
similar to izip but will iterator till the longest iterable
gets exhausted and when the shorter iterables are ex- >>> for i in itertools.takewhile(lamdba x: x<5,
hausted then fallvalue is substituted in their place. [1, 2, 3, 4, 5, 6, 7, 2, 3, 4]):

86 13/2013
Python Iterators

print i, tage of iterators is that they have an almost constant


1 2 3 4 memory footprint. The itertools module can be very
handy in hacking competitions because of their efficien-
tee(iterator, n=2) will return n (defaults to 2) indepen- cy and speed.
dent iterators of the input iterator.

>>> s = 0
>>> p = ‘123ab’
>>> for i in itertools.tee(p, 3):
print ‘iterator %d: ‘ %s,
s += 1
for q in i:
print q,
print ‘\n’
iterator 0: 1 2 3 a b Saad Bin Akhlaq
iterator 1: 1 2 3 a b Saad Bin Akhlaq is a software engineer at Plivo communica-
iterator 2: 1 2 3 a b tions pvt. Ltd., where he is working on automating the infra-
structure and debugging into issues if they arise. In his free
Summary time he loves sketching and photography. Visit Saad’s blog at
So I believe by now you must have a clear understand- saadbinakhlaq.wordpress.com and you can also contact him
ing of Python iterators and iterables. The huge advan- directly at [email protected].

a d v e r t i s e m e n t
DON’T BE LEFT OUT

Join theIRevolution

THIS
COULD BE
YOU

www.theIRapp.com

You might also like