Syllabus Informatique
Syllabus Informatique
Front Pages
Copyright notice
Foreword
Contributors
Preface
Copyright Notice
This book contains large parts that are based on the book
How To Think Like a
Computer Scientist --- Learning with Python 3.
The following is a copy of the license of
this book.
Copyright (C) Peter Wentworth, Jeffrey Elkner, Allen B. Downey and Chris Meyers.
under the terms of the GNU Free Documentation License, Version 1.3
0. PREAMBLE
The purpose of this License is to make a manual, textbook, or other functional
and
useful document "free" in the sense of freedom: to assure everyone the
effective
freedom to copy and redistribute it, with or without modifying it,
either commercially or
noncommercially. Secondarily, this License preserves for
the author and publisher a way
to get credit for their work, while not being
considered responsible for modifications
made by others.
This License is a kind of "copyleft", which means that derivative works of the
document
must themselves be free in the same sense. It complements the GNU
General Public
License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software,
because
free software needs free documentation: a free program should come with
manuals
providing the same freedoms that the software does. But this License is
not limited to
software manuals; it can be used for any textual work,
regardless of subject matter or
whether it is published as a printed book. We
recommend this License principally for
works whose purpose is instruction or
reference.
A "Modified Version" of the Document means any work containing the Document or
a
portion of it, either copied verbatim, or with modifications and/or
translated into another
language.
The "Invariant Sections" are certain Secondary Sections whose titles are
designated, as
being those of Invariant Sections, in the notice that says that
the Document is released
under this License. If a section does not fit the
above definition of Secondary then it is
not allowed to be designated as
Invariant. The Document may contain zero Invariant
Sections. If the Document
does not identify any Invariant Sections then there are none.
The "Cover Texts" are certain short passages of text that are listed, as
Front-Cover
Texts or Back-Cover Texts, in the notice that says that the
Document is released under
this License. A Front-Cover Text may be at most 5
words, and a Back-Cover Text may
be at most 25 words.
Examples of suitable formats for Transparent copies include plain ASCII without
markup,
Texinfo input format, LaTeX input format, SGML or XML using a publicly
available DTD,
and standard-conforming simple HTML, PostScript or PDF designed
for human
modification. Examples of transparent image formats include PNG, XCF
and JPG.
Opaque formats include proprietary formats that can be read and
edited only by
proprietary word processors, SGML or XML for which the DTD
and/or processing tools
are not generally available, and the machine-generated
HTML, PostScript or PDF
produced by some word processors for output purposes
only.
The "Title Page" means, for a printed book, the title page itself, plus such
following
pages as are needed to hold, legibly, the material this License
requires to appear in the
title page. For works in formats which do not have
any title page as such, "Title Page"
means the text near the most prominent
appearance of the work's title, preceding the
beginning of the body of the
text.
The "publisher" means any person or entity that distributes copies of the
Document to
the public.
A section "Entitled XYZ" means a named subunit of the Document whose title
either is
precisely XYZ or contains XYZ in parentheses following text that
translates XYZ in
another language. (Here XYZ stands for a specific section
name mentioned below, such
as "Acknowledgements", "Dedications",
"Endorsements", or "History".) To "Preserve the
Title" of such a section when
you modify the Document means that it remains a section
"Entitled XYZ"
according to this definition.
The Document may include Warranty Disclaimers next to the notice which states
that
this License applies to the Document. These Warranty Disclaimers are
considered to be
included by reference in this License, but only as regards
disclaiming warranties: any
other implication that these Warranty Disclaimers
may have is void and has no effect on
the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or
noncommercially, provided that this License, the copyright notices, and the
license
notice saying this License applies to the Document are reproduced in
all copies, and
that you add no other conditions whatsoever to those of this
License. You may not use
technical measures to obstruct or control the reading
or further copying of the copies
you make or distribute. However, you may
accept compensation in exchange for copies.
If you distribute a large enough
number of copies you must also follow the conditions in
section 3.
You may also lend copies, under the same conditions stated above, and you may
publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed
covers) of
the Document, numbering more than 100, and the Document's license
notice requires
Cover Texts, you must enclose the copies in covers that carry,
clearly and legibly, all
these Cover Texts: Front-Cover Texts on the front
cover, and Back-Cover Texts on the
back cover. Both covers must also clearly
and legibly identify you as the publisher of
these copies. The front cover must
present the full title with all words of the title equally
prominent and
visible. You may add other material on the covers in addition. Copying
with
changes limited to the covers, as long as they preserve the title of the
Document
and satisfy these conditions, can be treated as verbatim copying in
other respects.
If the required texts for either cover are too voluminous to fit legibly, you
should put the
first ones listed (as many as fit reasonably) on the actual
cover, and continue the rest
onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than
100,
you must either include a machine-readable Transparent copy along with
each Opaque
copy, or state in or with each Opaque copy a computer-network
location from which the
general network-using public has access to download
using public-standard network
protocols a complete Transparent copy of the
Document, free of added material. If you
use the latter option, you must take
reasonably prudent steps, when you begin
distribution of Opaque copies in
quantity, to ensure that this Transparent copy will
remain thus accessible at
the stated location until at least one year after the last time
you distribute
an Opaque copy (directly or through your agents or retailers) of that
edition
to the public.
It is requested, but not required, that you contact the authors of the Document
well
before redistributing any large number of copies, to give them a chance to
provide you
with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the
conditions of
sections 2 and 3 above, provided that you release the Modified
Version under precisely
this License, with the Modified Version filling the
role of the Document, thus licensing
distribution and modification of the
Modified Version to whoever possesses a copy of it.
In addition, you must do
these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from
that of the
Document, and from those of previous versions (which should, if
there were any,
be listed in the History section of the Document). You may
use the same title as a
previous version if the original publisher of that
version gives permission.
B. List on the Title Page, as authors, one or more persons or entities
responsible
for authorship of the modifications in the Modified Version,
together with at least
five of the principal authors of the Document (all of
its principal authors, if it has
fewer than five), unless they release you
from this requirement.
C. State on the Title page the name of the publisher of the Modified Version,
as
the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the
other copyright notices.
F. Include, immediately after the copyright notices, a license notice giving
the
public permission to use the Modified Version under the terms of this
License, in
the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and
required
Cover Texts given in the Document's license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled "History", Preserve its Title, and add to it
an item
stating at least the title, year, new authors, and publisher of the
Modified Version
as given on the Title Page. If there is no section Entitled
"History" in the
Document, create one stating the title, year, authors, and
publisher of the
Document as given on its Title Page, then add an item
describing the Modified
Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public
access
to a Transparent copy of the Document, and likewise the network
locations given
in the Document for previous versions it was based on. These
may be placed in
the "History" section. You may omit a network location for a
work that was
published at least four years before the Document itself, or if
the original
publisher of the version it refers to gives permission.
K. For any section Entitled "Acknowledgements" or "Dedications", Preserve the
Title of the section, and preserve in the section all the substance and tone
of each
of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their
text and
in their titles. Section numbers or the equivalent are not
considered part of the
section titles.
M. Delete any section Entitled "Endorsements". Such a section may not be
included in the Modified Version.
N. Do not retitle any existing section to be Entitled "Endorsements" or to
conflict
in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
You may add a section Entitled "Endorsements", provided it contains nothing but
endorsements of your Modified Version by various parties—for example,
statements of
peer review or that the text has been approved by an organization
as the authoritative
definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage
of up
to 25 words as a Back-Cover Text, to the end of the list of Cover Texts
in the Modified
Version. Only one passage of Front-Cover Text and one of
Back-Cover Text may be
added by (or through arrangements made by) any one
entity. If the Document already
includes a cover text for the same cover,
previously added by you or by arrangement
made by the same entity you are
acting on behalf of, you may not add another; but you
may replace the old one,
on explicit permission from the previous publisher that added
the old one.
The author(s) and publisher(s) of the Document do not by this License give
permission
to use their names for publicity for or to assert or imply
endorsement of any Modified
Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License,
under the terms defined in section 4 above for modified versions, provided that
you
include in the combination all of the Invariant Sections of all of the
original documents,
unmodified, and list them all as Invariant Sections of your
combined work in its license
notice, and that you preserve all their Warranty
Disclaimers.
The combined work need only contain one copy of this License, and multiple
identical
Invariant Sections may be replaced with a single copy. If there are
multiple Invariant
Sections with the same name but different contents, make the
title of each such section
unique by adding at the end of it, in parentheses,
the name of the original author or
publisher of that section if known, or else
a unique number. Make the same adjustment
to the section titles in the list of
Invariant Sections in the license notice of the combined
work.
In the combination, you must combine any sections Entitled "History" in the
various
original documents, forming one section Entitled "History"; likewise
combine any
sections Entitled "Acknowledgements", and any sections Entitled
"Dedications". You
must delete all sections Entitled "Endorsements".
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents
released
under this License, and replace the individual copies of this License
in the various
documents with a single copy that is included in the collection,
provided that you follow
the rules of this License for verbatim copying of each
of the documents in all other
respects.
You may extract a single document from such a collection, and distribute it
individually
under this License, provided you insert a copy of this License
into the extracted
document, and follow this License in all other respects
regarding verbatim copying of
that document.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute
translations of the
Document under the terms of section 4. Replacing Invariant
Sections with translations
requires special permission from their copyright
holders, but you may include
translations of some or all Invariant Sections in
addition to the original versions of these
Invariant Sections. You may include
a translation of this License, and all the license
notices in the Document, and
any Warranty Disclaimers, provided that you also include
the original English
version of this License and the original versions of those notices and
disclaimers. In case of a disagreement between the translation and the original
version
of this License or a notice or disclaimer, the original version will
prevail.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as
expressly
provided under this License. Any attempt otherwise to copy, modify,
sublicense, or
distribute it is void, and will automatically terminate your
rights under this License.
However, if you cease all violation of this License, then your license from a
particular
copyright holder is reinstated (a) provisionally, unless and until
the copyright holder
explicitly and finally terminates your license, and (b)
permanently, if the copyright holder
fails to notify you of the violation by
some reasonable means prior to 60 days after the
cessation.
Termination of your rights under this section does not terminate the licenses
of parties
who have received copies or rights from you under this License. If
your rights have been
terminated and not permanently reinstated, receipt of a
copy of some or all of the same
material does not give you any rights to use
it.
11. RELICENSING
"Massive Multiauthor Collaboration Site" (or "MMC Site") means any World Wide
Web
server that publishes copyrightable works and also provides prominent
facilities for
anybody to edit those works. A public wiki that anybody can edit
is an example of such a
server. A "Massive Multiauthor Collaboration" (or
"MMC") contained in the site means
any set of copyrightable works thus
published on the MMC site.
under the terms of the GNU Free Documentation License, Version 1.3
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace
the
"with … Texts." line with this:
with the Invariant Sections being LIST THEIR TITLES, with the
Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination
of the
three, merge those two alternatives to suit the situation.
Foreword
This the foreword of "How To Think Like a Computer Scientist --- Learning with Python
3"
By David Beazley
Despite Python's appeal to many different communities, you may still wonder why
Python? or why teach programming with Python? Answering these questions is no
simple task---especially when popular opinion is on the side of more
masochistic
alternatives such as C++ and Java. However, I think the most
direct answer is that
programming in Python is simply a lot of fun and more
productive.
Although Python is still a young and evolving language, I believe that it has a
bright
future in education. This book is an important step in that direction.
David Beazley
University of Chicago Author of the Python Essential Reference
Contributor List
This is the contributor list of "How To Think Like a Computer Scientist --- Learning with
Python 3"
To paraphrase the philosophy of the Free Software Foundation, this book is free
like free
speech, but not necessarily free like free pizza. It came about
because of a collaboration
that would not have been possible without the GNU
Free Documentation License. So we
would like to thank the Free Software
Foundation for developing this license and, of
course, making it available to
us.
We would also like to thank the more than 100 sharp-eyed and thoughtful readers
who
have sent us suggestions and corrections over the past few years. In the
spirit of free
software, we decided to express our gratitude in the form of a
contributor list.
Unfortunately, this list is not complete, but we are doing
our best to keep it up to date. It
was also getting too large to include
everyone who sends in a typo or two. You have our
gratitude, and you have the
personal satisfaction of making a book you found useful
better for you and
everyone else who uses it. New additions to the list for the 2nd edition
will
be those who have made on-going contributions.
If you have a chance to look through the list, you should realize that each
person here
has spared you and all subsequent readers from the confusion of a
technical error or a
less-than-transparent explanation, just by sending us a
note.
Impossible as it may seem after so many corrections, there may still be errors
in this
book. If you should stumble across one, we hope you will take a minute
to contact us.
The email address (for the Python 3 version of the book)
is [email protected]
.
Substantial changes made due to your suggestions will add you to the next
version of
the contributor list (unless you ask to be omitted). Thank you!
Second Edition
An email from Mike MacHenry set me straight on tail recursion. He not only
pointed out an error in the presentation, but suggested how to correct it.
It wasn't until 5th Grade student Owen Davies came to me in a Saturday
morning
Python enrichment class and said he wanted to write the card game,
Gin Rummy,
in Python that I finally knew what I wanted to use as the case
study for the object
oriented programming chapters.
A special thanks to pioneering students in Jeff's Python Programming class
at
GCTAA during the 2009-2010 school year: Safath
Ahmed, Howard Batiste, Louis
Elkner-Alfaro, and Rachel Hancock. Your
continual and thoughtfull feedback led
to changes in most of the chapters of
the book. You set the standard for the
active and engaged learners that will
help make the new Governor's Academy
what it is to become. Thanks to you
this is truly a student tested text.
Thanks in a similar vein to the students in Jeff's Computer Science
class at the
HB-Woodlawn program during the 2007-2008 school year: James
Crowley,
Joshua Eddy, Eric Larson, Brian McGrail, and Iliana Vazuka.
Ammar Nabulsi sent in numerous corrections from Chapters 1 and 2.
Aldric Giacomoni pointed out an error in our definition of the Fibonacci
sequence
in Chapter 5.
Roger Sperberg sent in several spelling corrections and pointed out a twisted
piece of logic in Chapter 3.
Adele Goldberg sat down with Jeff at PyCon 2007 and gave him a list of
suggestions and corrections from throughout the book.
Ben Bruno sent in corrections for chapters 4, 5, 6, and 7.
Carl LaCombe pointed out that we incorrectly used the term commutative in
chapter 6 where symmetric was the correct term.
Alessandro Montanile sent in corrections for errors in the code examples and
text
in chapters 3, 12, 15, 17, 18, 19, and 20.
Emanuele Rusconi found errors in chapters 4, 8, and 15.
Michael Vogt reported an indentation error in an example in chapter 6, and
sent in
a suggestion for improving the clarity of the shell vs. script
section in chapter 1.
First Edition
Lloyd Hugh Allen sent in a correction to Section 8.4.
Yvon Boulianne sent in a correction of a semantic error in Chapter 5.
Fred Bremmer submitted a correction in Section 2.1.
Jonah Cohen wrote the Perl scripts to convert the LaTeX source for this book
into
beautiful HTML.
Michael Conlon sent in a grammar correction in Chapter 2 and an improvement
in
style in Chapter 1, and he initiated discussion on the technical aspects
of
interpreters.
Benoit Girard sent in a correction to a humorous mistake in Section 5.6.
Courtney Gleason and Katherine Smith wrote horsebet.py, which was used as a
case study in an earlier version of the book. Their program can now be found
on
the website.
Lee Harr submitted more corrections than we have room to list here, and
indeed
he should be listed as one of the principal editors of the text.
James Kaylin is a student using the text. He has submitted numerous
corrections.
David Kershaw fixed the broken catTwice function in Section 3.10.
Eddie Lam has sent in numerous corrections to Chapters 1, 2, and 3. He also
fixed the Makefile so that it creates an index the first time it is run and
helped us
set up a versioning scheme.
Man-Yong Lee sent in a correction to the example code in Section 2.4.
David Mayo pointed out that the word unconsciously in Chapter 1 needed to be
changed to subconsciously .
Chris McAloon sent in several corrections to Sections 3.9 and 3.10.
Matthew J. Moelter has been a long-time contributor who sent in numerous
corrections and suggestions to the book.
Simon Dicon Montford reported a missing function definition and several typos
in
Chapter 3. He also found errors in the increment function in Chapter 13.
John Ouzts corrected the definition of return value in Chapter 3.
Kevin Parks sent in valuable comments and suggestions as to how to improve
the distribution of the book.
David Pool sent in a typo in the glossary of Chapter 1, as well as kind words
of
encouragement.
Michael Schmitt sent in a correction to the chapter on files and
exceptions.
Robin Shaw pointed out an error in Section 13.1, where the printTime function
was used in an example without being defined.
Paul Sleigh found an error in Chapter 7 and a bug in Jonah Cohen's Perl
script
that generates HTML from LaTeX.
Craig T. Snydal is testing the text in a course at Drew University.
He has
contributed several valuable suggestions and corrections.
Ian Thomas and his students are using the text in a programming course. They
are the first ones to test the chapters in the latter half of the book, and
they have
make numerous corrections and suggestions.
Keith Verheyden sent in a correction in Chapter 3.
Peter Winstanley let us know about a longstanding error in our Latin in
Chapter 3.
Chris Wrobel made corrections to the code in the chapter on file I/O and
exceptions.
Moshe Zadka has made invaluable contributions to this project. In addition to
writing the first draft of the chapter on Dictionaries, he provided continual
guidance in the early stages of the book.
Christoph Zwerschke sent several corrections and pedagogic
suggestions, and
explained the difference between gleich and
selbe.
James Mayer sent us a whole slew of spelling and typographical
errors, including
two in the contributor list.
Hayden McAfee caught a potentially confusing inconsistency between two
examples.
Angel Arnal is part of an international team of translators working on the
Spanish
version of the text. He has also found several errors in the English
version.
Tauhidul Hoque and Lex Berezhny created the illustrations in Chapter 1 and
improved many of the other illustrations.
Dr. Michele Alzetta caught an error in Chapter 8 and sent some interesting
pedagogic comments and suggestions about Fibonacci and Old Maid.
Andy Mitchell caught a typo in Chapter 1 and a broken example in Chapter 2.
Kalin Harvey suggested a clarification in Chapter 7 and caught some typos.
Christopher P. Smith caught several typos and is helping us prepare to update
the book for Python 2.2.
David Hutchins caught a typo in the Foreword.
Gregor Lingl is teaching Python at a high school in Vienna, Austria. He is
working
on a German translation of the book, and he caught a couple of bad
errors in
Chapter 5.
Julie Peters caught a typo in the Preface.
Preface
This the preface of "How To Think Like a Computer Scientist --- Learning with Python 3"
By Jeffrey Elkner
This book owes its existence to the collaboration made possible by the Internet
and the
free software movement. Its three authors---a college professor, a high
school teacher,
and a professional programmer---never met face to face to work
on it, but we have been
able to collaborate closely, aided by many other folks
who have taken the time and
energy to send us their feedback.
We think this book is a testament to the benefits and future possibilities of
this kind of
collaboration, the framework for which has been put in place by
Richard Stallman and
the Free Software Foundation.
Two years later, I was convinced that C++ was a poor choice to use for
introducing
students to computer science. While it is certainly a very powerful
programming
language, it is also an extremely difficult language to learn and
teach. I found myself
constantly fighting with C++'s difficult syntax and
multiple ways of doing things, and I
was losing many students unnecessarily as
a result. Convinced there had to be a better
language choice for our first-year
class, I went looking for an alternative to C++.
I needed a language that would run on the machines in our GNU/Linux lab as well
as on
the Windows and Macintosh platforms most students have at home. I wanted
it to be
free software, so that students could use it at home regardless of
their income. I wanted
a language that was used by professional programmers,
and one that had an active
developer community around it. It had to support
both procedural and object-oriented
programming. And most importantly, it had
to be easy to learn and teach. When I
investigated the choices with these goals
in mind, Python stood out as the best
candidate for the job.
I asked one of Yorktown's talented students, Matt Ahrens, to give Python a try.
In two
months he not only learned the language but wrote an application called
pyTicket that
enabled our staff to report technology problems via the Web. I
knew that Matt could not
have finished an application of that scale in so short
a time in C++, and this
accomplishment, combined with Matt's positive
assessment of Python, suggested that
Python was the solution I was looking for.
Finding a textbook
Having decided to use Python in both of my introductory computer science
classes the
following year, the most pressing problem was the lack of an
available textbook.
Free documents came to the rescue. Earlier in the year, Richard Stallman had
introduced me to Allen Downey. Both of us had written to Richard expressing an
interest
in developing free educational materials. Allen had already written a
first-year computer
science textbook, How to Think Like a Computer Scientist.
When I read this book, I
knew immediately that I wanted to use it in my class.
It was the clearest and most
helpful computer science text I had seen. It
emphasized the processes of thought
involved in programming rather than the
features of a particular language. Reading it
immediately made me a better
teacher.
How to Think Like a Computer Scientist was not just an excellent book, but it
had been
released under the GNU public license, which meant it could be used
freely and
modified to meet the needs of its user. Once I decided to use
Python, it occurred to me
that I could translate Allen's original Java version
of the book into the new language.
While I would not have been able to write a
textbook on my own, having Allen's book to
work from made it possible for me to
do so, at the same time demonstrating that the
cooperative development model
used so well in software could also work for educational
materials.
Working on this book for the last two years has been rewarding for both my
students
and me, and my students played a big part in the process. Since I
could make instant
changes whenever someone found a spelling error or difficult
passage, I encouraged
them to look for mistakes in the book by giving them a
bonus point each time they made
a suggestion that resulted in a change in the
text. This had the double benefit of
encouraging them to read the text more
carefully and of getting the text thoroughly
reviewed by its most important
critics, students using it to learn computer science.
For the second half of the book on object-oriented programming, I knew that
someone
with more real programming experience than I had would be needed to do
it right. The
book sat in an unfinished state for the better part of a year
until the open source
community once again provided the needed means for its
completion.
I received an email from Chris Meyers expressing interest in the book. Chris
is a
professional programmer who started teaching a programming course last
year using
Python at Lane Community College in Eugene, Oregon. The prospect of
teaching the
course had led Chris to the book, and he started helping out with
it immediately. By the
end of the school year he had created a companion
project on our Website at
http://openbookproject.net called *Python for Fun* and was
working with some of my
most advanced students as a master teacher, guiding
them beyond where I could take
them.
The first example from the text illustrates this point. It is the traditional
hello, world
program, which in the Java version of the book looks like this:
class Hello {
Even though this is a trivial example, the advantages of Python stand out.
Yorktown's
Computer Science I course has no prerequisites, so many of the
students seeing this
example are looking at their first program. Some of them
are undoubtedly a little
nervous, having heard that computer programming is
difficult to learn. The Java version
has always forced me to choose between two
unsatisfying options: either to explain the
class Hello,
public static void main, String[] args, {, and }, statements and risk
confusing
or intimidating some of the students right at the start, or to tell
them, Just don't worry
about all of that stuff now; we will talk about it
later, and risk the same thing. The
educational objectives at this point in the
course are to introduce students to the idea of
a programming statement and to
get them to write their first program, thereby
introducing them to the
programming environment. The Python program has exactly
what is needed to do
these things, and nothing more.
Comparing the explanatory text of the program in each version of the book
further
illustrates what this means to the beginning student. There are
seven paragraphs of
explanation of Hello, world! in the Java version; in the
Python version, there are only a
few sentences. More importantly, the missing
six paragraphs do not deal with the big
ideas in computer programming but with
the minutia of Java syntax. I found this same
thing happening throughout the
book. Whole paragraphs simply disappear from the
Python version of the text
because Python's much clearer syntax renders them
unnecessary.
Another example of how Python aids in the teaching and learning of programming
is in
its syntax for functions. My students have always had a great deal of
difficulty
understanding functions. The main problem centers around the
difference between a
function definition and a function call, and the related
distinction between a parameter
and an argument. Python comes to the rescue
with syntax that is nothing short of
beautiful. Function definitions begin with
the keyword def, so I simply tell my students,
When you define a function,
begin with def, followed by the name of the function that you
are defining;
when you call a function, simply call (type) out its name. Parameters go
with
definitions; arguments go with calls. There are no return types, parameter
types, or
reference and value parameters to get in the way, so I am now able to
teach functions in
less than half the time that it previously took me, with
better comprehension.
Using Python improved the effectiveness of our computer science program for all
students. I saw a higher general level of success and a lower level of
frustration than I
experienced teaching with either C++ or Java. I moved faster
with better results. More
students left the course with the ability to create
meaningful programs and with the
positive attitude toward the experience of
programming that this engenders.
Building a community
I have received email from all over the globe from people using this book to
learn or to
teach programming. A user community has begun to emerge, and many
people have
been contributing to the project by sending in materials for the
companion Website at
http://openbookproject.net/pybiblio.
With the continued growth of Python, I expect the growth in the user community
to
continue and accelerate. The emergence of this user community and the
possibility it
suggests for similar collaboration among educators have been the
most exciting parts of
working on this project for me. By working together, we
can increase the quality of
materials available for our use and save valuable
time. I invite you to join our community
and look forward to hearing from you.
Please write to me at [email protected].
Jeffrey Elkner
Governor's Career and Technical Academy in Arlington
Arlington, Virginia
3 - Conditionals
5 - Iteration
7 - Functions
8 - Fruitful functions
9 - Modules
The goal of this book is to teach you to think like a computer scientist. This
way of
thinking combines some of the best features of mathematics, engineering,
and natural
science. Like mathematicians, computer scientists use formal
languages to denote ideas
(specifically computations). Like engineers, they
design things, assembling components
into systems and evaluating tradeoffs
among alternatives. Like scientists, they observe
the behavior of complex
systems, form hypotheses, and test predictions.
The single most important skill for a computer scientist is problem
solving. Problem
solving means the ability to formulate problems, think
creatively about solutions, and
express a solution clearly and accurately. As
it turns out, the process of learning to
program is an excellent opportunity to
practice problem-solving skills. That's why this
chapter is called, The way of
the program.
As you might infer from the name high-level language, there are also
low-level
languages, sometimes referred to as machine languages or assembly
languages.
Loosely speaking, computers can only execute programs written in
low-level languages.
Thus, programs written in a high-level language have to be
translated into something
more suitable before they can run.
Almost all programs are written in high-level languages because of their advantages.
It
is much easier to program in a
high-level language so programs take less time
to write,
they are shorter and easier to read, and they are more likely to be
correct. Second, high-
level languages are portable, meaning that they can
run on different kinds of computers
with few or no modifications.
The engine that translates and runs Python is called the Python Interpreter:
There are
two ways to use it: immediate mode and script
mode. In immediate mode, you type
Python expressions that are executed immediately. This looks as follows:
The >>> is called the Python prompt. The interpreter uses the prompt to indicate that it
is ready for
instructions. We typed 2 + 2, and the interpreter evaluated our expression,
and replied 4,
and on the next line it gave a new prompt, indicating that it is ready for
more input.
Alternatively, you can write a program in a file and use the interpreter to
execute the
contents of the file. Such a file is called a script. Scripts have the
advantage that they
can be saved to disk, printed, and so on.
Working directly in the interpreter is convenient for testing short bits of code because
you
get immediate feedback. Think of it as scratch paper used to help you work out
problems. Anything longer than a few lines should be put into a script.
What is a program?
A program is a sequence of instructions that specifies how to perform a
computation.
The computation might be something mathematical, such as solving a
system of
equations or finding the roots of a polynomial, but it can also be a
symbolic computation,
such as searching and replacing text in a document or
(strangely enough) compiling a
program.
The details look different in different languages, but a few basic instructions
appear in
just about every language:
input
Get data from the keyboard, a file, or some other device.
output
Display data on the screen or send data to a file or other device.
math
Perform basic mathematical operations like addition and multiplication.
conditional execution
Check for certain conditions and execute the appropriate sequence of
statements.
repetition
Perform some action repeatedly, usually with some variation.
Believe it or not, that's pretty much all there is to it. Every program you've
ever used, no
matter how complicated, is made up of instructions that look more
or less like these.
Thus, we can describe programming as the process of
breaking a large, complex task
into smaller and smaller subtasks until the
subtasks are simple enough to be performed
with sequences of these basic
instructions.
That may be a little vague, but we will come back to this topic later when we
talk about
algorithms.
What is debugging?
Programming is a complex process, and because it is done by human beings, it
often
leads to errors. Programming errors are called
bugs and the process of tracking them
down and correcting them is called
debugging. Use of the term bug to describe small
engineering difficulties
dates back to at least 1889, when Thomas Edison had a bug with
his phonograph.
Three kinds of errors can occur in a program: syntax errors, runtime errors, and
semantic errors. It is useful to
distinguish between them in order to track them down
more quickly.
Syntax errors
Python can only execute a program if the program is syntactically correct;
otherwise, the
process fails and returns an error message. Syntax refers
to the structure of a program
and the rules about that structure. For example,
in English, a sentence must begin with
a capital letter and end with a period.
this sentence contains a syntax error. So does
this one
For most readers, a few syntax errors are not a significant problem, which is
why we can
read the poetry of E. E. Cummings without problems.
Python is not so forgiving. If there
is a single syntax error anywhere in your
program, Python will display an error message
and quit, and you will not be able
to run your program. During the first few weeks of your
programming career, you
will probably spend a lot of time tracking down syntax errors.
As you gain
experience, though, you will make fewer errors and find them faster.
Runtime errors
The second type of error is a runtime error, so called because the error does
not appear
until you run the program. These errors are also called
exceptions because they
usually indicate that something exceptional (and
bad) has happened.
Runtime errors are rare in the simple programs you will see in the first few
chapters, so
it might be a while before you encounter one.
Semantic errors
The third type of error is the semantic error. If there is a semantic error
in your
program, it will run successfully, in the sense that the computer will
not generate any
error messages, but it will not do the right thing. It will do
something else. Specifically, it
will do what you told it to do.
The problem is that the program you wrote is not the program you wanted to
write. The
meaning of the program (its semantics) is wrong. Identifying
semantic errors can be
tricky because it requires you to work backward by
looking at the output of the program
and trying to figure out what it is doing.
Experimental debugging
One of the most important skills you will acquire is debugging. Although it
can be
frustrating, debugging is one of the most intellectually rich,
challenging, and interesting
parts of programming.
In some ways, debugging is like detective work. You are confronted with clues,
and you
have to infer the processes and events that led to the results you see.
Debugging is also like an experimental science. Once you have an idea what is
going
wrong, you modify your program and try again. If your hypothesis was
correct, then you
can predict the result of the modification, and you take a
step closer to a working
program. If your hypothesis was wrong, you have to
come up with a new one. As
Sherlock Holmes pointed out, When you have
eliminated the impossible, whatever
remains, however improbable, must be the
truth. (A. Conan Doyle, The Sign of Four)
For some people, programming and debugging are the same thing. That is,
programming is the process of gradually debugging a program until it does what
you
want. The idea is that you should start with a program that does
something and make
small modifications, debugging them as you go, so that you
always have a working
program.
Later chapters will make more suggestions about debugging and other programming
practices.
Formal languages tend to have strict rules about syntax. For example, 3+3=6
is a
syntactically correct mathematical statement, but 3=+6$ is not.
H2O is a syntactically
correct chemical name, but 2Zz is
not.
The second type of syntax rule pertains to the structure of a statement--- that
is, the
way the tokens are arranged. The statement 3=+6$ is structurally
illegal because you
can't place a plus sign immediately after an equal sign.
Similarly, molecular formulas
have to have subscripts after the element name,
not before. And in our Python example,
if we omitted the comma, or if we changed the two
parentheses around to say
print)"Happy New Year for ",2013( our statement would still
have six legal and valid
tokens, but the structure is illegal.
Although formal and natural languages have many features in common --- tokens,
structure, syntax, and semantics --- there are many differences:
ambiguity
Natural languages are full of ambiguity, which people deal
with by
using contextual clues and other information. Formal
languages are
designed to be nearly or completely
unambiguous, which means that any
statement has exactly
one meaning, regardless of context.
redundancy
In order to make up for ambiguity and reduce
misunderstandings, natural
languages employ lots of
redundancy. As a result, they are often
verbose. Formal
languages are less redundant and more concise.
literalness
Formal languages mean exactly what they say. On the other
hand, natural languages
are full of idiom and metaphor. If
someone says, "The
other shoe fell", there is probably no
shoe and nothing falling.
You'll need to find the
original joke
to understand the idiomatic meaning of the other shoe falling.
Yahoo! Answers thinks it knows!
People who grow up speaking a natural language---everyone---often have a hard
time
adjusting to formal languages. In some ways, the difference between formal
and natural
language is like the difference between poetry and prose, but more
so:
poetry
Words are used for their sounds as well as for their meaning,
and the
whole poem together creates an effect or emotional
response. Ambiguity
is not only common but often deliberate.
prose
The literal meaning of words is more important, and the
structure
contributes more meaning. Prose is more amenable
to analysis than
poetry but still often ambiguous.
program
The meaning of a computer program is unambiguous and
literal, and can
be understood entirely by analysis of the
tokens and structure.
Here are some suggestions for reading programs (and other formal languages).
First,
remember that formal languages are much more dense than natural
languages, so it
takes longer to read them. Also, the structure is very
important, so it is usually not a
good idea to read from top to bottom, left to
right. Instead, learn to parse the program in
your head, identifying the tokens
and interpreting the structure. Finally, the details
matter. Little things
like spelling errors and bad punctuation, which you can get away
with in
natural languages, can make a big difference in a formal language.
print("Hello, World!")
This is an example of using the print function, which doesn't actually print
anything on
paper. It displays a value on the screen. In this case, the result shown
is
Hello, World!
The quotation marks in the program mark the beginning and end of the value;
they don't
appear in the result.
Comments
As programs get bigger and more complicated, they get more difficult to read.
Formal
languages are dense, and it is often difficult to look at a piece of
code and figure out
what it is doing, or why.
For this reason, it is a good idea to add notes to your programs to explain in
natural
language what the program is doing.
#---------------------------------------------------
You'll also notice that we've left a blank line in the program. Blank lines
are also ignored
by the interpreter, but comments and blank lines can make your
programs much easier
for humans to parse. Use them liberally!
Glossary
algorithm
A set of specific steps for solving a category of problems.
bug
An error in a program.
comment
Information in a program that is meant for other programmers
(or anyone
reading the source code) and has no effect on the
execution of the
program.
debugging
The process of finding and removing any of the three kinds of
programming errors.
exception
Another name for a runtime error.
formal language
Any one of the languages that people have designed for
specific
purposes, such as representing mathematical ideas
or computer programs;
all programming languages are
formal languages.
high-level language
A programming language like Python that is designed to be
easy for
humans to read and write.
immediate mode
A style of using Python where we type expressions at the
command prompt, and
the results are shown immediately.
Contrast with script, and see the
entry under Python shell.
interpreter
The engine that executes your Python scripts or expressions.
low-level language
A programming language that is designed to be easy for a
computer to
execute; also called machine language or
assembly language.
natural language
Any one of the languages that people speak that evolved
naturally.
object code
The output of the compiler after it translates the program.
parse
To examine a program and analyze the syntactic structure.
portability
A property of a program that can run on more than one kind
of computer.
print function
A function used in a program or script that causes the Python
interpreter to
display a value on its output device.
problem solving
The process of formulating a problem, finding a solution, and
expressing the solution.
program
a sequence of instructions that specifies to a computer
actions and
computations to be performed.
Python shell
An interactive user interface to the Python interpreter. The
user of a
Python shell types commands at the prompt (>>>),
and presses the return
key to send these commands
immediately to the interpreter for
processing. The word shell
comes from Unix. In Thonny, the Interpreter Window is where
we'd do the immediate mode interaction.
runtime error
An error that does not occur until the program has started to
execute
but that prevents the program from continuing.
script
A program stored in a file (usually one that will be
interpreted).
semantic error
An error in a program that makes it do something other than
what the
programmer intended.
semantics
The meaning of a program.
source code
A program in a high-level language before being compiled.
syntax
The structure of a program.
syntax error
An error in a program that makes it impossible to parse ---
and
therefore impossible to interpret.
token
One of the basic elements of the syntactic structure of a
program,
analogous to a word in a natural language.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
If you use the Python interpreter, this program is initially stored on the disk drive; when
you start the interpreter, it is loaded in the main memory of the computer, such that the
processor can execute the interpreter.
The main memory of the computer stores everything that the processor needs to have
access to in order to execute a program. This not only includes the Python interpreter,
the instructions of the program that the processor is executing, but also intermediate
results of a calculation; after all, in most cases the calculation that we ask a computer to
do is so complex that it needs memory to maintain the intermediate steps of a
calculation.
To organize its calculations well, the Python interpreter organizes the memory in a
specific manner, of which we will see more details later in this syllabus. Core ideas are
the following:
Parts of the memory are given names; these names can be used to refer to that
part of the memory;
The information that is stored in a certain part of the memory, has a value, for
instance 4 or 3.0, and a type: for instance, it is a text, or a number;
To calculate information that can be stored in a part of the memory, in Python
programs we write expressions;
To decide the order in which we perform calculations, Python programs consist of
statements that are put in a certain order.
If you are not sure what class a value falls into, Python has a function
called type which
can tell you.
<class 'str'>
>>> type(17)
<class 'int'>
Not surprisingly, strings belong to the class str and integers belong to the
class int. Less
obviously, numbers with a decimal point belong to a class
called float, because these
numbers are represented in a format called
floating-point. At this stage, you can treat
the words class and type
interchangeably. We'll come back to a deeper understanding of
what a class
is in later chapters.
>>> type(3.2)
<class 'float'>
What about values like "17" and "3.2"? They look like numbers, but they
are in quotation
marks like strings.
>>> type("17")
<class 'str'>
>>> type("3.2")
<class 'str'>
They're strings!
Strings in Python can be enclosed in either single quotes (') or double quotes
("), or
three of each (''' or """)
<class 'str'>
<class 'str'>
<class 'str'>
<class 'str'>
>>>
... lines."""
>>> print(message)
span several
lines.
>>>
'This is a string.'
'And so is this.'
When you type a large integer, you might be tempted to use commas between
groups of
three digits, as in 42,000. This is not a legal integer in
Python, but it does mean
something else, which is legal:
>>> 42000
42000
>>> 42,000
(42, 0)
Well, that's not what we expected at all! Because of the comma, Python chose to
treat
this as a pair of values. We'll come back to learn about pairs later.
But, for the moment,
remember not to put commas or spaces in your integers, no matter
how big they are.
Also revisit what we said in the previous chapter: formal languages are
strict, the
notation is concise, and even the smallest change might
mean something quite different
from what you intended.
Variables
One of the most powerful features of a programming language is the ability to
store
values in the memory of the computer. In Python this is done by manipulating variables.
A variable is a name that refers to a value stored in the memory of the computer.
>>> n = 17
>>> pi = 3.14159
This example makes three assignments. The first assigns the string value "What's
up,
Doc?" to a variable named message. The second gives the integer
17 to n, and the third
assigns the floating-point number 3.14159 to
a variable called pi.
After executing these instructions, hence, in the memory of the computer we have three
variables; each variable has a name (such as message), a type (such as str) and a value
(such as "What's up, Doc?"). The assignment statement effectively changes the contents
of the memory of the computer.
The assignment token, =, should not be confused with equals, which uses
the token
==. The assignment statement binds a name, on the
left-hand side of the operator, to a
value, on the right-hand side.
This is why you will get an error if you enter:
>>> 17 = n
Tip
When reading or writing code, say to yourself
"n is assigned 17"
or "n gets the value 17".
Don't say "n equals 17".
A common way to represent variables on paper is to write the name with an arrow
pointing to the variable's value. This kind of figure is called a state
snapshot because it
shows what state each of the variables is in at a particular
instant in time. (Think of it as
the variable's state of mind).
This diagram shows the result of executing the assignment
statements:
If you ask the interpreter to evaluate a variable, it will produce the value that is currently
linked to the variable:
>>> message
>>> n
17
>>> pi
3.14159
We use variables in a program to "remember" things, perhaps the current score at the
football game.
But variables are variable. This means they can change over time, just
like the scoreboard at a football game.
You can assign a value to a variable, and later
assign a different value to the same variable.
(This is different from maths. In maths, if
you give `x` the value 3, it
cannot change to link to a different value half-way through
your calculations!)
>>> day
'Thursday'
>>> day
'Friday'
>>> day = 21
>>> day
21
You'll notice we changed the value of day three times, and on the third assignment we
even
made it refer to a value that was of a different type.
A great deal of programming is about having the computer remember things, e.g. The
number of missed calls on your phone,
and then arranging to update or change the
variable when you miss another call.
There are some situations in which names beginning with an underscore have
special
meaning, so a safe rule for beginners is to start all names with a letter.
It turns out that class is one of the Python keywords. Keywords define
the language's
syntax rules and structure, and they cannot be used as variable names.
Python has thirty-something keywords (and every now and again improvements to
Python
introduce or eliminate one or two):
and as assert break class continue
def del elif else except exec
finally for from global if import
in is lambda nonlocal not or
pass raise return try while with
yield True False None
You might want to keep this list handy. If the interpreter complains about one
of your
variable names and you don't know why, see if it is on this list.
Programmers generally choose names for their variables that are meaningful to
the
human readers of the program ---
they help the programmer document, or remember,
what the variable is used for.
Caution!
Statements
A statement is an instruction that the Python interpreter can execute.
In this chapter we
have seen the assignment statement. There are however many other forms of
statements. Another example is the function call that we saw in the previous chapter:
print("Hello, World!")
The effect of this statement was to print a string on the screen of the computer.
print(3)
And
x = 3
This last statement will store the value 3 in the memory of the computer, such that it can
be used later in the program. The first statement prints the value 3 on the screen of the
user, but this value is not stored for later use.
We will see more details on functions later; some other kinds of statements that
we'll see
shortly are while statements, for statements, if statements,
and import statements.
(There are other kinds too!)
Evaluating expressions
An expression is a combination of values, variables, operators, and calls to functions. If
you
type an expression at the Python prompt, the interpreter evaluates it and
displays
the result:
>>> 1 + 1
2
>>> len("hello")
In this example len is a built-in Python function that returns the number of characters in a
string.
We've previously seen the print and the type functions, so this is our third example
of a function!
>>> 17
17
>>> y = 3.14
>>> x = len("hello")
>>> x
5
>>> y
3.14
The following are all legal Python expressions whose meaning is more or less
clear:
>>> 2 ** 3
8
>>> 3 ** 2
>>> hours
10.75
Oops! In Python 3, the division operator / always yields a floating point result.
What we
might have wanted to know was how many whole hours there are, and how many
minutes remain.
Python gives us two different flavors of the division operator.
The
second, called floor division uses the token //.
Its result is always a whole number ---
and if it has to adjust the number it always
moves it to the left on the number line. So 6 //
4 yields 1, but -6 // 4 might surprise you!
>>> 7 / 4
1.75
>>> 7 // 4
1
>>> minutes = 645
>>> hours
10
Take care that you choose the correct flavor of the division operator. If you're
working
with expressions where you need floating point values, use the division operator
that
does the division accurately.
Order of operations
When more than one operator appears in an expression, the order of evaluation
depends on the rules of precedence. Python follows the same precedence
rules for its
mathematical operators that mathematics does. The acronym PEMDAS
is a useful way
to remember the order of operations:
512
>>> (2 ** 3) ** 2 # Use parentheses to force the order you
want!
64
The immediate mode command prompt of Python is great for exploring and
experimenting
with expressions like this.
Glossary
assignment statement
A statement that assigns a value to a name (variable). To the
left of
the assignment operator, =, is a name. To the right of
the
assignment token is an expression which is evaluated by
the Python
interpreter and then assigned to the name. The
difference between the
left and right hand sides of the
assignment statement is often
confusing to new
programmers. In the following assignment:
n = n + 1
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Conditionals
Source: this section is heavily based on Chapter 5 of [ThinkCS].
Until now, we have seen how to write programs that consist of assignment statements.
In these programs, we always execute all statements in the program one after the other.
Programs get much more interesting when we can test conditions and change the
program behaviour depending on the outcome of the tests. That's what this
chapter is
about.
A core idea in the conditional execution of statements is that we want to execute certain
parts of code only when a condition is satisfied. Essentially, we want to test whether a
condition is satisfied; if so, we do one thing; if not, we do another thing.
<class 'bool'>
>>> type(true)
True
>>> 5 == 6
False
>>> j = "hel"
True
In the first statement, the two operands evaluate to equal values, so the expression
evaluates
to True; in the second statement, 5 is not equal to 6, so we get False.
The == operator is one of six common comparison operators which all produce
a bool
result; here are all six:
x == y # Produce True if ... x is equal to y
Although these operations are probably familiar, the Python symbols are
different from
the mathematical symbols. A common error is to use a single
equal sign (=) instead of a
double equal sign (==). Remember that =
is an assignment operator and == is a
comparison operator. Also, there is
no such thing as =< or =>.
Like any other types we've seen so far, Boolean values can be assigned to
variables,
printed, etc.
>>> age = 18
>>> print(old_enough_to_get_driving_licence)
True
>>> type(old_enough_to_get_driving_licence)
<class 'bool'>
Logical operators
There are three logical operators, and, or, and not,
that allow us to build more complex
Boolean expressions from simpler Boolean expressions. The
semantics (meaning) of
these operators is similar to their meaning in English.
For example, x > 0 and x < 10
produces True only if x is greater than 0 and
at the same time, x is less than 10.
n % 2 == 0 or n % 3 == 0 is True if either of the conditions is True,
that is, if the number n is
divisible by 2 or it is divisible by 3. (What do
you think happens if n is divisible by both 2
and by 3 at the same time?
Will the expression yield True or False? Try it in your Python
interpreter.)
The expression on the left of the or operator is evaluated first: if the result is True,
Python
does not (and need not) evaluate the expression on the right --- this is called short-
circuit evaluation.
Similarly, for the and operator, if the expression on the left yields False,
Python does not
evaluate the expression on the right.
Truth Tables
A truth table is a small table that allows us to list all the possible inputs,
and to give the
results for the logical operators. Because the and and or
operators each have two
operands, there are only four rows in a truth table that
describes the semantics of and.
a b a and
b
False False False
False True False
True False False
True True True
The third logical operator, not, only takes a single operand, so its truth table
only has two
rows:
a not
a
F T
T F
n * 0 == 0
y and x == x and y
x and True == x
True and x == x
x and x == x
x or False == x
False or x == x
y or x == x or y
x or True == True
True or x == True
x or x == x
not (not x) == x
Conditional execution
In order to write useful programs, we almost always need the ability to check
conditions
and change the behavior of the program accordingly. Conditional
statements give us
this ability. The simplest form is the if
statement:
if x % 2 == 0:
print("Did you know that 2 is the only even number that is prime?")
else:
if BOOLEAN EXPRESSION:
else:
As with the function definition from the last chapter and other compound
statements like
for, the if statement consists of a header line and a body. The header
line begins with
the keyword if followed by a Boolean expression and ends with
a colon (:).
The indented statements that follow are called a block. The first
unindented statement
marks the end of the block.
Each of the statements inside the first block of statements are executed in order if the
Boolean
expression evaluates to True. The entire first block of statements
is skipped if
the Boolean expression evaluates to False, and instead
all the statements indented
under the else clause are executed.
There is no limit on the number of statements that can appear under the two clauses of
an
if statement, but there has to be at least one statement in each block. Occasionally, it
is useful
to have a section with no statements (usually as a place keeper, or scaffolding,
for code we haven't written yet). In that case, we can use the pass statement, which
does
nothing except act as a placeholder.
else:
pass
Another form of the if statement is one in which the else clause is omitted entirely.
In this
case, when the condition evaluates to True, the statements are
executed, otherwise the
flow of execution continues to the statement after the if.
if x < 0:
x = 42
In this case, the print function that outputs the square root is the one after the if --- not
because we left a blank line, but because of the way the code is indented. Note too that
the function call math.sqrt(x) will give an error unless we have an import math statement,
usually placed near the top of our script.
Python terminology
However, before introducing the while statement, it is good remind ourselves of some
important aspects of assignments.
print(airtime_remaining)
airtime_remaining = 7
print(airtime_remaining)
15
Note too that an equality test is symmetric, but assignment is not. For example,
if a == 7
then 7 == a. But in Python, the statement a = 7
is legal and 7 = a is not.
a = 5
The third line changes the value of a but does not change the value of
b, so they are no
longer equal. (In some programming languages, a different
symbol is used for
assignment, such as <- or :=, to avoid confusion.) Some
people also think that variable
was an unfortunate word to choose, and instead
we should have called them
assignables. Python chooses to
follow common terminology and token usage, also
found in languages like C, C++, Java, and C#,
so we use the tokens = for assignment,
== for equality, and we talk of variables.
Updating variables
When an assignment statement is executed, the right-hand side expression (i.e. the
expression that comes after the assignment token) is evaluated first. This produces a
value.
Then the assignment is made, so that the variable on the left-hand side now
refers
to the new value.
One of the most common forms of assignment is an update, where the new
value of the
variable depends on its old value.
n = 5
n = 3 * n + 1
Line 2 means get the current value of n, multiply it by three and add one, and assign the
answer to n, thus making n refer to the value.
So after executing the two lines above, n
will point/refer to the
integer 16.
If you try to get the value of a variable that has never been assigned to, you'll get an
error:
>>> w = x + 1
Before you can update a variable, you have to initialize it to some starting value,
usually
with a simple assignment:
runs_scored = 0
...
runs_scored = runs_scored + 1
Here is a fragment of code that demonstrates the use of the while statement:
n = 4
ss = 0
v = 1
while v <= n:
ss = ss + v
v = v + 1
print(v)
When executing this code, the value 10 (1+2+3+4) will be printed on the screen.
You can almost read the while statement as if it were English. It means,
while v is less
than or equal to n, continue executing the body of the loop. Within
the body, each time,
increment v. When v passes n, return your accumulated sum.
Evaluate the condition at line 5, yielding a value which is either False or True.
If the value is False, exit the while statement and continue
execution at the next
statement (line 8 in this case).
If the value is True, execute each of the statements in the body (lines 6 and 7)
and
then go back to the while statement at line 5.
The body consists of all of the statements indented below the while keyword.
Notice that if the loop condition is False the first time we get
loop, the statements in the
body of the loop are never executed.
The body of the loop should change the value of one or more variables so that
eventually the condition becomes false and the loop terminates. Otherwise the
loop will
repeat forever, which is called an infinite loop. An endless
source of amusement for
computer scientists is the observation that the
directions on shampoo, "lather, rinse,
repeat", are an infinite loop.
In the case here, we can prove that the loop terminates because we
know that the value
of n is finite, and we can see that the value of v
increments each time through the loop,
so eventually it will have to exceed n. In
other cases, it is not so easy, even impossible in
some cases,
to tell if the loop will ever terminate.
i = 1
while i <= n:
STATEMENTS
i = i + 1
In this case, we will execute the indicated STATEMENTS n times. However, note that it
is possible to use any Boolean expression as a condition; the following is hence also a
valid program:
while x >= 0:
STATEMENTS
x = x // 2
In this code, we will continue to divide a number x by 2, as long as the outcome of the
division is not 2. The power of the while statement is hence that we are very free in how
we specify how many times a block of statement is repeated!
Glossary
block
A group of consecutive statements with the same
indentation.
body
The block of statements in a compound statement that
follows the
header.
Boolean algebra
Some rules for rearranging and reasoning about Boolean
expressions.
Boolean expression
An expression that is either true or false.
Boolean value
There are exactly two Boolean values: True and False.
Boolean
values result when a Boolean expression is
evaluated by the Python
interepreter. They have type bool.
branch
One of the possible paths of the flow of execution determined
by
conditional execution.
comparison operator
One of the six operators that compares two values: ==, !=, >,
<, >=, and <=.
condition
The Boolean expression in a conditional statement that
determines which
branch is executed.
conditional statement
A statement that controls the flow of execution depending on
some
condition. In Python the keywords if, elif, and else are
used for conditional statements.
counter
A variable used to count something, usually initialized to zero
and
incremented in the body of a loop.
increment
Both as a noun and as a verb, increment means to increase
by 1.
infinite loop
A loop in which the terminating condition is never satisfied.
indefinite iteration
A loop where we just need to keep going until some condition
is met.
A while statement is used for this case.
iteration
Repeated execution of a set of programming statements.
loop
The construct that allows allows us to repeatedly execute a
statement or a group of statements until a terminating
condition is satisfied.
loop variable
A variable used as part of the terminating condition of a loop.
logical operator
One of the operators that combines Boolean expressions:
and,
or, and not.
nesting
One program structure within another, such as a conditional
statement
inside a branch of another conditional statement.
prompt
A visual cue that tells the user that the system is ready to
accept input data.
truth table
A concise table of Boolean values that can describe the
semantics
of an operator.
type conversion
An explicit function call that takes a value of one type and
computes a
corresponding value of another type.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
The int function can take a floating point number or a string, and turn
it into an int. For
floating point numbers, it discards the decimal portion
of the number --- a process we
call truncation towards zero on
the number line. Let us see this in action:
>>> int(3.14)
3
>>> int(3.9999) # This doesn't round to the closest int!
3
>>> int(3.0)
3
>>> int(-3.999) # Note that the result is closer to zero
-3
10
2345
17
This last case doesn't look like a number --- what do we expect?
Traceback (most recent call last):
ValueError: invalid literal for int() with base 10: '23 bottles'
The type converter float can turn an integer, a float, or a syntactically legal
string into a
float:
>>> float(17)
17.0
>>> float("123.45")
123.45
>>> str(17)
'17'
>>> str(123.45)
'123.45'
Operations on strings
In general, you cannot perform mathematical operations on strings, even if the
strings
look like numbers. The following are illegal (assuming that message
has type string):
>>> message - 1 # Error
Interestingly, the + operator does work with strings, but for strings,
the + operator
represents concatenation, not addition.
Concatenation means joining the two operands
by linking them end-to-end. For example:
fruit = "banana"
print(fruit + baked_good)
The output of this program is banana nut bread. The space before the word
nut is part of
the string, and is necessary to produce the space between
the concatenated strings.
Input
There is a built-in function in Python for getting input from the user:
n = input("Please enter your name: ")
The user of the program can enter the name and press enter, and when this happens
the
text that has been entered is returned from the input function, and in this
case assigned
to the variable n.
Even if you asked the user to enter their age, you would get back a string like "17".
It
would be your job, as the programmer, to convert that string into a int or a float,
using
the int or float converter functions we saw earlier.
Composition
One of the most useful features of programming languages is their ability to
take small
building blocks and compose them into larger chunks.
For example, we know how to get the user to enter some input, we know how to
convert
the string we get into a float, we know how to write a complex expression, and
we know
how to print values. Let's put these together in a small four-step program that
asks the
user to input a value for the radius of a circle, and then
computes the area of the circle
from the formula.
r = float(response)
Now let's compose the first two lines into a single line of code, and compose the
second
two lines into another line of code.
Such compact code may not be most understandable for humans, but it does
illustrate
how we can compose bigger chunks from our building blocks.
If you're ever in doubt about whether to compose code or fragment it into smaller steps,
try to make it as simple as you can for the human to follow. My choice would
be the first
case above, with four separate steps.
>>> print(q)
2
>>> r = 7 % 3
>>> print(r)
The modulus operator turns out to be surprisingly useful in larger programs. For
example, you can
check whether one number is divisible by another---if x % y is zero,
then
x is divisible by y.
Also, you can extract the right-most digit or digits from a number. For
example, x % 10
yields the right-most digit of x (in base 10).
Similarly x % 100 yields the last two digits.
minutes = secs_still_remaining // 60
secs_finally_remaining = secs_still_remaining % 60
"secs=", secs_finally_remaining)
Chained conditionals
We have now seen how to combine types and expressions in more complex statements.
Similarly, we can also combine if statements in more complex manners. The basic if
statement had two branches. Sometimes there are more than two possibilities and we
need more than two
branches. One way to express a computation like that is a chained
conditional:
if x < y:
STATEMENTS_A
elif x > y:
STATEMENTS_B
else:
STATEMENTS_C
if choice == "a":
function_one()
function_two()
function_three()
else:
print("Invalid choice.")
Each condition is checked in order. If the first is false, the next is checked,
and so on. If
one of them is true, the corresponding branch executes, and the
statement ends. Even if
more than one condition is true, only the first true
branch executes.
Nested conditionals
One conditional can also be nested within another. (It is the same theme of
composability, again!) We could have written
the previous example as follows:
STATEMENTS_A
else:
if x > y:
STATEMENTS_B
else:
STATEMENTS_C
Although the indentation of the statements makes the structure apparent, nested
conditionals very quickly become difficult to read. In general, it is a good
idea to avoid
them when we can.
if x < 10:
while x != 0:
if x < 0:
print ( -x )
else:
print ( x )
In this code, we continue to ask the user for a number, as long as the user does not
enter the number 0. For each such number, we check whether it is positive or negative,
and adapt our printing process to the situation. It is perfectly possible to nest the if
condition within the while loop.
Also the reverse type of nesting is possible, where we put a while loop within an if block.
Logical opposites
We have already seen how to combine Boolean expressions using and, or and not.
Combinations of such expressions can quickly become complex. It is important to then
reflect on whether it is possible to simplify such expressions.
Each of the six relational operators has a logical opposite: for example,
suppose we can
get a driving licence when our age is greater or equal to 17,
we can not get the driving
licence when we are less than 17.
Two powerful simplification laws (called de Morgan's laws) that are often
helpful when
dealing with complicated Boolean expressions are:
not (x and y) == (not x) or (not y)
For example, suppose we can slay the dragon only if our magic
lightsabre sword is
charged to 90% or higher,
and we have 100 or more energy units in our protective
shield.
We find this fragment of Python code in the game:
else:
else:
We could also get rid of the not by swapping around the then and
else parts of the
conditional. So here is a third version, also equivalent:
if (sword_charge >= 0.90) and (shield_energy >= 100):
else:
This version is probably the best of the three, because it very closely matches
the initial
English statement. Clarity of our code (for other humans),
and making it easy to see that
the code does what was expected should always
be a high priority.
Tip
Once our program works, we should play around a bit trying to polish it
up.
Write good comments. Think about whether the code would be
clearer with
different variable names. Could we have done it more
elegantly? Should
we rather use a function? Can we simplify the
conditionals?
We think of our code as our creation, our work of art! We make it great.
Type conversion
We've had a first look at this earlier in this chapter. Seeing it again won't hurt!
Many Python types come with a built-in function that attempts to convert values
of
another type into its own type. The int function, for example,
takes any value and
converts it to an integer, if possible, or complains
otherwise:
>>> int("32")
32
>>> int("Hello")
>>> int(-2.3)
-2
>>> int(3.99999)
3
>>> int("42")
42
>>> int(1.0)
>>> float(32)
32.0
>>> float("3.14159")
3.14159
>>> float(1)
1.0
It may seem odd that Python distinguishes the integer value 1 from the
floating-point
value 1.0. They may represent the same number, but they
belong to different types. The
reason is that they are represented differently
inside the computer.
>>> str(32)
'32'
>>> str(3.14149)
'3.14149'
>>> str(True)
'True'
>>> str(true)
str will work with any value and convert it into a string. As
mentioned earlier, True is
Boolean value; true is just an ordinary variable name,
and is not defined here, so we get
an error.
Glossary
chained conditional
A conditional branch with more than two possible flows of
execution. In
Python chained conditionals are written with if
... elif ... else
statements.
composition
The ability to combine simple expressions and statements
into compound
statements and expressions in order to
represent complex computations
concisely.
concatenate
To join two strings end-to-end.
modulus operator
An operator, denoted with a percent sign ( %), that works on
integers and yields the remainder when one number is
divided by
another.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Iteration
Source: this section is heavily based on Chapter 7 of [ThinkCS].
Until now we have seen that the while statement can be used for repeatedly executing
lines of code. Repeated execution of a set of statements is called iteration. Because
iteration is so common, we are going to dive deeper in this topic in this chapter. Among
others, we are going to introduce the for statement --- another way to have your
program
do iteration, useful in slightly different circumstances.
runs_scored = 0
print(runs_scored)
runs_scored = runs_scored + 1
In this code, we first initialize the variable runs_scored, and then repeatedly increment
this variable, until a stop condition is reached. We print the values 0, 1, 2, 3 and 4; then
we stop the repetition when runs_scored reaches the value 5. We exploit here the fact
that the assignment statement in the loop modifies the value of the variable runs_scored
in the memory of the computer.
Note that overall, we repeat the 3rd line in this program 5 times: 1 time when
runs_scored==0, one time when runs_scored==1, ..., and 1 time when runs_scored==4
-- 5 in total.
The while statement can always be used to repeat statements. However, it is not always
the shortest approach. Python offers an alternative solution: the for loop.
The for notation of Python allows us to write this more shortly as follows:
for runs_scored in [0,1,2,3,4]:
print(runs_scored)
In this code, [0,1,2,3,4] is a list that specifies the values that are going to be assigned to
the variable runs_scored; the block of code in the for loop is executed for each value in
this list.
While in this example our lists consists of numbers, this need not be the case. We can
also iterate over other types. For instance:
print(invitation)
Running through all the items in a list is called traversing the list,
or traversal.
The variable f in the for statement at line 1 is called the loop variable.
We could
have chosen any other variable name instead.
Lines 2 and 3 are the loop body. The loop body is always
indented. The
indentation determines exactly what statements are "in the body of the loop".
On each iteration or pass of the loop, we consider one of the elements in the list,
and execute the body of the list for that value.
At the end of each execution of the body of the loop, Python returns
to the for
statement, to assign the next value to f.
As the program executes, the interpreter always keeps track of which statement is
about
to be executed. We call this the control flow, of the flow of execution
of the program.
When humans execute programs, they often use their finger to point
to each statement
in turn. So we could think of control flow as "Python's moving finger".
running_total = 0
running_total = running_total + x
print(running_total)
print(runs_scored)
It is cumbersome to have to write all the numbers explicitly. Fortunately, Python has a
shorthand notation for this, which is the following:
print(runs_scored)
This code will produce exactly the same output as the previous code. Simple, isn't it?
Actually, not really; most beginning programmers have a very hard time using this
notation correctly.
The issue is that we write range(5) with the value 5 to create a list that starts at 0 and
ends at 4. There is a good reason for this: the list specified by range(5) contains 5
elements. These 5 elements are 0, 1, 2, 3 and 4.
print(runs_scored + 1)
If you are confused by this, you are not alone; you are almost certain to make mistakes
with this. However, it is extremely important to understand this correctly for your
programs to work correctly.
Let's look at a sequence that has fascinated and foxed mathematicians for many years.
They still cannot answer even quite simple questions about this.
This Python function captures that algorithm, where we calculate the sequence for n =
19:
n = 19
"""
while n != 1:
if n % 2 == 0: # n is even
n = n // 2
else: # n is odd
n = n * 3 + 1
print(n, end=".\n")
Notice first that the print function on line 6 has an extra argument end=", ". This
tells the
print function to follow the printed string with whatever the programmer
chooses (in this
case, a comma followed by a space), instead of ending the line. So
each time something
is printed in the loop, it is printed on the same output line, with
the numbers separated
by commas. The call to print(n, end=".\n") at line 11 after the loop terminates
will then
print the final value of n followed by a period and a newline character.
(You'll cover the \n
(newline character) later).
The condition for continuing with this loop is n != 1, so the loop will continue running until
it reaches its termination condition, (i.e. n == 1).
Each time through the loop, the program outputs the value of n and then
checks whether
it is even or odd. If it is even, the value of n is divided
by 2 using integer division. If it is
odd, the value is replaced by n * 3 + 1.
Here is the output of this program:
19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13,
Particular values aside, the interesting question was first posed by a German
mathematician called Lothar Collatz: the Collatz conjecture (also known as
the 3n + 1
conjecture), is that this sequence terminates for all positive
values of n. So far, no one
has been able to prove it or disprove it!
(A conjecture is a statement that might be true,
but nobody knows for sure.)
Think carefully about what would be needed for a proof or disproof of the conjecture
"All
positive integers will eventually converge to 1 using the Collatz rules".
With fast
computers we have been able to test every integer up to very
large values, and so far,
they have all eventually ended up at 1.
But who knows? Perhaps there is some as-yet
untested number which does not reduce to 1.
You'll notice that if you don't stop when you reach 1, the sequence gets into
its own
cyclic loop: 1, 4, 2, 1, 4, 2, 1, 4 ... So one possibility is that there might
be other cycles
that we just haven't found yet.
Wikipedia has an informative article about the Collatz conjecture. The sequence
also
goes under other names (Hailstone sequence, Wonderous numbers, etc.),
and you'll
find out just how many integers have already been tested by computer,
and found to
converge!
So any problem like "iterate this weather model for 1000 cycles", or
"search this
list of words", "find all prime numbers up to 10000" suggest
that a for loop is best.
We call the first case definite iteration --- we know ahead of time some
definite bounds for
what is needed. The latter case is called indefinite
iteration --- we're not sure
how many iterations we'll need --- we cannot
even establish an upper bound!
Tracing a program
To write effective computer programs, and to build a good conceptual
model of program
execution, a programmer needs to develop the ability
to trace the execution of a
computer program. Tracing involves becoming the
computer and following the flow of
execution through a sample program run,
recording the state of all variables and any
output the program generates after
each instruction is executed.
To keep track of all this as you hand trace a program, make a column heading on
a
piece of paper for each variable created as the program runs and another one
for
output. Our trace so far would look something like this:
n output printed so far
-- ---------------------
3 3,
10
-- ---------------------
3 3,
10 3, 10,
5 3, 10, 5,
16 3, 10, 5, 16,
8 3, 10, 5, 16, 8,
4 3, 10, 5, 16, 8, 4,
2 3, 10, 5, 16, 8, 4, 2,
1 3, 10, 5, 16, 8, 4, 2, 1.
Tracing can be a bit tedious and error prone (that's why we get computers to do
this stuff
in the first place!), but it is an essential skill for a programmer
to have. From this trace
we can learn a lot about the way our code works. We
can observe that as soon as n
becomes a power of 2, for example, the program
will require log2(n) executions of the
loop body to complete. We can
also see that the final 1 will not be printed as output
within the body of the loop,
which is why we put the special print function at the end.
There are also some great visualization tools becoming available to help you
trace and
understand small fragments of Python code. The one we recommend is at
http://www.pythontutor.com/visualize.html .
Counting digits
Let us consider another example where a while loop is necessary instead of a for loop.
n = int(input("Give a number:" ))
count = 0
while n != 0:
count = count + 1
n = n // 10
print(count)
If the user enters 710, the code will print 3. Trace the execution of this
function call
(perhaps using the single step function in Thonny, or the
Python visualizer, or on some
paper) to convince yourself that it works.
Note that even though we have a counter here, we cannot use a for loop! In a for loop,
we need to specify in advance all the elements we are going to iterate over, but in this
case we don't know the end value of count before the loop is executed!
If we wanted to only count digits that are either 0 or 5, adding a conditional
before
incrementing the counter will do the trick:
count = 0
while n > 0:
digit = n % 10
if digit == 0 or digit == 5:
count = count + 1
n = n // 10
print(count)
Confirm that when we enter 1055030250, as result 7 is printed. What happens when the
user enters 0?
Abbreviated assignment
Incrementing a variable is so common that Python provides an abbreviated syntax
for it:
>>> count = 0
>>> count += 1
>>> count
1
>>> count += 1
>>> count
>>> n += 5
>>> n
There are similar abbreviations for -=, *=, /=, //= and %=:
>>> n = 2
>>> n *= 5
>>> n
10
>>> n -= 4
>>> n
6
>>> n //= 2
>>> n
3
>>> n %= 2
>>> n
while n > 0:
digit = n % 10
if digit == 0 or digit == 5:
count += 1
n //= 10
print(count)
The break statement is used to immediately leave the body of its loop. The next
statement to be executed is the first one after the body:
print(i)
print("done")
This prints:
12
16
done
for and while loops do their tests at the start, before executing
any part of
the body. They're called pre-test loops, because the test
happens before
(pre) the body.
break and return are our tools for adapting this standard
behaviour.
A typical example is a problem where the user has to input numbers to be summed.
To
indicate that there are no more inputs, the user enters a special value, often
the value
-1, or the empty string. This needs a middle-exit loop pattern:
input the next number,
then test whether to exit, or else process the number:
while True:
if response == "":
break
total += int(response)
Convince yourself that this fits the middle-exit loop flowchart: line 3
does some useful
work, lines 4 and 5 can exit the loop, and if they don't
line 6 does more useful work
before the next iteration starts.
The while bool-expr: uses the Boolean expression to determine whether to iterate again.
True is a trivial Boolean expression, so while True: means always do
the loop body again.
This is a language idiom --- a convention that
most programmers will recognize
immediately. Since the expression on line 2
will never terminate the loop, (it is a dummy
test) the programmer must arrange to
break (or return) out of the loop body elsewhere,
in some other way (i.e. in lines 4 and 5 in
this sample). A clever compiler or interpreter
will understand that line 2 is a
fake test that must always succeed, so it won't even
generate a test, and our flowchart
never even put the diamond-shape dummy test box at
the top of the loop!
Similarly, by just moving the if condition: break to the end of the loop body we
create a
pattern for a post-test loop. Post-test loops are used when you want to
be sure that the
loop body always executes at least once (because the first test
only happens at the end
of the execution of the first loop body).
This is useful, for example, if we want to play an
interactive game against
the user --- we always want to play at least one game:
while True:
play_the_game_once()
if response != "yes":
break
print("Goodbye!")
Hint: Think about where you want the exit test to happen
Once you've recognized that you need a loop to repeat something, think
about its terminating condition --- when will I want to stop iterating?
Then
figure out whether you need to do the test before starting
the first (and
every other) iteration, or at the end of
the first (and every other) iteration,
or perhaps in
the middle of each iteration. Interactive programs that
require input
from the user or read from files often need to exit their loops
in the
middle or at the end of an iteration, when it becomes clear that
there is
no more data to process, or the user doesn't want to play our
game anymore.
An example
The following program implements a simple guessing game:
guesses = 0
msg = ""
while True:
guesses += 1
else:
break
This program makes use of the mathematical law of trichotomy (given real
numbers a
and b, exactly one of these three must be true: a > b, a < b, or a == b).
Also notice the use of the msg variable, initially an empty string, on lines 6, 12 and 14.
Each time through the loop we extend the message being displayed: this allows us to
display the program's feedback right at the same place as we're asking for the next
guess.
print(i)
print("done")
This prints:
12
16
24
30
done
Repeat this calculation a few times using your calculator. Can you
see why each
iteration brings your estimate a little closer? One of the amazing
properties of this
particular algorithm is how quickly it converges to an accurate
answer --- a great
advantage for doing it manually.
By using a loop and repeating this formula until the better approximation gets close
enough to the previous one, we can write a function for computing the square root.
(In
fact, this is how your calculator finds square roots --- it may have a slightly
different
formula and method, but it is also based on repeatedly improving its
guesses.)
Ideally, we'd like the old and new guess to be exactly equal to each other when we stop.
But exact equality is a tricky notion in computer arithmetic when real numbers are
involved.
Because real numbers are not represented absolutely accurately (after all, a
number like pi or the
square root of two has an infinite number of decimal places
because it is irrational), we
need to formulate the stopping test for the loop by asking "is
a close enough to b"?
This stopping condition can be coded like this:
break
Notice that we take the absolute value of the difference between a and b!
while True:
return better
approx = better
5.00000000002
See if you can improve the approximations by changing the stopping condition. Also,
step through the algorithm (perhaps by hand, using your calculator) to see how many
iterations were needed before it achieved this level of accuracy for sqrt(25).
Algorithms
Newton's method is an example of an algorithm: it is a mechanical process
for solving a
category of problems (in this case, computing square roots).
Some kinds of knowledge are not algorithmic. For example, learning dates
from history
or your multiplication tables involves memorization of specific
solutions.
But the techniques you learned for addition with carrying, subtraction
with borrowing,
and long division are all algorithms. Or if you are an avid Sudoku
puzzle solver, you
might have some specific set of steps that you always follow.
One of the characteristics of algorithms is that they do not require any intelligence to
carry out. They are mechanical processes in which each step follows from the
last
according to a simple set of rules. And they're designed to solve a
general class or
category of problems, not just a single problem.
Glossary
algorithm
A step-by-step process for solving a category of problems.
body
The statements inside a loop.
breakpoint
A place in your program code where program execution will
pause (or break),
allowing you to inspect the state of the
program's variables, or single-step
through individual
statements, executing them one at a time.
bump
Programmer slang. Synonym for increment.
continue statement
A statement that causes the remainder of the current iteration
of a loop to be skipped.
The flow of execution goes back to
the top of the loop, evaluates the condition,
and if this is true
the next iteration of the loop will begin.
cursor
An invisible marker that keeps track of where the next
character will
be printed.
decrement
Decrease by 1.
development plan
A process for developing a program. In this chapter, we
demonstrated a
style of development based on developing
code to do simple, specific
things and then encapsulating
and generalizing.
encapsulate
To divide a large complex program into components (like
functions) and
isolate the components from each other (by
using local variables, for
example).
escape sequence
An escape character, \, followed by one or more printable
characters
used to designate a nonprintable character.
generalize
To replace something unnecessarily specific (like a constant
value)
with something appropriately general (like a variable
or parameter).
Generalization makes code more versatile,
more likely to be reused, and
sometimes even easier to write.
initialization (of a variable)
To initialize a variable is to give it an initial value.
Since in
Python variables don't exist
until they are assigned values,
they are initialized when they are
created. In other
programming languages this is not the case, and
variables
can be created without being initialized, in which case they
have either default or garbage values.
loop body
Any number of statements nested inside a loop. The nesting
is indicated
by the fact that the statements are indented
under the for loop statement.
loop variable
A variable used as part of a for loop. It is assigned a different
value on
each iteration of the loop.
middle-test loop
A loop that executes some of the body, then tests for the exit
condition,
and then may execute some more of the body. We
don't have a special
Python construct for this case, but can
use while and break together.
nested loop
A loop inside the body of another loop.
newline
A special character that causes the cursor to move to the
beginning of
the next line.
post-test loop
A loop that executes the body, then tests for the exit
condition. We don't have a special
Python construct for this,
but can use while and break together.
pre-test loop
A loop that tests before deciding whether the execute its
body. for and while
are both pre-test loops.
range
A built-in function in Python for generating sequences of
integers. It is especially
useful when we need to write a for
loop that executes a fixed number of times.
single-step
A mode of interpreter execution where you are able to
execute your
program one step at a time, and inspect the
consequences of that step.
Useful for debugging and building
your internal mental model of what is
going on.
tab
A special character that causes the cursor to move to the
next tab stop
on the current line.
trichotomy
Given any real numbers a and b, exactly one of the following
relations holds: a < b, a > b, or a == b. Thus when you can
establish that two of the relations are false, you can assume
the
remaining one is true.
trace
To follow the flow of execution of a program by hand,
recording the
change of state of the variables and any output
produced.
terminating condition
A condition that occurs which causes a loop to stop repeating
its body.
In the for loops we saw in this chapter, the
terminating condition
has been when there are no more
elements to assign to the loop variable.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
There are many modules in Python that provide very powerful features that we
can use
in our own programs. Some of these can send email, or fetch web pages.
The one we'll
look at in this chapter allows us to create turtles and get them
to draw shapes and
patterns.
The turtles are fun, but the real purpose of the chapter is to teach ourselves
a little more
Python, and to develop our theme of computational thinking,
or thinking like a computer
scientist. Most of the Python covered here
will be explored in more depth later.
We then create and open what it calls a screen (we would prefer to call it
a window),
which we assign to variable wn. Every window contains
a canvas, which is the area
inside the window on which we can draw.
In line 3 we create a turtle. The variable alex is made to refer to this turtle.
So these first three lines have set things up, we're ready to get our turtle to draw on our
canvas.
An object can have various methods --- things it can do --- and it can also have
attributes --- (sometimes called properties). For example, each turtle has
a color
attribute. The method invocation
alex.color("red") will make alex red, and drawing will be
red too.
(Note the word color is spelled the American way!)
The color of the turtle, the width of its pen, the position of the
turtle within the window,
which way it is facing, and so on are all part of its
current state. Similarly, the window
object has a background color, and
some text in the title bar, and a size and position on
the screen. These are all
part of the state of the window object.
Quite a number of methods exist that allow us to modify the turtle and the
window
objects. We'll just show a couple. In this program we've only commented those
lines that
are different from the previous example (and we've used a different
variable name for
this turtle):
import turtle
wn = turtle.Screen()
tess = turtle.Turtle()
tess.forward(50)
tess.left(120)
tess.forward(50)
wn.mainloop()
When we run this program, a new window pops up, and will remain on the
screen until
we close it.
Extend this program ...
wn.bgcolor("lightgreen")
tess.color("hotpink")
tess.pensize(5)
tess.left(120)
tess.forward(80)
tess.left(120)
tess.forward(80)
alex.left(90)
alex.forward(50)
alex.left(90)
alex.forward(50)
alex.left(90)
alex.forward(50)
alex.left(90)
wn.mainloop()
Here is what happens when alex completes his rectangle, and tess completes her
triangle:
There are 360 degrees in a full circle. If we add up all the turns that a turtle
makes,
no matter what steps occurred between the turns, we can easily figure
out if they
add up to some multiple of 360. This should convince us that alex is
facing in
exactly the same direction as he was when he was first created.
(Geometry
conventions have 0 degrees facing East, and that is the case here
too!)
We could have left out the last turn for alex, but that would not have been
as
satisfying. If we're asked to draw a closed shape like a
square or a rectangle, it is
a good idea to
complete all the turns and to leave the turtle back where it started,
facing the
same direction as it started in.
This makes reasoning about the
program and composing chunks of code into bigger programs
easier for us
humans!
We did the same with tess: she drew her triangle, and turned through a full 360
degrees.
Then we turned her around and moved her aside. Even the blank line
18
is a hint about how the programmer's mental chunking is working:
in big terms,
tess' movements were chunked as "draw the triangle"
(lines 12-17) and then
"move away from the origin" (lines 19 and 20).
One of the key uses for comments is to record our mental chunking, and big
ideas.
They're not always explicit in the code.
And, uh-huh, two turtles may not be enough for a herd. But the important idea is
that the
turtle module gives us a kind of factory that lets us create as many turtles
as we
need. Each instance has its own state and behaviour.
for i in [0,1,2,3]:
alex.forward(50)
alex.left(90)
Some observations:
While "saving some lines of code" might be convenient, it is not the big deal here.
What is much more important is that we've found a "repeating pattern" of
statements,
and reorganized our program to repeat the pattern. Finding the
chunks and somehow
getting our programs arranged around those chunks is a
vital
skill in computational thinking.
The values [0,1,2,3] were provided to make the loop body execute 4 times.
We
could
have used any four values, but these are the conventional ones to use. We
could have also used range:
for i in range(4):
Our little trick earlier to make sure that alex did the final turn to complete
360
degrees has paid off: if we had not done that, then we would not have been
able
to use a loop for the fourth side of the square.
It would have become a "special
case",
different from the other sides. When possible, we'd much prefer to make
our code fit a general pattern, rather than have to create a special case.
for i in range(4):
alex.forward(50)
alex.left(90)
By now you should be able to see how to change our previous program so that
tess can
also use a for loop to draw her equilateral triangle.
alex.color(c)
alex.forward(50)
alex.left(90)
A variable can also be assigned a value that is a list. So lists can also be used in
more
general situations, not only in the for loop. The code above could be rewritten like this:
# Assign a list to a variable
for c in clrs:
alex.color(c)
alex.forward(50)
alex.left(90)
This suggests that we don't need both a left and a right turn method --- we could be
minimalists, and just have one method. There is also a backward
method. (If you are
very nerdy, you might enjoy saying alex.backward(-100) to
move alex forward!)
Part of thinking like a scientist is to understand more of the structure and rich
relationships in our field. So revising a few basic facts about
geometry and number lines,
and spotting the relationships between left, right,
backward, forward, negative and
positive distances or angles values is a good start
if we're going to play with turtles.
A turtle's pen can be picked up or put down. This allows us to move a turtle
to a different
place without drawing a line. The methods are
alex.penup()
alex.pendown()
Every turtle can have its own shape. The ones available "out of the box"
are arrow,
blank, circle, classic, square, triangle, turtle.
alex.shape("turtle")
We can speed up or slow down the turtle's animation speed. (Animation controls how
quickly the turtle turns and moves forward). Speed settings can be set
between 1
(slowest) to 10 (fastest). But if we set the speed to 0, it has
a special meaning --- turn off
animation and go as fast as possible.
alex.speed(10)
wn = turtle.Screen()
wn.bgcolor("lightgreen")
tess = turtle.Turtle()
tess.shape("turtle")
tess.color("blue")
size = 20
for i in range(30):
wn.mainloop()
Be careful now! How many times was the body of the loop executed? How many turtle
images do we see on the screen? All except one of the shapes we see on the screen
here
are footprints created by stamp. But the program still only has one turtle
instance ---
can you figure out which one here is the real tess? (Hint: if you're not
sure, write a new
line of code after the for loop to change tess' color,
or to put her pen down and draw a
line, or to change her shape, etc.)
Glossary
attribute
Some state or value that belongs to a particular object. For
example, tess has
a color.
canvas
A surface within a window where drawing takes place.
instance
An object of a certain type, or class. tess and alex are different
instances of
the class Turtle.
method
A function that is attached to an object. Invoking or activating
the method
causes the object to respond in some way, e.g.
forward is the method
when we say tess.forward(100).
invoke
An object has methods. We use the verb invoke to mean
activate the
method. Invoking a method is done by putting
parentheses after the method
name, with some possible
arguments. So tess.forward() is an invocation
of the forward
method.
module
A file containing Python definitions and statements intended
for use in other
Python programs. The contents of a module
are made available to the other
program by using the import
statement.
object
A "thing" to which a variable can refer. This could be a screen
window,
or one of the turtles we have created.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Functions
Source: this section is heavily based on Chapter 4 of [ThinkCS].
Motivation
In an earlier chapter, we saw the following code:
count = 0
while n > 0:
count = count + 1
n = n // 10
print(count)
This code allowed the user to type in a number, performed a calculation on this number,
and then printed the outcome of this calculation.
In this specific case, we were counting the number of digits in the decimal
representation of an integer.
While this code is readable, and has desired functionality, most programmers do not
consider this good code.
As a programmer you will often have to write programs that provide additional and more
complex functionality than this. For instance, suppose we are implementing a calculator,
and this calculator should implement additional functionality such as addition,
subtraction, multiplication, division, sinus, cosinus...; such a program would become
very long very quickly. We need some approach to structure code if many different
functionalities have to be implemented.
The core building block in Python for organising the functionality of your code, is to
divide your code into functions.
In Python, a function is a named sequence of
statements
that belong together. Their primary purpose is to help us
organize programs
into chunks that match how we think about
the problem.
For our earlier example, this is one way to write this code in an equivalent manner using
a function:
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
return count
print(ndigits(number))
What we have done here, is that we have given a name to our calculation, ndigits. We
have specified that the code of this calculation operates on a variable n. At the end of
the calculation, we indicate that the outcome of the calculation is what is stored in the
variable n.
Subsequently, in the unnamed part of our code we ask the user of our program to type
in a number; for this number, we execute the ndigits function, and we print the outcome
of the calculation. Note that when we execute this program, it will start by asking the
user to give a number; the code within our function ndigits is not (yet) executed: this will
only happen after the user has typed in a number, in the last line of the program.
The structure of this code is better. We have separated the user interaction from the
calculation; it is now clear which chunk of the code calculates the number of digits.
Now it is easier to extend the program, for instance, consider this program:
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
return count
def sumdigits ( n ):
digitsum = 0
while n > 0:
digitsum = digitsum + n % 10
n = n // 10
return digitsum
if choice == 1:
print(ndigits(number))
if choice == 2:
print(sumdigits(number))
Here we have added second functionality to our program. Study for yourself what this
functionality does!
Moreover, it is easier to reuse code now. For instance, consider the following code:
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
return count
for i in [344,23,3493]:
print(ndigits(i))
Our function ndigits has not changed; we only changed how it is used. In this particular
case, we print the number of digits for 3 integers.
It is sometimes said that programmers are lazy; this is true. Most programmers want to
write the same code only once. Functions allow you to do so and be lazy.
The core insight we had here is that the calculation done in ndigits could be used in a
number of different ways; it is useful to put this code together and give it a name, such
that we can reuse it.
Syntax
The general syntax for a function definition is:
STATEMENTS
We can make up any names we want for the functions we create, except that
we can't
use a name that is a Python keyword, and the names must follow the rules
for legal
identifiers.
There can be any number of statements inside the function, but they have to be
indented from the def. In the examples in this book, we will use the
standard indentation
of four spaces. Function definitions are the second of
several compound statements
we will see, all of which have the same
pattern:
1. A header line which begins with a keyword and ends with a colon.
2. A body consisting of one or more Python statements, each
indented the same
amount --- the Python style guide recommends 4 spaces --- from
the header line.
We've already seen the for loop which follows this pattern.
So looking again at the function definition, the keyword in the header is def, which is
followed by the name of the function and some parameters enclosed in
parentheses.
The parameter list may be empty, or it may contain any number of
parameters
separated from one another by commas. In either case, the parentheses are required.
The parameters specifies what information, if any, we have to provide in order to use the
new function.
A Turtle Example
Suppose we're working with turtles, and a common operation we need is to draw
squares. "Draw a square" is an abstraction, or a mental
chunk, of a number of smaller
steps. So let's write a function to capture the pattern
of this "building block":
import turtle
for i in range(4):
t.forward(sz)
t.left(90)
wn.bgcolor("lightgreen")
wn.mainloop()
Docstrings are the key way to document our functions in Python and
the
documentation part is important. Because whoever calls our
function
shouldn't have to need to know what is going on in the
function or how it
works; they just need to know what arguments our
function takes, what it
does, and what the expected result is.
Enough to be able to use the
function without having to look
underneath. This goes back to the concept
of abstraction of which
we'll talk more about.
Defining a new function does not make the function run. To do that we need a
function
call. We've already seen how to call some built-in functions like
print, range and int.
Function calls contain the name of the function being
executed followed by a list of
values, called arguments, which are assigned
to the parameters in the function
definition. So in the second last line of
the program, we call the function, and pass alex
as the turtle to be manipulated,
and 50 as the size of the square we want. While the
function is executing, then, the
variable sz refers to the value 50, and the variable t refers
to the same
turtle instance that the variable alex refers to.
Once we've defined a function, we can call it as often as we like, and its
statements will
be executed each time we call it. And we could use it to get
any of our turtles to draw a
square. In the next example, we've changed the draw_square
function a little, and we get
tess to draw 15 squares, with some variations.
import turtle
t.color(i)
t.forward(sz)
t.left(90)
wn.bgcolor("lightgreen")
tess.pensize(3)
for i in range(15):
draw_multicolor_square(tess, size)
wn.mainloop()
So we eventually come up with this rather nice code that can draw a rectangle.
for i in range(2):
t.forward(w)
t.left(90)
t.forward(h)
t.left(90)
The parameter names are deliberately chosen as single letters to ensure they're not
misunderstood.
In real programs, once we've had more experience, we will insist on
better variable names than this.
But the point is that the program doesn't "understand"
that we're drawing a rectangle, or that the
parameters represent the width and the
height. Concepts like rectangle, width, and height are
the meaning we humans have, not
concepts that the program or the computer understands.
So far, it may not be clear why it is worth the trouble to create all of these
new functions.
Actually, there are a lot of reasons, but this example
demonstrates two:
Flow of execution
In order to ensure that a function is defined before its first use, we have to
know the
order in which statements are executed, which is called the flow of
execution. We've
already talked about this a little in an earlier chapter.
Execution always begins at the first statement of the program. Statements are
executed
one at a time, in order from top to bottom.
Function definitions do not alter the flow of execution of the program, but
remember that
statements inside the function are not executed until the
function is called. Although it is
not common, we can define one function
inside another. In this case, the inner definition
isn't executed until the
outer function is called.
Function calls are like a detour in the flow of execution. Instead of going to
the next
statement, the flow jumps to the first line of the called function,
executes all the
statements there, and then comes back to pick up where it left
off.
That sounds simple enough, until we remember that one function can call
another. While
in the middle of one function, the program might have to execute
the statements in
another function. But while executing that new function, the
program might have to
execute yet another function!
What's the moral of this sordid tale? When we read a program, don't read from
top to
bottom. Instead, follow the flow of execution.
Let us go back and see how this works with the program above that
draws 15
multicolor squares. First, we're going to add one line of magic
below
the import statement --- not strictly necessary, but it will make our
lives
much simpler, because it prevents stepping into the module
containing
the turtle code.
import turtle
__import__("turtle").__traceable__ = False
Now we're ready to begin. Put the mouse cursor on the line number of the
line
where we create the turtle screen, and double click. A red circle will
appear; this indicates we wish to stop the execution of the progam at this
line. Subsequently, start debugging the program by pressing the "debug"
button. This will run the Python
program up to, but not including, the line
where we have put the red circle. Our program
will "break" now, and
provide a highlight on the next line to be executed, something like this:
At this point we can press the F7 key (step into) repeatedly to single step
through
the code. Observe as we execute lines 10, 11, 12, ... how the
turtle window gets
created, how its canvas color is changed, how the title
gets changed, how the turtle is created on the canvas, and then how the
flow of execution gets into the loop, and from there into the function,
and
into the function's loop, and then repeatedly through the body of that loop.
While we do this, we also see the values of the variables, and can confirm
that
their values match our conceptual model of what is happening.
After a few loops, when we're about to execute line 20 and we're starting
to get bored, we can use the key F6
to "step over" the function we are
calling. This executes all the statements in the function, but without
having to step through each one. We always have the choice to either
"go
for the detail", or to "take the high-level view" and execute the function as
a single chunk.
There are some other options, including one that allow us to resume
execution without further stepping.
>>> abs(5)
5
>>> abs(-5)
In this example, the arguments to the abs function are 5 and -5.
Some functions take more than one argument. For example the built-in function
pow
takes two arguments, the base and the exponent. Inside the function,
the values that are
passed get assigned to variables called parameters.
>>> pow(2, 3)
8
>>> pow(7, 4)
2401
Another built-in function that takes more than one argument is max.
>>> max(7, 11)
11
17
503
max can be passed any number of arguments, separated by commas, and will
return the
largest value passed. The arguments can be either simple values or
expressions. In the
last example, 503 is returned, since it is larger than 33,
125, and 1.
So an important difference between these functions and one like draw_square is that
draw_square was not executed because we wanted it to compute a value --- on the
contrary,
we wrote draw_square because we wanted it to execute a sequence of steps
that caused
the turtle to draw.
"""
"""
a = p * (1 + r/n) ** (n*t)
This is a bit messy with all these decimal places, but remember that
Python
doesn't understand that we're working with money: it just does the calculation to
the best of its ability, without rounding. Later we'll see how to format the string
that
is printed in such a way that it does get nicely rounded to two decimal places
before printing.
Notice something else very important here. The name of the variable we pass as an
argument --- toInvest --- has nothing to do with the name of the parameter
--- p. It is as if
p = toInvest is executed when final_amt is called.
It doesn't matter what the value was
named in
the caller, in final_amt its name is p.
These short variable names are getting quite tricky, so perhaps we'd prefer one of these
versions instead:
numTimesPerYear, years):
a = principalAmount * (1 + nominalPercentageRate /
numTimesPerYear) ** (numTimesPerYear*years)
return a
return a
They all do the same thing. Use your judgement to write code that can be best
understood by other humans!
Short variable names are more economical and
sometimes make
code easier to read:
E = mc2 would not be nearly so memorable if
Einstein had
used longer variable names! If you do prefer short names,
make sure you
also have some comments to enlighten the reader
about what the variables are used for.
a = p * (1 + r/n) ** (n*t)
return a
>>> a
Turtles Revisited
Now that we have fruitful functions, we can focus our attention on
reorganizing our code
so that it fits more nicely into our mental chunks.
This process of rearrangement is called
refactoring the code.
Two things we're always going to want to do when working with turtles
is to create the
window for the turtle, and to create one or more turtles.
We could write some functions
to make these tasks easier in future:
def make_window(colr, ttle):
"""
Set up the window with the given background color and title.
"""
w = turtle.Screen()
w.bgcolor(colr)
w.title(ttle)
return w
"""
"""
t = turtle.Turtle()
t.color(colr)
t.pensize(sz)
return t
tess = make_turtle("hotpink", 5)
alex = make_turtle("black", 1)
dave = make_turtle("yellow", 2)
The trick about refactoring code is to anticipate which things we are likely to want to
change
each time we call the function: these should become the parameters, or
changeable parts,
of the functions we write.
We can get a turtle to display text on the canvas at the turtle's current position.
The method to do that is
alex.write("Hello").
We can fill a shape (circle, semicircle, triangle, etc.) with a color. It is a two-step
process.
First we call the method alex.begin_fill(), then we draw the shape, then
we call alex.end_fill().
We've previously set the color of our turtle --- we can now also set its fill color,
which need not
be the same as the turtle and the pen color. We use
alex.color("blue","red") to set the turtle
to draw in blue, and fill in red.
Ok, so can we get tess to draw a bar chart? Let us start with some data to be charted,
Corresponding to each data measurement, we'll draw a simple rectangle of that height,
with a fixed width.
t.left(90)
t.right(90)
t.right(90)
...
draw_bar(tess, v)
Ok, not fantasically impressive, but it is a nice start! The important thing here
was the
mental chunking, or how we broke the problem into smaller pieces. Our chunk
is to draw
one bar, and we wrote a function to do that. Then, for the whole
chart, we repeatedly
called our function.
Next, at the top of each bar, we'll print the value of the data.
We'll do this by adding a
function. In the body of draw_bar, by adding t.write(' ' + str(height))
as the new third line of
the body.
We've put a little space in front of the number, and turned the
number into a
string. Without this extra space we tend
to cramp our text awkwardly against the bar to
the left.
The result looks a lot better now:
And now we'll add two lines to fill each bar. Our final program now looks like this:
def draw_bar(t, height):
t.left(90)
t.forward(height)
t.right(90)
t.forward(40)
t.right(90)
t.forward(height)
t.left(90)
t.forward(10)
wn.bgcolor("lightgreen")
tess.color("blue", "red")
tess.pensize(3)
xs = [48,117,200,240,160,260,220]
for a in xs:
draw_bar(tess, a)
wn.mainloop()
The first source information is the website of Python itself. For the version of Python we
are using, this documentation can be found on https://docs.python.org/3/. For instance,
the documentation of the range function can be found here:
https://docs.python.org/3/library/stdtypes.html#typesseq-range.
When computers appeared on the scene, one of the initial reactions was, "This is
great!
We can use the computers to generate the tables, so there will be no
errors." That
turned out to be true (mostly) but shortsighted. Soon thereafter,
computers and
calculators were so pervasive that the tables became obsolete.
Well, almost. For some operations, computers use tables of values to get an
approximate answer and then perform computations to improve the approximation.
In
some cases, there have been errors in the underlying tables, most famously
in the table
the Intel Pentium processor chip used to perform floating-point division.
Although a log table is not as useful as it once was, it still makes a good
example. The
following program outputs a sequence of values in the
left column and 2 raised to the
power of that value in the right column:
An escape sequence can appear anywhere in a string; in this example, the tab
escape
sequence is the only thing in the string.
The tab character shifts the cursor to the right until it reaches one of the
tab stops. Tabs
are useful for making columns of text line up, as in the output
of the previous program:
0 1
1 2
2 4
3 8
4 16
5 32
6 64
7 128
8 256
9 512
10 1024
11 2048
12 4096
Because of the tab characters between the columns, the position of the second
column
does not depend on the number of digits in the first column.
Two-dimensional tables
Let's now make our example a little more complex.
A two-dimensional table is a table where you read the value at the intersection
of a row
and a column. A multiplication table is a good example. Let's say you
want to print a
multiplication table for the values from 1 to 6.
A good way to start is to write a loop that prints the multiples of 2, all on
one line:
for i in range(6):
print()
Again, the extra end=" " argument in the print function suppresses the newline, and
uses three spaces instead. After the
loop completes, the call to print at line 3 finishes the
current line, and starts a new line.
2 4 6 8 10 12
So far, so good. We now have some pieces of code that we wish to put in functions. The
next step is to encapsulate and generalize.
def print_multiples(n):
for i in range(6):
print()
To encapsulate, all we had to do was add the first line, which declares the
name of the
function and the parameter list. To generalize, all we had to do
was replace the value 2
with the parameter n.
If we call this function with the argument 2, we get the same output as before.
With the
argument 3, the output is:
3 6 9 12 15 18
4 8 12 16 20 24
By now you can probably guess how to print a multiplication table --- by
calling
print_multiples repeatedly with different arguments. In fact, we
can use another loop:
for i in range(6):
print_multiples(i+1)
Notice how similar this loop is to the one inside print_multiples. All we
did was replace
the print function with a function call.
3 6 9 12 15 18
4 8 12 16 20 24
5 10 15 20 25 30
6 12 18 24 30 36
More encapsulation
To demonstrate encapsulation again, let's take the code from the last section
and wrap it
up in a function:
def print_mult_table():
for i in range(6):
print_multiples(i+1)
This development plan is particularly useful if you don't know how to divide
the program
into functions when you start writing. This approach lets you
design as you go along.
Local variables
You might be wondering how we can use the same variable, i, in both
print_multiples and
print_mult_table. Doesn't it cause problems when
one of the functions changes the value
of the variable?
The answer is no, because the i in print_multiples and the i in
print_mult_table are not the
same variable.
Variables created inside a function definition are local; you can't access a
local variable
from outside its home function. That means you are free to have
multiple variables with
the same name as long as they are not in the same
function.
Python examines all the statements in a function --- if any of them assign a value
to a
variable, that is the clue that Python uses to make the variable a local variable.
The stack diagram for this program shows that the two variables named i are
not the
same variable. They can refer to different values, and changing one
does not affect the
other.
It is common and perfectly legal to have different local variables with the
same name. In
particular, names like i and j are used frequently as
loop variables. If you avoid using
them in one function just because you used
them somewhere else, you will probably
make the program harder to read.
More generalization
As another example of generalization, imagine you wanted a program that would
print a
multiplication table of any size, not just the six-by-six table. You
could add a parameter
to print_mult_table:
def print_mult_table(high):
for i in range(high):
print_multiples(i+1)
1 2 3 4 5 6
2 4 6 8 10 12
3 6 9 12 15 18
4 8 12 16 20 24
5 10 15 20 25 30
6 12 18 24 30 36
7 14 21 28 35 42
This is fine, except that we probably want the table to be square --- with the
same
number of rows and columns. To do that, we add another parameter to
print_multiples to
specify how many columns the table should have.
for i in range(high):
print()
def print_mult_table(high):
for i in range(high):
print_multiples(i+1, high)
Notice that when we added a new parameter, we had to change the first line of
the
function (the function heading), and we also had to change the place where
the function
is called in print_mult_table.
1 2 3 4 5 6 7
2 4 6 8 10 12 14
3 6 9 12 15 18 21
4 8 12 16 20 24 28
5 10 15 20 25 30 35
6 12 18 24 30 36 42
7 14 21 28 35 42 49
When you generalize a function appropriately, you often get a program with
capabilities
you didn't plan. For example, you might notice that, because ab =
ba, all the entries in
the table appear twice. You could save ink by printing
only half the table. To do that, you
only have to change one line of
print_mult_table. Change
print_multiples(i, high)
to
print_multiples(i, i)
1
2 4
3 6 9
4 8 12 16
5 10 15 20 25
6 12 18 24 30 36
7 14 21 28 35 42 49
Functions
A few times now, we have mentioned all the things functions are good for. Let's
summarize this:
1. Capturing your mental chunking. Breaking your complex tasks into sub-tasks, and
giving the sub-tasks a meaningful name is a powerful mental technique. Look
back
at the example that illustrated the post-test loop: we assumed that we had a
function
called play_the_game_once. This chunking allowed us to put aside the
details
of the particular game --- is it a card game, or noughts and crosses, or a
role playing
game --- and simply focus on one isolated part of our program logic --
- letting the player
choose whether they want to play again.
2. Dividing a long program into functions allows you to separate parts of the
program, debug them in isolation, and then compose them into a whole.
3. Functions facilitate the use of iteration.
4. Well-designed functions are often useful for many programs. Once you write
and
debug one, you can reuse it.
Glossary
argument
A value provided to a function when the function is called.
This value
is assigned to the corresponding parameter in the
function. The argument
can be the result of an expression
which may involve operators,
operands and calls to other
fruitful functions.
body
The second part of a compound statement. The body
consists of a
sequence of statements all indented the same
amount from the beginning
of the header. The standard
amount of indentation used within the
Python community is 4
spaces.
compound statement
A statement that consists of two parts:
keyword ... :
statement
statement ...
docstring
A special string that is attached to a function as its __doc__
attribute.
Tools can use docstrings to provide documentation
or hints for the programmer.
When we get to modules,
classes, and methods, we'll see that docstrings can also be
used there.
flow of execution
The order in which statements are executed during a
program run.
frame
A box in a stack diagram that represents a function call. It
contains
the local variables and parameters of the function.
function
A named sequence of statements that performs some useful
operation.
Functions may or may not take parameters and
may or may not produce a
result.
function call
A statement that executes a function. It consists of the name
of the
function followed by a list of arguments enclosed in
parentheses.
function composition
Using the output from one function call as the input to
another.
function definition
A statement that creates a new function, specifying its name,
parameters, and the statements it executes.
fruitful function
A function that returns a value when it is called.
header line
The first part of a compound statement. A header line begins
with a keyword and
ends with a colon (:)
import statement
A statement which permits functions and variables defined in
another Python
module to be brought into the environment of
another script. To use the
features of the turtle, we need to
first import the turtle module.
lifetime
Variables and objects have lifetimes --- they are created at
some point during
program execution, and will be destroyed
at some time.
local variable
A variable defined inside a function. A local variable can only
be used
inside its function. Parameters of a function are also
a special kind
of local variable.
parameter
A name used inside a function to refer to the value which was
passed
to it as an argument.
refactor
A fancy word to describe reorganizing our program code,
usually to make
it more understandable. Typically, we have a
program that is already working,
then we go back to "tidy it
up". It often involves choosing better variable
names, or
spotting repeated patterns and moving that code into a
function.
stack diagram
A graphical representation of a stack of functions, their
variables,
and the values to which they refer.
traceback
A list of the functions that are executing, printed when a
runtime
error occurs. A traceback is also commonly refered
to as a
stack trace, since it lists the functions in the order in
which they
are stored in the
runtime stack.
void function
The opposite of a fruitful function: one that does not return a
value. It is
executed for the work it does, rather than for the
value it returns.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Fruitful functions
Source: this section is heavily based on Chapter 6 of [ThinkCS].
Return values
The built-in functions we have used, such as abs, pow, int, max, and range,
have
produced results. Calling each of these functions generates a value, which
we usually
assign to a variable or use as part of an expression.
biggest = max(3, 7, 2, 5)
x = abs(3 - 11) + 10
We also wrote our own function to return the final amount for a compound interest
calculation.
In this chapter, we are going to write more functions that return values, which we
will call
fruitful functions, for want of a better name. The first example
is area, which returns the
area of a circle with the given radius:
def area(radius):
b = 3.14159 * radius**2
return b
We have seen the return statement before, but in a fruitful function the
return statement
includes a return value. This statement means: evaluate
the return expression, and
then return it immediately as the result (the fruit)
of this function. The expression
provided can be arbitrarily complicated,
so we could have written this function like this:
def area(radius):
On the other hand, temporary variables like b above often make debugging
easier.
def absolute_value(x):
if x < 0:
return -x
else:
return x
Another way to write the above function is to leave out the else and just
follow the if
condition by the second return statement.
def absolute_value(x):
if x < 0:
return -x
return x
Think about this version and convince yourself it works the same as the first
one.
Code that appears after a return statement, or any other place the flow of
execution can
never reach, is called dead code, or unreachable code.
if x < 0:
return -x
elif x > 0:
return x
>>> print(bad_absolute_value(0))
None
All Python functions return None whenever they do not return another value.
for wd in xs:
if len(wd) == 2:
return wd
return ""
'is'
''
Single-step through this code and convince yourself that in the first test case
that we've
provided, the function returns while processing the second element
in the list: it does not
have to traverse the whole list.
Return vs print
The purpose of a fruitful function is to do a calculation, the result of which is used later in
another part of the program; as we have seen now, return is used to return a value to the
rest of the program.
However, in earlier chapters, we have often used functions such as print and input. For
instance, we wrote:
count = 0
while n > 0:
count = count + 1
n = n // 10
print(count)
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
return count
At the exam, we will (almost) always ask to write programs that interact with our
evaluation system, not with some hypothetical user. Hence, if at the exam we ask you to
write a function that returns the number of digits in the decimal representation of an
integer, code without return is absolutely incorrect; you will not get any points for it!
Similarly, the following answers are all incorrect:
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
print(count)
def ndigits ( n ):
count = 0
while n > 0:
count = count + 1
n = n // 10
print(count)
The core problem with these implementations is that input and print are functions that
interact with the user, but not with the rest of the program. In all functions that use print,
print will display the result of the calculation on the screen, but it will not provide this
result to the rest of the program. If at the exam the rest of this program has to calculate
a grade for your answer, without return it has nothing to work on!!!
Hence, be very careful with the use of print; when writing functions, you will need return
most of the times.
Program development
At this point, you should be able to look at complete functions and tell what
they do.
Also, if you have been doing the exercises, you have written some
small functions. As
you write larger functions, you might start to have more
difficulty, especially with runtime
and semantic errors.
To deal with increasingly complex programs, we are going to suggest a technique
called
incremental development. The goal of incremental development is to
avoid long
debugging sessions by adding and testing only a small amount of code
at a time.
As an example, suppose we want to find the distance between two points, given
by the
coordinates (x1, y1) and
(x2, y2). By the Pythagorean theorem, the distance is:
The first step is to consider what a distance function should look like in
Python. In other
words, what are the inputs (parameters) and what is the output
(return value)?
In this case, the two points are the inputs, which we can represent using four
parameters. The return value is the distance, which is a floating-point value.
Already we can write an outline of the function that captures our thinking so far:
return 0.0
>>> distance(1, 2, 4, 6)
0.0
We chose these values so that the horizontal distance equals 3 and the vertical
distance
equals 4; that way, the result is 5 (the hypotenuse of a 3-4-5
triangle). When testing a
function, it is useful to know the right answer.
At this point we have confirmed that the function is syntactically correct, and
we can
start adding lines of code. After each incremental change, we test the
function again. If
an error occurs at any point, we know where it must be --- in
the last line we added.
dx = x2 - x1
dy = y2 - y1
return 0.0
If we call the function with the arguments shown above, when the flow of execution
gets
to the return statement, dx should be 3 and dy should be 4.
We can check that this is the
case in Thonny by putting the cursor on
the return statement, and putting a breakpoint
on that line.
Then we inspect the variables dx and dy by hovering the mouse above
them, to confirm that the function is getting the right parameters and performing the first
computation correctly. If not, there are only a few lines to check.
dx = x2 - x1
dy = y2 - y1
return 0.0
Again, we could run the program at this stage and check the value of dsquared (which
should be 25).
Finally, using the fractional exponent 0.5 to find the square root,
we compute and return
the result:
dx = x2 - x1
dy = y2 - y1
result = dsquared**0.5
return result
If that works correctly, you are done. Otherwise, you might want to inspect the
value of
result before the return statement.
When you start out, you might add only a line or two of code at a time. As you
gain more
experience, you might find yourself writing and debugging bigger
conceptual chunks.
Either way, stepping through your code one line at a time and
verifying that each step
matches your expectations can save you a lot of
debugging time. As you improve your
programming skills you should find yourself
managing bigger and bigger chunks: this is
very similar to the way we learned to read
letters, syllables, words, phrases, sentences,
paragraphs, etc., or the way we learn
to chunk music --- from individual notes to chords,
bars, phrases, and so on.
1. Start with a working skeleton program and make small incremental changes. At
any
point, if there is an error, you will know exactly where it is.
2. Use temporary variables to refer to intermediate values so that you
can easily
inspect and check them.
3. Once the program is working, relax, sit back, and play around with your options.
(There is interesting research that links "playfulness" to better understanding,
better learning, more enjoyment, and a more positive mindset about
what you can
achieve --- so spend some time fiddling around!)
You might want to consolidate
multiple statements into one bigger compound expression,
or rename the
variables you've used, or see if you can make the function shorter.
A good
guideline is to aim for making code as easy as possible for others to read.
Here is another version of the function. It makes use of a square root function
that is in
the math module (we'll learn about modules shortly). Which do you
prefer? Which looks
"closer" to the Pythagorean formula we started out with?
import math
>>> distance(1, 2, 4, 6)
5.0
You must have a clear solution to the problem, and must know what should
happen before you can debug a program. Work on solving the problem
on a
piece of paper (perhaps using a flowchart to record the steps you take)
before
you concern yourself with
writing code. Writing a program doesn't solve the
problem --- it simply automates
the manual steps you would take. So first make
sure you have
a pen-and-paper manual solution that works.
Programming then is
about making those manual steps happen automatically.
Do not write chatterbox functions. A chatterbox is a fruitful
function that, in
addition to its primary task, also asks the user for input,
or prints output, when it
would be more useful
if it simply shut up and did its work quietly.
For example, we've seen built-in functions like range,
max and abs. None of these
would be useful building blocks for other
programs if they prompted the user for
input, or printed their results while
they performed their tasks.
So we wish to repeat our statement earlier in this chapter: avoid calling print and
input functions inside
fruitful functions, unless the primary purpose of your
function is to
perform input and output. The one exception
to this rule might be to
temporarily sprinkle some calls to print into
your code to help debug and
understand what is happening when the code runs,
but these will then be
removed once you get things working; once more, also at the exam, these
statements are not considered.
Composition
As you should expect by now, you can call one function from within another.
This ability
is called composition.
As an example, we'll write a function that takes two points, the center of the
circle and a
point on the perimeter, and computes the area of the circle.
Assume that the center point is stored in the variables xc and yc, and
the perimeter point
is in xp and yp. The first step is to find the
radius of the circle, which is the distance
between the two points.
Fortunately, we've just written a function, distance, that does just
that,
so now all we have to do is use it:
The second step is to find the area of a circle with that radius and return it.
Again we will
use one of our earlier functions:
result = area(radius)
return result
result = area(radius)
return result
The temporary variables radius and result are useful for development,
debugging, and
single-stepping through the code to inspect what is happening,
but once the program is
working, we can make it more concise by
composing the function calls:
Boolean functions
Functions can return Boolean values, which is often convenient for hiding
complicated
tests inside functions. For example:
def is_divisible(x, y):
if x % y == 0:
return True
else:
return False
We can make the function more concise by taking advantage of the fact that the
condition of the if statement is itself a Boolean expression. We can return
it directly,
avoiding the if statement altogether:
return x % y == 0
>>> is_divisible(6, 4)
False
>>> is_divisible(6, 3)
True
else:
if is_divisible(x, y) == True:
We'll have more to say about style as our programs become more complex, but a
few
pointers will be helpful already:
Unit testing also forces the programmer to think about the different cases
that the
function needs to handle. You also only have to type the tests once
into the script, rather
than having to keep entering the same test data over
and over as you develop your
code.
Extra code in your program which is there because it makes debugging or testing
easier
is called scaffolding.
Let's start with the absolute_value function that we wrote earlier in this
chapter. Recall that
we wrote a few different versions, the last of which was
incorrect, and had a bug. Would
tests have caught this bug?
import sys
def test(did_pass):
if did_pass:
else:
print(msg)
There is also some slightly tricky string formatting using the format method
which we will
gloss over for the moment, and cover in detail in a future chapter.
But with this function
written, we can proceed to construct our test suite:
def test_suite():
""" Run the suite of tests for code in this module (this file).
"""
test(absolute_value(17) == 17)
test(absolute_value(-17) == 17)
test(absolute_value(0) == 0)
test(absolute_value(3.14) == 3.14)
test(absolute_value(-3.14) == 3.14)
Here you'll see that we've constructed five tests in our test suite. We could run this
against the first or second versions (the correct versions) of absolute_value,
and we'd get
output similar to the following:
But let's say you change the function to an incorrect version like this:
def absolute_value(n): # Buggy version
if n < 0:
return 1
elif n > 0:
return n
Can you find at least two mistakes in this code? Our test suite can! We get:
There is a built-in Python statement called assert that does almost the same
as our test
function (except the program stops when the first assertion fails).
You may want to read
about it, and use it instead of our test function.
if x <= 0:
return
result = x**0.5
Glossary
Boolean function
A function that returns a Boolean value. The only possible
values of the bool type are False and True.
chatterbox function
A function which interacts with the user (using input or print)
when
it should not. Silent functions that just convert their
input arguments into
their output results are usually the most
useful ones.
composition (of functions)
Calling one function from within the body of another, or using
the
return value of one function as an argument to the call of
another.
dead code
Part of a program that can never be executed, often because
it appears
after a return statement.
fruitful function
A function that yields a return value instead of None.
incremental development
A program development plan intended to simplify debugging
by adding and
testing only a small amount of code at a time.
None
A special Python value. One use in Python is that it is
returned
by functions that do not execute a return statement
with a return argument.
return value
The value provided as the result of a function call.
scaffolding
Code that is used during program development to assist with
development
and debugging. The unit test code that we
added in this chapter are
examples of scaffolding.
temporary variable
A variable used to store an intermediate value in a complex
calculation.
test suite
A collection of tests for some code you have written.
unit testing
An automatic procedure used to validate that individual units
of code
are working properly. Having a test suite is extremely
useful when somebody
modifies or extends the code: it
provides a safety net against
going backwards by putting
new bugs into previously working code.
The term regression
testing is often used to capture this idea that we
don't want to
go backwards!
wrapping code in a function
The process of adding a function header and parameters to a
sequence
of program statements is often refered to as
"wrapping the code in
a function". This process is very useful
whenever the program
statements in question are going to
be used multiple times. It is
even more useful when it allows
the programmer to express their mental
chunking, and how
they've broken a complex problem into pieces.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Modules
Source: this section is heavily based on Chapter 12 of [ThinkCS].
We have also shown you how to access help. The help system contains
a listing of all
the standard modules that are available with Python.
Play with help!
Random numbers
We often want to use random numbers in programs, here are a few typical uses:
To play a game of chance where the computer needs to throw some dice, pick a
number, or flip a coin,
To shuffle a deck of playing cards randomly,
To allow/make an enemy spaceship appear at a random location and start
shooting at the player,
To simulate possible rainfall when we make a computerized model for
estimating
the environmental impact of building a dam,
For encrypting banking sessions on the Internet.
Python provides a module random that helps with tasks like this. You can
look it up using
help, but here are the key things we'll do with it:
import random
rng = random.Random()
The randrange method call generates an integer between its lower and upper
argument,
using the same semantics as range --- so the lower bound is included, but
the upper
bound is excluded. All the values have an equal probability of occurring
(i.e. the results
are uniformly distributed). Like range, randrange can
also take an optional step argument.
So let's assume we needed a random odd number less
than 100, we could say:
The random method returns a floating point number in the interval [0.0, 1.0) --- the
square bracket means "closed interval on the left" and the round parenthesis means
"open interval on the right". In other words, 0.0 is possible, but all returned
numbers will
be strictly less than 1.0. It is usual to scale the results after
calling this method, to get
them into an interval suitable for your application. In the
case shown here, we've
converted the result of the method call to a number in
the interval [0.0, 5.0). Once more,
these are uniformly distributed numbers --- numbers
close to 0 are just as likely to occur
as numbers close to 0.5, or numbers close to 1.0.
This example shows how to shuffle a list. (shuffle cannot work directly
with a lazy
promise, so notice that we had to convert the range object
using the list type converter
first.)
This alternative way of creating a random number generator gives an explicit seed
value
to the object. Without this argument, the system probably uses something based
on the
time. So grabbing some random numbers from drng today will give you
precisely the
same random sequence as it will tomorrow!
Picking balls from bags, throwing dice, shuffling a pack of
cards
Here is an example to generate a list containing n random ints between a
lower and an
upper bound:
import random
"""
"""
result = []
for i in range(num):
result.append(rng.randrange(lower_bound, upper_bound))
return result
[8, 1, 8, 5, 6]
But what if you don't want duplicates? If you wanted 5 distinct months,
then this
algorithm is wrong. In this case a good algorithm is to generate the
list of possibilities,
shuffle it, and slice off the number of elements you want:
xs = list(range(1,13)) # Make list 1..12 (there are no duplicates)
In statistics courses, the first case --- allowing duplicates --- is usually
described as
pulling balls out of a bag with replacement --- you put the drawn
ball back in each time,
so it can occur again. The latter case, with no duplicates,
is usually described as pulling
balls out of the bag without replacement. Once the
ball is drawn, it doesn't go back to be
drawn again. TV lotto games work like this.
"""
"""
result = []
rng = random.Random()
for i in range(num):
while True:
break
result.append(candidate)
return result
xs = make_random_ints_no_dups(5, 1, 10000000)
print(xs)
Even this function has its pitfalls. Can you spot what is going to happen in
this case?
xs = make_random_ints_no_dups(10, 1, 6)
The way to use it is to call clock and assign the result to a variable, say t0,
just before
you start executing the code you want to measure. Then after execution, call
clock
again, (this time we'll save the result in variable t1). The difference
t1-t0 is the time
elapsed, and is a measure of how fast your program is running.
Let's try a small example. Python has a built-in sum function that can sum the
elements
in a list. We can also write our own. How do we think they would compare
for speed?
We'll try to do the summation of a list [0, 1, 2 ...] in both cases, and
compare the results:
import time
def do_my_sum(xs):
sum = 0
for v in xs:
sum += v
return sum
testdata = range(sz)
t0 = time.clock()
my_result = do_my_sum(testdata)
t1 = time.clock()
.format(my_result, t1-t0))
t2 = time.clock()
their_result = sum(testdata)
t3 = time.clock()
.format(their_result, t3-t2))
So our function runs about 57% slower than the built-in one.
Generating and summing
up ten million elements in under a second is not too shabby!
The math module
The math module contains the kinds of mathematical functions you'd typically find on
your
calculator (sin, cos, sqrt, asin, log, log10) and some mathematical constants
like pi
and e:
3.141592653589793
2.718281828459045
1.4142135623730951
1.5707963267948966
1.0
3.141592653589793
Like almost all other programming languages, angles are expressed in radians
rather
than degrees. There are two functions radians and degrees to
convert between these two
popular ways of measuring angles.
Notice another difference between this module and our use of random and turtle:
in
random and turtle we create objects and we call methods on the object. This is
because
objects have state --- a turtle has a color, a position, a heading, etc.,
and every random
number generator has a seed value that determines its next result.
Mathematical functions are "pure" and don't have any state --- calculating the square
root of
2.0 doesn't depend on any kind of state or history about what happened in the
past.
So the functions are not methods of an object ---
they are simply functions that are
grouped together in a module called math.
Creating your own modules
All we need to do to create our own modules is to save our script as
a file with a .py
extension. Suppose, for example, this script is
saved as a file named seqtools.py:
We can now use our module, both in scripts we write, or in the interactive Python
interpreter. To do so, we
must first import the module.
>>> seqtools.remove_at(4, s)
'A sting!'
The use of modules makes it possible to break up very large programs into
manageable
sized parts, and to keep related parts together.
Namespaces
A namespace is a collection of identifiers that belong to
a module, or to a function, (and
as we will see soon, in classes too). Generally,
we like a namespace to hold "related"
things, e.g. all the math functions, or all
the typical things we'd do with random numbers.
Each module has its own namespace, so we can use the same identifier name in
multiple modules without causing an identification problem.
# Module1.py
answer = 42
# Module2.py
We can now import both modules and access question and answer in each:
import module1
import module2
print(module1.question)
print(module2.question)
print(module1.answer)
print(module2.answer)
42
n = 7
def g():
n = 42
n = 11
f()
g()
printing n inside of f: 7
printing n inside of g: 42
The three n's here do not collide since they are each in a different
namespace --- they
are three names for three different variables, just like
there might be three different
instances of people, all called "Bruce".
Local scope refers to identifiers declared within a function. These identifiers are
kept
in the namespace that belongs to the function, and each function has its own
namespace.
Global scope refers to all the identifiers declared within the current module, or
file.
Built-in scope refers to all the identifiers built into Python --- those like range and
min that can be used without having to import anything, and are (almost) always
available.
Python (like most other computer languages) uses precedence rules: the same name
could occur in
more than one of these scopes, but the innermost, or local scope, will
always take
precedence over the global scope, and the global scope always gets used
in preference to the
built-in scope. Let's start with a simple example:
def range(n):
return 123*n
print(range(10))
What gets printed? We've defined our own function called range, so there
is now a
potential ambiguity. When we use range, do we mean our own one,
or the built-in one?
Using the scope lookup rules determines this: our own
range function, not the built-in
one, is called, because our function range
is in the global namespace, which takes
precedence over the built-in names.
So although names likes range and min are built-in, they can be "hidden"
from your use if
you choose to define your own variables or functions that reuse
those names. (It is a
confusing practice to redefine built-in names --- so to be
a good programmer you need
to understand the scope rules and understand
that you can do nasty things that will
cause confusion, and then you avoid doing them!)
n = 10
m = 3
def f(n):
m = 7
return 2*n+m
print(f(5), n, m)
This prints 17 10 3. The reason is that the two variables m and n in lines 1 and 2
are
outside the function in the global namespace. Inside the function, new variables
called n
and m are created just for the duration of the execution of f. These are
created in the
local namespace of function f. Within the body of f, the scope lookup rules
determine
that we use the local variables m and n. By contrast, after we've returned from f,
the n
and m arguments to the print function refer to the original variables
on lines 1 and 2, and
these have not been changed in any way by executing function f.
Notice too that the def puts name f into the global namespace here. So it can be
called
on line 7.
What is the scope of the variable n on line 1? Its scope --- the region in which it is
visible
--- is lines 1, 2, 6, 7. It is hidden from view in lines 3, 4, 5 because of the
local variable n.
Modules contain functions as well as attributes, and the dot operator is used
to access
them in the same way. seqtools.remove_at refers to the
remove_at function in the seqtools
module.
Here just the single identifier math is added to the current namespace. If you want to
access one of the functions in the module, you need to use the dot notation to get to it.
x = sqrt(10)
The names are added directly to the current namespace, and can be used without
qualification. The name
math is not itself imported, so trying to use the qualified form
math.sqrt would give an error.
>>> m.pi
3.141592653589793
But hey, with nice editors that do auto-completion, and fast fingers,
that's a small price!
def area(radius):
import math
return math.pi * radius * radius
Here we imported math, but we imported it into the local namespace of area.
So the
name is usable within the function body, but not in the enclosing script,
because it is not
in the global namespace.
Glossary
attribute
A variable defined inside a module (or class or instance -- as
we will
see later). Module attributes are accessed by using
the dot
operator (.).
dot operator
The dot operator (.) permits access to attributes and
functions of
a module (or attributes and methods of a class or
instance -- as we
have seen elsewhere).
fully qualified name
A name that is prefixed by some namespace identifier and
the dot operator, or
by an instance object, e.g. math.sqrt or
tess.forward(10).
import statement
A statement which makes the objects contained in a module
available for
use within another module. There are two forms
for the import
statement. Using hypothetical modules named
mymod1 and mymod2
each containing
functions f1 and f2,
and variables v1 and v2, examples
of these two forms
include:
import mymod1
>>> s.upper()
'THIS IS A STRING.'
>>>
import string
instead of
standard library
A library is a collection of software used as tools in the
development
of other software. The standard library of a
programming language is
the set of such tools that are
distributed with the core programming
language. Python
comes with an extensive standard library.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
2 - Lists
4 - Tuples
6 - Search algorithms
7 - Files
8 - Exceptions
9 - Dictionaries
Strings
Source: this section is heavily based on Chapter 8 of [ThinkCS].
For example:
>>> tt = ss.upper()
>>> tt
'HELLO, WORLD!'
>>> m = fruit[1]
>>> print(m)
The expression fruit[1] selects character number 1 from fruit, and creates a new
string
containing just this one character. The variable m refers to the result.
When we display
m, we could get a surprise:
>>> m = fruit[0]
>>> print(m)
We can use enumerate to visualize the indices that can be used to access a string:
>>> list(enumerate(fruit))
[(0, 'b'), (1, 'a'), (2, 'n'), (3, 'a'), (4, 'n'), (5, 'a')]
Note that indexing returns a string --- Python has no special type for a single character.
It is just a string of length 1.
We've also seen lists previously. The same indexing notation works to extract elements
from
a list:
>>> prime_nums = [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31]
>>> prime_nums[4]
11
>>> friends[3]
'Angelina'
Length
The len function, when applied to a string, returns the number of characters in a string:
>>> fruit = "banana"
>>> len(fruit)
To get the last letter of a string, you might be tempted to try something like
this:
sz = len(fruit)
sz = len(fruit)
last = fruit[sz-1]
Alternatively, we can use negative indices, which count backward from the
end of the
string. The expression fruit[-1] yields the last letter,
fruit[-2] yields the second to last, and
so on.
As you might have guessed, indexing with a negative index also works like this for lists.
We won't use negative indexes in the rest of these notes --- not many computer
languages
use this idiom, and you'll probably be better off avoiding it. But there is plenty
of
Python code out on the Internet that will use this trick, so it is best to know that it
exists.
Traversal and the for loop
A lot of computations involve processing a string one character at a time.
Often they
start at the beginning, select each character in turn, do something
to it, and continue
until the end. This pattern of processing is called a
traversal. One way to encode a
traversal is with a while statement:
ix = 0
letter = fruit[ix]
print(letter)
ix += 1
This loop traverses the string and displays each letter on a line by itself.
The loop
condition is ix < len(fruit), so when ix is equal to the
length of the string, the condition is
false, and the body of the loop is not
executed. The last character accessed is the one
with the index
len(fruit)-1, which is the last character in the string.
But we've previously seen how the for loop can easily iterate over
the elements in a list
and it can do so for strings as well:
for c in fruit:
print(c)
Each time through the loop, the next character in the string is assigned to the
variable c.
The loop continues until no characters are left. Here we
can see the expressive power
the for loop gives us compared to the
while loop when traversing a string.
The following example shows how to use concatenation and a for loop to
generate an
abecedarian series. Abecedarian refers to a series or list in which
the elements appear
in alphabetical order. For example, in Robert McCloskey's
book Make Way for
Ducklings, the names of the ducklings are Jack, Kack, Lack,
Mack, Nack, Ouack, Pack,
and Quack. This loop outputs these names in order:
prefixes = "JKLMNOPQ"
suffix = "ack"
for p in prefixes:
print(p + suffix)
Jack
Kack
Lack
Mack
Nack
Oack
Pack
Qack
Of course, that's not quite right because Ouack and Quack are misspelled.
You'll fix this
as an exercise below.
Slices
A substring of a string is obtained by taking a slice. Similarly, we can
slice a list to refer
to some sublist of the items in the list:
>>> s = "Pirates of the Caribbean"
>>> print(s[0:7])
Pirates
>>> print(s[11:14])
the
>>> print(s[15:24])
Caribbean
>>> print(friends[2:4])
['Brad', 'Angelina']
The operator [n:m] returns the part of the string from the n'th character
to the m'th
character, including the first but excluding the last. This
behavior makes sense if you
imagine the indices
pointing between the characters, as in the following diagram:
If you imagine this as a piece of paper, the slice operator [n:m] copies out
the part of the
paper between the n and m positions. Provided m and n are
both within the bounds of
the string, your result will be of length (m-n).
Three tricks are added to this: if you omit the first index (before the colon),
the slice
starts at the beginning of the string (or list). If you omit the second index,
the slice
extends to the end of the string (or list). Similarly, if you provide value for
n that is bigger
than the length of the string (or list), the slice will take all the
values up to the end. (It
won't give an "out of range" error like the normal indexing operation
does.) Thus:
>>> fruit = "banana"
>>> fruit[:3]
'ban'
>>> fruit[3:]
'ana'
>>> fruit[3:999]
'ana'
String comparison
The comparison operators work on strings. To see if two strings are equal:
if word == "banana":
else:
This is similar to the alphabetical order you would use with a dictionary,
except that all
the uppercase letters come before all the lowercase letters. As
a result:
print(greeting)
Instead of producing the output Jello, world!, this code produces the
runtime error
TypeError: 'str' object does not support item assignment.
Strings are immutable, which means you can't change an existing string. The
best you
can do is create a new string that is a variation on the original:
print(new_greeting)
A find function
What does the following function do?
"""
"""
ix = 0
if strng[ix] == ch:
return ix
ix += 1
return -1
test(find("Compsci", "p") == 3)
test(find("Compsci", "C") == 0)
test(find("Compsci", "i") == 6)
test(find("Compsci", "x") == -1)
If the character doesn't appear in the string, then the program exits the loop
normally
and returns -1.
This pattern of computation is sometimes called a eureka traversal or
short-circuit
evaluation, because as soon as we find what we are looking for,
we can cry "Eureka!",
take the short-circuit, and stop looking.
def count_a(text):
count = 0
for c in text:
if c == "a":
count += 1
return(count)
test(count_a("banana") == 3)
To understand why optional parameters are useful, consider the following example.
Suppose we wish to find the occurrences of a character in a
string, only considering the
locations in the string after a certain starting
location. We can modify the find function,
adding a third parameter for the
starting position in the search string:
def find2(strng, ch, start):
ix = start
if strng[ix] == ch:
return ix
ix += 1
return -1
test(find2("banana", "a", 2) == 3)
The call find2("banana", "a", 2) now returns 3, the index of the first
occurrence of "a" in
"banana" starting the search at index 2. What does
find2("banana", "n", 3) return? If you
said 4, there is a good chance you
understand how find2 works.
ix = start
if strng[ix] == ch:
return ix
ix += 1
return -1
ix = start
if end is None:
end = len(strng)
if strng[ix] == ch:
return ix
ix += 1
return -1
The optional value for end is interesting: we give it a default value None if the
caller does
not supply any argument. In the body of the function we test what end is,
and if the caller
did not supply any argument, we reassign end to be the length of the string.
If the caller
has supplied an argument for end, however, the caller's value will be used in the loop.
The semantics of start and end in this function are precisely the same as they are in
the
range function.
test(find(ss, "s") == 7)
test(find(ss, "s", 7) == 7)
True
False
True
False
True
True
True
True
True
def remove_vowels(s):
vowels = "aeiouAEIOU"
s_sans_vowels = ""
for x in s:
if x not in vowels:
s_sans_vowels += x
return s_sans_vowels
test(remove_vowels("compsci") == "cmpsc")
test(remove_vowels("aAbEefIijOopUus") == "bfjps")
test(ss.find("s") == 7)
test(ss.find("s", 7) == 7)
test(ss.find("s", 8) == 13)
test(ss.find(".") == len(ss)-1)
The built-in find method is more general than our version. It can find
substrings, not just
single characters:
>>> "banana".find("nan")
2
>>> "banana".find("na", 3)
Usually we'd prefer to use the methods that Python provides rather than reinvent
our
own equivalents. But many of the built-in functions and methods make good
teaching
exercises, and the underlying techniques you learn are your building blocks
to becoming
a proficient programmer.
>>> wds
punctuation = "!\"#$%&'()*+,-./:;<=>?@[\\]^_`{|}~"
def remove_punctuation(s):
s_sans_punct = ""
for letter in s:
s_sans_punct += letter
return s_sans_punct
import string
def remove_punctuation(s):
s_without_punct = ""
for letter in s:
s_without_punct += letter
return s_without_punct
Composing together this function and the split method from the previous section
makes
a useful combination --- we'll clean out the punctuation, and
split will clean out the
newlines and tabs while turning the string into
a list of words:
my_story = """
Pythons are constrictors, which means that they will 'squeeze' the life
out of their prey. They coil themselves around their prey and with
each breath the creature takes the snake will squeeze a little tighter
until they stop breathing completely. Once the heart stops the prey
stomach except for fur or feathers. What do you think happens to the fur,
feathers, beaks, and eggshells? The 'extra stuff' gets passed out as ---
wds = remove_punctuation(my_story).split()
print(wds)
The output:
As indicated earlier, there are other useful string methods, but this book isn't intended to
be a reference manual. You can find all necessary information in the Python Library
Reference
online.
print(s1)
name = "Alice"
age = 10
print(s2)
n1 = 4
n2 = 5
print(s3)
The template string contains place holders, ... {0} ... {1} ... {2} ... etc.
The format method
substitutes its arguments into the place holders.
The numbers in the place holders are
indexes that determine which argument
gets substituted --- make sure you understand
line 6 above!
But there's more! Each of the replacement fields can also contain a format
specification ---
it is always introduced by the : symbol (Line 11 above uses one.)
This
modifies how the substitutions are made into the template, and can control things like:
whether the field is aligned to the left <, center ^, or right >
the width allocated to the field within the result string (a number like 10)
the type of conversion (we'll initially only force conversion to float, f, as we did in
line 11 of the code above, or perhaps we'll ask integer numbers to be converted
to hexadecimal using x)
if the type conversion is a float, you can also specify how many decimal places
are wanted
(typically, .2f is useful for working with currencies to two decimal
places.)
Let's do a few simple and common examples that should be enough for most needs. If
you need to
do anything more esoteric, use help and read all the powerful, gory details.
n1 = "Paris"
n2 = "Whitney"
n3 = "Hilton"
print("|||{0:<15}|||{1:^15}|||{2:>15}|||Born in {3}|||"
.format(n1,n2,n3,1981))
.format(123456))
"""
print("i\ti**2\ti**3\ti**5\ti**10\ti**20")
This program prints out a table of various powers of the numbers from 1 to 10.
(This
assumes that the tab width is 8. You might see
something even worse than this if you
tab width is set to 4.)
In its current form it relies on the tab character ( \t) to align the
columns of values, but this breaks down when the values in the table get larger
than the
tab width:
i i**2 i**3 i**5 i**10 i**20
1 1 1 1 1 1
2 4 8 32 1024 1048576
One possible solution would be to change the tab width, but the first column
already has
more space than it needs. The best solution would be to set the
width of each column
independently. As you may have guessed by now, string
formatting provides a much
nicer solution. We can also right-justify each field:
layout = "{0:>4}{1:>6}{2:>6}{3:>8}{4:>13}{5:>24}"
Running this version produces the following (much more satisfying) output:
i i**2 i**3 i**5 i**10 i**20
1 1 1 1 1 1
2 4 8 32 1024 1048576
Summary
This chapter introduced a lot of new ideas. The following summary
may prove helpful in
remembering what you learned.
indexing ([])
Access a single character in a string using its position
(starting from
0). Example: "This"[2] evaluates to "i".
length function (len)
Returns the number of characters in a string. Example:
len("happy") evaluates to 5.
for loop traversal (for)
Traversing a string means accessing each character in the
string, one
at a time. For example, the following for loop:
for ch in "Example":
...
Glossary
compound data type
A data type in which the values are made up of components,
or elements,
that are themselves values.
default value
The value given to an optional parameter if no argument for it
is
provided in the function call.
docstring
A string constant on the first line of a function or module
definition
(and as we will see later, in class and method
definitions as well).
Docstrings provide a convenient way to
associate documentation with
code. Docstrings are also used
by programming tools to provide interactive help.
dot notation
Use of the dot operator, ., to access methods and attributes
of an object.
immutable data value
A data value which cannot be modified. Assignments to
elements or
slices (sub-parts) of immutable values cause a
runtime error.
index
A variable or value used to select a member of an ordered
collection, such as
a character from a string, or an element
from a list.
mutable data value
A data value which can be modified. The types of all mutable
values
are compound types. Lists and dictionaries are
mutable; strings
and tuples are not.
optional parameter
A parameter written in a function header with an assignment
to a
default value which it will receive if no corresponding
argument is
given for it in the function call.
short-circuit evaluation
A style of programming that shortcuts extra work as soon as
the
outcome is know with certainty. In this chapter our find
function returned as soon as it found what it was looking for;
it
didn't traverse all the rest of the items in the string.
slice
A part of a string (substring) specified by a range of indices.
More
generally, a subsequence of any sequence type in
Python can be created
using the slice operator
(sequence[start:stop]).
traverse
To iterate through the elements of a collection, performing a
similar
operation on each.
whitespace
Any of the characters that move the cursor without printing
visible
characters. The constant string.whitespace contains all
the
white-space characters.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Lists
Source: this section is heavily based on Chapter 11 of [ThinkCS].
List values
There are several ways to create a new list; the simplest is to enclose the
elements in
square brackets ([ and ]):
The first example is a list of four integers. The second is a list of three
strings. The
elements of a list don't have to be the same type. The following
list contains a string, a
float, an integer, and
(amazingly) another list:
We have already seen that we can assign list values to variables or pass lists as
parameters to functions:
>>> an_empty_list = []
Accessing elements
The syntax for accessing the elements of a list is the same as the syntax for
accessing
the characters of a string --- the index operator: [] (not to
be confused with an empty
list). The expression inside the brackets specifies
the index. Remember that the indices
start at 0:
>>> numbers[0]
17
5
>>> numbers[1.0]
If you try to access or assign to an element that does not exist, you get a runtime
error:
>>> numbers[2]
print(horsemen[i])
Each time through the loop, the variable i is used as an index into the
list, printing the
i'th element. This pattern of computation is called a
list traversal.
The above sample doesn't need or use the index i for anything besides getting
the items
from the list, so this more direct version --- where the for loop gets
the items --- might be
preferred:
horsemen = ["war", "famine", "pestilence", "death"]
for h in horsemen:
print(h)
List length
The function len returns the length of a list, which is equal to the number
of its elements.
If you are going to use an integer index to access the list,
it is a good idea to use this
value as the upper bound of a
loop instead of a constant. That way, if the size of the list
changes, you
won't have to go through the program changing all the loops; they will
work
correctly for any size list:
for i in range(len(horsemen)):
print(horsemen[i])
Although a list can contain another list, the nested list still counts as a
single element in
its parent list. The length of this list is 4:
4
List membership (Optional topic)
in and not in are Boolean operators that test membership in a sequence. We
used them
previously with strings, but they also work with lists and
other sequences:
True
False
True
Using this produces a more elegant version of the nested loop program we previously
used
to count the number of students doing Computer Science
in the section
nested_data:
students = [
counter = 0
if "CompSci" in subjects:
counter += 1
List operations
The + operator concatenates lists:
>>> a = [1, 2, 3]
>>> b = [4, 5, 6]
>>> c = a + b
>>> c
[1, 2, 3, 4, 5, 6]
>>> [0] * 4
[0, 0, 0, 0]
>>> [1, 2, 3] * 3
[1, 2, 3, 1, 2, 3, 1, 2, 3]
The first example repeats [0] four times. The second example repeats the
list [1, 2, 3]
three times.
List slices
The slice operations we saw previously with strings let us work with sublists:
>>> a_list = ["a", "b", "c", "d", "e", "f"]
>>> a_list[1:3]
['b', 'c']
>>> a_list[:4]
>>> a_list[3:]
>>> a_list[:]
>>> fruit
>>> my_list
>>> a_list
We can also remove elements from a list by assigning an empty list to them:
>>> a_list[1:3] = []
>>> a_list
And we can add elements to a list by squeezing them into an empty slice at the
desired
location:
>>> a_list
>>> a_list
List deletion
Using slices to delete list elements can be error-prone.
Python provides an alternative
that is more readable.
The del statement removes an element from a list:
>>> a
['one', 'three']
>>> a_list
['a', 'f']
As usual, the sublist selected by slice contains all the elements up to, but not including,
the second
index.
a = "banana"
b = "banana"
we know that a and b will refer to a string object with the letters
"banana". But we don't
know yet whether they point to the same string object.
There are two possible ways the Python interpreter could arrange its memory:
In one case, a and b refer to two different objects that have the same
value. In the
second case, they refer to the same object.
We can test whether two names refer to the same object using the is
operator:
>>> a is b
True
This tells us that both a and b refer to the same object, and that it
is the second of the
two state snapshots that accurately describes the relationship.
Since strings are immutable, Python optimizes resources by making two names
that
refer to the same string value refer to the same object.
>>> a = [1, 2, 3]
>>> b = [1, 2, 3]
>>> a == b
True
>>> a is b
False
a and b will then have the same value but do not refer to the same object.
Aliasing
Since variables refer to objects, if we assign one variable to another, both
variables refer
to the same object:
>>> a = [1, 2, 3]
>>> b = a
>>> a is b
True
Because the same list has two different names, a and b, we say that it
is aliased.
Changes made with one alias affect the other:
>>> b[0] = 5
>>> a
[5, 2, 3]
Cloning lists
If we want to modify a list and also keep a copy of the original, we need to be
able to
make a copy of the list itself, not just the reference. This process is
sometimes called
cloning, to avoid the ambiguity of the word copy.
>>> a = [1, 2, 3]
>>> b = a[:]
>>> b
[1, 2, 3]
Taking any slice of a creates a new list. In this case the slice happens to
consist of the
whole list. So now the relationship is like this:
Now we are free to make changes to b without worrying that we'll inadvertently be
changing a:
>>> b[0] = 5
>>> a
[1, 2, 3]
BODY
print(friend)
It almost reads like English: For (every) friend in (the list of) friends,
print (the name of
the) friend.
if number % 3 == 0:
print(number)
The first example prints all the multiples of 3 between 0 and 19. The second
example
expresses enthusiasm for various fruits.
for i in range(len(xs)):
xs[i] = xs[i]**2
List parameters
Passing a list as an argument actually passes a reference to the list, not a
copy or clone
of the list. So parameter passing creates an alias for you: the caller
has one variable
referencing the list, and the called function has an alias, but there
is only one underlying
list object.
For example, the function below takes a list as an
argument and multiplies
each element in the list by 2:
def double_stuff(a_list):
""" Overwrite each element in a_list with double its value. """
for i in range(len(a_list)):
a_list[i] = 2 * a_list[i]
things = [2, 5, 9]
double_stuff(things)
print(things)
Since the list object is shared by two frames, we drew it between them.
If a function modifies the items of a list parameter, the caller sees the change.
Use the Python visualizer!
We've already mentioned the Python visualizer
at http://www.pythontutor.com/visualize.html.
It
is a very useful tool for building a good
understanding of references, aliases,
assignments,
and passing arguments to
functions. Pay special attention to cases where
you clone
a list or have two separate lists, and
cases where there is only one underlying list,
but more than one variable is aliased to
reference the list.
List methods
The dot operator can also be used to access built-in methods of list objects. We'll
start
with the most useful method for adding something onto the end of an existing list:
>>> mylist = []
>>> mylist.append(5)
>>> mylist.append(27)
>>> mylist.append(3)
>>> mylist.append(12)
>>> mylist
append is a list method which adds the argument passed to it to the end of
the list. We'll
use it heavily when we're creating new lists.
Continuing with this example, we show
several other list methods:
>>> mylist
2
>>> mylist.extend([5, 9, 5, 11]) # Put whole list onto end of mylist
>>> mylist
6
>>> mylist.reverse()
>>> mylist
>>> mylist.sort()
>>> mylist
>>> mylist
Experiment and play with the list methods shown here, and read their documentation
until
you feel confident that you understand how they work. You are not required to learn
these
functions by heart and are always free to check Python's documentation to
understand
what these functions are doing.
Python also provides built-in functions that can be applied to lists. One example is
the
sorted function.
[3, 5, 1, 27]
>>> mylist
The difference between sorted and sort is that sorted returns a sorted version
of the list,
and keeps the original list unmodified.
A pure function does not produce side effects. It communicates with the
calling
program only through parameters, which it does not modify, and a return
value. Here is
double_stuff written as a pure function:
def double_stuff(a_list):
"""
new_list = []
new_elem = 2 * value
new_list.append(new_elem)
return new_list
>>> xs = double_stuff(things)
>>> things
[2, 5, 9]
>>> xs
An early rule we saw for assignment said "first evaluate the right hand side, then
assign
the resulting value to the variable". So it is quite safe to assign the function
result to the
same variable that was passed to the function:
>>> things
Anything that can be done with modifiers can also be done with pure
functions.
In fact, some programming languages only allow pure
functions. There is some
evidence that programs that use pure functions
are faster to develop and less
error-prone than programs that use
modifiers. Nevertheless, modifiers are
convenient at times, and in some
cases, functional programs are less efficient.
loop
append it to result
Let us show another use of this pattern. Assume you already have a function
is_prime(x)
that can test if x is prime. Write a function
to return a list of all prime numbers less than
n:
def primes_lessthan(n):
result = []
if is_prime(i):
result.append(i)
return result
>>> wds
>>> song.split("ai")
>>> s = glue.join(wds)
>>> s
'The;rain;in;Spain...'
The list that you glue together (wds in this example) is not modified. Also, as these
next
examples show, you can use empty glue or multi-character strings as glue:
>>> "".join(wds)
'TheraininSpain...'
>>> xs
["C", "r", "u", "n", "c", "h", "y", " ", "F", "r", "o", "g"]
>>> "".join(xs)
'Crunchy Frog'
def f(n):
""" Find the first positive integer between 101 and less
"""
if (i % 21 == 0):
return i
test(f(110) == 105)
test(f(1000000000) == 105)
You'll sometimes find the lazy range wrapped in a call to list. This forces
Python to turn
the lazy promise into an actual list:
range(0, 10)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
def double_stuff(a_list):
"""
new_list = []
new_elem = 2 * value
new_list.append(new_elem)
return new_list
While this function is correct, the developers of Python felt that this notation
is longer
than desirable: in Python one often wishes to create lists based on
other lists. For this
reason, a shorter notation was introduced in the language.
Any program written using
this shorter notation, can also be written without it; however,
programs much shorter
when you use it:
def double_stuff(a_list):
"""
{2* value | value ∈ a_list}
{2* value | value ∈ a_list, value ≥ 3}
"""
{xy | x ∈ {0, 1, 2}, y ∈ {0, 2, 4}, xy ≥ 3}
numbers = [1, 3, 5]
Here, we apply the str function on every number in the numbers list,
and put the result of
this function call in the new list.
Glossary
aliases
Multiple variables that contain references to the same object.
clone
To create a new object that has the same value as an
existing object.
Copying a reference to an object creates an
alias but doesn't clone the
object.
delimiter
A character or string used to indicate where a string should
be split.
element
One of the values in a list (or other sequence). The bracket
operator
selects elements of a list. Also called item.
immutable data value
A data value which cannot be modified. Assignments to
elements or
slices (sub-parts) of immutable values cause a
runtime error.
index
An integer value that indicates the position of an item in a list.
Indexes start from 0.
item
See element.
list
A collection of values, each in a fixed position within the list.
Like other types str, int, float, etc. there is also a
list type-
converter function that tries to turn whatever argument
you
give it into a list.
list traversal
The sequential accessing of each element in a list.
modifier
A function which changes its arguments inside the function
body. Only
mutable types can be changed by modifiers.
mutable data value
A data value which can be modified. The types of all mutable
values
are compound types. Lists and dictionaries are
mutable; strings
and tuples are not.
nested list
A list that is an element of another list.
object
A thing to which a variable can refer.
pattern
A sequence of statements, or a style of coding something
that has
general applicability in a number of different
situations. Part of
becoming a mature Computer Scientist is
to learn and establish the
patterns and algorithms that form
your toolkit. Patterns often
correspond to your "mental
chunking".
promise
An object that promises to do some work or deliver some
values if
they're eventually needed, but it lazily puts off doing
the work immediately.
Calling range produces a promise.
pure function
A function which has no side effects. Pure functions only
make changes
to the calling program through their return
values.
sequence
Any of the data types that consist of an ordered collection of
elements, with
each element identified by an index.
side effect
A change in the state of a program made by calling a
function. Side
effects can only be produced by modifiers.
step size
The interval between successive elements of a linear
sequence. The
third (and optional argument) to the range
function is called the
step size. If not specified, it defaults to
1.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Nested Datastructures
Source: this section combines elements of Chapters 7 and 11 of [ThinkCS].
Nested lists
A nested list is a list that appears as an element in another list. In this
list, the element
with index 3 is a nested list:
>>> print(nested[3])
[10, 20]
To extract an element from the nested list, we can proceed in two steps:
>>> elem[0]
10
20
Bracket operators evaluate from left to right, so this expression gets the
3'th element of
nested and extracts the 1'th element from it.
Matrices
Nested lists are often used to represent matrices. For example, consider this matrix:
>>> mx[1]
[4, 5, 6]
Or we can extract a single element from the matrix using the double-index form:
>>> mx[1][2]
The first index selects the row, and the second index selects the column.
Although this
way of representing matrices is common, it is not the only
possibility. A small variation is
to use a list of columns instead of a list of
rows. Later we will see a more radical
alternative using a dictionary.
def matrix(n):
m = []
for j in range(n):
l = []
for i in range(n):
l.append ( 0 )
m.append ( l )
return m
In this code, m is the matrix we are going to return. Within this list,
we are creating n
nested lists. Each of these nested lists is a list
of n 0s.
def matrix_incorrect(n):
l = []
for i in range(n):
l.append ( 0 )
m = []
for j in range(n):
m.append ( l )
return m
At first sight, it may seem that this code is correct. If you execute
print(matrix_incorrect(3)) you will indeed see this output:
However, the data structure created by this function has undesirable properties.
If we
execute the following code:
m = matrix_incorrect(3)
m[0][0] = 1
print(m)
This is most likely not the output that you were looking for; indeed, any modification
of
one row, will also lead to a modification of the other rows.
The explanation for this can be found in how Python deals with objects and references.
The append method takes as argument a reference to an object, and will add a reference
to this object to the list. In our code, we have first created a list for which l is
a reference.
Subsequently, we add this reference three times to the list m. As a result,
all three
elements in the list m point towards the same underlying list. As
m[0], m[1], m[2] all point
towards the same list, any change to m[0] will
also be visible when printing m[1] and
m[2].
Our original code did not have this problem, as we created a new nested list n times.
def matrix_incorrect(n):
l = []
for i in range(n):
l.append ( 0 )
m = []
for j in range(n):
m.append ( list(l) )
return m
The main difference is here that we use the list(l) construction. This construction
will first
create a copy of l: it will create a new list object that contains the same
elements as l;
the reference to this new object is added t the list m.
As we create n copies of the
original list l, all these nested lists are now independent
of each other.
Glossary
nested list
A list that is an element of another list.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Tuples
Source: this section is heavily based on Chapter 9 of [ThinkCS].
Tuples are useful for representing what other languages often call records ---
some
related information that belongs together, like your student record. There is
no
description of what each of these fields means, but we can guess. A tuple
lets us
"chunk" together related information and use it as a single thing.
Tuples are very similar to lists and support the same sequence operations as lists. The
index operator
selects an element from a tuple:
>>> julia[2]
1967
However, tuples are not the same as lists. Next to the different syntax (parentheses
instead of square brackets),
an important difference is that tuples are unmutable.
If we
try to use item assignment to modify one of the elements of the
tuple, we get an error:
>>> julia[0] = "X"
>>> julia
("Julia", "Roberts", 1967, "Eat Pray Love", 2010, "Actress", "Atlanta, Georgi
a")
To create a tuple with a single element (but you're probably not likely
to do that too
often), we have to include the final comma, because without
the final comma, Python
treats the (5) below as an integer in parentheses:
>>> type(tup)
<class 'tuple'>
>>> x = (5)
>>> type(x)
<class 'int'>
Given the similarity between Python's tuples and lists, a good question is why tuples
exist at
all. Why doesn't Python only have lists?
There are two good reasons for the
existence of tuples.
One is the performance of the code. Tuples are more efficient than lists: they consume
less memory and the run time required to create them is smaller.
Why is this? We skimmed over this till now, but allocating memory in a computer
is not a
trivial operation; essentially, each time a program requires more memory, the operating
system will have to search for a piece of memory that is still unused. This also applies to
lists.
If the operating system would have to look for a new piece memory each time an
element is added to a list,
adding elements to a list would be a rather slow operation. To
avoid this, Python
tries to be intelligent: it will anticipate the addition of elements in a list
by reserving more
memory than necessary at the moment of creation. The benefit is that
adding elements
is now faster. The side effect is however that lists will consume
more
memory than necessary. Python's tuples avoid this. As a tuple will never change,
we
know its memory consumption will never change. Hence, Python does not need to
anticipate future additions to the tuple.
The second reason for having tuples relates to the readability of code written using
tuples.
Consider this piece of code:
do_something ( julia )
print ( julia[0] )
For this piece of code, we can be sure that whatever the functionality of the function
do_something is,
at the end the string "Julia" will be printed. This makes it easy to
understand what the third line of
code is doing.
do_something ( julia )
print ( julia[0] )
In this code, we can no longer be sure about what julia[0] will print. Consider this
implementation of
the do_something function:
def do_something ( l ):
l[0] = "Hugh"
This function will change the list julia, and as a result the code will print "Hugh". Hence,
to understand what the line print ( julia[0] ) does, we will need to check the
documentation or source
code of the function do_something. For tuples, this is not
necessary: by using tuples, the programmer
can communicate to another reader of the
code that this data is not supposed to be changed.
Indeed, any function that you will
apply on this tuple, and that would try to change it, will
yield an error message, hence
making it easier to debug the code as well.
Tuple assignments
Python has a very powerful tuple assignment feature that allows a tuple of variables
on
the left of an assignment to be assigned values from a tuple
on the right of the
assignment. (We already saw this used for pairs, but it generalizes.)
This does the equivalent of seven assignment statements, all on one easy line.
One
requirement is that the number of variables on the left must match the number
of
elements in the tuple.
>>> name
'Bob'
>>> age
19
>>> studies
'CS'
a = b
b = temp
a, b = b, a
The left side is a tuple of variables; the right side is a tuple of values.
Each value is
assigned to its respective variable. All the expressions on the
right side are evaluated
before any of the assignments. This feature makes
tuple assignment quite versatile.
Naturally, the number of variables on the left and the number of values on the
right have
to be the same:
def f(r):
""" Return (circumference, area) of a circle of radius r """
c = 2 * math.pi * r
a = math.pi * r * r
return (c, a)
students = [
Tuples items can themselves be other tuples. For example, we could improve
the
information about our movie stars to hold the full date of birth rather
than just the year,
and we could have a list of some of her movies and dates that they
were made, and so
on:
julia_more_info = ( ("Julia", "Roberts"), (8, "October", 1967),
[ ("Duplicity", 2009),
Notice in this case that the tuple has just five elements --- but each of those in turn
can
be another tuple, a list, a string, or any other kind of Python value.
This property is
known as being heterogeneous, meaning that it can
be composed of elements of
different types.
xs = [1, 2, 3, 4, 5]
for i in range(len(xs)):
xs[i] = xs[i]**2
While correct, this type of list traversal is so common, that Python provides a nicer way
to implement it:
xs = [1, 2, 3, 4, 5]
xs[i] = val**2
This code exploits lists-of-tuples: enumerate generates pairs of both (index, value) during
the list traversal. Try this next example to see more clearly how enumerate
works:
0 banana
1 apple
2 pear
3 lemon
Another common type of program one may wish to write is the following:
xs = [1, 2, 3, 4, 5]
ys = [3, 4, 5, 6, 7]
for i in range(len(xs)):
print (xs[i],ys[i])
ys = [3, 4, 5, 6, 7]
print (val,ys[i])
However, most programmers would not consider this to be a very clean solution.
Python
provides the zip function to write this code more elegantly:
xs = [1, 2, 3, 4, 5]
ys = [3, 4, 5, 6, 7]
for x, y in zip(xs,ys):
print (x,y)
Like a zipper, the zip function combines elements of two given lists pairwise,
and
provides a list of the tuples that represent pairs from the two given list.
In combination with the enumerate function, one can now write code like the following:
xs = [1, 2, 3, 4, 5]
ys = [3, 4, 5, 6, 7]
xs[i] = x**2
ys[i] = y**2
Observe that in this code, the zip function generates pairs of elements
from the xs and ys
lists. The enumerate function subsequently
adds the indexes of the pairs in this list.
Glossary
data structure
An organization of data for the purpose of making it easier to
use.
immutable data value
A data value which cannot be modified. Assignments to
elements or
slices (sub-parts) of immutable values cause a
runtime error.
mutable data value
A data value which can be modified. The types of all mutable
values
are compound types. Lists and dictionaries are
mutable; strings
and tuples are not.
tuple
An immutable data value that contains related elements.
Tuples are used
to group together related data, such as a
person's name, their age,
and their gender.
tuple assignment
An assignment to all of the elements in a tuple using a single
assignment statement. Tuple assignment occurs
simultaneously rather than
in sequence, making it useful for
swapping values.
heterogeneous list
A list that contains elements of different types.
generators
Functions that will generate lists
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
The main purpose of coding is to create an executable program. Code should however
not only
execute; it should also satisfy a number of additional requirements:
You wish to add a feature to a program that you wrote one year ago.
You write a program that a collaborator needs to modify later on.
You are contacted about a bug in a program that you wrote a year ago.
You wish to add a feature to a library written by somebody else.
You are asked to maintain source code that was written by somebody else.
In all these cases, it is not only important that code executes; it is also important that
your code is of sufficiently good quality to support these other requirements. Writing
code of good quality is not easy. Some programmers that understand Python very well,
will still write code that is very hard to read. Consider this fragment of bad code:
f=type((lambda:(lambda:None for n in range(len(((((),(((),())))))))))().next
())
n=f(u(int(wwpd[4][1]),int(wwpd[7][1]),int(wwpd[6][1]),int(wwpd[9][1]),w
wpd[2][1],
(None,wwpd[10][1],wwpd[13][1],wwpd[11][1],wwpd[15][1]),(wwpd[20]
[1],wwpd[21][1]),
(wwpd[16][1],wwpd[17][1],wwpd[18][1],wwpd[11][1],wwpd[19][1]),ww
pd[22][1],wwpd[25][1],int(wwpd[4][1]),wwpd[0][1]),
{wwpd[27][1]:__builtins__,wwpd[28][1]:wwpd[29][1]})
FIGHT = f(u(int(wwpd[4][1]),int(wwpd[4][1]),int(wwpd[5][1]),int(wwpd[9]
[1]),wwpd[3][1],
{wwpd[14][1]:c,wwpd[24][1]:urlopen,wwpd[27][1]:__builtins__,wwpd
[28][1]:wwpd[29][1]})
FIGHT(msg)
Maintaining, modifying or building on this code is very difficult, even for the most
experienced programmer. Hence, it is important that while you learn how to program,
you also pay attention to how to write good code. We don't want you to create code such
as the one in this example!
In this chapter, we will introduce you to the practice of writing specifications for the
functions used in a program,
followed by tests that check whether the source code
meets the specifications.
Specifications in Python
Consider the following code:
def f ( n ):
for i in range(1,n):
if n % i == 0:
return False
return True
We could try to read this code in order to understand it. However, this would require
quite some effort from every programmer using this function. Better is to give the
function an interpretable name and add documentation; as we have seen earlier in this
syllabus, we can do so by adding a block of comments after the function definition:
def prime ( n ):
for i in range(1,n):
if n % i == 0:
return False
return True
This code is already easier to understand. However, its documentation is still not very
precise. For instance, the user of the code could wonder whether it is possible to
execute this code for n=0, =-1, or n=0.5. In many projects it is desirable to make the
specification of each parameter as precise as possible. One way of doing this is as
follows:
def prime ( n ):
"""
for i in range(1,n):
if n % i == 0:
return False
return True
In this case we have used the docstring to make the specification more precise.
In most parts of this course, we will use preconditions and postconditions to specify the
functionality of a function in more detail. Specifications of preconditions and
postconditions are precise, but not too long either.
In practice, larger software projects impose even more structure on the content of
docstrings. One approach that is often used is that of Google Style docstrings, which
were originally used by Google in its Python projects, but which are now also used in
many other projects. The following code illustrates a Google Style docstring:
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
for i in range(2,n):
if n % i == 0:
return False
return True
Args:
Returns:
"""
function code
""" Zips two lists, such that elements of l1 and l2 are interleaved.
This function zips two given lists l1 and l2 into a new list
[l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]],
Args:
Returns:
"""
new_list = []
new_list.append ( l1[i] )
new_list.append ( l2[i] )
return new_list
Note that the docstring in this example is very verbose. In practice, one will not
encounter such long docstrings for many short functions. Still, many programmers
consider such long documentation the best approach. In this course we will from time to
time ask you explicitly to provide such long Google Style docstrings; where this is not
indicated explicitly, you may write shorter docstrings, such as using pre- and
postconditions. However, you are always expected to provide an informative docstring.
Tests in Python
Now that we know how to write a proper specification for a function in Python, the next
question is how we ensure that code satisfies the requirements specified in the
docstring. A very common approach (that we will also use during the exams) is that of
running unit tests.
An unit test is a piece of code that tests whether a function operates
as intended. This is an example of one unit test, for the prime function provided above:
if prime ( 10 ) == True:
This code runs the prime function and evaluates its return value: if the function returns
True it has made an error, and we display an error message.
Note that there are many implementations of the prime function that will pass this test,
while they are incorrect. For instance, this code will not give an error for the test case:
def prime ( n ):
return False
Clearly, one test does not suffice to prove that an implementation of prime is correct.
if prime ( 8 ) == True:
if prime ( 5 ) == False:
if prime ( 3 ) == False:
To answer this question, you will need to know more about theoretical computer science;
you will need to study questions of computability. This is the subject of another course
and will not be discussed in detail here. However, important to know is that in general, it
is impossible to determine a finite set of tests to determine the correctness of a function
that accepts a large number of different inputs. Hence, while tests can provide strong
evidence that a function is implemented correctly, they are never sufficient evidence. In
practice if often happens that you start with a small set of tests, while you discover later
on that the code is still incorrect. In this case, the proper approach is to test cases.
Writing tests is so common in Python, that Python provides special notation to simplify
their specification. A common approach is the one offered by the unittest module;
however, you will need to know more about Python before you will be able to use this
module. For most of this course, we will therefore rely on a more simple approach that is
provided by the assert statement. Using this statement, the earlier set of tests can be
written more compactly:
assert prime ( 10 ) == False, "10 should not be prime"
Or even shorter:
If we execute this code for an incorrect impementation of prime, Python will give a
message such as this one:
The execution of the code will stop immediately at the test case that fails. Hence, for any
given assert statement, Python will test whether the condition provided is satisfied; if not,
it will print the message provided and will stop the execution immediately.
For our zipper function we can now write test cases in a similar fashion:
assert zipper ( [1,2], [3,4] ) == [1,3,2,4], "[1,2], [3,4] not zipped correctly"
Note that these cases only test lists of lengths smaller than 3; it could be good to add
some larger test cases as well to reduce the chances that an incorrect implementation
still passes all the tests.
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
return True
If we execute this code, we will get one error message before the code terminates. For a
correct implementation of the prime function, the code will execute without printing a
message.
Testing becomes slightly more complex if we wish to separate a program and its tests in
separate files. Suppose we have this program, stored in a file prime.py:
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
return True
print ( i, prime ( i ) )
How can we test the prime function in a separate program? We could consider writing a
second program, as follows:
from prime import *
However, this code has undesirable behavior: when executing the import statement, it
will also execute the print statements in the prime.py program. The reason for this is that
when executing the import statement, Python will execute all code in the prime.py file,
including the print statements. How can we avoid that the print statements are
executed? The standard solution in Python for this is to modify the prime.py file as
follows:
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
return True
if __name__ == "__main__":
print ( i, prime ( i ) )
In this code, the code below __name__ == "__main__" is only executed when the code
is not imported from another file. This allows a second program to test the functions in
the program without executing the code printing statements in the file.
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
for i in range(2,n):
if n % i == 0:
return False
return True
Note that in the specification we indicated that this function should only be executed on
parameters that represent positive integers.
While a perfect programmer would hence
not use this function in any other context, no programmer is perfect. The incorrect use of
a function can sometimes cause bugs that are very hard to track down. To help a
programmer, it can be good to check that the function is used correctly. We can use
assert here as well:
def prime ( n ):
Args:
n: a positive integer
Returns:
"""
for i in range(2,n):
if n % i == 0:
return False
return True
With this added line, every time a programmer uses the function with an argument of an
incorrect type, or with a smaller than 1, the code will stop and give an error.
In this course, we will not require you to add asserts to all your functions; however, from
time to time we will explicitly ask you to do so.
""" Zips two lists, such that elements of l1 and l2 are interleaved.
This function zips two given lists l1 and l2 into a new list
[l1[0],l2[0],l1[1],l2[1],...,l1[n],l2[n]],
Args:
Returns:
"""
assert type ( l1 ) == list and type ( l2 ) == list and len ( l1 ) == len ( l2 ), "l
1 and l2 must be two lists of equal length"
new_list = []
new_list.append ( l1[i] )
new_list.append ( l2[i] )
return new_list
Note that the asserts that are added in the code should correspond closely to the
information provided in the docstring.
Other Considerations
At this point you may have the impression that adding specifications, tests and asserts is
sufficient to write code of good quality. Certainly it helps. However, it is not sufficient.
During this course and during your study you will encounter additional approaches for
writing code of good quality. We wish to mention a number of recommendations here:
Make sure that all functions and variables have names that are easy to
understand.
Use a proper layout for your code, including white spaces. The gold standard for
this is the so-called PEP 8 [Pep8] specification. It is highly recommended that
your code confirms to PEP 8 standards, so click on this link to check its contents.
Make sure that your functions are not too long, and that each function has a
clearly defined task.
Avoid copy-pasting code: if you need to copy a piece of code, consider whether it
would make sense to put that piece of code in a separate function.
References
[Pep8] PEP 8 -- Style Guide for Python Code
Search Algorithms
Source: this section is not based on external sources.
Linear search
As we have seen in the previous section, lists can be used to store collections of
data of
arbitrary length. Often one is faced with the problem of finding
information in a list.
Consider the following example. We are given a list
of computers in a large IT
department:
computer_names = ["apple", "pear", "cherry", "banana", "mango", "grape",
"peach"]
if "pear" in computer_names:
else:
Easy, right?
How can a computer find a name in a list? The simplest approach is the following:
def linear_search ( name, list_of_names ):
if name == list_name:
return True
return False
if name in list_of_names:
if linear_search(list, list_of_names):
What are the drawbacks of the linear_search function (as well as the in statement)?
Let
us execute our function on some examples to understand this.
Let us first consider the
example
if linear_search("apple", computer_names):
In this example, our algorithm is very fast. The first name that we will take from the list
is
apple. We will find out that this name is equal to the name we are looking for.
The linear
search function will immediately return the value True.
The situation is very different in this case. Our algorithm will first look at apple;
as we are
not looking for apples, it will continue with pear, then cherry, ... and so on,
till we arrive at
the end of the list. The algorithm will then return False.
Hence, in the worst case, our algorithm will need to look at all elements in the list.
If our
list has only 9 elements, this is not problematic.
However, suppose that we would use this algorithm to search among all names of users
on
Facebook (2.3 billion in total). In that case the algorithm would take very long.
If we
would be looking for a name that is not present on Facebook, we
would have to retrieve
all 2.3 billion names from the list before we can answer
True or False!
As in the worst case we would need to look at every element in the list one
after the
other, this algorithm is known as a linear search algorithm.
The number of elements
that we look at in the worst case is a linear function
of the number of elements in the list.
This raises the question whether we can do better. Fortunately, in some cases
we can.
Binary search
Assume that we would have stored our computer names in a sorted order:
Note that we can also obtain such a sorted list using the sorted function:
On these sorted lists, we can use an approach that is inspired by how we search
for
words in a dictionary: we start with the middle page of the dictionary, and based on
the
words at that page, decide where we continue looking in the dictionary.
The following
code in Python shows this approach:
first = 0
last = len(list_of_names)-1
found = False
if list_of_names[middle] == name:
found = True
else:
last = middle-1
else:
first = middle+1
return found
This code is a little bit more complex than the code we have seen till now!
Let's start with a simple case to understand how this code works.
Consider
binary_search("grape",sorted_computer_names). Before the start
of the while loop, the value
for last is 6. As 0<6, we
calculate the middle next. The outcome of this is 3. This yields
the following
situation:
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
We retrieve the 3rd value of sorted_computer_names next, which happens
to be equal to
grape. At this moment, the code stops and returns True.
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
Here, we
determine that "orange" >="grape". We set first=4, giving the following
situation:
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
Basically, the program has decided at this moment that if the word orange is
present in
the sorted list, it must be at position 4 or higher.
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
last
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
As first<=last, we continue:
first
last
middle
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
last first
middle
+-------+--------+--------+-------+-------+-------+------+
+-------+--------+--------+-------+-------+-------+------+
0 1 2 3 4 5 6
In words, the algorithm believes now that if orange is in the list, it must be after
mango, at
position 5 or higher. As at this moment last<first, the
algorithm stops and returns False.
It is instructive to consider the number of elements in the list that the algorithm
compared with orange:
grape
peach
mango
One can wonder how many elements would be considered in the worst case
for a sorted
list
of any other length. To understand how we can generalize our results
to other lists,
let us first consider a list of 15 elements:
+---+---+---+---+---+---+---+---+---+---+---+----+----+----+----+----+
| | | | | | | | | | | | | | | | |
+---+---+---+---+---+---+---+---+---+---+---+----+----+----+----+----+
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
What would our algorithm do in this case? There are three cases:
We stop, as the element we are looking for is in the middle. Clearly, this is not the
worst case.
We move last to position 7; in this case we are effectively continuing the algorithm
on a part of the list having 7 elements.
We move first to position 9; in this case we are effectively continuing the
algorithm on a part of the list having 7 elements as well.
In the last two cases, we know that the worst case is that we are looking at 3 elements.
In total, we
hence will look at 4 elements in the worst case.
We can continue this argument. On a list of 31 elements, the worst case number of
elements considered will be 5.
On a list of 63 elements, it will be 6:
7 3
15 4
31 5
63 6
127 7
255 8
... ...
4294967295 32
2k − 1
The organisation of data: we did not put the elements in the list in an arbitrary
order; the elements had to be sorted.
The algorithm operating on the data: we tuned our algorithm such that it worked
much better for the chosen organisation of the data.
These two aspects are the topics studied in courses on algorithms and data structures.
A good understanding of algorithms and data structures can in practice help a lot
to
develop programs that work efficiently. You will be able to study these
topics in more
detail in later courses.
Binary Search on Complex Data structures
The code that we saw in the previous section is useful if we want to test that a
given
element occurs in a sorted list. We can use this two write programs such as
else:
In other words, programs in which we only wish to test the presence of a given
element.
Often, however, one does not only want to check that a given computer
exists, but one
also wants to retrieve information associated with the
computer, such as its operating
system.
("pear", "Linux")]
Hence, our list now consists of tuples, where the first part of the tuple stores
the
computer name and the second part stores the operating system.
We cannot use our existing binary_search to look for the data associated with
a given
name. In our binary_search algorithm, we perform a test
like
if list_of_names[middle] == name:
first = 0
last = len(list_of_names)-1
found = False
if middle_name == name:
found = True
else:
last = middle-1
else:
first = middle+1
if found:
return middle_info
else:
return None
Observe that in this code, we assume that our list now contains tuples,
each of which we
can unpack. If the element we are looking for exists,
we return the associated value;
otherwise, we made the choice to
return the special value None. We can now write code
such as:
else:
print ( "Your machine does not exist or your operating system is not supported")
("pear", ("Linux","DE"))]
print ( "Welcome! Your operating system is " + os + " and your country " + country )
else:
Our binary search algorithm only works for data that is sorted. Assume we are given
more complex
data. How we can sort this data? Fortunately, the sorted function that we
saw earlier, also works on
tuples. In this case, it will use a lexicographical order, in which
the order between
two tuples is determined by the second element in the tuple if the first
elements are equal.
[(1, 2), (3, 4), (3, 5), (6, 2), (6, 3)]
Note that the order of information in a tuple is important when using the sort function.
If
the information is not in the correct order, one solution is to recreate the data
in the
desired order. For instance, to sort our computer_names on operating system, we
can
write
Recreating the data just to order it differently may seem a little complex. Furthermore,
our indexing (cnoc[1][0]) becomes cumbersome and hard to read.
We will see later in
the course that there is an alternative solution to this
problem.
Corner cases
In our explanation, we made our lives easy by making a number of implicit assumptions:
It is important to ask yourself what the code would do if these restrictions no longer hold.
We will not discuss these questions in detail in this reference. However, as small
exercises
consider doing the following:
Glossary
algorithm
An unambiguous specification of how to solve a class of
problems.
binary search
A search algorithm that searches by repeatedly splitting a list
in two parts
linear search
A search algorithm that considers all elements of a list in the
worst case
search algorithm
An algorithm for searching an element that fulfills a well-
defined set of requirements
Files
Source: this section is based on both [ThinkCS] and [PythonForBeginners].
About files
While a program is running, its data is stored in random access memory (RAM).
RAM is
fast and inexpensive, but it is also volatile, which means that when
the program ends,
or the computer shuts down, data in RAM disappears. To make
data available the next
time the computer is turned on and the program
is started, it has to be written to a non-
volatile storage medium,
such a hard drive, usb drive, or CD-RW.
Data on non-volatile storage media is stored in named locations on the media
called
files. By reading and writing files, programs can save information
between program
runs.
Working with files is a lot like working with a notebook. To use a notebook,
it has to be
opened. When done, it has to be closed. While the
notebook is open, it can either be
read from or written to. In either case,
the notebook holder knows where they are. They
can read the whole notebook in its
natural order or they can skip around.
All of this applies to files as well. To open a file, we specify its name and
indicate
whether we want to read or write.
file = open("testfile.txt","w")
file.write("Hello World\n")
file.close()
Opening a file creates what we call a file handle. In this example, the variable file
refers
to the new handle object. Our program calls methods on the handle, and this makes
changes to the actual file which is usually located on our disk.
On line 1, the open function takes two arguments. The first is the name of the file, and
the second is the mode. Mode "w" means that we are opening the file for
writing.
Closing the file handle (line 6) tells the system that we are done writing and makes
the
disk file available for reading by other programs (or by our own program).
file.read()
For example, the following Python code would print out the file we have just created on
the console.
print(file.read())
file.close ()
The output of this command will display all the text inside the file, the same text we told
the interpreter to add earlier:
Hello World
print(file.read(5))
file.close ()
Notice how we’re using the same file.read() method, only this time we specify the
number of characters to process.
This time the text displayed will be:
Hello
Finally, if you would want to read the file line by line – as opposed to pulling the content
of the entire file in a string at once – then you can use the readline() method.
Why would
you want to use something like this?
Let’s say you only want to see the first line of the
file – or the third. You would execute the readline() method as many times as possible to
get the data you were looking for.
Each time you run the method, it will return a string of
characters that contains the next line of information from the file. For example:
print(file.readline())
print(file.readline())
file.close ()
This command would print the first two lines of the file, like so:
Hello World
Note that an empty line is printed between these two lines. This is because, by default,
the print() command always prints a newline after every string. The string that we are
printing here, however, ends with a newline itself: this newline was read from the input
file, and was not removed by Python.
print(file.readline(),end="")
print(file.readline(),end="")
file.close ()
Now we get the same result but without empty lines in between:
Hello World
print(file.readlines())
file.close ()
The output you would get is a list containing each line as a separate element:
['Hello World\n', 'This is our new text file\n', 'and this is another line.\n', 'Why? Becau
se we can.\n']
If you would now wish to determine, for example, the third line in the file, we could use
the following code (we use the index 2 instead of 3 since the first element of a list is at
position 0):
print(file.readlines()[2])
which prints:
print(line,end='')
file.close ()
While correct, this code is not very memory efficient. It would read the entire file in a list,
and then traverse this list.
When you want to read all the lines from a file in a more
memory efficient, and fast manner, using a for-loop, Python provides a method that is
both simple and easy to read:
print(line,end='')
file.close ()
In this case, Python will avoid loading the entire file in memory.
Note how we used the
print statement with a second argument again, to avoid having undesired newlines.
The
code above will print:
Hello World
file.write("This is a test\n")
file.close()
This will amend our current file to include the two new lines of text. If you don't believe it,
open the changed file in your text editor,
or write a Python code fragment to print its
current contents.
Closing a File
When you’re done reading or writing a file, it is good practice to call the close() method.
By calling this method, you tell the operating system that your program has finished
working on the file, and that the file can now be read or written by other programs on
your computer. For instance, as long as your program is reading a file, your operating
system may decide not to allow other programs to change the file.
While in principle you could keep a file open during the execution of the program, hence,
it is a matter of good manners towards other programs to close your files when you don't
need access to them any more. For this reason, in our examples we are always closing
our files.
It’s important to understand that when you use the close() method, any further attempts
to use the file object will fail.
lines_of_text = ["One line of text here\n", "and another line here\n", "and yet another
here\n", "and so on and so forth\n"]
file.writelines(lines_of_text)
file.close()
data = file.readlines()
words = line.split()
print(words)
The output for this will be something like (depending on what your testfile currently
contains):
The reason the words are presented in this manner is because they are stored – and
returned – as a list.
g = open("thecopy.zip", "wb")
while True:
buf = f.read(1024)
if len(buf) == 0:
break
g.write(buf)
f.close()
g.close()
There are a few new things here. In lines 1 and 2 we added a "b"
to the mode to tell
Python that the files are binary rather than
text files. In line 5, we see read can take an
argument which
tells it how many bytes to attempt to read from the file. Here we
chose
to read and write up to 1024 bytes on each iteration of the loop. When
we get back an
empty buffer from our attempt to read, we know we can
break out of the loop and close
both the files.
Directories
Files on non-volatile storage media are organized by a set of rules known as a
file
system. File systems are made up of files and directories, which
are containers for
both files and other directories.
When we create a new file by opening it and writing, the new file goes in the
current
directory (wherever we were when we ran the program). Similarly, when
we open a file
for reading, Python looks for it in the current directory.
If we want to open a file somewhere else, we have to specify the path to
the file, which
is the name of the directory (or folder) where the file is
located:
>>> print(wordlist[:6])
This (Unix) example opens a file named words that resides in a directory named
dict,
which resides in share, which resides in usr, which resides
in the top-level directory of the
system, called /. It then reads in each
line into a list using readlines, and prints out the
first 5 elements from
that list.
Glossary
delimiter
A sequence of one or more characters used to specify the
boundary
between separate parts of text.
directory
A named collection of files, also called a folder. Directories
can
contain files and other directories, which are referred to
as
subdirectories of the directory that contains them.
file
A named entity, usually stored on a hard drive, floppy disk, or
CD-ROM,
that contains a stream of characters.
file system
A method for naming, accessing, and organizing files and the
data they
contain.
handle
An object in our program that is connected to an underlying
resource (e.g. a file).
The file handle lets our program
manipulate/read/write/close the actual
file that is on our disk.
mode
A distinct method of operation within a computer program.
Files in
Python can be opened in one of four modes: read
("r"), write
("w"), append ("a"), and read and write ("+").
non-volatile memory
Memory that can maintain its state without power. Hard
drives, flash
drives, and rewritable compact disks (CD-RW)
are each examples of
non-volatile memory.
path
A sequence of directory names that specifies the exact
location of a
file.
text file
A file that contains printable characters organized into lines
separated by newline characters.
socket
One end of a connection allowing one to read and write
information to or from another computer.
volatile memory
Memory which requires an electrical current to maintain
state. The
main memory or RAM of a computer is volatile.
Information stored in
RAM is lost when the computer is
turned off.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
[PythonForBeginners] https://www.pythonforbeginners.com/files/reading-and-writing-
files-in-python
Exceptions
Source: this section is based on [ThinkCS]
Catching exceptions
Whenever a runtime error occurs, it creates an exception object. The program stops
running at this point and Python prints out the traceback, which ends with a message
describing the exception that occurred.
>>> print(55/0)
>>> a = []
>>> print(a[5])
In each case, the error message on the last line has two parts: the type of
error before
the colon, and specifics about the error after the colon.
For example, we might prompt the user for the name of a file and then try to
open it. If
the file doesn't exist, we don't want the program to crash; we want
to handle the
exception. We can do this using the
try statement to "wrap" a region of code:
try:
f = open(filename, "r")
lines = f.readlines ()
f.close ()
except:
The try statement executes and monitors the statements in the first block. If no
exceptions occur, it skips the block under the except clause. If any exception occurs,
it
executes the statements in the except clause and then continues.
def get_age():
if age < 0:
raise my_error
return age
if age < 0:
raise my_error
return age
def contains_digit(s):
"""
for l in s:
if l in "0123456789":
return True
return False
def get_username():
if contains_digit(name):
raise my_error
return name
age = get_age ()
username = get_username ()
try:
except:
Hence, where a return statement in a function will always return to the place where the
function was called, a raise statement will break of multiple function calls, till it reaches a
place where the exception is handled using a try ... except block. We call this
"unwinding
the call stack".
If we would call the get_age function without try ... except block, we would get this output:
>>> get_age()
42
>>> get_age()
The error message includes the exception type and the additional information
that was
provided when the exception object was first created.
We can also print the more specific error message using this code:
try:
print ( "Your username is " + username + " your age is " + str(age))
print ( error )
In this case, the try ... except block will only catch an exception of
the type ValueError. It
will store information regarding this exception, as
created by raise statement in the error
value, which we can subsequently print.
With statement
As pointed out earlier, a common situation in which exceptions are useful, is when
working with files. We saw this code earlier:
try:
f = open(filename, "r")
lines = f.readlines ()
f.close ()
except:
In this example, the program will print the message There is no file ... when the file does
not exist.
Let us now consider this variation, in which we use the get_name function of the previous
section:
try:
f = open(filename, "r")
username = get_username ()
for line in f:
if line == username:
f.close ()
except IOError:
except ValueError:
In this code, two different exceptions can occur: one is related to a file error, the other to
the provision of an incorrect name. These two types of errors are distinguished by
having two except statements; each of these will catch the corresponding type of error.
Many tricky things are happening in this code: we have one try ... except block for
different types of errors, depending on whether or not the file exists we will ask for a
name, and so on. One thing is happening in this code that makes it particularly
undesirable. As stated earlier, it is considered good practice for a program to close every
file that it opens.
In the program above, if an incorrect username is entered, the program will raise an
exception, and jump towards printing the message Incorrect name provided without
executing the close() instruction: after all, the close() statement is only executed after we
have successfully finished the get_username() function.
To resolve this issue, the proper way to combine exception handling with file processing
is as follows:
filename = input("Enter a file name: ")
try:
username = get_username ()
for line in f:
if line == username:
except IOError:
except ValueError:
In this code there is no close() statement any more! Instead, we have used the
with
open(filename, "r") as f: construction. What does this construction do?
Essentially, it
associates the result of open(filename, "r") to f, and executes
the block of code
username = get_username ()
for line in f:
if line == username:
if the file was opened successfully. Two things can then happen:
the code executes successfully; in this case, the file will be closed automatically
when the code is finished.
the code raises an execption; in this case, the file will be closed before the
execution is passed on to the exception handler.
Hence, the with statement can be used to ensure that a file is automatically closed in all
circumstances, whether good or bad.
Many Python programmers nowadays use with every time they open a file, as by using
this statement, one does not need to think about closing a file any more: it will always
happen after the specified piece of code is finished.
Let’s take a look at another example, which prints all the data in a file, line by line:
data = file.readlines()
print(line, end='')
Notice that in the above example we didn’t use the file.close() method because the with
statement will automatically call that for us upon execution.
It really makes things a lot
easier, doesn’t it?
Glossary
exception
An error that occurs at runtime.
handle an exception
To prevent an exception from causing our program to crash,
by wrapping
the block of code in a try ... except construct.
raise
To create a deliberate exception by using the raise statement.
References
[ThinkCS]
How To Think Like a Computer Scientist --- Learning with Python 3
Dictionaries
Source: this section is heavily based on [ThinkCS].
All of the compound data types we have studied in detail so far --- strings,
lists, and
tuples --- are sequence types, which use integers as indices to access
the values they
contain within them.
Dictionaries are yet another kind of compound type. They are Python's
built-in
mapping type. They map keys, which can be any immutable type,
to values, which can
be any type (heterogeneous), just like the elements
of a list or tuple. In other languages,
they are called associative
arrays since they associate a key with a value.
One way to create a dictionary is to start with the empty dictionary and add
key:value
pairs. The empty dictionary is denoted {}:
>>> eng2sp = {}
>>> print(eng2sp)
The key:value pairs of the dictionary are separated by commas. Each pair
contains a
key and a value separated by a colon.
Hashing
The order of the pairs may not be what was expected. Python uses
complex algorithms, designed for very fast access, to determine
where
the key:value pairs are stored in a dictionary. For our
purposes we can
think of this ordering as unpredictable.
You also might wonder why we use dictionaries at all when the same
concept of mapping a key to a value could be implemented using a
list of
tuples:
The reason is dictionaries are very fast, both to update and to search,
implemented using a
technique called hashing, which allows us to access
a value very
quickly, and to remove and add values quickly.
By contrast,
the list of tuples implementation is slow, either to update,
or to search
into. If
we wanted to find a value associated with a key in an unordered
list, we would have to
iterate over every tuple. What if the key
wasn't even
in the list? We would have to get to the end of it to
find out.
If we wanted
to add a value with a key in an ordered list, we would have to move
all
elements in the list if we need to put the new value at the beginning of the
list.
It doesn't matter what order we write the pairs. The values in a dictionary are
accessed
with keys, not with indices, so there is no need to care about
ordering.
>>> print(eng2sp["two"])
'dos'
Lists, tuples, and strings have been called sequences, because their items
occur in
order. The dictionary is the first compound type that we've
seen that is not a sequence,
so we can't index or slice a dictionary.
Dictionary operations
The del statement removes a key:value pair from a dictionary. For example,
the following
dictionary contains the names of various fruits and the number of
each fruit in stock:
>>> print(inventory)
If someone buys all of the pears, we can remove the entry from the dictionary:
>>> print(inventory)
Or if we're expecting more pears soon, we might just change the value
associated with
pears:
>>> inventory["pears"] = 0
>>> print(inventory)
>>> print(inventory)
>>> len(inventory)
Dictionary methods
Dictionaries have a number of useful built-in methods.
The keys method returns what Python 3 calls a view of its underlying keys.
A view object
has some similarities to the range object we saw earlier ---
it is a lazy promise, to deliver
its elements when they're needed by the
rest of the program. We can iterate over the
view, or turn the view into a
list like this:
for k in eng2sp.keys(): # The order of the k's is not defined
ks = list(eng2sp.keys())
print(ks)
for k in eng2sp:
print("Got key", k)
The values method is similar; it returns a view object which can be turned
into a list:
>>> list(eng2sp.values())
The items method also returns a view, which promises a list of tuples --- one
tuple for
each key:value pair:
>>> list(eng2sp.items())
Tuples are often useful for getting both the key and the value at the same
time while we
are looping:
This produces:
True
False
>>> "tres" in eng2sp # Note that 'in' tests keys, not values.
False
...
KeyError: 'dog'
If we want to modify a dictionary and keep a copy of the original, use the
copy method.
For example, opposites is a dictionary that contains pairs
of opposites:
>>> opposites["right"]
'left'
>>> opposites["right"]
'left'
Sparse matrices
We previously used a list of lists to represent a matrix. That is a good choice
for a matrix
with mostly nonzero values, but consider a sparse matrix like this one:
[0, 0, 0, 0, 0],
[0, 2, 0, 0, 0],
[0, 0, 0, 0, 0],
[0, 0, 0, 3, 0]]
An alternative is to use a dictionary. For the keys, we can use tuples that
contain the row
and column numbers. Here is the dictionary representation of
the same matrix:
We only need three key:value pairs, one for each nonzero element of the matrix.
Each
key is a tuple, and each value is an integer.
Notice that the syntax for the dictionary representation is not the same as the
syntax for
the nested list representation. Instead of two integer indices, we
use one index, which is
a tuple of integers.
KeyError: (1, 3)
The first argument is the key; the second argument is the value get should
return if the
key is not in the dictionary:
>>> matrix.get((1, 3), 0)
Counting letters
In the exercises in Strings we wrote a function that counted the number of occurrences
of a
letter in a string. A more general version of this problem is to form a
frequency table
of the letters in the string, that is, how many times each letter
appears.
Such a frequency table might be useful for compressing a text file. Because different
letters appear with different frequencies, we can compress a file by using
shorter codes
for common letters and longer codes for letters that appear less
frequently.
>>> letter_counts = {}
...
>>> letter_counts
We start with an empty dictionary. For each letter in the string, we find the
current count
(possibly zero) and increment it. At the end, the dictionary
contains pairs of letters and
their frequencies.
>>> letter_items.sort()
>>> print(letter_items)
Notice in the first line we had to call the type conversion function list.
That turns the
promise we get from items into a list, a step that is
needed before we can use the list's
sort method.
Glossary
call graph
A graph consisting of nodes which represent function frames
(or invocations),
and directed edges (lines with arrows)
showing which frames gave
rise to other frames.
dictionary
A collection of key:value pairs that maps from keys to values.
The keys
can be any immutable value, and the associated
value can be of any type.
immutable data value
A data value which cannot be modified. Assignments to
elements or
slices (sub-parts) of immutable values cause a
runtime error.
key
A data item that is mapped to a value in a dictionary. Keys
are used
to look up values in a dictionary. Each key must be
unique
across the dictionary.
key:value pair
One of the pairs of items in a dictionary. Values are looked up
in a
dictionary by key.
mapping type
A mapping type is a data type comprised of a collection of
keys and
associated values. Python's only built-in mapping
type is the
dictionary. Dictionaries implement the
associative
array
abstract data type.
mutable data value
A data value which can be modified. The types of all mutable
values
are compound types. Lists and dictionaries are
mutable; strings
and tuples are not.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
5 - Collections of objects
6 - Inheritance
7 - Linked lists
Object-oriented programming
Python is an object-oriented programming language, which means that it supports
many of the features of
the [object_oriented_programming] paradigm.
Object-oriented programming (OOP) has its roots in the 1960s, but it wasn't until the
mid
1980s that it became a mainstream [programming_paradigm] used in the creation
of
new software. It was developed as a way to handle the rapidly increasing
size and
complexity of software systems, and to make it easier to modify and maintain these
large and complex systems over time.
Up to now, most of the programs we have been writing in this course used a
[procedural_programming]
style. In the procedural programming paradigm the focus is
on writing functions or procedures, which
operate on data. In object-oriented
programming, the focus is on creating
objects which group both data and the functions
or methods, which operate on that data. We have
already seen examples of objects
such as turtles and strings.
An object definition often corresponds to some object or
concept in the real
world, and the functions (methods) that operate on (the data
encapsulated in) that
object correspond to the ways those real-world objects can
interact.
Some of the typical operations that one associates with points might be
calculating the
distance of a point from the origin, or from another point,
or finding a midpoint of two
points, or asking if a point falls within a
given rectangle or circle. We'll shortly see how
we can organise these
operations together with the data.
A natural way to represent a point in Python is with two numeric values. The
question,
then, is how to group these two values into a compound object. The
quick and dirty
solution is to use a tuple, for example we could write
p = (0,0) and q = (1,1), and for some
applications
that might be a good choice. But we would still need to define dedicated
procedures to do something useful with these tuples representing points.
class Point:
""" The Point class represents and manipulates x,y coordinates. """
def __init__(self):
self.x = 0
self.y = 0
Although class definitions like the one above can appear anywhere in a program,
they
are usually put near
the beginning (after the import statements). Some programmers and
languages
prefer to put every class in a file or module of its own --- we won't do that for
now.
The syntax rules for a class
definition are the same as for other compound
statements. There is a header
which begins with the keyword, class, followed by the
name of the class,
and ending with a colon. Indentation levels tell us where the class
ends.
We can use our new Point class now to create two Point objects:
print(p.x, p.y, q.x, q.y) # Each point object has its own x and y
The variables p and q above are assigned references to two new Point objects.
A
function like Turtle() or Point() that creates a new object instance
from its corresponding
class is called a constructor. Every class automatically provides a
constructor function
which is named the same as the class.
The combined process of "construct me a new object" and "get its settings initialised
to
the factory default settings" is called instantiation.
Attributes
Object instances have both attributes (the data contained in the instance) and
methods (the operations that act on that data). Whereas the methods are the same for
all objects of a same class (we will see in a next section how to define such methods),
the attribute values are specific to each particular instance of that class. For that reason,
the attributes are sometimes also referred to as instance variables. Of course, initially
they
are initialised to the same factory default settings, but once an object has been
created,
we can modify its attribute values by using the following dot notation:
>>> p.x = 3
>>> p.y = 4
This sets the x attribute of the object instance p to the value 3 and its y attribute to the
value 4.
The variable p refers to a Point object, which contains two attributes x and y.
Each
attribute contains a number.
>>> print(p.y)
4
>>> x = p.x
>>> print(x)
The expression p.x means, "Go to the point object that p refers to and get the
value of its
attribute named x". In this case, we assign that value to a global variable named x.
There is no conflict between the variable named x (in the global namespace)
and the
attribute named x (in the namespace belonging to the instance). The
purpose of the dot
notation is to fully qualify which variable we are referring to
unambiguously.
We can use dot notation as part of any expression, so the following statements
are
legal:
print(distance_from_origin)
We can make our class constructor more general by placing extra parameters into
the
__init__ method, as shown in this example:
class Point:
""" The Point class represents and manipulates x,y coordinates. """
passed as arguments
"""
self.x = x
self.y = y
The x and y parameters here are both optional. If the caller does not
supply any
arguments for x and y, they'll get the default values of 0.
Here is our improved class in
action:
>>> p = Point(4, 2)
>>> q = Point(6, 3)
4 3 0
Below you can find another memory diagram depicting the three objects that have
been
created in the computer's memory.
If we are really fussy, we would argue that the __init__ method's docstring
is inaccurate. Indeed, __init__ doesn't create the object (i.e. set aside
memory for it;
it's the constructor that does that), ---
it just initialises the
object to its factory-default settings after its creation.
So we're writing the docstring so that it makes the most sense when it
pops up to
help the programmer who is using our Point class:
Let's add another method, distance_from_origin, to our class Point to see better how
methods
work:
class Point:
""" The Point class represents and manipulates x,y coordinates. """
passed as arguments
"""
self.x = x
self.y = y
def distance_from_origin(self):
@pre: -
"""
When defining a method, it must always have a first parameter that refers to the
instance being
manipulated, i.e. the object itself. For that reason it is customary to name
this parameter self.
Now let's create a few point instances, look at their attributes, and call our new
distance
calculation method on them. (Note that we must execute our new class definition above
first,
to make our modified Point class available to the interpreter.)
>>> p = Point(3, 4)
>>> p.x
3
>>> p.y
4
>>> p.distance_from_origin()
5.0
>>> q.x
5
>>> q.y
12
>>> q.distance_from_origin()
13.0
>>> r = Point()
>>> r.x
0
>>> r.y
0
>>> r.distance_from_origin()
0.0
Notice that, although the method distance_from_origin(self) was defined with a first
parameter self, the caller of distance_from_origin() does not explicitly
supply an argument
to match this self parameter; nevertheless this parameter
will be bound to self
automatically, behind our back. Remember that: when you
define a method in a class
you should add a first parameter self representing the instance being
manipulated; when
calling the method you should drop that parameter, it will be filled
in automatically behind
your back.
def print_point(pt):
class Point:
def to_string(self):
>>> p = Point(3, 4)
>>> print(p.to_string())
(3, 4)
But doesn't there already exist a str type converter that can
turn an object into a string?
Yes! And doesn't print
automatically use this when printing things? Yes again!
But these
automatic mechanisms do not (yet) seem to do exactly what we want:
>>> str(p)
>>> print(p)
Rather than printing the contents of the object they print a unique
reference to the
object.
Luckily Python has a clever trick to fix this. If we call our new
method __str__ (with a
double underscore character before and after
the method name) instead of to_string, the
Python interpreter
will use our code instead of the default str function whenever
it needs
to convert a Point to a string.
Let's re-do this again, now:
class Point:
(3, 4)
>>> print(p)
(3, 4)
Such special methods like __str__ (and also the __init__ method introduced before)
are
called [magic_methods] in Python. Typically, whenever you define your own new
classes,
you may want to implement such an __str__ method on them, to be able to
easily
inspect objects of those classes by printing them.
"""
"""
mx = (p1.x + p2.x)/2
my = (p1.y + p2.y)/2
>>> p = Point(3, 4)
>>> r = midpoint(p, q)
>>> print(r)
(4.0, 8.0)
class Point:
# ...
"""
"""
mx = (self.x + target.x)/2
my = (self.y + target.y)/2
This method is almost identical to the function, aside from some renaming.
It's usage
might be like this:
>>> p = Point(3, 4)
>>> r = p.halfway(q)
>>> print(r)
(4.0, 8.0)
While this example assigns each point to a variable, this need not be done.
Just as
function calls are composable, method calls and object instantiation
are also
composable, leading to this alternative that uses no variables:
(4.0, 8.0)
A change of perspective
The original syntax for a function call, print_time(current_time), suggests that the
function
is the active agent. It says something like, "Hey, print_time!
Here's an object for you to
print."
This change in perspective might be more polite, but it may not initially
be obvious that it
is useful. But sometimes shifting responsibility from
the functions onto the objects makes
it possible to write more versatile
functions, and makes it easier to maintain and reuse
code.
Glossary
attribute
One of the named data items that makes up an object.
Another word for attribute is instance variable.
class
A user-defined compound type. A class can also be thought
of as a
template or factory for the objects that are instances
of it.
constructor
A class can also be seen as a "factory" for making objects of
a certain kind.
Every class thus provides a constructor
method, called by the same name as the class, for
making
new instances of this kind. If the class has an initialiser
method, this method
is used to get the attributes (i.e. the
state) of the new instance properly set up.
initialiser method
A special method in Python (called __init__)
that is invoked
automatically to set a newly created object's
attributes to
their initial (factory-default) state.
instance
An object whose type is of some class. The words instance
and object are used
interchangeably.
instance variable
Since the attribute values of an object are specific to that
particular object
(i.e., another object of the same class may
have another value for that attribute),
they are sometimes
also referred to as instance variables.
instantiate
To create an instance of a class, and to run its initialiser
method.
instance method
A function that is defined inside a class definition and is
invoked on
instances of that class.
magic method
Magic methods are special methods like __init__ or __str__
that you can define
to add some magic to your classes. For
example Python magically knows that when a new
object
gets constructed it should call the __init__ method to initialise
the
attributes of the newly created object, or that when you
print an object, it should call
the __str__ method to get a
printable string representation of the object.
Magic methods
are always surrounded by double underscores.
method
If it is clear from the context we will often refer to an instance
method simply
as a method. (We will learn later that there is
also such a thing as class methods,
which is not the same as
instance methods.)
object
A compound data type that is often used to model a thing or
concept in
the real world. It bundles together the data and the
operations that
are relevant for that kind of data. The words
instance and object are used
interchangeably.
object-oriented programming
A powerful style of programming in which data and the
operations
that manipulate it are organized into objects.
object-oriented language
A language that provides features, such as user-defined
classes and
inheritance, that facilitate object-oriented
programming.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
[object_oriented_programming]
(1, 2) http://en.wikipedia.org/wiki/Object-
oriented_programming
[programming_paradigm] http://en.wikipedia.org/wiki/Programming_paradigm
[procedural_programming] http://en.wikipedia.org/wiki/Procedural_programming
[magic_methods] https://rszalski.github.io/magicmethods/
[docstring] https://www.python.org/dev/peps/pep-0257/
[format] https://www.python.org/dev/peps/pep-3101/#id16
Rectangles
Suppose we want a class to represent rectangles located
somewhere in the Cartesian
X-Y plane. What information do we have
to provide in order to specify such a rectangle?
To simplify things,
let us assume that the rectangle is always oriented either vertically or
horizontally, never at an angle.
There are a few possibilities: we could specify the center of the rectangle
(two
coordinates) and its size (width and height); or we could specify one of
the corners and
the size; or we could specify two opposing corners. A
conventional choice is to specify
the upper-left corner of the rectangle, and
its size.
As with the Point class before, we'll define a new class Rectangle,
and provide it with an
initialiser method __init__ and a string converter method __str__.
Also don't forget to
always add as first parameter to your methods a reference to self.
class Rectangle:
passed as arguments
"""
self.corner = pos
self.width = w
self.height = h
def __str__(self):
"""
@pre: -
"""
Note how, to specify the upper-left corner, we embedded a Point object (as was defined
in the previous section) within our new Rectangle object.
We create two new Rectangle
objects, and then print them, which produces:
The dot operator can be composed (chained). For example, the expression box.corner.x
means: "Go to the
object that box refers to, select its attribute named corner, then go to
that object and select its attribute named x".
From the figure it can easily be seen that printing box.corner.x would produce:
>>> print(box.corner.x)
When reasoning about the state of objects (their attributes and the values they contain),
we strongly encourage you to draw such memory diagrams.
box.width += 50
box.height += 100
The memory diagram below sketches the effect of making these modifications (in
green):
class Rectangle:
@post: this rectangle's width was grown to its original width plus
nothing is returned
"""
self.width += delta_width
self.height += delta_height
nothing is returned
"""
self.corner.x += dx
self.corner.y += dy
>>> print(r)
>>> print(r)
print(r)
Again, a memory diagram may help you to better visualise how the state of the rectangle
and point objects get modified (the part in blue corresponds to the definition of the
rectangle object, in pink the effect of growing it, in green the effect of moving it):
Sameness
The meaning of the word "same" seems perfectly clear until we give it some
thought,
and then we realize there is more to it than we initially expected.
For example, if we say, "Alice and Bob have the same mother", we mean that her
mother and his
are the same person.
If we say, however, "Alice and Bob have the same
car", we probably mean that her car
and his are the same make and model, but that they
are two different cars.
But if we say, "Alice and Bob share the same car", we probably
mean that they
actually share the usage of a single car.
When we talk about objects, there is a similar ambiguity. For example, if two
Points are
the same, does that mean they are two point objects that contain the same data
(coordinates) or that they are actually the same object?
We can use the is operator to find out if two references refer to the same object:
>>> p1 = Point(3, 4)
>>> p2 = Point(3, 4)
>>> p1 is p2
False
In this example, even though p1 and p2 contain the same coordinates, they are not the
same object. If we assign the value of p1 to a new variable named p3, however, then the
two variables are aliases
of (refer to) the same object:
>>> p3 = p1
>>> p1 is p3
True
False
The reason is that whenever you call the Point(3, 4) constructor you create a new
distinct
point object that happens to have the values 3 and 4 for its x and y coordinates. But the
two objects are distinct and stored in different memory locations.
"""
"""
Now if we try to run the comparisons above again, but using same_coordinates
as a
comparator rather than the is operator, we can see that they are all
considered the
same:
True
True
True
Of course, if two variables refer to the same object (as is the case with p1 and p3), they
have both
shallow and deep equality.
Beware of ==
p1 = Point(4, 2)
p2 = Point(4, 2)
l1 = [2,3]
l2 = [2,3]
This outputs:
So we conclude that even though the two lists (or tuples, etc.) are distinct
objects
with different memory addresses, for lists the == operator tests for
deep equality,
while in the case of points it makes a shallow test.
Copying
Aliasing (different variables referring to a same object) can make a program difficult to
read because changes made in
one place might have unexpected effects in another
place. It is hard
to keep track of all the variables that might refer to a given object.
>>> p1 = Point(3, 4)
>>> p2 = copy.copy(p1)
>>> p1 is p2
False
True
Once we import the copy module, we can use the copy function to make
a new Point. p1
and p2 are not a reference to the same object,
but they are distinct point objects that
contain the same data.
p2 is a newly created object of which the data is copied
from p1.
>>> b2 = copy.copy(b1)
If we create a rectangle b1 in the usual way, and then make a copy b2,
using copy, the
resulting memory diagram looks like this:
This is almost certainly not what we want. In this case, invoking
grow on one of the
Rectangle objects would not affect the other
(since the grow method only acts on the
width and height attributes
which were copied),
but invoking move on either Rectangle
object would affect the other!
That would be very weird, since the rectangles would
share their
upper-left corner but not their size attributes.
>>> b1.move(10,10)
>>> print(b2.corner)
(10,10)
In the example above, although we didn't explicitly move b2, we can see that its corner
object has changed as a side-effect of moving b1. This behavior is
confusing and error-
prone. The problem is that the shallow copy of the rectangle object has created an alias
to the Point that represents the corner, rather than making a copy of that point.
Fortunately, the copy module also contains a function named deepcopy that
copies not
only the object but also any embedded objects (recursively). It won't be
surprising to
learn that this operation is called a deep copy.
>>> b2 = copy.deepcopy(b1)
>>> b1.move(10,10)
>>> print(b1.corner)
(10,10)
>>> print(b2.corner)
(0,0)
Glossary
deep copy
To copy the contents of an object as well as any embedded
objects, and
any objects embedded in them, and so on;
implemented by the
deepcopy function in the copy module.
deep equality
Equality of values, or two references that point to (potentially
different)
objects that have the same attribute values.
shallow copy
To copy the contents of an object, including any references to
embedded
objects; implemented by the copy function in the
copy module.
shallow equality
Equality of references, or two references that point to the
same object.
string converter method
A magic method in Python (called __str__) that produces an
informal string
representation of an object. For example, this
is the string that will be printed
when calling the print function
on that object.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Even more object-oriented
programming
Source: this section is heavily based on the first half of Chapter 21 of [ThinkCS] though
adapted to better fit with the contents, terminology and notations of this particular
course.
Now that we've seen the basics of object-oriented programming and have
created our
own first Point and Rectangle classes, let's take things
yet a step further.
MyTime
As another example of a user-defined class, we'll define a class called MyTime
that
records the time of day. We provide an __init__ method to ensure
that every instance is
created with appropriate attributes and initialisation.
The class definition looks like this:
class MyTime:
"""
"""
self.hours = hrs
self.minutes = mins
self.seconds = secs
We can then create and instantiate a new MyTime object by calling the
constructor with
the necessary arguments for the initialisation method:
tim1 = MyTime(11, 59, 30)
We leave it as an exercise for the reader (please do iy!) to also add a __str__
method so
that MyTime objects can print themselves decently.
For example, the object above
should print as 11:59:30.
(If you don't know how to do this, look at the Rectangle class of
the
previous chapter for inspiration.)
Pure functions
In the next few sections, we'll write two versions of a function called
add_time, which
calculates the sum of two MyTime objects. They will demonstrate
two kinds of functions:
pure functions and modifiers.
"""
"""
h = t1.hours + t2.hours
m = t1.minutes + t2.minutes
s = t1.seconds + t2.seconds
sum_t = MyTime(h, m, s)
return sum_t
Here is an example of how to use this function. We'll create two MyTime
objects:
current_time, which contains the current time; and bread_time,
which contains the amount
of time it takes for a breadmaker to make bread. Then
we'll use add_time to figure out
when the bread will be done.
>>> print(done_time)
12:49:30
The problem is that this function does not deal with cases where the number of
seconds
or minutes adds up to more than sixty. When that happens, we have to
carry the extra
seconds into the minutes column or the extra minutes into the
hours column.
Here's an improved version of the function. (We left out its specification,
because it
would get pretty big and we will soon propose a better alternative
solution.)
h = t1.hours + t2.hours
m = t1.minutes + t2.minutes
s = t1.seconds + t2.seconds
if s >= 60:
s -= 60
m += 1
if m >= 60:
m -= 60
h += 1
sum_t = MyTime(h, m, s)
return sum_t
This function is already starting to get bigger, and still doesn't work
for all possible
cases. Later we will
suggest an alternative approach that yields better code.
Modifiers
There are times when it is useful for a function to modify one or more of the
objects it
gets as parameters. Usually, the caller keeps a reference to the
objects it passes, so
any changes the function makes are visible to the caller.
Functions that work this way
are called modifiers.
For example, increment, which adds a given number of seconds to a MyTime object,
when written as a modifier, could behave like this:
>>> t = MyTime(10,20,30)
>>> increment(t,70)
>>> print(t)
10:21:40
"""
nothing is returned
"""
t.seconds += secs
t.seconds -= 60
t.minutes += 1
t.minutes -= 60
t.hours += 1
The first line performs the basic operation; the remainder deals with the
special cases
we saw before.
Note that this function has no return statement nor does it need to create
a new object. It
simply modifies the state of the Time object t that was
passed as first parameter to the
function.
t.seconds += secs
t.seconds -= 60
t.minutes += 1
t.minutes -= 60
t.hours += 1
This function is now correct when seconds is not negative, but it is still not
a particularly
good nor efficient solution.
>>> t = MyTime(10,20,30)
>>> increment(t,100)
>>> print(t)
10:22:10
class MyTime:
self.seconds += seconds
self.seconds -= 60
self.minutes += 1
self.minutes -= 60
self.hours += 1
Now we can invoke increment using the dot syntax for invoking a method,
instead of
writing increment(current_time,500) :
>>> current_time.increment(500)
>>> print(current_time)
12: 6:50
The object current_time on which the method is invoked gets assigned to the first
parameter, self. The second parameter, seconds gets the value 500.
An "Aha!" moment
An "Aha!" moment is that moment or instant at which the solution to a problem suddenly
becomes clear.
Often a high-level insight into a problem can make the programming
much easier.
Our "Aha!" moment consists of the insight that a MyTime object is actually a
three-digit
number in base 60 !
The "seconds" correspond to the units, the "minutes" to the sixties,
and the hours to the thirty-six hundreds.
Indeed, 12h03m30s corresponds to 12*3600 + 3*60
+ 30 = 43410 seconds.
When we were writing the add_time and increment functions and methods,
we were
effectively doing addition in base 60, which explains why we had to carry
over remaining
digits from one column to the next.
This observation suggests another approach to the entire problem --- we can
convert a
MyTime object into a single number (in base 10, representing the
seconds) and take
advantage of the fact that the computer knows how to do
arithmetic with numbers. The
following method can be added to the MyTime
class to convert any instance into a
corresponding number of seconds:
class MyTime:
# ...
def to_seconds(self):
"""
@pre: -
"""
>>> print(current_time)
11:58:30
>> print(seconds)
43110
Now, all we need is a way to convert from an integer, representing the time in seconds,
back to a MyTime object.
Supposing we have tsecs seconds, some integer division and
modulus operators
can do this for us:
secs = leftoversecs % 60
You might have to think a bit to convince yourself that this technique to
convert from one
base to another is correct. Remember that the // operator
represents integer division and
that the modulus operator % calculates the
remainder of integer division.
class MyTime:
# ...
In case the values of mins and secs are outside the range
"""
self.minutes = leftoversecs // 60
self.seconds = leftoversecs % 60
"""
"""
This version is much shorter than the original, and it is much easier to
demonstrate or
reason that it is correct. Notice that we didn't have to
do anything for carrying over
seconds or minutes that are too large; that
is handled automatically by our new initialiser
method now.
(Isn't that just wonderful?)
>>> print(done_time)
12:49:30
The final question that remains now is how we can rewrite the increment
method that we
wrote before, without having to reimplement the logic that we now
put into our new
initialiser method. The answer to this question is in the question.
What if we simply try to
call the __init__ method from within the increment
method so as to reuse its logic. This
can be done surprisingly easily:
def increment(self, secs):
"""
nothing is returned
"""
self.__init__(self.hours,self.minutes,self.seconds+secs)
Again, the carrying over of seconds or minutes that are too large is handled
automatically by the initialiser method. It is important to observe that,
as opposed to the
add_time method, we are not creating a new MyTime object here.
We are simply calling
__init__ to assign a new state to the existing instance (self).
>>> current_time.increment(500)
>>> print(current_time)
12: 6:50
Generalisation
In some ways, converting from base 60 to base 10 and back is harder than just
dealing
with time. Base conversion is more abstract; our intuition for dealing
with time is better.
However, if we have the insight to treat time objects as base 60 numbers and make the
investment of writing the conversions, we get a program that is shorter,
easier to read
and debug, and more reliable.
It is also easier to add features later. For example, imagine subtracting two
MyTime
objects to find the duration between them. The naive approach would be to
implement
subtraction with borrowing. Using the conversion functions would be
easier and more
likely to be correct.
Ironically, sometimes making a problem harder (or more general) makes the
programming easier, because there are fewer special cases and fewer opportunities
for
error.
Computer scientists tend to do the opposite. We will argue that there are
many integer
operations that are simply not meaningful for dominoes, or
for days of the century. So
we'll often define new specialised types, like
MyTime, because we can restrict,
control, and specialise the operations
that are possible. Object-oriented programming
is particularly popular
because it gives us a good way to bundle methods and specialised data
into a new type. (We call such a type an abstract data type.)
class MyTime:
In case the values of mins and secs are outside the range
0-59, the resulting MyTime object will be normalised,
"""
self.minutes = leftoversecs // 60
self.seconds = leftoversecs % 60
def __str__(self) :
"""
@pre: -
"""
"""
nothing is returned
"""
self.__init__(self.hours,self.minutes,self.seconds+secs)
def to_seconds(self):
"""
@pre: -
"""
Binary operations
We will now add a few more interesting methods to this class. Let us start by
an after
function which compares two times, and tells us whether the first
time is strictly after the
second, e.g.
>>> t1 = MyTime(10, 55, 12)
True
if current_time.after(done_time):
We can almost read the invocation like English: If the current time is after the
done time,
then...
To implement this method, we can again use our "Aha!" insight of the previous chqpter
and
and reduce both times to seconds, which yields a very compact method definition:
class MyTime:
"""
"""
This is a great way to code this: if we want to tell if the first time is
after the second time,
turn them both into integers and compare the integers.
Operator overloading
Some languages, including Python, make it possible to have different meanings for
the
same operator when applied to different types. For example, + in Python
means quite
different things for integers and for strings. This feature is called
operator overloading.
It is especially useful when programmers can also overload the operators for their
own
user-defined types.
class MyTime:
"""
seconds of t1 and t2
"""
As usual, the first parameter self is the MyTime object on which the method is invoked.
The second parameter is conveniently named other to distinguish it from
self. To add two
MyTime objects, we create and return a new MyTime object
that contains their sum in
seconds.
(Remember from the previous chapter that the __init__ method normalises
MyTime objects
by converting their value in seconds to hours, minutes and seconds.)
Now, when we apply the + operator to MyTime objects, Python magically invokes
the
__add__ method that we have written:
>>> t3 = t1 + t2
>>> print(t3)
5: 6:12
For the next couple of exercises we'll go back to the Point class defined
when we first
introduced objects (in chapter Classes and Objects – the Basics),
and overload some of
its operators. Firstly, adding
two points adds their respective (x, y) coordinates:
class Point:
"""
"""
>>> p = Point(3, 4)
>>> q = Point(5, 7)
>>> print(r)
(8, 11)
"""
"""
"""
"""
The result is a new Point whose coordinates are a multiple of the original
coordinates. If
other is a type that cannot be multiplied by a
floating-point number, then __rmul__ will
yield an error.
>>> p1 = Point(3, 4)
>>> p2 = Point(5, 7)
43
(10, 14)
>>> print(p2 * 2)
>>> print(p2 * 2)
If you wonder if we could avoid this error and make __mul__ work as well
when the
second argument is a number, the answer is yes:
"""
"""
if type(other) is Point :
Polymorphism
Most of the methods we have written so far only work for a specific type. When we
create a new object, we write methods that operate on that type. But there are certain
operations that we may want to apply to many types,
such as the arithmetic operators +
and * in the previous section. If many types
support the same set of operations, we can
write functions that work on any of
those types.
For example, the multadd operation (which is common in linear algebra)
takes three
parameters; it multiplies the first two and then adds the third. We
can write it in Python
like this:
return x * y + z
This function will work for any values of x and y that can be multiplied
and for any value
of z that can be added to the product.
>>> multadd(3, 2, 1)
>>> p1 = Point(3, 4)
>>> p2 = Point(5, 7)
(11, 15)
44
In the first case, the Point p1 is multiplied by a scalar 2 and then added to
another Point
p2. In the second case, the dot product of p1 and p2 yields a numeric value,
so the third
parameter also has to be a numeric value.
Functions like +, * and multadd that can work with arguments of different types are called
polymorphic. In object-oriented programming, polymorphism (from the Greek meaning
"having multiple forms") is the characteristic of being able to assign a different meaning
or usage to something in different contexts. In this case, the context that varies are the
types of arguments taken by the function.
import copy
back = copy.copy(front)
back.reverse()
print(str(front) + str(back))
Because the reverse method is a modifier, we first make a copy of the list before
reversing it. That way, this function doesn't modify the list it gets as a
parameter.
>>> front_and_back(my_list)
[1, 2, 3, 4][4, 3, 2, 1]
Since copy works on any object, and we have already written a __str__ method
for Point
objects, all we need to add is a reverse method to the Point class,
which we define as a
method that swaps the values of the x and y attributes of a point:
def reverse(self):
"""
@pre: -
"""
After this, we can try to pass Point objects to the front_and_back function:
>>> p = Point(3, 4)
>>> front_and_back(p)
(3, 4)(4, 3)
The most interesting polymorphism is often the unintentional kind, where we discover
that a function which we have already written can be applied to a type for which we
never planned it.
Glossary
dot product
An operation defined in linear algebra that multiplies two
points
and yields a numeric value.
duck typing
If all of the operations on arg inside the body of a function
f(arg)
can be applied to a given type, then the function can be
applied to
an argument arg of that type.
operator overloading
Extending built-in operators ( +, -, *, >, <, etc.)
so that they do
different things for different types of arguments. We've
seen
earlier how + is overloaded for numbers and strings,
and
here we've shown how to further overload it for user-defined
types
using magic methods.
polymorphic
A function that can operate on more than one type. Notice
the subtle
distinction: overloading has different functions (all
with the same name)
for different types, whereas a
polymorphic function is a single function
that can work for a
range of types.
scalar multiplication
An operation defined in linear algebra that multiplies each of
the
coordinates of a Point by a numeric value.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Collections of objects
Source: this section is heavily based on Chapter 22 of [ThinkCS] though adapted to
better fit with the contents, terminology and notations of this particular course.
Composition
By now, we have seen several examples of composition. One
example is using a
method invocation as part of an expression. Another
example is the nested structure of
statements: we can put an if statement
within a while loop, within another if statement,
and so on.
Having seen this pattern, and having learned about lists and objects, we
should not be
surprised to learn that we can create lists of objects. We can
also create objects that
contain lists (as attributes); we can create lists
that contain lists; we can create objects
that contain objects; and so on.
In this chapter and the next, we will look at some examples of these
combinations, using
Card objects as an example.
Card objects
If you are not familiar with common playing cards, now would be a good time to
get a
deck, or else this chapter might not make much sense. There are
fifty-two cards in a
deck, each of which belongs to one of four suits and one
of thirteen ranks. The suits are
Spades ♠︎, Hearts ♥︎, Diamonds ♦︎, and Clubs ♣︎(in
descending order in the bridge game).
The ranks are Ace (1), 2, 3, 4, 5, 6, 7, 8, 9, 10,
Jack, Queen, and King. Depending on
the game that we are playing, the rank of
Ace may be higher than King or lower than 2.
The rank is sometimes called the face-value of the card.
If we want to define a new object to represent a playing card, it is obvious
what its
attributes should be: rank and suit. It is not as obvious what
type these attributes should
have. One possibility is to use strings containing
words like "Spade" for suits and "Queen"
for ranks. One problem with
this implementation is that it would not be easy to compare
cards to see which
had a higher rank or suit.
Spades <--> 3
Hearts <--> 2
Diamonds <--> 1
Clubs <--> 0
An obvious feature of this mapping is that the suits map to integers in order,
so we can
compare suits by comparing integers. The mapping for ranks is fairly
obvious; each of
the numerical ranks maps to the corresponding integer (and Ace to 1), and for
face
cards:
Jack <--> 11
Queen <--> 12
King <--> 13
Using such an encoding of suits and ranks as integers, the class definition for
the Card
type looks like this:
class Card:
self.suit = suit
self.rank = rank
To create some objects, representing say the 3 of Clubs (0) and the Jack (11) of
Diamonds (1), use these commands:
three_of_clubs = Card(0, 3)
card1 = Card(1, 11) # Jack of Diamonds
In the first case above, the first argument, 0, represents the suit Clubs.
In the second
case above, the second argument, 11, represents the Jack.
In the next chapter we will assume that we have saved the Cards class,
and the upcoming Deck class in a file called Cards.py.
Class attributes
In order to print Card objects in a way that people can easily read, we
want to map the
integer codes back onto words. A natural way to do that is with
lists of strings. We
assign these lists to class attributes (or class variables)
at the top of the class
definition:
class Card:
self.suit = suit
self.rank = rank
def __str__(self):
A class attribute or class variable is defined outside of any method, and it can be
accessed from
any of the methods in the class.
To access a class attribute, you have to
use the dot notation.
Card.ranks refers to the class attribute ranks defined in the class
Card.
Inside __str__, we can use the suits and ranks list to map the numerical
values of suit and
rank to strings. For example, the expression
Card.suits[self.suit] means: use the instance
variable suit from the object
self as an index into the class attribute named suits of the
class Card, and select the
corresponding string.
The reason for the "narf" value (which is an acronym for "not a real face-value")
as the
first element in ranks is to act as a
place keeper for the zero-eth element of the list, which
will never be used.
The only valid ranks are 1 to 13. This wasted item is not entirely
necessary.
We could have started at 0, by putting rank 1 at position 0 in the list, and so
on,
but it is much less confusing to encode the rank 2 as integer 2, 3 as 3, and so on.
With the methods we have so far, we can create and print cards:
>>> print(card1)
Jack of Diamonds
We can access a class variable directly via its class, like we did before:
>>> print(Card.suits[1])
Diamonds
Again, it can be useful to draw a memory diagram like the one below to clearly
understand that instance variables are stored in the instance and that class
variables
are part of the class definition:
>>> print(card1.suits[1])
Diamonds
def __str__(self):
>>> print(card2)
3 of Diamonds
>>> print(card2.suits[1])
Diamonds
However, because every Card instance references the same class attribute, we have an
aliasing situation. The disadvantage of that is that if we would modify a class attribute,
this modification would affect every instance of that class.
For example, if we decide that
Jack of Diamonds should
really be called Jack of Swirly Whales, we could do this:
>>> print(card1)
The problem is that all of the Diamonds just became Swirly Whales:
>>> print(card2)
3 of Swirly Whales
We'd like to start with a single method named cmp that captures the logic of ordering.
By
convention, a comparison method takes two parameters, self and other,
and returns 1 if
the first object is greater, -1 if the second object is greater,
and 0 if they are equal to
each other.
Some types are completely ordered, which means that we can compare any two
elements and tell which is bigger. For example, the integers and the
floating-point
numbers are completely ordered. Some types are unordered, which
means that there is
no meaningful way to say that one element is bigger than
another. For example, the
fruits are unordered, which is why we cannot compare
apples and oranges, and we
cannot meaningfully order a collection of images,
or a collection of cellphones.
return 0
Note that in this ordering, Aces (1) appear lower than Deuces (2).
# equality
return self.cmp(other) == 0
# not equal
return self.cmp(other) != 0
With this machinery in place, the relational operators now work as we'd like them to:
False
True
Decks
Now that we have objects to represent Cards, the next logical step is to
define a class to
represent a Deck. Of course, a deck is made up of cards,
so each Deck object will
contain a list of cards as an attribute. Some card
games will need at least two different
decks --- a red deck and a blue deck.
The following is a class definition for the Deck class. The initialisation
method creates
the attribute cards and generates the standard pack of
fifty-two cards:
class Deck:
def __init__(self):
self.cards = []
self.cards.append(Card(suit, rank))
The easiest way to populate the deck is with a nested loop. The outer loop
enumerates
the suits from 0 to 3. The inner loop enumerates the ranks from 1 to
13. (Remember that
range(m, n) generates integers from m up to, but not including, n.) Since the outer loop
iterates four times, and the inner loop iterates
thirteen times, the total number of times
the body is executed is 52
(13 * 4). Each iteration creates a new instance of Card with
the current suit and rank, and appends that card to the cards list.
(Remember that
whenever the Card constructor method is invoked
a new instance of class Card is
created.)
red_deck = Deck()
blue_deck = Deck()
class Deck:
...
def print_deck(self):
print(card)
Here, and from now on, the ellipsis (...) indicates that we have omitted
the other methods
in the class.
>>> red_deck.print_deck()
class Deck:
...
def __str__(self):
s,spaces = "",""
for c in self.cards:
return s
Second, instead of using the print command to print the cards, we use the
str function to
get their print representation.
Passing an object as an argument to str, i.e. str(c), is
equivalent to
invoking the __str__ method on the object, i.e. c.__str__() .
Thirdly, we are using the variables s and spaces as accumulators. Initially,
s and spaces
are empty strings. Each time through the loop, a new string is
generated and
concatenated to the old value of s to get the new value.
Similarly, each time through the
loop a single space is added to spaces
to increase the indentation level.
When the loop
ends, s finally contains the complete string representation of the
Deck, which looks like
this:
>>> print(red_deck)
Ace of Clubs
2 of Clubs
3 of Clubs
4 of Clubs
5 of Clubs
6 of Clubs
7 of Clubs
8 of Clubs
9 of Clubs
10 of Clubs
Jack of Clubs
Queen of Clubs
King of Clubs
Ace of Diamonds
2 of Diamonds
...
And so on. Even though the result appears on 52 lines, it is one long string
that contains
newlines.
rng.randrange(0, len(self.cards))
An easy way to shuffle the deck is by traversing the cards and swapping each
card with
a randomly chosen one. It is possible that the card will be swapped
with itself, but that is
fine. In fact, if we precluded that possibility, the
order of the cards would be less than
entirely random:
class Deck:
...
def shuffle(self):
import random
num_cards = len(self.cards)
for i in range(num_cards):
j = rng.randrange(i, num_cards)
>>> red_deck.shuffle()
>>> print(red_deck)
Rather than assuming that there are fifty-two cards in the deck, we get the
actual length
of the list and store it in num_cards. This avoids having
hardcoded numbers in the code,
so that the algorithm is more generic and can
be reused easily for other sizes of decks
(such as those used for the blackjack
card game).
While this is a good shuffling method, a random number generator object also
has a
shuffle method that can shuffle elements in a list, in place.
So we could rewrite this
function to use the one provided for us:
class Deck:
...
def shuffle(self):
import random
class Deck:
...
if card in self.cards:
self.cards.remove(card)
return True
else:
return False
To deal cards, we want to remove and return the top card. The list method
pop provides
a convenient way to do that:
class Deck:
...
def pop(self):
return self.cards.pop()
Actually, pop removes the last card in the list, so we are actually
dealing from the bottom
of the deck.
One more operation that we are likely to want is the Boolean function
is_empty, which
returns True if the deck contains no more cards:
class Deck:
...
def is_empty(self):
return self.cards == []
Glossary
accumulator
A variable used in a loop to accumulate a series of values,
such as by
concatenating them onto a string or adding them
to a running sum.
class attribute
A variable that is defined inside a class definition but outside
any
method. Class attributes are accessible from any method
in the class
and are shared by all instances of the class.
class variable
synonym for class attribute
encode
To represent one type of value using another type of value by
constructing a mapping between them.
magic methods for relational operators
__eq__ (equals) overloads the == operator
__le__ (equals)
overloads the <= operator
__ge__ (equals) overloads the >=
operator
__lt__ (equals) overloads the < operator
__gt__
(equals) overloads the > operator
__ne__ (equals) overloads
the != operator
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
Inheritance
Source: this section is heavily based on Chapter 23 of [ThinkCS] though adapted to
better fit with the contents, terminology and notations of this particular course.
Inheritance
The language feature most often associated with object-oriented programming is
inheritance. Inheritance is the ability to define a new class that is a
modified version of
an existing class.
The primary advantage of this feature is that you can add new methods to a
class
without modifying the existing class. It is called inheritance because
the new class
inherits all of the methods of the existing class. Extending this
metaphor, the existing
class is sometimes called the parent class. The new
class is called the child class or
sometimes subclass.
On the other hand, inheritance can sometimes make programs difficult to read.
When a
method is invoked, it is sometimes not clear where to find its definition,
since the
relevant code may be scattered among several classes.
If the natural structure of a
problem does not lend itself to
inheritance, sometimes a more elegant solution without
using inheritance
is more appropriate. In general, as a computer scientist it is good to
know a few different programming paradigms so that you can always choose the one
that is most suited for the problem at hand.
We add the code in this chapter to our Cards.py file from the previous chapter.
In the
class definition, the name of the parent class appears in parentheses:
class Hand(Deck):
pass
This statement indicates that the new Hand class inherits from the existing
Deck class.
Such an empty child class would provide exactly the same behaviour as its super class.
(In other words, instances of the child class understand exactly the same methods as
instances of the super class.) This is not very useful, unless we add a few additional
methods and instance variables.
class Hand(Deck):
self.cards = []
self.name = name
For just about any card game, it is necessary to add and remove cards from a
hand.
Removing cards is already taken care of, since Hand inherits
remove from Deck. (In other
words, since the super or parent class Deck
already implements the method remove, any
instance of class Hand
will automatically understand that method as well.) But we still
have to implement an add method:
class Hand(Deck):
...
self.cards.append(card)
Again, the ellipsis ... indicates that we have omitted other methods. The list
append
method adds the new card to the end of the list of cards held in the hand.
Dealing cards
Now that we have a Hand class, we want to deal cards from the Deck into
hands. It is not
immediately obvious whether this method should go in the
Hand class or in the Deck
class, but since it operates on a single deck
and (possibly) several hands, it is more
natural to put it in Deck.
deal should be fairly general, since different games will have different
requirements. We
may want to deal out the entire deck at once or add one card
to each hand.
deal takes two parameters, a list (or tuple) of hands and the total number
of cards to
deal. If there are not enough cards in the deck, the method deals
out all of the cards and
stops:
class Deck:
...
num_hands = len(hands)
for i in range(num_cards):
if self.is_empty():
The loop variable i goes from 0 to num_cards-1. Each time through the
loop, a card is
removed from the deck using the list method pop, which
removes and returns the last
item in the list.
The modulus operator (%) allows us to deal cards in a round robin (one
card at a time to
each hand). When i is equal to the number of hands in the
list, the expression i %
num_hands wraps around to the beginning of the list
(index 0).
Printing a Hand
To print the contents of a hand, we can take advantage of the
__str__ method inherited
from Deck. For example:
>>> deck = Deck()
>>> deck.shuffle()
>>> deck.deal([hand], 5)
>>> print(hand)
2 of Spades
3 of Spades
4 of Spades
Ace of Hearts
9 of Clubs
class Hand(Deck)
...
def __str__(self):
if self.is_empty():
s += " is empty\n"
return s
else:
s += " contains\n"
return s + Deck.__str__(self)
Otherwise, the program appends the word contains and the string
representation of the
Deck, computed by invoking the __str__ method in
the Deck class on self.
>>> deck.shuffle()
>>> deck.deal([hand], 5)
>>> print(hand)
2 of Spades
3 of Spades
4 of Spades
Ace of Hearts
9 of Clubs
...
def __str__(self):
if self.is_empty():
s += " is empty\n"
return s
else:
s += " contains\n"
return s + super().__str__()
The only change with respect to the previous implementation is the last line.
Rather than
referring to the super class Deck explicitly, the super()
method allows us to refer to that
super class implicitly. Also note that
we don't have to pass self as an argument anymore
when making such a super call.
class CardGame:
def __init__(self):
self.deck = Deck()
self.deck.shuffle()
This is the first case we have seen where the initialisation method performs a
significant
computation, beyond initialising attributes. For more complex classes,
like this one, that
will often be the case. (As a side note, the initialisation
method of a subclass will also
often refine the initialisation
method of its parent class using a super call. That is not the
case here since
CardGame is not a subclass.)
To implement specific games, we can inherit from CardGame and add features
for the
new game. As an example, we'll write a simulation for the [OldMaid] card game.
The object of Old Maid is to get rid, as soon as possible, of all the cards in your hand.
You do this by matching cards by rank and colour.
For example, the 4 of Clubs ♣︎
matches the 4 of
Spades ♠︎ since they have the same rank (4) and both suits (♣︎,♠︎) are
black.
The Jack of Hearts ♥︎matches the Jack of
Diamonds ♦︎since both Jacks are of the
red colour.
Before starting the game, the Queen of Clubs is removed from the deck.
(Other variants
of the [OldMaid] game exist where the card removed
from the deck is another one, but
that doesn't change the essence of the game.)
As a consequence of having removed the
Queen of Clubs, its corresponding card,
the Queen of Spades, will never be matched
during the game. The player who
remains with this card, the old maid, at the end of the
game, loses the game.
The 51 remaining cards are now dealt to the players in a round robin fashion.
After the
deal, all players can discard all matching pairs of cards they have in
their hand.
When no more matches can be made, the actual play begins. In turn, each player
picks
a card (without looking) from his closest neighbor to the left who still has
cards. If the
chosen card matches a card in the player's own hand, he can discard
this pair from his
hand. Otherwise, the chosen card is added to the player's hand.
Eventually, as the game
continues, all possible matches are made, except for the
Queen of Spades (for which no
match exists, as the Queen of Clubs was removed from
the deck before starting the
game). The player who remains with the Queen of Spades
in his hand loses the game.
(This game is particular in the sense that it has a
unique loser, not a winner.)
In our computer simulation of the game, the computer will play all hands.
Unfortunately,
some funny nuances of the real game are lost. In a real game, the
player with the Old
Maid goes to some effort to get their closest neighbor to pick that
card, by displaying it a
little more prominently, or perhaps failing to display
it more prominently, or even failing to
fail to display that card more
prominently. The computer simply picks a neighbor's card
at random.
OldMaidHand class
A hand for playing the Old Maid game requires some abilities beyond the general
abilities of a Hand, such as the ability to remove matching cards from the hand.
We will
therefore define a new class, OldMaidHand, that inherits from Hand
to reuse its
functionality, and provides an additional method called
remove_matches:
class OldMaidHand(Hand):
def remove_matches(self):
count = 0
original_cards = self.cards.copy()
for i in range(0,len(original_cards)):
card = original_cards[i]
for j in range(i+1,len(original_cards)):
match = original_cards[j]
self.cards.remove(card)
self.cards.remove(match)
count += 1
break
return count
We start by making a copy of the list of cards, so that we can traverse the
copy while
removing cards from the original. Since self.cards will be modified
in the loop, we don't
want to use it to control the traversal. Python (or any other
programming language, for
that matter) can get quite confused if it is traversing
a list that is changing while being
traversed!
For each card in our hand (outer loop), we iterate over all the remaining cards in our
hand
(inner loop) to check whether they match that card. In the inner loop, we are smart
and only
consider cards after the current card being compared, since all the ones before
have already
been compared.
We have a match if the match has the same rank and the other suit of the same
color.
Conveniently, the expression 3 - card.suit turns a Club ♣︎ (suit 0) into a Spade ♠︎
(suit 3)
and a Diamond ♦︎ (suit 1) into a Heart ♥︎ (suit 2). You should satisfy
yourself that the
opposite operations also work. This clever trick works
because of how we encoded suits
as numbers. A clever encoding
often may make certain operations surprisingly easy.
Whenever we find a match, we remove both the card and its match
from our hand, and
jump out of the inner loop, since no other matches
for this card will be found.
>>> print(hand)
2 of Hearts
6 of Diamonds
9 of Clubs
6 of Hearts
Jack of Diamonds
7 of Diamonds
10 of Spades
7 of Clubs
3 of Hearts
7 of Hearts
3 of Spades
10 of Clubs
8 of Clubs
3 matches found
>>> print(hand)
2 of Hearts
9 of Clubs
Jack of Diamonds
7 of Clubs
3 of Hearts
3 of Spades
8 of Clubs
Alternative implementation
Here's an alternative and slightly more compact implementation of the remove_matches
method.
Which one you prefer is a matter of personal taste.
class OldMaidHand(Hand):
def remove_matches(self):
count = 0
original_cards = self.cards.copy()
if match in self.cards:
self.cards.remove(card)
self.cards.remove(match)
count += 1
return count
OldMaidGame class
Now we can turn our attention to the game itself. OldMaidGame is a subclass
of
CardGame. Since __init__ is inherited from CardGame,
a new OldMaidGame object already
contains a new shuffled deck.
OldMaidGame defines a new method called play that takes
a list of player
names as a parameter. Calling this play method launches the game:
OldMaidGame().play(["kim","charles","siegfried"])
class OldMaidGame(CardGame):
...
queen_clubs = Card(0,12)
self.deck.remove(queen_clubs)
self.hands = []
self.hands.append(OldMaidHand(name))
self.deck.deal(self.hands)
self.print_hands()
matches = self.remove_all_matches()
self.print_hands()
turn = 0
num_players = len(names)
matches += self.play_one_turn(turn)
self.print_hands()
Some of the steps of the game have been separated into methods.
The auxiliary method
print_hands is pretty straightforward:
class OldMaidGame(CardGame):
...
def print_hands(self):
print(hand)
class OldMaidGame(CardGame):
...
def remove_all_matches(self):
count = 0
count += hand.remove_matches()
return count
The variable turn keeps track of which player's turn it is. It starts at 0
and increases by
one each time; when it reaches num_players, the modulus
operator wraps it back around
to 0.
The method play_one_turn takes a parameter that indicates whose turn it is.
The return
value is the number of matches made during this turn:
class OldMaidGame(CardGame):
...
if self.hands[i].is_empty():
return 0
neighbor = self.find_neighbor(i)
picked_card = self.hands[neighbor].pop()
self.hands[i].add(picked_card)
count = self.hands[i].remove_matches()
self.hands[i].shuffle()
return count
If a player's hand is empty, that player is out of the game, so he or she does
nothing and
0 matches are returned.
Otherwise, a turn consists of finding the first player on the left that has
cards, taking one
card from the neighbor, and checking for matches. Before
returning, the cards in the
hand are shuffled so that the next player's choice
is random.
The method find_neighbor starts with the player to the immediate left and
continues
around the circle until it finds a player that still has cards:
class OldMaidGame(CardGame):
...
num_hands = len(self.hands)
if not self.hands[neighbor].is_empty():
return neighbor
If find_neighbor ever went all the way around the circle without finding
cards, it would
return None and cause an error elsewhere in the program.
Fortunately, we can prove
that that will never happen (as long as the end of
the game is detected correctly).
Glossary
ancestor class
A parent class, or an ancestor of the parent class.
child class
A new class created by inheriting from an existing class;
also
called a subclass.
inheritance
The ability to define a new class that is a modified version of
a
previously defined class.
method overriding
When, in addition to overwriting a method higher up the
hierarchy,
the implementation of that new method also
refines the old one,
by making use of it through a super call,
and doing a bit more.
method overwriting
When a method defined in a child class replaces the
implementation
of a method with the same name defined in a
parent or ancestor class.
parent class
The class from which a child class inherits;
also called a
superclass
subclass
Another word for child class.
superclass
another word for parent class.
super call
A super call can be used to gain access to inherited methods
– from a parent
or ancestor class – that have been
overridden in a child class.
This can either be done by
explicitly referring to that parent class,
or implicitly by using
the special super() function.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
[OldMaid]
(1, 2, 3) https://en.wikipedia.org/wiki/Old_maid_(card_game)
Linked lists
Source: this section is largely based on Chapter 24 of [ThinkCS] though adapted to
better fit with the contents, terminology and notations of this particular course. In
particular, some of the code in this chapter has been adapted to use a more object-
oriented style.
Embedded references
We have seen examples of attributes that refer to other objects.
For example, the
CardGame class referred to a Deck object
as one of its attributes. We call such objects
contained in another one
embedded references.
We have also seen examples of data structures, such as lists and tuples.
For example,
Deck objects contain a list of Card objects.
Linked lists are made up of Node objects, where each node (the last node excepted)
contains a reference to the next node in the linked list. In addition, each node carries
a
unit of data called its cargo.
"""
@pre: -
"""
self.cargo = cargo
self.next = next
def __str__(self):
"""
@pre: -
"""
return str(self.cargo)
To test the implementation so far, we can create a Node object and print it:
>>> print(node)
test
To make it more interesting, we will now try to create a linked list with three nodes.
First
we create each of the three nodes.
This code creates three nodes, but we don't have a linked list yet because the nodes
are
not linked. The memory diagram looks like this:
To link the nodes, we have to make the first node refer to the second one and the
second one to the third:
The next reference of the third node remains None, which indicates that it is the end
of
the linked list. Now the memory diagram looks like this:
Now you know how to create nodes and link them into lists. What might be less
clear at
this point is why.
To pass a linked list as a parameter, we only have to pass a reference to its first
node.
For example, the function print_list below takes a single node as an
argument. Starting
with the head of a linked list, it prints each node until it gets
to the end:
def print_list(node):
"""
Prints the cargo of this node and of each node it is linked to.
@post: Has printed a space-separated list of the form "a b c ... ",
"""
node = node.next
>>> print_list(node1)
1 2 3
Inside print_list we have a reference to the first node of the linked list.
From there, to get
to the next nodes, we can use the next attribute
of each node.
To traverse a linked list, it
is common to use a loop variable like node to
refer to each of the nodes in succession.
This diagram shows the different values that the node variable takes on:
Linked lists and recursion
Since the linked list data structure is defined as a class, it would have been
more natural
to define the print_list function as a method on the Node
class. When doing so, the
method needs to be defined in a recursive way, by
first printing the cargo of its head and
then recursively invoking the print_list
method on the next node, until no more nodes are
left:
class Node:
...
def print_list(self):
"""
@pre: The linked data structure of which this node is the head
contains no loops.
@post: Has printed a space-separated list of the form "a b c ... ",
"""
if tail is not None : # as long as the end of the list was not reached
>>> node1.print_list()
1 2 3
1. Separate the list into two pieces: its first node (called the
head); and the
remainder (called the tail).
2. Print the tail backward.
3. Print the head.
The code which implements this algorithm looks surprisingly similar to the code
of the
print_list method above, the only difference being that now the head is printed
after the
recursive call, instead of before :
class Node:
...
def print_backward(self):
"""
Recursively prints the cargo of each node connected to this node (in
opposite order), then prints the cargo of this node as last value.
@pre: The linked data structure of which this node is the head
contains no loops.
"""
>>> node1.print_backward()
3 2 1
Infinite lists
There is nothing to prevent a node from referring back to an earlier node in
the list,
including itself. For example, this figure shows a list with two
nodes, one of which refers
to itself:
For example, when we write print(self, end=" ") we seem to regard self
as referring to a
single node that is the head of this linked list,
and we use the print function to print the
value of its cargo.
On the other hand, when assigning self.next to a variable named tail,
we seem to be
regarding self.next not as a single node but rather as the
entire linked list that has the
next node as first node.
Modifying lists
There are two ways to modify a linked list. Obviously, we can change the cargo
of one of
its nodes, but the more interesting operations are the ones that add,
remove, or reorder
nodes.
As an example, let's write a method that removes the second node in the list
and returns
a reference to the removed node:
class Node:
...
def remove_second(self):
"""
"""
first = self
second = self.next
first.next = second.next
second.next = None
return second
We are using temporary variables first and second here to make the code more readable.
Here is
how to use this method:
>>> node1.print_list()
1 2 3
>>> removed.print_list()
2
>>> node1.print_list()
1 3
class Node:
...
def print_backward_nicely(self):
"""
@pre: The linked data structure of which this node is the head
contains no loops.
@post: Has printed a space-separated list of the form "[ ... c b a ]",
"""
self.print_backward()
print("]")
class LinkedList:
def __init__(self):
"""
@pre: -
"""
self.length = 0
self.head = None
class LinkedList:
...
Adds a new Node with given cargo to the front of this LinkedList.
Nothing is returned.
"""
node = Node(cargo)
node.next = self.head
self.head = node
self.length += 1
The LinkedList class also provides a natural place to put wrapper functions like our
method print_backward_nicely, which we can make a method of the LinkedList class:
class LinkedList:
...
def print_backward(self):
"""
Prints the contents of this LinkedList and its nodes, back to front.
@post: Has printed a space-separated list of the form "[ ... c b a ]",
where "a", "b", "c", ... are the string representation of each
A space is printed after and before the opening and closing bracket,
"""
self.head.print_backward()
print("]")
class LinkedList:
...
def print(self):
"""
@post: Has printed a space-separated list of the form "[ a b c ... ]",
where "a", "b", "c", ... are the string representation of each
A space is printed after and before the opening and closing bracket,
"""
self.head.print_list()
print("]")
The code below illustrates how to create and print linked lists using this new LinkedList
class.
>>> l = LinkedList()
>>> print(l.length)
0
>>> l.print()
[ ]
>>> l.add(3)
>>> l.add(2)
>>> l.add(1)
>>> l.print()
[ 1 2 3 ]
>>> l.print_backward()
[ 3 2 1 ]
The full code of this LinkedList class and its corresponding Node class
are provided in an
appendix. As opposed to the code above, in this appendix we
also hid the attributes and
provided some accessor and mutator methods
to access and modify these attributes.
Other useful methods can be added to this LinkedList class, such as a method
to
remove the first element of a list. We leave this as an exercise to the reader.
Invariants
Some lists are well formed; others are not. For example, if a list contains a
loop, it will
cause many of our methods to crash, so we might want to require
that lists contain no
loops. Another requirement is that the length value
in the LinkedList object should be
equal to the actual number of nodes in
the list.
Requirements like these are called invariants because, ideally, they should
be true of
every object all the time. Specifying invariants for objects is a
useful programming
practice because it makes it easier to prove the correctness
of code, check the integrity
of data structures, and detect errors.
One thing that is sometimes confusing about invariants is that there are times
when they
are violated. For example, in the middle of add, after we
have added the node but before
we have incremented length, the invariant is
violated. This kind of violation is acceptable;
in fact, it is often impossible
to modify an object without violating an invariant for at least
a little while.
Normally, we require that every method that violates an invariant must
restore
the invariant.
Glossary
cargo
An item of data contained in a node. (The data carried by the
node.)
collection
A collection is a data structure that assembles multiple
objects
into a single entity.
data structure
A mechanism for grouping and organising data
to make it
easier to use.
embedded reference
A reference to another object stored in an attribute of an
object.
fundamental ambiguity theorem
A reference to a list node can be treated as a single object or
as the
first in a list of nodes.
helper
A method that is not invoked directly by a caller but is used
by
another method to perform part of an operation.
Also
called auxiliary method.
invariant
An assertion that should be true of an object at all times
(except
perhaps while the object is being modified).
link
An embedded reference used to link one object to another.
linked list
A data structure that implements a collection of elements
using a sequence of linked nodes.
node
An element of a linked list, usually implemented as an object
that carries a unit of data (its cargo) and
that contains an
embedded reference (a link)
to another object of the same
type.
precondition
An assertion that must be true in order for a method to work
correctly.
recursive data structure
A recursive data structure, such as a linked list, is a data
structure
that can be defined in terms of itself. For example,
we can say that
a linked list is either the empty list, or a node
that carries a cargo
and a link to a linked list, containing the
remaining data.
recursive method
A recursive method is a method that invokes itself,
typically
on a subset of the data on which it was originally invoked.
singleton
A linked list with a single node.
wrapper
A method that acts as a middleman between a caller and a
helper method,
often making the method easier or less error-
prone to invoke.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
def __init__(self,c,z,pr,po) :
self.country = c
self.zone = z
self.prefix = pr
self.postfix = po
def call(self) :
def __str__(self):
>>> print(number_kim)
+32(0)10/479111
>>> print(number_kim)
+32(0)10/479111
>>> number_kim.call()
NokiaPhone class
class NokiaPhone :
def __init__(self,s,p,t,n) :
self.marque = "Nokia"
self.serie = s
self.poids = p
self.taille = t
self.numero = n
def print_type(self) :
def print_specs(self) :
self.print_type()
def print(self) :
self.print_specs()
print(self.numero)
def __str__(self) :
>>> phone_kim.print()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
+32(0)10/479111
>>> phone_kim.print_specs()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
>>> phone_kim.print_type()
Nokia 5110
>>> print(phone_kim)
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
>>> print(nokia_charles)
Nokia 7.1
Poids: 160 g
Taille: 150x71x8 mm
>>> nokia_siegfried.print_specs()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
>>> print(phone_kim)
Alcatel 5110
Poids: 170 g
Taille: 132x48x31 mm
True
False
>>> print(nokia_kim)
Nokia 5110
Poids: 200 g
Taille: 132x48x31 mm
>>> print(nokia_siegfried)
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
>>> kim_phone.print()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
+32(0)10/479000
>>> siegfried_phone.print()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
+32(0)10/479000
>>> charles_phone.print()
Nokia 5110
Poids: 170 g
Taille: 132x48x31 mm
+32(0)10/479111
class Compte :
self.titulaire = titulaire
self.solde = 0
Now let us create an object of this class, representing someone's bank account:
>>> a = Compte("kim")
>>> print(a.titulaire)
kim
>>> print(a.solde)
0
>>> a.solde = 10
>>> print(a.solde)
10
>>> print(a.solde)
1010
class Compte :
self.__titulaire = titulaire
self.__solde = 0
As you can see, you cannot easily access the instance attributes
of an object of this
class, such as an account's balance, anymore:
>>> a = Compte("kim")
>>> a.__titulaire
>>> a.__solde
Accessor methods
But oh, wait a minute, we need to be able to at least get access to it from
the inside, so
we need accessor methods that allow us to access these values,
and while we are at it
let's add a method to be able to print the account too:
class Compte :
# initialiser
self.__titulaire = titulaire
self.__solde = 0
# accessor
def titulaire(self):
return self.__titulaire
# accessor
def solde(self):
return self.__solde
# print representation
def __str__(self) :
If you don't know or don't remember how the format() method works on strings,
look it
up, it's not so crucial for this example; we could easily have
written the __str__ method
without it, but it leads to more compact code.
Also note how we use the accessors methods titulaire() and solde()
in the __str__ method
as well.
This makes it easier to change the internal variable if we want to.
>>> a = Compte("kim")
>>> print(a)
>>> print(a.titulaire())
kim
Mutator methods
Objects carry their own state and can provide their own methods to
manipulate that
state. We will now add two such mutator methods
(they are called like that since they
mutate the state of the object);
one for redrawing money from the account, and another
to deposit
money on the account:
class Compte :
# initialiser
self.__titulaire = titulaire
self.__solde = 0
# accessor
def titulaire(self):
return self.__titulaire
# accessor
def solde(self):
return self.__solde
# string representation
def __str__(self) :
# *** No modifications above! Only the methods below were added... ***
# mutator
self.__solde += somme
return self.solde()
# mutator
self.__solde -= somme
return self.solde()
else :
Now we can add or remove money from an account with the newly added methods:
>>> print(compte_charles)
100
>>> print(compte_charles.retirer(90))
10
>>> print(compte_charles.retirer(50))
Solde insuffisant
>>> print(compte_charles.titulaire())
Charles
Class variables
While an object's instance variables carry the state of the object
that is specific to each
particular instance, sometimes it is also
useful to have a state that is shared by all the
objects of a same
class.
For example, all account objects may share the same interest
rate.
Such shared state common to all instances of a same class can
be declared in a
class variable, or class attribute,
defined within the class:
class Compte :
# class variable
taux_interet = 0.02
self.__titulaire = titulaire
self.__solde = 0
def titulaire(self):
return self.__titulaire
def solde(self):
return self.__solde
def __str__(self) :
self.__solde += somme
return self.solde()
self.__solde -= somme
return self.solde()
else :
>>> print(compte_kim)
>>> print(compte_siegfried)
>>> print(compte_kim)
>>> print(compte_siegfried)
Shadowing
Attention! It is possible for an instance variable to have the same name
as a class
variable. Here, we add a new instance variable to an object
that will shadow the value
of the class variable.
>>> compte_kim.taux_interet = 0.03
Asking the object for that variable will now return the value of the
newly assigned
instance variable:
>>> print(compte_kim.taux_interet)
0.03
Even though the class variable still exists with its old value:
newly assigned instance
variable:
>>> print(Compte.taux_interet)
0.04
Asking other objects of this class for the value of that variable
will still return the value of
that class variable (in these other
objects, the class variable wasn't shadowed by an
instance
variable.
>>> print(compte_siegfried.taux_interet)
0.04
__taux_interet = 0.02
@classmethod
def taux_interet(cls):
return cls.__taux_interet
@classmethod
def set_taux_interet(cls,nouveau_taux):
cls.__taux_interet = nouveau_taux
self.__titulaire = titulaire
self.__solde = 0
def titulaire(self):
return self.__titulaire
def solde(self):
return self.__solde
def __str__(self) :
self.__solde += somme
return self.solde()
self.__solde -= somme
return self.solde()
else :
>>> Compte.taux_interet()
0.02
Note that, in Python, you can invoke the class method on the instance too!
That may be
a bit confusing, but what happens is that Python first tries to
send the method to the
object instance, and if it cannot find an instance
method with that name it will invoke it
instead as a class method on
the class of that instance.
>>> compte_kim.taux_interet()
0.02
Inheritance
Now let us consider a special kind of account, a checkings account,
which inherits from
the general account type and adds one additional
method.
To indicate that a class inherits from a another one, put the name
of the parent class in
parenthesis after the class name:
class CompteCourant(Compte) :
def transferer(self,compte,montant) :
res = self.retirer(montant)
compte.deposer(montant)
return res
>>> compte_kim.deposer(100)
>>> compte_kim.transferer(compte_charles,50)
>>> print(compte_kim.solde())
50
>>> print(compte_charles.solde())
50
>>> print(compte_kim.transferer(compte_charles,60))
Solde insuffisant
Method overriding
Also we can redefine existing methods such as the
method for withdrawing money that
charges 0.10 Euro extra
for every cash withdrawal.
class CompteCourant(Compte) :
__frais_retirer = 0.10
def transferer(self,compte,montant) :
res = self.retirer(montant)
compte.deposer(montant)
return res
Note how this method retirer overrides a method with the same
name already defined in
the superclass Compte. In fact, for
its implementation, this method makes use of the
method defined on that
superclass, by explicitly calling the method on that class and
passing
self as argument. This seems to have the desired effect:
>>> print(compte_kim.deposer(1000))
1000
>>> print(compte_kim.retirer(10))
989.9
>>> print(compte_kim.retirer(10))
979.8
Super call
The above implementation of the method retirer seems to work,
but the explicit call to
Compte.retirer could be avoided.
If you would try that, you would get the following error
upon calling that method:
The reason is that, rather than calling the method on the superclass,
the method would
call itself (which would recursively call itself, and so on).
However, there is a better way to call the method on the super class,
by using a super
call with the special method super():
class CompteCourant(Compte) :
__frais_retirer = 0.10
@classmethod
def frais_retirer(cls):
return cls.__frais_retirer
super().__init__(titulaire)
self.__banque = banque
def __str__(self) :
In fact, writing
is equivalent to writing
Also note how we extended the class definition with two other
super calls. One in the
__init__ method to initialise an
additional instance variable representing the name of the
bank,
and another in the __str__ method. Both of these methods
are defined in terms of
their corresponding methods on the
superclass, by making a super call.
>>> print(compte_kim.deposer(1000))
1000
>>> print(compte_kim.retirer(10))
989.9
>>> print(compte_kim.retirer(10))
979.8
>>> print(compte_kim)
self.suit = suit
self.rank = rank
def __str__(self):
return 0
# equality
return self.cmp(other) == 0
# not equal
return self.cmp(other) != 0
def __init__(self):
self.cards = []
self.cards.append(Card(suit, rank))
def print_deck(self):
print(card)
def __str__(self):
s,spaces = "",""
for c in self.cards:
return s
def shuffle(self):
import random
num_cards = len(self.cards)
for i in range(num_cards):
j = rng.randrange(i, num_cards)
def shuffle2(self):
import random
if card in self.cards:
self.cards.remove(card)
return True
else:
return False
def pop(self):
return self.cards.pop()
def is_empty(self):
return self.cards == []
num_hands = len(hands)
for i in range(num_cards):
if self.is_empty():
def __init__(self):
self.deck = Deck()
self.deck.shuffle()
self.cards = []
self.name = name
def __str__(self):
if self.is_empty():
s += " is empty\n"
return s
else:
s += " contains\n"
self.cards.append(card)
return self
def remove_matches(self):
queen_clubs = Card(0,12)
self.deck.remove(queen_clubs)
self.hands = []
self.hands.append(OldMaidHand(name))
self.deck.deal(self.hands)
self.print_hands()
matches = self.remove_all_matches()
self.print_hands()
turn = 0
num_players = len(names)
matches += self.play_one_turn(turn)
self.print_hands()
def print_hands(self):
print(hand)
def remove_all_matches(self):
count = 0
count += hand.remove_matches()
return count
if self.hands[i].is_empty():
return 0
neighbor = self.find_neighbor(i)
picked_card = self.hands[neighbor].pop()
self.hands[i].add(picked_card)
count = self.hands[i].remove_matches()
self.hands[i].shuffle()
return count
num_hands = len(self.hands)
if not self.hands[neighbor].is_empty():
return neighbor
5 of Diamonds
10 of Diamonds
Ace of Clubs
Ace of Spades
Jack of Hearts
4 of Clubs
3 of Clubs
King of Diamonds
4 of Diamonds
10 of Clubs
Ace of Hearts
5 of Hearts
Queen of Diamonds
Jack of Spades
Jack of Diamonds
5 of Clubs
9 of Clubs
5 of Spades
6 of Clubs
Queen of Spades
8 of Spades
2 of Clubs
6 of Spades
9 of Hearts
8 of Hearts
10 of Hearts
9 of Diamonds
7 of Hearts
10 of Spades
9 of Spades
3 of Diamonds
Jack of Clubs
7 of Spades
3 of Hearts
7 of Clubs
6 of Diamonds
3 of Spades
King of Hearts
2 of Spades
2 of Diamonds
7 of Diamonds
2 of Hearts
4 of Hearts
King of Clubs
4 of Spades
8 of Clubs
King of Spades
8 of Diamonds
Queen of Hearts
6 of Hearts
Ace of Diamonds
10 of Diamonds
4 of Clubs
3 of Clubs
King of Diamonds
4 of Diamonds
10 of Clubs
Ace of Hearts
Queen of Diamonds
Jack of Spades
5 of Clubs
9 of Clubs
5 of Spades
Queen of Spades
8 of Spades
2 of Clubs
8 of Hearts
10 of Hearts
7 of Hearts
10 of Spades
9 of Spades
Jack of Clubs
7 of Spades
7 of Clubs
3 of Spades
King of Hearts
2 of Spades
7 of Diamonds
4 of Hearts
4 of Spades
8 of Clubs
8 of Diamonds
Queen of Hearts
Ace of Diamonds
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Player1:
Player2:
Player0:
Queen of Spades
def __init__(self):
"""
@pre: -
"""
self.__length = 0
self.__head = None
def size(self):
"""
@pre: -
@post: Returns the number of nodes (possibly zero) contained in this linked list.
"""
return self.__length
def first(self):
"""
@pre: -
"""
return self.__head
Adds a new Node with given cargo to the front of this LinkedList.
Nothing is returned.
"""
node = Node(cargo,self.__head)
self.__head = node
self.__length += 1
def print(self):
"""
@post: Has printed a space-separated list of the form "[ a b c ... ]",
where "a", "b", "c", ... are the string representation of each
A space is printed after and before the opening and closing bracket,
"""
self.__head.print_list()
print("]")
def print_backward(self):
"""
Prints the contents of this LinkedList and its nodes, back to front.
@post: Has printed a space-separated list of the form "[ ... c b a ]",
where "a", "b", "c", ... are the string representation of each
A space is printed after and before the opening and closing bracket,
"""
self.__head.print_backward()
print("]")
Node class
class Node:
"""
@pre: -
"""
self.__cargo = cargo
self.__next = next
def value(self):
"""
@pre: -
"""
return self.__cargo
def set_value(self,value):
"""
@pre: -
"""
self.__cargo = value
def next(self):
"""
@pre: -
"""
return self.__next
def set_next(self,node):
"""
@pre: -
@post: The next node of this node has been set to node.
"""
self.__next = node
def __str__(self):
"""
"""
return str(self.value())
def __eq__(self,other):
"""
"""
else :
return False
def print_list(self):
"""
@pre: -
@post: Has printed a space-separated list of the form "a b c ... ",
"""
if tail is not None : # as long as the end of the list has not been reached
def print_backward(self):
"""
Recursively prints the cargo of each node connected to this node (in opposite ord
er),
@pre: -
where a is my cargo (self), b is the cargo of the next node, and so on.
The nodes are printed in opposite order: the last node's value
is printed first.
"""
if tail is not None : # as long as the end of the list has not been reached
>>> l.print()
[ ]
>>> l.print_backward()
[ ]
>>> print(l.size())
0
>>> l.add(3)
>>> l.add(2)
>>> l.add(1)
>>> l.print()
[ 1 2 3 ]
>>> l.print_backward()
[ 3 2 1 ]
>>> print(l.size())
2 - Higher-order functions
Recursion
Source: this section is heavily based on Chapter 18 of [ThinkCS].
Drawing Fractals
A fractal is a drawing that has self-similar structure,
which can be defined in terms of
itself. [Fractal]
Let us start by looking at the famous Koch fractal. An order 0 Koch fractal is simply
a
straight line of a given size.
An order 1 Koch fractal is obtained like this: instead of drawing just one line,
draw
instead four smaller segments, as in the pattern shown below.
Now what would happen if we repeated this Koch pattern again on each of the order 1
segments?
We'd get an order 2 Koch fractal.
Now let us think about it the other way around. To draw a Koch fractal
of order 3, we can
simply draw four order 2 Koch fractals. But each of these
in turn needs four order 1
Koch fractals, and each of those in turn needs four
order 0 fractals. Ultimately, the only
drawing that will take place is
at order 0. This is very simple to code up in Python:
"""
"""
t.forward(size)
else:
t.left(60)
t.right(120)
t.left(60)
"Of course, to actually run the code above we still need to"
import turtle
window = turtle.Screen()
t = turtle.Turtle()
t.speed(0)
t.penup()
t.forward(-150)
t.pendown()
window.mainloop()
The key thing that is new here is that as long as order is not zero,
koch calls itself
recursively to get its job done.
if order == 0:
t.forward(size)
else:
t.left(angle)
One way to think about this is to convince yourself that the function
works
correctly when you call it for an order 0 fractal. Then do
a mental leap of
faith, saying "I will assume that Python
will handle correctly the four
recursive level 0 calls for me
in the else-branch, so I don't need to think
about that detail. So all I need to focus on
now is how to draw an order 1
fractal assuming that the order 0 one is
already working."
If this mode of thinking works (and you should practice it!), then take
it to
the next level. Aha! now can I see that it will work when called
for order 2
under the assumption that it is already working for level 1.
And, in general, if I can assume the order n-1 case works, can I just
solve
the level n problem?
t.forward(size)
koch_0(t, size/3)
t.left(angle)
koch_1(t, size/3)
t.left(angle)
koch_2(t, size/3)
t.left(angle)
This might be a useful hint to build your understanding. The mental goal
is, however, to be able to do the abstraction!
Suppose it is election time and that we are helping to count votes as they come in.
Votes
arriving from individual districts, cities, agglomerations and provinces
are sometimes
reported as a sum total of votes and sometimes as a list
of subtotals of votes. After
considering how best to store this incoming data, we
decide to use a nested number list,
which we define as follows:
a. numbers
b. nested number lists
Notice how in the above definition, the term nested number list is used to define itself.
Recursive definitions like this are quite common in mathematics and
computer
science. They provide a concise and powerful way to describe
recursive data
structures that are partially composed of smaller and
simpler instances of themselves.
The definition is not circular, nor infinite, since at some
point we will reach a list that
does not have any lists as elements.
Now suppose our job is to write a function that will sum all of the values in a
nested
number list. Python has a built-in function which finds the sum of a
sequence of
numbers:
11
For our nested number list, however, sum will not work:
The problem is that the third element of this list, [11, 13], is itself a
list, so it cannot just
be added to 1, 2, and 8.
Thanks to recursion, the Python code needed to sum the values of a nested number list
is
surprisingly short:
def r_sum(nested_num_list):
"""
post: returns 0 for empty lists or the sum of all encountered values,
"""
tot = 0
if isinstance(element,list):
tot += r_sum(element)
else:
tot += element
return tot
35
The example above has a base case (the else branch) which does not lead to a
recursive call: the case where the element is not a (sub-) list. Without
a base case, you'll
have infinite recursion, and your program will not work.
Recursion is truly one of the most beautiful and elegant tools in computer
science.
def r_max(nxs):
"""
"""
largest = None
first_time = True
for e in nxs:
if isinstance(e,list):
val = r_max(e)
else:
val = e
largest = val
first_time = False
return largest
Again here we have a base case at line 15. If we don't supply a base case,
Python stops
after reaching a maximum recursion depth and returns a runtime
error. See how this
happens, by running this little script which we will call infinite_recursion.py:
def recursion_depth(number):
recursion_depth(number + 1)
recursion_depth(0)
After watching the messages flash by, you will be presented with the end of a
long
traceback that ends with a message like the following:
We would certainly never want something like this to happen to a user of one of
our
programs, so it is good programming practice to write error handling code
that could
handle such errors when they arise.
Fibonacci numbers
The famous Fibonacci sequence 0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, ...
[FibonacciNumber] was devised by
Fibonacci (1170-1250), who used this to model the
breeding of (pairs) of rabbits.
If, in generation 8 you had 21 pairs in total, of which 13
were adults,
then next generation the adults will all have bred new children,
and the
previous children will have grown up to become adults.
So in generation 9 you'll have
13+21=34, of which 21 are adults.
This model to explain rabbit breeding made the simplifying assumption that rabbits
never died.
Scientists often make (over-)simplifying assumptions and restrictions
to
make some headway with the problem.
If we number the terms of the sequence from 0, we can describe each term recursively
as the sum of the previous two terms:
fib(0) = 0
fib(1) = 1
def fib(n):
"""
"""
if n <= 1:
return n
t = fib(n-1) + fib(n-2)
return t
test(fib(0) == 0)
test(fib(1) == 1)
test(fib(2) == 1)
test(fib(3) == 2)
test(fib(4) == 3)
test(fib(5) == 5)
test(fib(6) == 8)
test(fib(7) == 13)
test(fib(8) == 21)
test(fib(9) == 34)
test(fib(10) == 55)
test(fib(11) == 89)
test(fib(12) == 144)
import time
t0 = process_time()
n = 35
result = fib(n)
t1 = process_time()
Memoization
If you play around a bit with the fib function from the previous section, you
will notice
that the bigger the argument you provide, the longer the
function takes to run.
Furthermore, the run time increases very quickly. On one
of our machines, fib(20)
finishes instantly, fib(30) takes
about a second, and fib(40) takes roughly forever.
A call graph shows some function frames (instances when the function has
been
invoked), with lines connecting each frame to
the frames of the functions it calls. At the
top of the graph, fib
with n = 4 calls fib with n = 3 and n = 2. In turn,
fib with n = 3 calls fib
with n = 2 and n = 1.
And so on.
Count how many times fib(0) and fib(1) are called. This is
an inefficient solution to the
problem, and it gets far worse as the argument
gets bigger.
A good solution is to keep track of values that have already been computed by
storing
them in a dictionary. A previously computed value that is stored for
later use is called a
memo. Here is an implementation of fib
using memos:
alreadyknown = {0: 0, 1: 1}
def fib(n):
if n not in alreadyknown:
alreadyknown[n] = new_value
return alreadyknown[n]
>>> fib(100)
354224848179261915075
import os
def get_dirlist(path):
"""
This returns just the names, not the full path to the names.
"""
dirlist = os.listdir(path)
dirlist.sort()
return dirlist
"""
post: Prints the path and the names of all entries (files or directories
"""
dirlist = get_dirlist(path)
for f in dirlist:
Calling the function print_files with some initial path or folder name will produce an
output similar to this:
print_files("c:\python31\Lib\site-packages\pygame\examples")
| __init__.py
| aacircle.py
| aliens.py
| arraydemo.py
| blend_fill.py
| blit_blends.py
| camera.py
| chimp.py
| cursors.py
| data
| | alien1.png
| | alien2.png
| | alien3.png
...
Glossary
base case
A branch of the conditional statement in a recursive function
that does
not give rise to further recursive calls.
infinite recursion
A function that calls itself recursively without ever reaching
any base
case. Eventually, infinite recursion causes a
runtime error.
recursion
The process of calling a function that is already executing.
recursive call
The statement that calls an already executing function.
Recursion can
also be indirect --- function f can call g which
calls h,
and h could make a call back to f --- or mutual ---
function f
calls g and g makes a call back to f.
recursive definition
A definition which defines something in terms of itself. To be
useful
it must include base cases which are not recursive. In
this way it
differs from a circular definition. Recursive
definitions often
provide an elegant way to express complex
data structures, like a directory
that can contain other
directories, or a menu that can contain other menus.
References
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
[Fractal] https://en.wikipedia.org/wiki/Fractal
[FibonacciNumber] https://en.wikipedia.org/wiki/Fibonacci_number
Higher-Order Functions
Source: This section is heavily based on Section 1.6 of [SICP]. It does not appear in
[ThinkCS].
We have seen before that functions are abstractions that describe compound operations
independent of the particular values of their arguments. For example, when defining a
function square,
def square(x):
return x * x
we are not talking about the square of a particular number, but rather about a method for
obtaining the square of any number x. Of course we could get along without ever
defining this function, by always writing expressions such as:
>>> 3 * 3
9
>>> 5 * 5
25
and never mentioning square explicitly. This practice would suffice for simple
computations like square, but would become arduous for more complex examples. In
general, lacking function definition would put us at the disadvantage of forcing us to
work always at the level of the particular operations that happen to be primitives in the
language (multiplication, in this case) rather than in terms of higher-level operations. Our
programs would be able to compute squares, but our language would lack the ability to
express the concept of squaring. One of the things we demand from a powerful
programming language is the ability to build abstractions by assigning names to
common patterns and then to work in terms of these abstractions directly. Functions
provide this ability.
As we will see in the following examples, there are common programming patterns that
recur in code, but are used with a number of different functions. These patterns can also
be abstracted, by giving them names.
Functions as Arguments
Consider the following three functions, which all compute summations. The first,
sum_naturals, computes the sum of natural numbers up to n:
def sum_naturals(n):
total, k = 0, 1
while k <= n:
total, k = total + k, k + 1
return total
>>> sum_naturals(100)
5050
The second, sum_cubes, computes the sum of the cubes of natural numbers up to n.
def sum_cubes(n):
total, k = 0, 1
while k <= n:
return total
>>> sum_cubes(100)
25502500
def pi_sum(n):
total, k = 0, 1
while k <= n:
return total
>>> pi_sum(100)
3.121594652591009
These three functions clearly share a common underlying pattern. They are for the most
part identical, differing only in their name, the function of k used to compute the term to
be added, and the function that provides the next value of k. We could generate each of
the functions by filling in the slots <name>, <term> and <next> in the following template:
def <name>(n):
total, k = 0, 1
while k <= n:
return total
The presence of such a common pattern is strong evidence that there is a useful
abstraction waiting to be brought to the surface. Each of these functions is a summation
of terms. As program designers, we would like our language to be powerful enough so
that we can write a function that expresses the concept of summation itself rather than
only functions that compute particular sums. We can do so readily in Python by taking
the common template shown above and transforming the "slots" into formal parameters
of a more general summation function.
total, k = 0, 1
while k <= n:
return total
Notice that summation takes as its arguments the upper bound n together with the
functions term and next. We can use summation just as we would any function, and it
expresses summations succinctly. For example, we could rewrite our earlier definition of
sum_cubes(n) by making use of summation as follows:
def cube(k):
return pow(k, 3)
def successor(k):
return k + 1
def sum_cubes(n):
>>> sum_cubes(3)
36
Using as term function an identity function that returns its argument, we can also sum
integers.
def identity(k):
return k
def sum_naturals(n):
>>> sum_naturals(10)
55
We can even define pi_sum piece by piece, using our summation abstraction to combine
components.
def pi_term(k):
denominator = k * (k + 2)
return 8 / denominator
def pi_next(k):
return k + 4
def pi_sum(n):
>>> pi_sum(1e6)
3.1415906535898936
To illustrate this mechanism, in this subsection we will build an abstraction for a general
method of computation known as iterative improvement, and use it to compute the
golden ratio [GoldenRatio]. An iterative improvement algorithm begins with a guess of a
solution to an equation. It then repeatedly applies an update function to improve that
guess, and applies a test to check whether the current guess is "close enough" to the
expected solution to be considered correct.
def iter_improve(update, test, guess=1):
guess = update(guess)
return guess
The test function typically checks whether two functions, f and g, are near to each other
for a particular value of guess. Testing whether f(x) is near to g(x) is again a general
method of computation.
A common way to test for approximate equality in programs is to compare the absolute
value of the difference between numbers to a small tolerance value.
The golden ratio, often called phi, is a number that appears frequently in nature, art, and
architecture.
It can be found by applying the formula phi = 1 + 1/phi recursively until
phi^2 = phi + 1. [GoldenRatio]
In other words, we can compute the golden ratio via
iter_improve using the golden_update function phi = 1 + 1/phi, and it converges when its
successor phi + 1 is equal to its square phi^2.
def golden_update(guess):
return 1 + 1/guess
def golden_test(guess):
Calling iter_improve with the arguments golden_update and golden_test will compute an
approximation to the golden ratio.
>>> approx_phi
1.6180371352785146
This extended worked-out example illustrates two related big ideas in computer science.
First, naming and functions allow us to abstract away a vast amount of complexity. While
each individual function definition was quite trivial, the computational process set in
motion is quite intricate. Second, it is only by virtue of the fact that we have an extremely
general evaluation procedure that small components can be composed into complex
processes.
To conclude this example, it would be good if we could check the correctness of our new
general method iter_improve.
The computation of the golden ratio provide such a test,
because we used iter_improve to compute the golden ratio,
so we only need to compare
that computed value with its exact closed-form solution phi = (1 + square_root(5))/2.
[GoldenRatio]
def square_root(x):
def near_test():
assert near(phi, square, successor), 'phi * phi is not near phi + 1'
def iter_improve_test():
Let's consider a new problem: computing the square root of a number. It can be shown
that repeated application of the following update function converges to the square root of
x:
return (x + y)/2
def square_root(x):
def update(g):
def test(guess):
return approx_eq(square(guess), x)
>>> square_root(81)
9.000000000007091
Like local variable assignment, local function definitions only affect the body of the
function in which they are defined.
These local functions will only be visible and usable
while square_root is being evaluated. Moreover, these local def statements won't even
get evaluated until square_root is called. Their definition is part of the evaluation of
square_root .
Lexical scope. Locally defined functions have access to the name bindings in the local
scope in which they are defined. In this example, the nested function test can make use
of the nested function approx_eq because it is defined in the same scope. Similarly, the
expression iter_improve(update, test) in the body of square_root can make use of the locally
defined functions update and test . Furthermore, the nested functions update and test can
refer to the name x, which is a formal parameter of its enclosing function square_root .
(Upon calling square_root , this formal parameter will be bound to the actual value passed
as parameter when calling the function.) This discipline of sharing names among nested
definitions is called lexical scoping: all inner functions have access to the names in the
environment where they are defined (not where they are called).
Nested scopes. Whenever a name cannot be found in a local scope, it will be looked up
in the surrounding scope.
For example, in the function definition of update nested inside
the definition of square_root , a function named average is being referred to. Upon calling
this update function, it will first look for this name in its own local scope (it could have
been that average would have been defined as a local function nested inside the
definition of update itself). Since it doesn't find any definition of the name average there, it
goes to the surrounding scope, that is, the lexical scope in which the update function was
defined (the body of the square_root function). Again, there doesn't seem to be any
function named average defined there (there are only the functions approx_eq and test
defined there). Again, the name lookup goes one level up, reaching the global
environment in which square_root itself was initially defined. Luckily a definition of the
average function is finally found there.
Let us now illustrate how all this works with a picture. Suppose we evaluate the following
expression:
>>> square_root(256)
16.00000000000039
In the global environment, the functions square_root , iter_improve and square are defined.
When we evaluate square_root(256) , a new local environment is created that contains a
binding of the formal parameter x of the square_root function to the value 256.
Furthermore, the square_root function defines three nested functions update, approx_eq
and test. These functions definitions are also added to this new local environment (in the
picture below, for conciseness, only update is shown). Notice how the local definition of
these functions keep a pointer back to the local environment in which they were defined.
We will see soon that this is the essence of the mechanism of lexical scoping: all
expressions within these inner functions need to have access to the names in the
environment where they were defined.
After these nested function definitions, the expression return iter_improve(update, test) in
the body of the square_root function needs to be evaluated. The name update, which is
passed as an argument to iter_improve ,
is looked up and resolved to the newly defined
function. The same happens for the name test.
With these bindings for update and test , the function call iter_improve(update, test) now
gets evaluated.
For this evaluation, a new local environment is created where update and
test are bound to these functions (again, in the picture below, only update is shown), and
where guess is bound to its default value 1. Since iter_improve was defined in the global
environment, this local environment points to the global environment, so that it can
lookup unresolved names in the environment where the function iter_improve that is
being called was originally defined.
Within the body of iter_improve, in the while condition, we must apply the update function
to the initial guess of 1. This final application again creates a new local environment for
update that contains only a binding of its formal parameter g bound to the value 1.
The most crucial part of this evaluation procedure is to find out to what other
environment this new local environment should point. This is highlighted by the blue
arrows in the diagram. The environment created for the update call, will be scoped within
the environment in which update was defined, which can be found by following the blue
link back from the update function to its environment of definition (which was the
environment created when evaluating square_root(256) and that still contains a binding
for x).
In this way, the body of update can resolve a value for x. Hence, we realize two key
advantages of lexical scoping in Python.
1. The names of a local function do not interfere with names external to the function
in which it is defined, because the local function name will be bound in the current
local environment in which it is defined, rather than the global environment.
2. A local function can access the environment of the enclosing function. This is
because the body of the local function is evaluated in an environment that
extends the evaluation environment in which it is defined.
The update function thus implicitly carries with it some data: the values referenced in the
environment in which it was defined. Because they enclose information in this way,
locally defined functions are often called closures.
def h(x):
return f(g(x))
return h
>>> add_one_and_square(12)
169
The 1 in compose1 indicates that the composed functions and returned result all take 1
argument. This naming convention isn't enforced by the interpreter; the 1 is just part of
the function name.
Lambda Expressions
So far, every time we want to define a new function, we need to give it a name. But for
other types of expressions, we don’t need to associate intermediate products with a
name. That is, we can compute a*b + c*d without having to name the subexpressions
a*b or c*d, or the full expression a*b + c*d. In Python, we can create function values on
the fly using lambda expressions, which evaluate to unnamed functions. A lambda
expression evaluates to a function that has a single return expression as its body.
Assignment and control statements are not allowed.
def compose1(f,g):
lambda x : f(g(x))
Some programmers find that using unnamed functions from lambda expressions is
shorter and more direct. However, compound lambda expressions are notoriously
illegible, despite their brevity. The following definition is correct, but some programmers
have trouble understanding it quickly.
In general, Python style prefers explicit def statements to lambda expressions, but
allows them in cases where a simple function is needed as an argument or return value.
Such stylistic rules are merely guidelines; you can program any way you wish. However,
as you write programs, think about the audience of people who might read your program
one day. If you can make your program easier to interpret, you will do those people a
favor.
The term lambda is a historical accident resulting from the incompatibility of written
mathematical notation and the constraints of early type-setting systems.
Despite their unusual etymology, lambda expressions and the corresponding formal
language for function application, the lambda calculus, are fundamental computer
science concepts shared far beyond the Python programming community. You will very
likely encounter it in other programming languages or other computer science courses.
Newton's method is a classic iterative approach to finding the arguments x for which a
single-argument mathematical function f(x) yields a return value of 0. In other words, the
values of x for which that function f cuts the x-axis (f(x) = 0). These values are called the
roots of that function. Finding a root of a single-argument mathematical function is often
equivalent to solving a related math problem. For example:
A comment before we proceed: it is easy to take for granted the fact that we know how
to compute square roots and logarithms. Not just Python, but your phone, your pocket
calculator, and perhaps even your watch can do so for you. However, part of learning
computer science is understanding how quantities like these can be computed, and the
general approach presented here is applicable to solving a large class of equations
beyond those built into Python.
Before even beginning to understand Newton's method, we can start programming; this
is the power of functional abstractions. We simply translate our previous statements into
code.
def square_root(a):
Of course, we cannot apply any of these functions yet until we define find_root, and so
we need to understand how Newton's method works.
Like the algorithm we saw before, Newton's method is also an iterative improvement
algorithm. It improves a guess of the root for any function that is differentiable (in the
mathematical sense). Notice that both of our functions of interest change smoothly;
graphing x versus f(x) for
on a 2-dimensional plane shows that both functions produce a smooth curve without
kinks that crosses the x-axis (f(x)=0) at the appropriate point.
Because they are smooth (differentiable), these curves can be approximated by a line at
any point. Newton's method follows these linear approximations to find function roots.
Imagine a line through the point (x, f(x)) that has the same slope as the curve for
function f(x) at that point. Such a line is called the tangent, and its slope is called the
derivative of f at x.
This line's slope is the ratio of the change in function value to the change in function
argument. Hence, translating x by f(x) divided by the slope will give the argument value
at which this tangent line touches 0.
Our newton_update function expresses the computational process of following this
tangent line to 0. We approximate the derivative of the function by computing its slope
over a very small interval.
return df/delta
def newton_update(f):
def update(x):
return update
Finally, we can define the find_root function in terms of newton_update, our iterative
improvement algorithm, and a test to see if f(x) is near 0. We supply a larger initial
guess to improve the performance for logarithm.
def test(x):
return approx_eq(f(x), 0)
>>> square_root(16)
4.000000000026422
>>> logarithm(32, 2)
5.000000094858201
And to verify that these values are correct, you can test:
>>> square(square_root(16))
16.00000000021138
32.0000021040223
As you experiment with Newton's method, be aware that it will not always converge. The
initial guess of iter_improve must be sufficiently close to the root, and various conditions
about the function must be met. Despite this shortcoming, Newton's method is a
powerful general computational method for solving differentiable equations. In fact, very
fast algorithms for logarithms and large integer division employ variants of the
technique.
Python awards functions full first-class status, and the resulting gain in expressive power
is enormous. Control structures, on the other hand, do not: you cannot pass if to a
function the way you can sum.
Function Decorators
Python provides special syntax to apply higher-order functions as part of executing a def
statement, called a decorator. Perhaps the most common example is a trace.
def trace1(f):
def wrapped(x):
return f(x)
return wrapped
@trace1
def triple(x):
return 3 * x
>>> triple(12)
36
In this example, a higher-order function trace1 is defined, which returns a function that
precedes a call to its argument with a print statement that outputs the argument. The def
statement for triple has an annototation, @trace1, which affects the execution rule for def.
As usual, the function triple is created. However, the name triple is not bound to this
function. Instead, the name triple is bound to the returned function value of calling trace1
on the newly defined triple function. In fact, in code this decorator is equivalent to:
def triple(x):
return 3 * x
triple = trace1(triple)
If you want, try and apply the @trace1 annotation to the Fibonacci function fib(n) before
calling it with some value of n, to observe to how many recursive calls it leads.
@trace1
def fib(n):
if n <= 1:
return n
fib(10)
>>> fib(5)
Decorators can be used for tracing, for selecting which functions to call when a program
is run from the command line, and many other things.
Extra for experts. The actual rule is that the decorator symbol @ may be followed by an
expression (@trace1 is just a simple expression consisting of a single name). Any
expression producing a suitable value is allowed. For example, with a suitable definition,
you could define a decorator check_range so that decorating a function definition with
@check_range(1, 10) would cause the function's results to be checked to make sure they
are integers between 1 and 10. The call check_range(1,10) would return a function that
would then be applied to the newly defined function before it is bound to the name in the
def statement.
Glossary
higher-order functions
Higher-order functions are functions that can accept other
functions as arguments
or that return functions as values.
general methods of computation
Higher-order functions can serve as powerful abstraction
mechanisms to
express general methods of computation,
independent of the particular functions they call.
These
higher-order functions can then be supplied with particular
functions
to produce more specific computations.
For
example, a higher-order function that expresses the high-
level computational
process of iterative improvement, could
be customized, by providing
the right functions as arguments,
into a method for computing an
approximation of the golden
ratio.
nested functions
Nested function definitions are functions that are defined
locally in the body of another function definition.
Nested
function definitions have two main advantages.
Firstly,
because the functions are defined locally, they don't clutter
the global namespace
with the names of many small auxiliary
functions. Secondly, since the functions are scoped
within the
body of another function, they have access to all parameters
and variables declared
locally inside that other function.
Because of that, those nested functions often require
less
parameters than if they would have been defined globally.
lexical scope
The discipline of sharing names among nested definitions is
called lexical scoping:
all nested function definitions have
access to the names visible in their environment of definition
(as opposed to the environment where they were called).
nested scope
Since function definitions can be nested inside other function
definitions, which can again be nested inside
other function
definitions, we can have multiple layers of nested lexical
scopes.
In such cases, name resolution (i.e., the process of
looking up names for variables or functions)
will proceed
layer by layer from the inner-most scope where a function
was defined
until it eventually reaches the outermost global
namespace.
shadowing
Related to nested scopes, shadowing refers to a situation
where two (variable or function) names
exist within scopes
that overlap. Whenever that happens, the name with the
outermost scope is hidden
because the variable with the
more nested scope overrides it. The outermost variable is
said
to be shadowed by the innermost one.
function as returned value
In Python, just like it is possible to create functions that take
other functions are arguments,
it is possible to write functions
whose returned values are themselves functions.
lambda expression
Lambda expressions are a way to define new functions,
without needing to give them a name.
A lambda expression
evaluates to a function that has a single return expression as
its body.
Lambda expressions in Python are quite limited:
they are only useful for simple,
one-line functions that
evaluate and return a single expression.
Assignment and
control statements are not allowed.
first-class functions
In general, programming languages impose restrictions on
how certain language elements can be manipulated.
Elements with the fewest restrictions are said to have first-
class status.
Some of the "rights and privileges" of first-class
language element are that they can be bound to names,
be
passed as arguments to functions, be returned as the results
of functions
and that they may be included in data structures.
Since all this is the case for functions in Python, functions are
first-class elements in Python.
function decorators
By definition, a function decorator is a function that takes
another function
and extends the behavior of the latter
function without explicitly modifying it.
Function decorators
provide a simple syntax for calling higher-order functions,
by
simply annotating the definition of a function with the higher-
order function
that needs to be applied to it upon calling that
function.
References
[SICP] SICP in Python. This book is derived from the classic textbook "Structure and
Interpretation of Computer Programs" by Abelson, Sussman, and Sussman.
John Denero originally modified it for Python in 2011. It is licensed under the
Creative Commons Attribution-ShareAlike 3.0 license.
[ThinkCS] How To Think Like a Computer Scientist --- Learning with Python 3
[GoldenRatio]
(1, 2, 3) https://en.wikipedia.org/wiki/Golden_ratio