Python For Data Analytics
Python For Data Analytics
Rodrigo Belo
[email protected]
Spring 2015
Introduction
Instructor
Rodrigo Belo
Researcher at Carnegie Mellon University and at Catlica-Lisbon, Portugal
PhD in Technological Change and Entrepreneurship from Carnegie
Mellon University
Research Interests: Social Networks and Technology on
Educational Settings
Background: Undergraduate degree in Computer Science and
Engineering, 5 years as Software Engineer
Email: [email protected]
Course Description
Course Description
Course Description
Each class will start with the introduction of a concept or tool and end with
in-class hands-on exercises using example datasets.
Throughout the course students will apply these techniques to do their
Homework and their Term Project.
Learning Objectives
Process and store data in the appropriate format for future analysis
Source Materials
Textbooks:
1
Online references:
1
Grading
Individual Assignments: 40%
Assignments will be done by individual students and posted on Blackboard.
Specific assignments will appear approx. 1 week prior to due date.
Late Work
If a work is delivered t seconds late, its score is adjusted by multiplying it by
1
t
24 5 60 60
4
100
Maximum Grade
80
60
40
20
0
N. Days Late
10
11
Why Python?
Python is one of the most popular dynamic languages, along with Ruby,
Perl, R, and others
Python has a large and active scientific computing community
Adoption of Python has increased significantly since the 2000s both in
the industry and academic community
Python started as general purpose programming language but data
manipulation libraries make it a first class citizen in data manipulation
and analysis
Excellent choice as a single language for building data-centric
applications
12
Python as Glue
13
Python Essentials
14
15
16
0
1
2
3
4
broilers
89
72
75
66
78
other_chicken
NaN
NaN
NaN
NaN
NaN
\
NaN
NaN
NaN
NaN
NaN
turkey
NaN
NaN
NaN
NaN
NaN
17
3000
2500
beef
2000
1500
1000
500
0
1945
1955
1965
1975
date
1985
1995
2005
18
IPython is the component that ties everything together. Aside from the
standard terminal, IPython shell provides:
IPython notebook: HTML notebook for connecting to IPython through
a web browser
GUI console with inline plotting, multiline editing and syntax
highlighting
Infrastructure for interactive parallel and distributed computing
19
Mac OS X and Linux distributions come with a Python distribution, but not
necessarily with all the required libraries
New users can install Anaconda (http://continuum.io/downloads) or
Canopy (https://store.enthought.com/downloads/)
To install IPython (and Python) follow the instructions on
http://ipython.org/install.html
You will need IPython notebook
20
21
More details at
http://nbviewer.ipython.org/github/rasbt/python_reference/blob/
master/tutorials/key_differences_between_python_2_and_3.ipynb
However, there is still a considerable amount code written in Python 2.x,
making it the de facto standard
In this course we will be using Python 2.x
22
There are many editors and IDEs that you can use to edit Python
PyDev (plugin for Eclipse)
Python Tools for Visual Studio
PyCharm
IPython notebook
Emacs
Vim
You can find more IDEs on
https://wiki.python.org/moin/IntegratedDevelopmentEnvironments
23
24
25
In [7]: da<Tab>
data
date2num
datestr2num
datetime
datetime64
datetime_as_string
datetime_data
In [7]: data
Out[7]:
{0: 0.0016908926460949773,
1: 0.39596065989527957,
2: -0.9295711814640477,
3: 2.1076302341719058,
4: -0.6391315204450737,
5: 1.7496783252859787,
6: -0.5307855278794061,
7: 0.38045583368270064}
26
Using a question mark (?) before or after a variable will display some
general information about the object:
In [3]: b?
Type:
list
String form: [1, 2, 3, 45]
Length:
4
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterables items
27
28
Any file can be run as a Python program inside the environment of your
IPython session using the %run command
# ipython_script_test.py
def my_function(x,y,z):
return (x + y) / z
aa = 5
%run ipython_script_test
print aa
print my_function (3.0 ,4 ,5)
5
1.4
29
30
IPython provides very strong integration with the operating system shell:
Command
output = !cmd args
%alias alias_name cmd
%bookmark
%cd directory
%pwd
%dirs
%dhist
%env
Description
Run cmd and store the stdout in output
Define an alias for a system (shell) command
Utilize IPythons directory bookmarking system
Change system working directory to passed directory
Return the current system working directory
Return a list containing the current directory stack
Print the history of visited directories
Return the system environment variables as a dict
31
32
33
Python Language
34
Multiplication
print 5 * 4
print 3.1 * 2
20
6.2
35
Float division
print 4.0 / 2.0
print 1.0/3.1
2.0
0.322580645161
Integer division
print 4 / 2
print 1/3
2
0
36
Exponentiation
print 3. ** 2
print 3**2
print 2 ** 0.5
9.0
9
1.41421356237
37
Square Root
import numpy as np
print np . sqrt (2)
1.41421356237
38
Exponential
import numpy as np
print np . exp(1)
2.71828182846
Logarithm
import numpy as np
print np . log (10)
print np . log10 (10)
# base10
2.30258509299
1.0
39
Variable Assignment
40
41
Boolean
42
Strings
Strings can be enclosed in single quotes or double quotes
Single quotes
Hello World
Hello World
Isn \ t i t nice to have a computer that t a l k s to you?
"Isnt it nice to have a computer that talks to you?"
Double quotes
" Hello World"
Hello World
" Isn t i t nice to have a computer that t a l k s to you? "
"Isnt it nice to have a computer that talks to you?"
43
Strings
44
Strings
45
Strings
You can use triple quotes for strings that span multiple lines
print " " " \
Hello
46
Strings
Strings can be indexed (subscripted), with the first character having index 0
mystring = " Hello World"
print mystring [0]
print mystring [6:10]
H
Worl
47
Lists
48
Lists
49
Lists
50
Tuples
51
Sets
52
Dictionaries
53
54
Control Flows
if statements
x = 42
i f x > 10:
print x
else :
print 10
42
for statements
words = [ cat , window , defenestrate ]
for w in words :
print w, len (w)
cat 3
window 6
defenestrate 12
a = [ Mary , had , a , l i t t l e , lamb ]
for i in range ( len (a ) ) :
print i , a[ i ]
0
1
2
3
4
Mary
had
a
little
lamb
55
56
Defining Functions
57
Defining Functions
58
59
60
61
62
Lambda Functions
You can define "lambda" functions, which are also known as inline or
anonymous functions.
The syntax is lambda var:f(var)
print map(lambda x : x ** 2 , [0 , 1 , 2])
[0, 1, 4]
63
Nested Functions
64
map
def cube( x ) : return x * x * x
map(cube , range (10))
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]
reduce
def add( x , y ) : return x+y
reduce (add , range (10))
45
65
List Comprehensions
66
67
Class Objects
x = MyClass ( )
print x . i
print x . f ( )
12345
hello world
68
Exercises
69