67% found this document useful (3 votes)
1K views

Python For Data Analytics

Python for Data Analytics Lecture 1

Uploaded by

SaiKrishnaIyerJ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
67% found this document useful (3 votes)
1K views

Python For Data Analytics

Python for Data Analytics Lecture 1

Uploaded by

SaiKrishnaIyerJ
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Python for Data Analytics

Lectures 1 & 2: The Python Language and Environment

Rodrigo Belo
[email protected]

Spring 2015

Introduction

Instructor

Rodrigo Belo
Researcher at Carnegie Mellon University and at Catlica-Lisbon, Portugal
PhD in Technological Change and Entrepreneurship from Carnegie
Mellon University
Research Interests: Social Networks and Technology on
Educational Settings
Background: Undergraduate degree in Computer Science and
Engineering, 5 years as Software Engineer
Email: [email protected]

Course Description

This course introduces Python as a tool to collect, process and analyze


large data sets from a variety of sources to create information that
guides businesses decision making

Course Description

Students will get familiarized with Python as a language and as a


platform to integrate different technologies and techniques for data
analytics, including:
Collection of online information;
Tools and strategies for data storage; and
Data analysis methods.

Course Description

Each class will start with the introduction of a concept or tool and end with
in-class hands-on exercises using example datasets.
Throughout the course students will apply these techniques to do their
Homework and their Term Project.

Learning Objectives

Upon completion of this course, the student will be able to:


1

Use Python as a general-purpose programming language

Collect data available online in an automated fashion

Process and store data in the appropriate format for future analysis

Apply data analytics tools to extract relevant information

Source Materials

Textbooks:
1

Main: McKinney (2012), Python for Data Analysis, OReilly

Other: Russel (2011), Mining the Social Web, OReilly

Online references:
1

Python 2 Documentation: https://docs.python.org/2/

pandas online reference:


http://pandas.pydata.org/pandas-docs/stable/

ggplot online reference: http://ggplot.yhathq.com

Grading
Individual Assignments: 40%
Assignments will be done by individual students and posted on Blackboard.
Specific assignments will appear approx. 1 week prior to due date.

Term Project: 30%


The term-project will be done in 2 or 3 person teams and will involve the
application of the methods mentioned in the class.
Students will identify a question they would like to answer using publicly
available data, gather the data from an online source, store it and analyze
it using some of the methods shown in class.

Final Exam: 30%


May 6, 6pm

Late Work
If a work is delivered t seconds late, its score is adjusted by multiplying it by


1

t
24 5 60 60

4

100

Maximum Grade

80
60
40
20
0

N. Days Late

10

Basic Concepts and Environment

11

Why Python?

Python is one of the most popular dynamic languages, along with Ruby,
Perl, R, and others
Python has a large and active scientific computing community
Adoption of Python has increased significantly since the 2000s both in
the industry and academic community
Python started as general purpose programming language but data
manipulation libraries make it a first class citizen in data manipulation
and analysis
Excellent choice as a single language for building data-centric
applications

12

Python as Glue

Python integrates easily with C, C++, and FORTRAN, languages in which


many routines are implemented
Most programs consist of small portions of code where most of the time is
spent, and large portions of glue code that doesnt run often
In many cases the execution time of glue code is irrelevant
Python can be used both as a prototyping language and as a
production language

13

Python Essentials

Some of the essential Python libraries and tools:


NumPy
SciPy
pandas
ggplot
IPython

14

Python Essentials: NumPy

NumPy (Numerical Python), is the foundational package for scientific


computing in Python. It provides, among other things:
A fast and efficient multidimensional array object: ndarray
Functions for performing element-wise computations with arrays or
mathematical operations between arrays
Linear algebra operations, Fourier transform, and random number
generation
Tools for integrating connecting C, C++, and Fortran code to Python

15

Python Essentials: SciPy


SciPy is a collection of packages addressing a number of different standard
problem domains in scientific computing:
scipy.integrate: numerical integration routines and differential
equation solvers
scipy.linalg: linear algebra routines and matrix decompositions
extending beyond those provided in numpy.linalg.
scipy.optimize: function optimizers (minimizers) and root finding
algorithms
scipy.signal: signal processing tools
scipy.sparse: sparse matrices and sparse linear system solvers
scipy.stats: standard continuous and discrete probability
distributions (density functions, samplers, continuous distribution
functions), various statistical tests, and more descriptive statistics

16

Python Essentials: pandas


pandas provides data structures and functions designed to make working
with structured data fast, easy and expressive
DataFrame is the primary object of this library
two dimensional object that resembles a table with rows and columns
meat[ : 5 ]
date beef veal pork lamb_and_mutton
0 1944-01-01
751
85 1280
1 1944-02-01
713
77 1169
2 1944-03-01
741
90 1128
3 1944-04-01
650
89
978
4 1944-05-01
681
106 1029

0
1
2
3
4

broilers
89
72
75
66
78

other_chicken
NaN
NaN
NaN
NaN
NaN

\
NaN
NaN
NaN
NaN
NaN

turkey
NaN
NaN
NaN
NaN
NaN

17

Python Essentials: ggplot


ggplot is a graphics library that allows for the creation of graphics very
easily
from ggplot import *
ggplot ( aes ( x= date , y= beef ) , data=meat) +\
geom_line ( ) +\
stat_smooth ( colour= blue , span=0.2)

3000

2500

beef

2000

1500

1000

500

0
1945

1955

1965

1975

date

1985

1995

2005

18

Python Essentials: IPython

IPython is the component that ties everything together. Aside from the
standard terminal, IPython shell provides:
IPython notebook: HTML notebook for connecting to IPython through
a web browser
GUI console with inline plotting, multiline editing and syntax
highlighting
Infrastructure for interactive parallel and distributed computing

19

Installation and Setup

Mac OS X and Linux distributions come with a Python distribution, but not
necessarily with all the required libraries
New users can install Anaconda (http://continuum.io/downloads) or
Canopy (https://store.enthought.com/downloads/)
To install IPython (and Python) follow the instructions on
http://ipython.org/install.html
You will need IPython notebook

20

Python 2 and Python 3

The Python community is currently undergoing a transition from the


Python 2 series of interpreters to the Python 3 series
Until the appearance of Python 3.0, all Python code was backwards
compatible
The community decided that in order to move the language forward,
certain backwards incompatible changes were necessary

21

Python 2 and Python 3

Python 3.x is a cleaned up version of Python 2.x


Many inconsistencies were removed in the new version
2.x: print "The answer is", 2*2
3.x: print("The answer is", 2*2)

More details at
http://nbviewer.ipython.org/github/rasbt/python_reference/blob/
master/tutorials/key_differences_between_python_2_and_3.ipynb
However, there is still a considerable amount code written in Python 2.x,
making it the de facto standard
In this course we will be using Python 2.x

22

Integrated Development Environments (IDEs)

There are many editors and IDEs that you can use to edit Python
PyDev (plugin for Eclipse)
Python Tools for Visual Studio
PyCharm
IPython notebook
Emacs
Vim
You can find more IDEs on
https://wiki.python.org/moin/IntegratedDevelopmentEnvironments

23

IPython: An Interactive Computing and


Development Environment

24

IPython Basics Prompt


$ ipython --pylab
Python 2.7.6 | 64-bit | (default, Jun 4 2014, 16:42:26)
Type "copyright", "credits" or "license" for more information.
IPython 2.1.0 -- An enhanced Interactive Python.
?
-> Introduction and overview of IPythons features.
%quickref -> Quick reference.
help
-> Pythons own help system.
object?
-> Details about object, use object?? for extra details.
Using matplotlib backend: MacOSX
In [1]: 3 + 4
Out[1]: 7
In [2]: data = {i : randn() for i in range(8)}
In [3]: data
Out[3]:
{0: 0.36680003627745555,
1: 0.5231034512314581,
2: 0.6300895261779402,
3: -0.9115682057027865,
4: -1.7244460134107902,
5: 0.3829479256814315,
6: 0.4718660373870812,
7: -0.23438875074129756}
In [4]: data[3]
Out[4]: -0.9115682057027865

25

IPython Basics Tab Completion

In [7]: da<Tab>
data
date2num
datestr2num

datetime
datetime64
datetime_as_string

datetime_data

In [7]: data
Out[7]:
{0: 0.0016908926460949773,
1: 0.39596065989527957,
2: -0.9295711814640477,
3: 2.1076302341719058,
4: -0.6391315204450737,
5: 1.7496783252859787,
6: -0.5307855278794061,
7: 0.38045583368270064}

26

IPython Basics Introspection

Using a question mark (?) before or after a variable will display some
general information about the object:
In [3]: b?
Type:
list
String form: [1, 2, 3, 45]
Length:
4
Docstring:
list() -> new empty list
list(iterable) -> new list initialized from iterables items

? can also be used before or after a function name

27

IPython Basics Introspection

? has a final usage, which is for searching the IPython namespace in a


manner similar to the standard UNIX or Windows command line:
In [4]: import numpy as np
In [5]: np.*load*?
np.load
np.loads
np.loadtxt
np.pkgload

28

IPython Basics The %run Command

Any file can be run as a Python program inside the environment of your
IPython session using the %run command
# ipython_script_test.py
def my_function(x,y,z):
return (x + y) / z
aa = 5
%run ipython_script_test
print aa
print my_function (3.0 ,4 ,5)
5
1.4

29

IPython Basics The %paste Command


The %paste command pastes code copied to the clipboard keeping
indentation
The following code will not work if simply pasted:
x = 5
y = 7
if (x > 5):
x += 1
y = 8
>>> x = 5
y = 7
if (x > 5):
x += 1
y = 8
>>> ... ... >>> >>>
>>> y
8
>>> %paste
x = 5
y = 7
if (x > 5):
x += 1
y = 8
## -- End pasted text ->>> y
7
>>>

30

IPython Basics Interacting with the OS

IPython provides very strong integration with the operating system shell:
Command
output = !cmd args
%alias alias_name cmd
%bookmark
%cd directory
%pwd
%dirs
%dhist
%env

Description
Run cmd and store the stdout in output
Define an alias for a system (shell) command
Utilize IPythons directory bookmarking system
Change system working directory to passed directory
Return the current system working directory
Return a list containing the current directory stack
Print the history of visited directories
Return the system environment variables as a dict

31

IPython Basics IPython GUI

Starting an IPython GUI:


ipython qtconsole --pylab=inline

32

IPython Basics IPython Notebook

Starting the IPython notebook server:


ipython notebook --pylab=inline

33

Python Language

34

Python as a Calculator Basic Math

Python can be used as a basic calculator

Addition and subtraction


print 2 + 4
print 8.1 5
6
3.1

Multiplication
print 5 * 4
print 3.1 * 2
20
6.2

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

35

Python as a Calculator Basic Math

Integer division is not the same as float division

Float division
print 4.0 / 2.0
print 1.0/3.1
2.0
0.322580645161

Integer division
print 4 / 2
print 1/3
2
0

Careful when performing integer division

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

36

Python as a Calculator Basic Math

Exponentiation
print 3. ** 2
print 3**2
print 2 ** 0.5
9.0
9
1.41421356237

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

37

Advanced Mathematical Operations

Some more advanced mathematical operations require the numpy package

Square Root
import numpy as np
print np . sqrt (2)
1.41421356237

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

38

Exponential and logarithmic functions

Exponential
import numpy as np
print np . exp(1)
2.71828182846

Logarithm
import numpy as np
print np . log (10)
print np . log10 (10)

# base10

2.30258509299
1.0

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

39

Variable Assignment

The equal sign (=) is used to assign a value to a variable


width = 20
height = 5 * 9
width * height
900

40

Python Language Types

41

Boolean

Python has a built-in boolean type:


print width == 20
print width == 30
True
False

42

Strings
Strings can be enclosed in single quotes or double quotes

Single quotes
Hello World
Hello World
Isn \ t i t nice to have a computer that t a l k s to you?
"Isnt it nice to have a computer that talks to you?"

Double quotes
" Hello World"
Hello World
" Isn t i t nice to have a computer that t a l k s to you? "
"Isnt it nice to have a computer that talks to you?"

43

Strings

You can concatenate strings with the + sign:


" Hello " + "World"
HelloWorld
aa = " Hello "
bb = "World"
aa + bb
HelloWorld

44

Strings

Strings are immutable:


aa = " Hello " + "World"
print aa
aa[5] = R
HelloWorld
Traceback (most recent call last):
File "<ipython-input-4454-3668d02c561e>", line 3, in <module>
aa[5] = R
TypeError: str object does not support item assignment

45

Strings

You can use triple quotes for strings that span multiple lines
print " " " \
Hello

World " " "


Hello
----World

Triple quotes are often used to provide function documentation

46

Strings

Strings can be indexed (subscripted), with the first character having index 0
mystring = " Hello World"
print mystring [0]
print mystring [6:10]
H
Worl

There is no separate character type. A character is simply a string of size


one

47

Lists

Lists are a compound data type in Python


can be written as a list of comma-separated values (items) between
square brackets
might contain items of different types
squares = [1 , 4 , 9 , 16, 25]
squares
[1, 4, 9, 16, 25]

48

Lists

Lists can be indexed like strings


squares = [1 , 4 , 9 , 16, 25]
print squares [1]
print squares[3]
print squares[ 3:]
4
9
[9, 16, 25]

Lists are mutable (unlike strings)


l e t t e r s = [ a , b , c , d , e , f , g ]
print l e t t e r s
l e t t e r s [ 2 : 5 ] = [ C , D , E ] # replace some values
print l e t t e r s
l e t t e r s [ 2 : 5 ] = [ ] # now remove them
print l e t t e r s
[a, b, c, d, e, f, g]
[a, b, C, D, E, f, g]
[a, b, f, g]

49

Lists

Lists can be used as stacks:


stack = [3 , 4 , 5]
stack . append(6)
stack . append(7)
stack
[3, 4, 5, 6, 7]
stack . pop ( )
7
stack
[3, 4, 5, 6]

50

Tuples

A tuple is like a list but without being enclosed in brackets.


Tuples are immutable; you cannot change their values.
a = 3 , 4 , 5 , [7 , 8] , cat
print a [ 0 ] , a[1]
a[1] = dog
3 cat
Traceback (most recent call last):
File "<ipython-input-4538-8e67474f43ae>", line 3, in <module>
a[-1] = dog
TypeError: tuple object does not support item assignment

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

51

Sets

A set is an unordered collection with no duplicate elements


basket = [ apple , orange , apple , pear , orange , banana ]
f r u i t = set ( basket )
# create a set without duplicates
fruit
{apple, banana, orange, pear}

52

Dictionaries

A dictionary can be though as an unordered set of key : value pairs


phone_list = { jack : 4123324098, j i l l : 4120294139}
phone_list
{jack: 4123324098, jill: 4120294139}

phone_list [ rodrigo ] = 4120293473


phone_list
{jack: 4123324098, jill: 4120294139, rodrigo: 4120293473}

You can access all the keys and values of a dictionary:


print phone_list . keys ( )
print phone_list . values ( )
[rodrigo, jill, jack]
[4120293473, 4120294139, 4123324098]

53

Python Language Control Structures

54

Control Flows
if statements
x = 42
i f x > 10:
print x
else :
print 10
42

for statements
words = [ cat , window , defenestrate ]
for w in words :
print w, len (w)
cat 3
window 6
defenestrate 12
a = [ Mary , had , a , l i t t l e , lamb ]
for i in range ( len (a ) ) :
print i , a[ i ]
0
1
2
3
4

Mary
had
a
little
lamb

55

Python Language Functions

56

Defining Functions

You can create functions using the keyword def


def f ( x ) :
return x ** 3 np . log ( x )
print f (3)
print f ( 5 . 1 )
25.9013877113
131.02175946

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

57

Defining Functions

Functions can receive more than one argument


def func ( x , y ) :
" return product of x and y"
return x * y
print func (2 , 3)
6

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

58

Functions - Optional and Keyword Arguments

You can create default values for arguments:


def func ( a , n=2):
"compute the nth power of a"
return a ** n
# three d i f f e r e n t ways to c a l l the function
print func (2)
print func (2 , 3)
print func (2 , n=4)
4
8
16

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

59

Functions - Optional and Keyword Arguments

Defining a function with two optional arguments


def func (a=1, n=2):
"compute the nth power of a"
return a ** n
# three d i f f e r e n t ways to c a l l the function
print func ( )
print func (2 , 4)
print func (n=4, a=2)
1
16
16

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

60

Functions - Optional and Keyword Arguments

We can define that a function receives an arbitrary number of arguments


with the *args syntax:
def func ( * args ) :
sum = 0
for arg in args :
sum += arg
return sum
print func (1 , 2 , 3 , 4)
10

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

61

Functions - Optional and Keyword Arguments

We can define that a function receives an arbitrary number of keyword


arguments with the **kwargs syntax:
def func ( * * kwargs ) :
for kw in kwargs :
print {0} = {1} . format (kw, kwargs [kw] )
func ( t1=6, color= blue )
color = blue
t1 = 6

62

Lambda Functions

You can define "lambda" functions, which are also known as inline or
anonymous functions.
The syntax is lambda var:f(var)
print map(lambda x : x ** 2 , [0 , 1 , 2])
[0, 1, 4]

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

63

Nested Functions

You can nest functions inside of functions


def wrapper ( x ) :
a = 4
def func ( x , a ) :
return a * x
return func ( x , a)
print wrapper (5)
20

Source: John Kitchin <[email protected]> http://kitchingroup.cheme.cmu.edu/pycse/pycse.html

64

Functional Programming Tools


filter
def f ( x ) :
return x % 3 == 0 or x % 5 == 0
f i l t e r ( f , range (2 , 25))
[3, 5, 6, 9, 10, 12, 15, 18, 20, 21, 24]

map
def cube( x ) : return x * x * x
map(cube , range (10))
[0, 1, 8, 27, 64, 125, 216, 343, 512, 729]

reduce
def add( x , y ) : return x+y
reduce (add , range (10))
45

65

List Comprehensions

List comprehensions provide a shortcut to create lists from existing


structures:
squares = [ x ** 2 for x in range (10)]
print squares
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

66

Python Language Class System

67

Class Objects

Class objects support two kinds of operations: attribute references and


instantiation.
class MyClass :
" " "A simple example class " " "
i = 12345
def f ( s e l f ) :
return hello world

x = MyClass ( )
print x . i
print x . f ( )
12345
hello world

68

Exercises

69

You might also like