METU Computer Engineering
2024
CEng 240 – Spring 2021
Week 14 13
Scientific and Engineering Libraries
Part 2: Pandas and Matplotlib
Sinan Kalkan
This Week
METU Computer Engineering
¢ Scientific and Engineering Libraries
§ Pandas for data handling and analysis
§ Matplotlib for plotting
2020 S. Kalkan - CEng 240 2
METU Computer Engineering
2020 S. Kalkan - CEng 240 4
Outline
METU Computer Engineering
¢ Overview
¢ Installation
¢ DataFrames
¢ Accessing data in DataFrames
¢ Modifying data in DataFrames
¢ Analyzing data in DataFrames
¢ Presenting data in DataFrames
2020 S. Kalkan - CEng 240 5
Overview
METU Computer Engineering
¢ A handy library for:
§ working with files of different formats
§ manipulating & analyzing data
¢ Data types & structures for
§ tables, especially numerical tables,
§ time series
¢ Name comes from “panel data”
2020 S. Kalkan - CEng 240 6
Installation
METU Computer Engineering
¢ On your Linux environment:
$ pip install pandas
or
$ conda install pandas
¢ On Windows/Mac: install anaconda first
¢ On Colab, it is already installed
¢ import pandas as pd
Supported Files
METU Computer Engineering
¢ A wide collection of
file formats
¢ Each format has a
reader and a writer
For an up-to-date list:
https://pandas.pydata.org/pandas-
docs/stable/user_guide/io.html
2020 S. Kalkan - CEng 240 8
Data Frames
METU Computer Engineering
¢ Similar to NumPy’s ndarray datatype, Pandas
has a very fundamental data type called
DataFrame
¢ A DataFrame is created by
§ Data loaded from files (using a reader)
§ The constructor DataFrame()
2020 S. Kalkan - CEng 240 9
Data Frames
Loading data from files
METU Computer Engineering
This produces the following output:
For more information about the CSV file format,
have a look at the File Handling chapter.
Sample file ‘ch10_example.csv’ at:
https://raw.githubusercontent.com/sinankalkan/CENG240/master/figures/ch10_example.csv
2020 S. Kalkan - CEng 240 10
Data Frames
Loading data from files
METU Computer Engineering
More on pd.read_csv():
• Automatically loads column headers
• If your file does not have a header, use: pd.read_csv(filename, header=None)
• If you want to read specific columns, use:
pd.read_csv(filename, usecols=[‘column name 1’, ...])
• For more information & control, see help(pd.read_csv)
2020 S. Kalkan - CEng 240 11
Data Frames
Create a DataFrame from Python data
METU Computer Engineering
Use the pd.DataFrame() function:
If you need keys/names for each row, then:
2020 S. Kalkan - CEng 240 12
Data Frames
Create a DataFrame from Python data
METU Computer Engineering
It is also possible to create the columns of data in a dictionary and pass
that to the pd.DataFrame() function:
Note that the column names were retrieved from the keys of the dictionary
2020 S. Kalkan - CEng 240 13
Accessing Data
Column-wise access
METU Computer Engineering
¢ Use column names &
row names like keys
in a dictionary
¢ df[‘Name’] returns
the ‘Name’ column
§ Then you can use
integer index or
named index (key) in
each row
2020 S. Kalkan - CEng 240 14
Accessing Data
Row-wise access
METU Computer Engineering
¢ df.iloc[<row index>
§ for integer indexes
¢ df.loc[<row name>]
§ for named indexes
¢ Row & column indexing can be
combined:
§ df.loc[‘Amanda’, ‘Grade’]
§ df.iloc[1, 1]
¢ With integer indexes, Python’s
slicing ([start:end:step]) can be
used
2020 S. Kalkan - CEng 240 15
Modifying Data
METU Computer Engineering
¢ Modifying data is very easy
¢ Need to be careful about chained indexing
¢ No guarantee on df[‘Grade’] being a copy or a
direct access to the ‘Grade’ column
2020 S. Kalkan - CEng 240 16
Modifying Data
METU Computer Engineering
¢ Specify row & column
in one step/go
¢ Avoid chained
indexing when
modifying data
2020 S. Kalkan - CEng 240 17
Analyzing Data
METU Computer Engineering
¢ Pandas provides many
facilities for analyzing your
data in a DataFrame
¢ df.describe()
¢ df.value_counts()
¢ df.max() or df.min()
¢ df.sort_values(by=<col name>)
¢ df.nlargest(<n>)
2020 S. Kalkan - CEng 240 18
Analyzing Data
METU Computer Engineering
¢ Pandas provides many
facilities for analyzing your
data in a DataFrame
¢ df.describe()
¢ df.value_counts()
¢ df.max() or df.min()
¢ df.sort_values(by=<colu
mn name>)
¢ df.nlargest(<n>)
2020 S. Kalkan - CEng 240 19
Presenting Data
METU Computer Engineering
¢ plot() function
2020 S. Kalkan - CEng 240 20
METU Computer Engineering
2020 S. Kalkan - CEng 240 21
Outline
METU Computer Engineering
¢ Overview
¢ Installation
¢ Anatomy of a figure/plot
¢ Preparing your data
¢ Drawing single plots
¢ Drawing multiple plots
¢ Changing elements of a plot
2020 S. Kalkan - CEng 240 22
Overview
METU Computer Engineering
¢ A drawing library for Python
¢ A free and open source alternative to Matlab
¢ Allows 2D & 3D plots
2020 S. Kalkan - CEng 240 23
Overview
METU Computer Engineering
2020 S. Kalkan - CEng 240 24
Installation
METU Computer Engineering
¢ On your Linux environment:
$ pip install matplotlib
or
$ conda install matplotlib
¢ On Windows/Mac: install anaconda first
¢ On Colab, it is already installed
¢ import matplotlib.pyplot as plt
2020 S. Kalkan - CEng 240 25
Anatomy of a plot
METU Computer Engineering
¢ Canvas / drawing
area
§ scatter plot, line
plot, ...
¢ Axes
§ ticks, tick labels,
axis labels
¢ figure title
¢ legend
2020 S. Kalkan - CEng 240 26
Figure from: https://matplotlib.org/tutorials/introductory/usage.html
Preparing your data
METU Computer Engineering
¢ Matplotlib expects NumPy arrays
¢ Convert your data to NumPy
§ If your data is a Python data type, use array()
function to do the conversion
§ If your data is a DataFrame, use df.values, e.g.:
2020 S. Kalkan - CEng 240 27
Drawing single plots
METU Computer Engineering
Drawing in an Object-Oriented Style
¢ Create a figure object
and axes object
¢ Use their member
functions & variables
2020 S. Kalkan - CEng 240 28
Drawing single plots
METU Computer Engineering
Drawing in an Pyplot Style
¢ Use
matplotlib.pylot
directly
2020 S. Kalkan - CEng 240 29
Drawing multiple plots
METU Computer Engineering
¢ This example uses the
object-oriented approach
2020 S. Kalkan - CEng 240 30
Drawing multiple plots
METU Computer Engineering
Multiple plots PyPlot style Multiple plots OOP style
2020 S. Kalkan - CEng 240 31
Changing plot
elements
METU Computer Engineering
¢ All elements of a plot are changeable
§ ticks, tick labels, ...
§ line/dot color, line/dot size, shape, ..
§ legends, titles, ...
§ font style, size, ...
§ Latex support
¢ See
§ help(plt.plot)
§ https://matplotlib.org/2.1.1/contents.html
2020 S. Kalkan - CEng 240 32
Examples (from the book)
METU Computer Engineering
¢ Create a simple CSV file using your favorite spreadsheet editor
(e.g. Microsoft Excel or Google Spreadsheets) and create a file with your
exams and their grades as two separate columns. Save the file, upload it to
the Colab notebook and do the following:
§ Load the file using Pandas.
§ Calculate the mean of your exam grades.
§ Calculate the standard deviation of your grades.
¢ Using Matplotlib, generate the following plots with suitable names for the
axes and the titles.
§ Draw the following four functions in separate single
plots: sin(!),cos(!),tan(!),cot(!).
§ Draw these four functions in a single plot.
§ Draw a multiple 2x2 plot where each subplot is one of the four functions.
2020 S. Kalkan - CEng 240 33
Final Words:
Important Concepts
METU Computer Engineering
¢ Pandas, DataFrame, loading files with Pandas.
¢ Accessing and modifying content in
DataFrames.
¢ Analyzing and presenting data in DataFrames.
¢ Matplotlib and different ways to make plots.
¢ Drawing single and multiple plots. Changing
elements of a plot.
2021 S. Kalkan - CEng 240 34
METU Computer Engineering
THAT’S ALL FOLKS!
STAY HEALTHY
2020 S. Kalkan - CEng 240 35