Unit 5 Plotting_ matplotlib in python
Unit 5 Plotting_ matplotlib in python
Plotting: matplotlib
import numpy as np
np.random.seed(10)
Python has many plotting libraries. Here we discuss some of the simplest
ones, matplotlib and seaborn. Matplotlib is in a sense a very basic plotting
library, oriented on vectors, not datasets (in this sense comparable to base-R
plotting). But it is very widely used, and with a certain effort, it allows to create
very nice looking plots. It is also easier to tinker with the lower level features in
matplotlib, compared to the more high-level data oriented libraries.
import numpy as np
import pandas as pd
1/15
2/5/24, 9:43 AM
4.1 Matplotlib
The module provides the basic functions like scatterplot and line plot, both of
these functions should be called with x and y vectors. Here is a demonstration
of a simple scatterplot:
x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y)
_ = plt.show()
https://faculty.washington.edu/otoomet/machinelearning-py/plotting-matplotlib-and-seaborn.html#plotting-matplotlib 2/15
2/5/24, 9:43 AM Chapter 4 Plotting: matplotlib and seaborn | Machine learning in python
x = np.linspace(-5, 5, 100)
y = np.sin(x)
_ = plt.plot(x, y)
_ = plt.show()
Most of the functionality should be clear by now, but here are a few notes:
The first lines create a linear sequence of 100 numbers between -5 and 5,
and compute sin of these numbers.
Line plots are done using plt.plot , it also takes arguments x and y.
4/15
2/5/24, 9:43 AM
x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y,
color="red", # dot color
edgecolor="black",
alpha=0.5 # transparency
)
_ = plt.xlabel("x") # axis labels
_ = plt.ylabel("y")
_ = plt.title("Random dots") # main label
_ = plt.xlim(-5, 5) # axis limits
_ = plt.ylim(-5, 5)
_ = plt.show()
Most of the features demonstated above are obvious from the code and
comments. However, some explanations are still needed:
Argument color denotes dot color when specified as color name, like
“red” or “black”. There is also another way to specify colors, c , see
below.
5/15
2/5/24, 9:43 AM
All the additional functions return an object that we store into a temporary
variable in order to avoid printing.
All the additional functions in plt are executed before the actual plot is
drawn on screen. In particular, despite we specify the axis limits after
plt.scatter , they still apply to the scatterplot.
x = np.random.normal(size=50)
y = np.random.normal(size=50)
z = np.random.choice([1,2,3], size=50)
_ = plt.scatter(x, y,
c=z # color made of variable "z"
)
_ = plt.show()
6/15
2/5/24, 9:43 AM
Now the dots are of different color, depending on the value of z. Note that the
values submitted to c argument must be numbers, strings will not work.
4.1.3 Histograms
Histograms are a quick and easy way to get an overview of 1-D data
distributions. These can be plotted using plt.hist . As hist returns bin
data, one may want to assign the result into a temporary variable to avoid
spurious printing in ipython-based environments (such as notebooks):
x = np.random.normal(size=1000)
_ = plt.hist(x)
_ = plt.show()
Not surprisingly, the histogram of normal random variables looks like, well, a
normal curve.
We may tune the picture somewhat using arguments bins to specify the
desired number of bins, and make bins more distinct by specifying
edgecolor :
7/15
2/5/24, 9:43 AM
Seaborn library is designed for plotting data, not vectors of numbers. It is built
on top of matplotlib and has only limited functionality outside of that library.
Hence in order to achieve the desired results with seaborn, one has to rely on
some matplotlib functionality for adjusting the plots. Seaborn is typically
imported as _sns:
8/15
2/5/24, 9:43 AM
df = pd.DataFrame({"x": np.random.normal(size=50),
"y": np.random.normal(size=50),
"z": np.random.choice([1,2,3], size=50)})
_ = sns.scatterplot(x="x", y="y", hue="z", data=df)
_ = plt.show()
These arguments are here not the data vectors as in case of matplotlib
but data variable names, those are looked up in the data frame, specified
with the argument data.
9/15
2/5/24, 9:43 AM
For some reason, seaborn insist that there should be legend for z value
“0”, even if no such value exists in data:
df.z.unique()
## array([2, 3, 1])
4.2.1.1 Scatterplot
10/15
2/5/24, 9:43 AM
We plot the northern sea ice extent (measured in km2 ) for September (month
of yearly minimum) and March (yearly maximum) through the years. We put
both months on the same plot using a different marker:
The plot shows two sets of dots–circles for March and crosses for September.
Note that seaborn automatically adds default labels for the marker types. We
also use matplotlib’s plt.ylim to set the limits for y-axis.
11/15
2/5/24, 9:43 AM
Note that the code is exactly the same as in the scatterplot example, just we
use sns.lineplot instead of sns.scatterplot . As a result the plot is made
of lines, not dots, and the style option controls line style, not the marker style.
Seaborn has a handy plot type, sns.regplot , that allows one to add the
regression line on plot. Here we plot the september ice extent, and add a trend
line (regression line) on the plot. We also change the default colors using the
scatter_kws and line_kws arguments:
12/15
2/5/24, 9:43 AM Chapter 4 Plotting: matplotlib and seaborn | Machine learning in python
_ = sns.regplot(x="time", y="extent",
scatter_kws = {"color":"blue", "alpha":0.5, "edgecolor":"
line_kws={"color":"black"},
data=ice[ice.month.isin([9]) & (ice.region == "N")])
_ = plt.show()
Unfortunately, regplot does not accept arguments like style for splitting data
into two groups.
13/15
2/5/24, 9:43 AM
14/15
2/5/24, 9:43 AM
Note that distplot does not use data frame centric approach, unlike
regplot or lineplot , it takes its input in a vector form instead.
15/15