0% found this document useful (0 votes)
3 views

Unit 5 Plotting_ matplotlib in python

The document provides an overview of two popular Python plotting libraries, Matplotlib and Seaborn. It discusses the basic functionalities of Matplotlib for creating scatterplots, line plots, and histograms, as well as how to enhance these plots with customization options. Seaborn is introduced as a higher-level data-oriented library built on top of Matplotlib, offering additional functionalities for visualizing data with ease.

Uploaded by

xech.170
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit 5 Plotting_ matplotlib in python

The document provides an overview of two popular Python plotting libraries, Matplotlib and Seaborn. It discusses the basic functionalities of Matplotlib for creating scatterplots, line plots, and histograms, as well as how to enhance these plots with customization options. Seaborn is introduced as a higher-level data-oriented library built on top of Matplotlib, offering additional functionalities for visualizing data with ease.

Uploaded by

xech.170
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

2/5/24, 9:43 AM

Plotting: matplotlib

import numpy as np
np.random.seed(10)

Python has many plotting libraries. Here we discuss some of the simplest
ones, matplotlib and seaborn. Matplotlib is in a sense a very basic plotting
library, oriented on vectors, not datasets (in this sense comparable to base-R
plotting). But it is very widely used, and with a certain effort, it allows to create
very nice looking plots. It is also easier to tinker with the lower level features in
matplotlib, compared to the more high-level data oriented libraries.

Seaborn is such a high-level data oriented plotting library (comparable to


ggplot in R in this sense). It has ready-made functionality to pick variables
from datasets and modify the visual properties of lines and points depending
on other values in data.

We assume you have imported the following modules:

import numpy as np
import pandas as pd

1/15
2/5/24, 9:43 AM

4.1 Matplotlib

Matplotlib is designed to be similar to the plotting functionality in the popular


matrix language matlab. This is a library geared to scientific plotting. In these
notes, we are mainly interested in the pyplot module but matplotlib contains
more functionality, e.g. handling of images. Typically we import pyplot as
plt :

import matplotlib.pyplot as plt

This page is compiled using matplotlib 3.1.2.

4.1.1 Introductory examples

The module provides the basic functions like scatterplot and line plot, both of
these functions should be called with x and y vectors. Here is a demonstration
of a simple scatterplot:

x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y)
_ = plt.show()

https://faculty.washington.edu/otoomet/machinelearning-py/plotting-matplotlib-and-seaborn.html#plotting-matplotlib 2/15
2/5/24, 9:43 AM Chapter 4 Plotting: matplotlib and seaborn | Machine learning in python

This small code demonstrates several functions:

first, we create 50 random dots using numpy

plt.scatter creates a scatterplot (point plot). It takes arguments x and y


for the horizontal and vertical placement of dots

plt.scatter returns an object, we may want to assign it to a temporary


variable to avoid printing. We use variable name _ (just underscore) for
a value we are not really storing but just avoiding printing.

Scatterplot automatically computes the suitable axis range.

plt.show makes the plot visible. It may not be necessary, depending on


the environment you use. For instance, when you run a notebook cell, it
will automatically make the plot visible at the end. However, if you want to
make two plots inside of the cell, you still need to call plt.show to tell
matplotlib that you are done with the first plot and now it is time to show it.

Finally, plt.show also returns an object, and we assign it to a temporary


variable to avoid printing.

Next, here is another simple example of line plot:


3/15
2/5/24, 9:43 AM

x = np.linspace(-5, 5, 100)
y = np.sin(x)
_ = plt.plot(x, y)
_ = plt.show()

Most of the functionality should be clear by now, but here are a few notes:

The first lines create a linear sequence of 100 numbers between -5 and 5,
and compute sin of these numbers.

Line plots are done using plt.plot , it also takes arguments x and y.

4.1.2 Tuning plots

Matplotlib offers a number of arguments and additional functions to improve


the look of the plots. Below we demonstrate a few:

4/15
2/5/24, 9:43 AM

x = np.random.normal(size=50)
y = np.random.normal(size=50)
_ = plt.scatter(x, y,
color="red", # dot color
edgecolor="black",
alpha=0.5 # transparency
)
_ = plt.xlabel("x") # axis labels
_ = plt.ylabel("y")
_ = plt.title("Random dots") # main label
_ = plt.xlim(-5, 5) # axis limits
_ = plt.ylim(-5, 5)
_ = plt.show()

Most of the features demonstated above are obvious from the code and
comments. However, some explanations are still needed:

Argument color denotes dot color when specified as color name, like
“red” or “black”. There is also another way to specify colors, c , see
below.

5/15
2/5/24, 9:43 AM

Alpha denotes transparency with alpha=0 being completely transparent


(invisible) and alpha=1 being completely oblique (default).

All the additional functions return an object that we store into a temporary
variable in order to avoid printing.

All the additional functions in plt are executed before the actual plot is
drawn on screen. In particular, despite we specify the axis limits after
plt.scatter , they still apply to the scatterplot.

Sometimes we want to make color of the dots dependent of another variable.


In this case we can use argument c instead of color:

x = np.random.normal(size=50)
y = np.random.normal(size=50)
z = np.random.choice([1,2,3], size=50)
_ = plt.scatter(x, y,
c=z # color made of variable "z"
)
_ = plt.show()

6/15
2/5/24, 9:43 AM

Now the dots are of different color, depending on the value of z. Note that the
values submitted to c argument must be numbers, strings will not work.

4.1.3 Histograms

Histograms are a quick and easy way to get an overview of 1-D data
distributions. These can be plotted using plt.hist . As hist returns bin
data, one may want to assign the result into a temporary variable to avoid
spurious printing in ipython-based environments (such as notebooks):

x = np.random.normal(size=1000)
_ = plt.hist(x)
_ = plt.show()

Not surprisingly, the histogram of normal random variables looks like, well, a
normal curve.

We may tune the picture somewhat using arguments bins to specify the
desired number of bins, and make bins more distinct by specifying
edgecolor :
7/15
2/5/24, 9:43 AM

_ = plt.hist(x, bins=30, edgecolor="w")


_ = plt.show()

4.2 Seaborn: data oriented plotting

Seaborn library is designed for plotting data, not vectors of numbers. It is built
on top of matplotlib and has only limited functionality outside of that library.
Hence in order to achieve the desired results with seaborn, one has to rely on
some matplotlib functionality for adjusting the plots. Seaborn is typically
imported as _sns:

import seaborn as sns

Below we use seaborn 0.10.0.

Here is an usage example using a data frame of three random variables:

8/15
2/5/24, 9:43 AM

df = pd.DataFrame({"x": np.random.normal(size=50),
"y": np.random.normal(size=50),
"z": np.random.choice([1,2,3], size=50)})
_ = sns.scatterplot(x="x", y="y", hue="z", data=df)
_ = plt.show()

Note the similarities and differences compared to matplotlib:

The information is fed to seaborn using arguments x, y, and hue (and


more) that determine the horizontal and vertical location of the dots, and
their color (“hue”).

These arguments are here not the data vectors as in case of matplotlib
but data variable names, those are looked up in the data frame, specified
with the argument data.

Seaborn automatically provides the axis labels and the legend.

If needed, the plot can be further adjusted with matplotlib functionality,


here we just use plt.show() to display it.

9/15
2/5/24, 9:43 AM

For some reason, seaborn insist that there should be legend for z value
“0”, even if no such value exists in data:

df.z.unique()

## array([2, 3, 1])

4.2.1 Different plot types

The plotting functions of seaborn are largely comparable to those of matplotlib


but the names may differ. It also offers additional plot types, such as density
plot, and to add regression line on scatterplot.

4.2.1.1 Scatterplot

The example above already demonstrated scatterplot. We make another


scatterplot here using sea ice extent data, this time demonstrating marker
types (style). The dataset looks like

ice = pd.read_csv("../data/ice-extent.csv.bz2", sep="\t")


ice.head(3)

## year month data-type region extent area time


## 0 1978 11 Goddard N 11.65 9.04 1978.875000
## 1 1978 11 Goddard S 15.90 11.69 1978.875000
## 2 1978 12 Goddard N 13.67 10.90 1978.958333

10/15
2/5/24, 9:43 AM

We plot the northern sea ice extent (measured in km2 ) for September (month
of yearly minimum) and March (yearly maximum) through the years. We put
both months on the same plot using a different marker:

_ = sns.scatterplot(x="time", y="extent", style="month",


data=ice[ice.month.isin([3,9]) & (ice.region == "N")]
_ = plt.ylim(0, 17)
_ = plt.show()

The plot shows two sets of dots–circles for March and crosses for September.
Note that seaborn automatically adds default labels for the marker types. We
also use matplotlib’s plt.ylim to set the limits for y-axis.

4.2.1.2 Line plot

Here we replicate the previous example using line plot

11/15
2/5/24, 9:43 AM

_ = sns.lineplot(x="time", y="extent", style="month",


data=ice[ice.month.isin([3,9]) & (ice.region == "N")]
_ = plt.ylim(0, 17)
_ = plt.show()

Note that the code is exactly the same as in the scatterplot example, just we
use sns.lineplot instead of sns.scatterplot . As a result the plot is made
of lines, not dots, and the style option controls line style, not the marker style.

4.2.1.3 Regression line on scatterplot

Seaborn has a handy plot type, sns.regplot , that allows one to add the
regression line on plot. Here we plot the september ice extent, and add a trend
line (regression line) on the plot. We also change the default colors using the
scatter_kws and line_kws arguments:

12/15
2/5/24, 9:43 AM Chapter 4 Plotting: matplotlib and seaborn | Machine learning in python

_ = sns.regplot(x="time", y="extent",
scatter_kws = {"color":"blue", "alpha":0.5, "edgecolor":"
line_kws={"color":"black"},
data=ice[ice.month.isin([9]) & (ice.region == "N")])
_ = plt.show()

Unfortunately, regplot does not accept arguments like style for splitting data
into two groups.

4.2.1.4 Histograms and density plots

Seaborn can do both kernel density plots and histograms using


sns.distplot . By default, the function shows histogram, overlied by kernel
density line, but these can be turned off. Both plots can further be customized
with further keywords.

13/15
2/5/24, 9:43 AM

_ = sns.distplot(ice[ice.month.isin([9]) & (ice.region == "N")].extent,


bins=10,
kde=False, # no density
hist_kws={"edgecolor":"black"})
_ = plt.show()

_ = sns.distplot(ice[ice.month.isin([9]) & (ice.region == "N")].extent,


hist=False) # no histogram
_ = plt.show()

14/15
2/5/24, 9:43 AM

Note that distplot does not use data frame centric approach, unlike
regplot or lineplot , it takes its input in a vector form instead.

15/15

You might also like