0% found this document useful (0 votes)
2 views49 pages

Unit 4_Working With Graphs _python

The document provides a comprehensive overview of data wrangling in Python, detailing key processes such as data loading, cleaning, transformation, aggregation, and visualization using libraries like Pandas and Matplotlib. It also covers techniques for combining and merging datasets, detecting and filtering outliers, and string manipulation methods. The content serves as a practical guide for preparing and analyzing data for various applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views49 pages

Unit 4_Working With Graphs _python

The document provides a comprehensive overview of data wrangling in Python, detailing key processes such as data loading, cleaning, transformation, aggregation, and visualization using libraries like Pandas and Matplotlib. It also covers techniques for combining and merging datasets, detecting and filtering outliers, and string manipulation methods. The content serves as a practical guide for preparing and analyzing data for various applications in data science.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 49

UNIT 4

WORKING WITH GRAPHS

VISHNU PRIYA P M | PYTHON | V BCA 1


DATA WRANGLING

Data wrangling in Python refers to the process of


cleaning, transforming, and preparing raw or messy
data for analysis, visualization, or machine learning
tasks using Python programming language. It involves
a series of operations to make the data more
structured, complete, and suitable for the intended
analysis. Python provides various libraries and tools
for efficiently performing data wrangling tasks. Here
are some common steps and techniques involved in
data wrangling in Python:

VISHNU PRIYA P M | PYTHON | V BCA 2


1. Data Loading: Load the raw data into Python using libraries like Pandas (for structured
data), NumPy (for numerical data), or specialized libraries for other data formats like CSV,
Excel, JSON, or databases.

import pandas as pd

# Load data from a CSV file


df = pd.read_csv('data.csv')
2. Data Exploration: Get a preliminary understanding of the data by examining its structure,
summary statistics, and identifying missing values.

# Display the first few rows of the DataFrame


print(df.head())

# Get basic summary statistics


print(df.describe())

# Check for missing values


print(df.isnull().sum())
VISHNU PRIYA P M | PYTHON | V BCA 3
3. Data Cleaning:

Handle missing values by imputing them or dropping rows/columns with missing data.
Remove duplicates.
Correct data errors and inconsistencies.

# Drop rows with missing values


df = df.dropna()

# Remove duplicates
df = df.drop_duplicates()

# Correct data errors


df['column_name'] = df['column_name'].apply(correct_function)

VISHNU PRIYA P M | PYTHON | V BCA 4


4. Data Transformation:

Convert data types.


Normalize or scale numerical data.
Encode categorical variables.
Create new features or variables.
python
Copy code
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)

# Normalize numerical data


df['numeric_column'] = (df['numeric_column'] - df['numeric_column'].mean()) /
df['numeric_column'].std()

# Encode categorical variables


df = pd.get_dummies(df, columns=['categorical_column'])

# Create new features


df['new_feature'] = df['feature1'] * df['feature2']
VISHNU PRIYA P M | PYTHON | V BCA 5
5. Data Aggregation and Grouping:

Aggregate data by grouping based on certain attributes.


Calculate summary statistics for groups.

# Group by a categorical variable and calculate the mean


grouped_data = df.groupby('category_column')['numeric_column'].mean()

6. Data Visualization:
Use libraries like Matplotlib or Seaborn to visualize the data, detect patterns, and gain insights.

import matplotlib.pyplot as plt

# Create a histogram
plt.hist(df['numeric_column'])
plt.xlabel('Numeric Column')
plt.ylabel('Frequency')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 6
7. Data Export:

Save the cleaned and transformed data to a new file if necessary.

# Export cleaned data to a CSV file


df.to_csv('cleaned_data.csv', index=False)

VISHNU PRIYA P M | PYTHON | V BCA 7


COMBINING AND MERGING DATA SETS IN PYTHON
Combining and merging data sets in Python is a common operation in data analysis and
manipulation. You can achieve this using various libraries, with the most popular one being
pandas. Here, I'll provide an overview of how to combine and merge data sets using pandas.

Combining Data Sets

1. Concatenation:
Concatenation is used to combine data frames either row-wise or column-wise.

a.Row-wise concatenation:
import pandas as pd

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B':


['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B':
['B3', 'B4', 'B5']})
VISHNU PRIYA P M | PYTHON | V BCA 8

result = pd.concat([df1, df2], axis=0) #


b. Column-wise concatenation:

result = pd.concat([df1, df2], axis=1) # Concatenate along columns (axis=1)

2. Appending:

Appending is a convenient way to add rows to an existing DataFrame.

result = df1.append(df2)

VISHNU PRIYA P M | PYTHON | V BCA 9


Merging Data Sets
Merging is used to combine data frames based on common columns or indices.

1. Inner Join:
result = pd.merge(df1, df2, on='key_column', how='inner’)

2. Left Join:
result = pd.merge(df1, df2, on='key_column', how='left’)

3. Right Join:
result = pd.merge(df1, df2, on='key_column', how='right’)

4. Outer Join:
result = pd.merge(df1, df2, on='key_column', how='outer’)

5. Merging on Multiple Columns:


You can merge on multiple columns by passing a list of column names to the on parameter.
result = pd.merge(df1, df2, on=['key_column1', 'key_column2'], how='inner')

VISHNU PRIYA P M | PYTHON | V BCA 10


DATA TRANSFORMATION

Data transformation is the process of converting raw data into a format that is more suitable for
analysis, modeling, or machine learning. It is an essential step in any data science project, and
Python is a popular programming language for data transformation.

There are many different types of data transformation, but some common examples include:

Cleaning and preprocessing: This involves removing errors and inconsistencies from the data,
as well as converting the data to a consistent format.
Feature engineering: This involves creating new features from the existing data, or
transforming existing features in a way that is more informative for the task at hand.
Encoding categorical data: Categorical data, such as text or labels, needs to be converted to
numerical data before it can be used by many machine learning algorithms.
Scaling and normalization: This involves transforming the data so that all features are on a
similar scale, which can improve the performance of machine learning algorithms.
There areP Ma| PYTHON
VISHNU PRIYA number| V BCA of different Python libraries that can be used for data transformation, but
11

the most popular one is Pandas. Pandas is a powerful library for data manipulation and
analysis, and it provides a wide range of functions for data transformation.
import pandas as pd

# Load the data


df = pd.read_csv('data.csv')

# Clean the data


df = df.dropna() # Drop rows with missing values
df['age'] = df['age'].astype('int') # Convert the 'age' column to integers

# Create a new feature


df['age_group'] = df['age'].apply(lambda x: 'young' if x < 18 else 'adult')

# Encode categorical data


df['gender'] = df['gender'].map({'male': 1, 'female': 0})

# Scale the data


from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['height', 'weight']] = scaler.fit_transform(df[['height', 'weight']])
VISHNU PRIYA P M | PYTHON | V BCA 12
# Save the transformed data
df.to_csv('transformed_data.csv', index=False)
1. Data Cleaning:
Data cleaning involves handling missing values, removing duplicates, and correcting errors in
your dataset.

Handling Missing Values:


Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values.

import pandas as pd

# Remove rows with missing values


df.dropna()

# Fill missing values with a specific value


df.fillna(0)

# Interpolate missing values


df.interpolate()
Removing Duplicates:

Use drop_duplicates()
VISHNU PRIYA P M | PYTHON | V BCA to remove duplicate rows from your DataFrame. 13
df.drop_duplicates()
2. Data Filtering:
Filtering allows you to select a subset of data based on certain conditions.

# Filter rows where a condition is met


filtered_df = df[df['column_name'] > 10]
3. Data Aggregation:
Aggregation involves summarizing data by grouping it based on certain criteria.

# Group by a column and calculate aggregate statistics


grouped_df = df.groupby('category_column')['numeric_column'].mean()
4. Data Transformation:
Data transformation includes operations like converting data types, scaling values, or applying
mathematical functions.

# Convert data types


df['numeric_column'] = df['numeric_column'].astype(float)

# Scaling values (e.g., Min-Max scaling)


from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
VISHNU PRIYA P M | PYTHON | V BCA 14

df['scaled_column'] = scaler.fit_transform(df[['numeric_column']])
5. One-Hot Encoding:
Convert categorical variables into a numerical format using one-hot encoding.

encoded_df = pd.get_dummies(df, columns=['categorical_column’])

6. Reshaping Data:
Reshaping data includes tasks like pivoting, melting, or stacking/unstacking for better analysis.

# Pivot a DataFrame
pivoted_df = df.pivot(index='row_column', columns='column_column', values='value_column')

# Melt a DataFrame
melted_df = pd.melt(df, id_vars=['id_column'], value_vars=['var1', 'var2'], var_name='variable',
value_name='value')

VISHNU PRIYA P M | PYTHON | V BCA 15


7. Text Data Processing:
For text data, you can perform transformations such as tokenization, stemming, and stop-word
removal using libraries like NLTK or spaCy.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

# Tokenization and stop-word removal


df['text_column'] = df['text_column'].apply(lambda x: ' '.join([word for word in word_tokenize(x)

VISHNU PRIYA P M | PYTHON | V BCA 16


DETECTING AND FILTERING OUTLIERS IN PYTHON

What are Outliers in Python?


Before diving deep into the concept of outliers, let us understand the origin of raw data.

Raw data that is fed to a system is usually generated from surveys and extraction of data from
real-time actions on the web. This may give rise to variations in the data and there exists a
chance of measurement error while recording the data.

An outlier is a point or set of data points that lie away from the rest of the data values of the
dataset. That is, it is a data point(s) that appear away from the overall distribution of data
values in a dataset.

Outliers are possible only in continuous values. Thus, the detection and removal of outliers are
applicable to regression values only.
VISHNU PRIYA P M | PYTHON | V BCA 17

Basically, outliers appear to diverge from the overall proper and well structured distribution of
the data elements. It can be considered as an abnormal distribution which appears away from
Why is it necessary to remove outliers from the data?
As discussed above, outliers are the data points that lie away from the usual distribution of the
data and causes the below effects on the overall data distribution:

Affects the overall standard variation of the data.


Manipulates the overall mean of the data.
Converts the data to a skewed form.
It causes bias in the accuracy estimation of the machine learning model.
Affects the distribution and statistics of the dataset.

Detection of Outliers – IQR approach


The outliers in the dataset can be detected by the below methods:

•Z-score
•Scatter Plots
•Interquartile range(IQR)

VISHNU PRIYA P M | PYTHON | V BCA 18


1. Visual Inspection:
Start by visualizing your data using histograms, box plots, scatter plots, or other visualization
techniques. Outliers often appear as points far from the main cluster or as values outside the
whiskers of box plots. Visualization can help you identify potential outliers.

import matplotlib.pyplot as plt


import seaborn as sns

# Box plot to visualize outliers


sns.boxplot(x=df['column_name'])
plt.show()

2. Z-Score:
The Z-score measures how far a data point is from the mean in terms of standard deviations. You
can use the Z-score to detect outliers. Typically, data points with a Z-score greater than a
threshold (e.g., 2 or 3) are considered outliers.

from scipy import stats

z_scores = stats.zscore(df['column_name'])
VISHNU PRIYA P M | PYTHON | V BCA 19
outliers = df[abs(z_scores) > 2]
filtered_data = df[abs(z_scores) <= 2]
3. IQR (Interquartile Range) Method:
The IQR method involves calculating the IQR (the difference between the 75th percentile and
the 25th percentile) and identifying outliers as values outside a specified range.

Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR


upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['column_name'] < lower_bound) | (df['column_name'] > upper_bound)]


filtered_data = df[(df['column_name'] >= lower_bound) & (df['column_name'] <=
upper_bound)]

VISHNU PRIYA P M | PYTHON | V BCA 20


4. Tukey's Fences:
Tukey's Fences method is similar to the IQR method but uses a different threshold for identifying
outliers.

Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)

lower_fence = Q1 - 3 * (Q3 - Q1)


upper_fence = Q3 + 3 * (Q3 - Q1)

outliers = df[(df['column_name'] < lower_fence) | (df['column_name'] > upper_fence)]


filtered_data = df[(df['column_name'] >= lower_fence) & (df['column_name'] <= upper_fence)]

VISHNU PRIYA P M | PYTHON | V BCA 21


5. Machine Learning-Based Methods:
You can also use machine learning models, such as Isolation Forest or One-Class SVM, to detect
outliers in your data.

from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination=0.05) # Adjust contamination based on your dataset


outliers = clf.fit_predict(df[['column_name']])
outliers = df[outliers == -1]
filtered_data = df[outliers == 1]

VISHNU PRIYA P M | PYTHON | V BCA 22


STRING MANIPULATION

String manipulation in Python involves performing various operations on strings, such as


concatenation, slicing, searching, replacing, formatting, and more. Python provides a rich set of
string manipulation methods and functions that make it easy to work with text data. Here are
some common string manipulation techniques in Python:

1. String Concatenation:
You can concatenate strings using the + operator or by using the str.join() method.

str1 = "Hello"
str2 = "World"
result = str1 + ", " + str2 # Using the + operator
words = ["Hello", "World"]
result = ", ".join(words) # Using join
VISHNU PRIYA P M | PYTHON | V BCA 23
2. String Slicing:
String slicing allows you to extract substrings from a string based on their positions.

text = "Python Programming"


substring = text[7:18] # Extract "Programming"

3. String Searching:
You can search for substrings within a string using methods like str.find(), str.index(), or regular
expressions with the re module.

text = "Python is a powerful programming language"


position = text.find("powerful") # Find the position of "powerful"

4. String Replacement:
Replace specific substrings within a string using the str.replace() method.

text = "Python is a great programming language"


new_text = text.replace("great", "powerful") # Replace "great" with "powerful"
VISHNU PRIYA P M | PYTHON | V BCA 24
5. String Formatting:
You can format strings using f-strings (Python 3.6+), the .format() method, or the % operator.

name = "Alice"
age = 30
formatted_str = f"My name is {name} and I am {age} years old."
name = "Bob"
age = 25
formatted_str = "My name is {} and I am {} years old.".format(name, age)

6. String Splitting:
Split a string into a list of substrings using the str.split() method.

text = "Python,Java,C++,JavaScript"
languages = text.split(",") # Split by comma

VISHNU PRIYA P M | PYTHON | V BCA 25


7. String Stripping:
Remove leading and trailing whitespace characters using str.strip() or str.lstrip() and str.rstrip().

text = " Python is awesome! "


cleaned_text = text.strip() # Remove leading and trailing spaces

8. String Case Conversion:


Convert the case of a string using methods like str.lower(), str.upper(), or str.capitalize().

text = "Hello World"


lower_case = text.lower()
upper_case = text.upper()
capitalized = text.capitalize()

VISHNU PRIYA P M | PYTHON | V BCA 26


Syntax Function
string.upper() To transform all the characters of the
string into uppercase.

string.lower() To transform all the characters of the


string into lowercase.

string.title() To transform the first letter of a word


into the upper case and the rest of the
characters into the lower case.
string.swapcase() To transform the upper case characters
into lower case and vice versa.
string.capitalize() To transform the first character in the
string to the upper case.
string.isupper() Returns true if all the alphabetic
characters of the string are upper case.
string.islower() Returns true if all the alphabetic
characters of the string are lower case.
string.Endswith() Return true if the string ends with a
specific value.
string.Startswith() Return true if the string starts with a
VISHNU PRIYA P M | PYTHON | V BCA
specific value. 27

string.index(‘character’) Return the position of the character.


VECTORIZED STRING FUNCTIONS
Vectorized string functions in pandas allow you to efficiently perform operations on string data
within a pandas DataFrame or Series. These functions are accessed through the .str attribute of
a pandas Series and enable you to apply string operations element-wise. Here are some
commonly used vectorized string functions in pandas:
import pandas as pd

series = pd.Series(['Alice', 'Bob', 'Carol']) 0 alice


series_lowercase = series.str.lower() 1 bob
2 carol
print(series_lowercase) dtype: object

VISHNU PRIYA P M | PYTHON | V BCA 28


Some of the most commonly used vectorized string functions in Pandas include:

str.lower(): Convert all strings to lowercase.


str.upper(): Convert all strings to uppercase.
str.strip(): Remove whitespace from the beginning and end of all strings.
str.split(): Split all strings into a list of strings, using a specified separator.
str.replace(): Replace all occurrences of a specified substring with another substring in all
strings.
str.contains(): Return a boolean Series indicating whether each string contains a specified
substring.
Vectorized string functions can also be used to perform more complex string operations, such as
regular expression matching and extraction. For example, the following code uses the
str.extract() function to extract the first name from each email address in a Series:

import pandas as pd

series = pd.Series(['[email protected]',
'[email protected]', '[email protected]']) Output:
first_names = series.str.extract(r'(?P<first_name>\
w+)@example.com') 0 alice
VISHNU PRIYA P M | PYTHON | V BCA 1 bob 29

print(first_names) 2 carol
dtype: object
PLOTTING AND VISUALIZATION

Plotting and visualization are crucial for understanding and communicating data. In Python, there
are several libraries for creating plots and visualizations, with Matplotlib, Seaborn, and Plotly
being some of the most popular ones. Here's an overview of how to create plots and
visualizations in Python:
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It
consists of various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of
flexibility.
pip install matplotlib

VISHNU PRIYA P M | PYTHON | V BCA 30


Basic Line Plot:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()

VISHNU PRIYA P M | PYTHON | V BCA 31


Seaborn:
Seaborn is built on top of Matplotlib and provides a high-level interface for creating informative
and attractive statistical graphics.

Installation:

pip install seaborn

Basic Scatter Plot:


import seaborn as sns

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

sns.scatterplot(x=x, y=y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 32
Plotly:
Plotly is a powerful library for creating interactive and web-based visualizations. It is often used
for creating dashboards and web applications.

Installation:

pip install plotly

Basic Bar Chart:

import plotly.express as px

data = {'Category': ['A', 'B', 'C', 'D'],


'Values': [10, 20, 15, 30]}

fig = px.bar(data, x='Category', y='Values', title='Bar Chart')


fig.show()

VISHNU PRIYA P M | PYTHON | V BCA 33


Other Libraries:
Pandas: Pandas also provides basic plotting functionality through the .plot() method for
DataFrames, making it convenient for quick exploratory data analysis.

Bokeh: Bokeh is another library for interactive web-based visualizations and is well-suited for
creating interactive dashboards.

Altair: Altair is a declarative statistical visualization library for Python, making it easy to create
complex visualizations with concise code.

ggplot (ggpy): ggplot is a Python implementation of the popular ggplot2 package from R, which
uses a grammar of graphics to create plots.

VISHNU PRIYA P M | PYTHON | V BCA 34


MATPLOTLIB API PRIMER

Matplotlib is a popular Python library for creating static, animated, and interactive
visualizations. When creating plots with Matplotlib, you can customize the appearance of your
lines, markers, and line styles. Here's a primer on how to do that:

Colors:
Matplotlib allows you to specify colors for lines, markers, and other plot elements in several
ways:

Named Colors: You can use named colors like 'red', 'blue', 'green', etc.
import matplotlib.pyplot as plt

plt.plot(x, y, color='red', label='Red Line')


RGB Values: You can use RGB tuples to specify colors.
plt.plot(x, y, color=(0.1, 0.2, 0.3), label='Custom Color')
Hexadecimal Colors:
VISHNU PRIYA P M | PYTHON | V BCA You can also use hexadecimal color codes. 35

plt.plot(x, y, color='#FF5733', label='Hex Color')


Markers:
Markers are used to indicate specific data points on a plot. You can customize markers in
Matplotlib:

plt.plot(x, y, marker='o', markersize=8, markerfacecolor='yellow', markeredgecolor='black',


label='Custom Marker')
marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares, 'x' for crosses).
markersize: Sets the size of the marker.
markerfacecolor: Sets the marker's fill color.
markeredgecolor: Sets the marker's edge color.

VISHNU PRIYA P M | PYTHON | V BCA 36


Line Styles:
You can customize the line style of your plot:

plt.plot(x, y, linestyle='--', linewidth=2, label='Dashed Line')


linestyle: Specifies the line style (e.g., '-', '--', '-.', ':').
linewidth: Sets the width of the line.

VISHNU PRIYA P M | PYTHON | V BCA 37


Line Styles:
You can customize the appearance of lines in your plot using various line styles, markers, and
colors. Here's an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Customizing line style with color, marker, and linestyle


plt.plot(x, y, color='blue', marker='o', linestyle='--', markersize=8, label='Custom Line')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label') In the above example:
plt.title('Custom Line Style')
plt.legend() color: Sets the line color.
plt.show() marker: Specifies the marker style (e.g., 'o' for
circles, 's' for squares).
linestyle: Sets the line style ('--' for dashed, ':' for
VISHNU PRIYA P M | PYTHON | V BCA dotted, etc.). 38

markersize: Adjusts the size of markers.


Ticks and Labels:
You can customize tick locations and labels on the x and y axes using xticks() and yticks()
functions:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Customizing x-axis ticks and labels


plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E'])

# Customizing y-axis ticks and labels


plt.yticks([2, 4, 6, 8, 10], ['Low', 'Medium', 'High', 'Very High', 'Max'])

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Custom Ticks and Labels') In this example:
plt.legend()
VISHNU PRIYA P M | PYTHON | V BCA 39
plt.show() xticks() and yticks() specify the locations and
labels for the ticks on the x and y axes,
Legends:
To add legends to your plot, you can use the legend() function. You should also label your plotted
lines or data points using the label parameter when creating the plot.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]

plt.plot(x, y1, label='Line 1')


plt.plot(x, y2, label='Line 2')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Legend Example')
plt.legend()
plt.show()

In this example:
VISHNU PRIYA P M | PYTHON | V BCA 40

label is provided when creating each line, and


legend() is called to display the legend.
ANNOTATIONS AND DRAWING ON A SUBPLOT

Annotations and drawing on a subplot in Matplotlib allow you to add textual or graphical
elements to your plots to provide additional information or highlight specific points of interest.
Here's how you can add annotations and draw on a subplot:

VISHNU PRIYA P M | PYTHON | V BCA 41


Adding Text Annotations:
You can add text annotations to your plot using the
text() function. Here's an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Adding a text annotation


plt.text(3, 7, 'Annotation Here', fontsize=12,
color='red')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Text Annotation Example')
plt.legend()
VISHNU PRIYA P M | PYTHON | V BCA
plt.show() In this example, plt.text(x, y, text, fontsize, color)
42

is used to add a text annotation at coordinates (3,


7) with the specified text, fontsize, and color.
Adding Arrows with Annotations:
You can add arrows to point to specific locations on your plot using the annotate() function. Here's
an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Adding an arrow with annotation


plt.annotate('Important Point', xy=(3, 6), xytext=(4, 8), fontsize=12,
arrowprops=dict(arrowstyle='->', color='blue'))

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Arrow Annotation Example')
plt.legend()
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 43

In this example, plt.annotate(text, xy, xytext, fontsize, arrowprops) is used to add an arrow with
text annotation. xy specifies the point being pointed to, and xytext specifies the location of the
Drawing Shapes:
You can draw various shapes, lines, and polygons on a subplot using Matplotlib's plotting
functions. For example, to draw a rectangle:

import matplotlib.pyplot as plt


import matplotlib.patches as patches

# Create a subplot
fig, ax = plt.subplots()

# Add a rectangle
rectangle = patches.Rectangle((1, 2), 2, 4, linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rectangle)

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Drawing Shapes Example')
plt.show()
In this example, we create a subplot, add a rectangle using patches.Rectangle(), and then add it
to the
VISHNU PRIYAplot with| V ax.add_patch().
P M | PYTHON BCA 44
SAVING PLOTS TO FILE

You can save plots created with Matplotlib to various file formats such as PNG, PDF, SVG, and
more using the savefig() function. Here's how to save a plot to a file:

import matplotlib.pyplot as plt

# Create and customize your plot


x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Saved Plot Example')
plt.legend()

#VISHNU
Save the
PRIYA plot |to
P M | PYTHON a file (e.g., PNG)
V BCA 45

plt.savefig('saved_plot.png', dpi=300)
In this example, plt.savefig('saved_plot.png', dpi=300) saves the current plot to a PNG file
named "saved_plot.png" with a resolution of 300 dots per inch (dpi). You can specify the file
format by changing the file extension (e.g., ".pdf" for PDF, ".svg" for SVG).

Common parameters for savefig():

fname: The file name and path where the plot will be saved.
dpi: The resolution in dots per inch (default is 100).
format: The file format (e.g., 'png', 'pdf', 'svg').
bbox_inches: Specifies which part of the plot to save. Use 'tight' to save the entire plot (default).
transparent: If True, the plot will have a transparent background.
orientation: For PDFs, you can specify 'portrait' or 'landscape'.

VISHNU PRIYA P M | PYTHON | V BCA 46


PLOTTING FUNCTIONS IN PANDAS.
Pandas provides a number of plotting functions that can be used to create a variety of different
plots and visualizations. These functions are simple to use and can be used to create plots with
just a few lines of code.

To use the Pandas plotting functions, you first need to import the pandas.plotting module. Once
you have imported the module, you can use the plot() function to create a plot of a Series or
DataFrame. The plot() function takes a number of keyword arguments, which can be used to
control the appearance of the plot.

Here is a simple example of how to use the Pandas plot() function to create a line chart:
import pandas as pd
import matplotlib.pyplot as plt
# Create a Series
series = pd.Series([2, 4, 6, 8, 10])
# Create a line chart of the Series
series.plot()
VISHNU PRIYA P M | PYTHON | V BCA 47

# Show the plot


plt.show()
Line Plot: Scatter Plot:
To create a line plot of a Series or Scatter plots are created similarly, but you specify
DataFrame, simply call the .plot() kind='scatter':
method on the data:
df.plot(kind='scatter', x='x', y='y')
import pandas as pd plt.xlabel('X-axis Label')
import matplotlib.pyplot as plt plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')
# Create a DataFrame plt.show()
data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8,
10]}
df = pd.DataFrame(data)

# Create a line plot


df['y'].plot()
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Line Plot')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 48
Bar Plot: Histogram:
To create a bar plot, you can use kind='bar': For histograms, use kind='hist':

df.plot(kind='bar', x='x', y='y') df['y'].plot(kind='hist', bins=5)


plt.xlabel('X-axis Label') plt.xlabel('Values')
plt.ylabel('Y-axis Label') plt.ylabel('Frequency')
plt.title('Bar Plot') plt.title('Histogram')
plt.show() plt.show()

Customization:
You can further customize your plots by using Matplotlib functions after calling .plot(). Additionally,
you can create subplots using the .subplots() method to have more control over the layout of
multiple plots.

fig, axes = plt.subplots(nrows=2, ncols=2)

df['y'].plot(ax=axes[0, 0], title='Line Plot')


df.plot(kind='scatter', x='x', y='y', ax=axes[0, 1], title='Scatter Plot')
df.plot(kind='bar', x='x', y='y', ax=axes[1, 0], title='Bar Plot')
df['y'].plot(kind='hist',
VISHNU PRIYA P M | PYTHON | V BCA
bins=5, ax=axes[1, 1], title='Histogram') 49

plt.tight_layout()

You might also like