0% found this document useful (0 votes)

2 views49 pages

Unit 4_Working With Graphs _python

The document provides a comprehensive overview of data wrangling in Python, detailing key processes such as data loading, cleaning, transformation, aggregation, and visualization using libraries like Pandas and Matplotlib. It also covers techniques for combining and merging datasets, detecting and filtering outliers, and string manipulation methods. The content serves as a practical guide for preparing and analyzing data for various applications in data science.

Uploaded by

ashupersonal12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views49 pages

Unit 4_Working With Graphs _python

Uploaded by

ashupersonal12345

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

UNIT 4

WORKING WITH GRAPHS

VISHNU PRIYA P M | PYTHON | V BCA 1

DATA WRANGLING

Data wrangling in Python refers to the process of

cleaning, transforming, and preparing raw or messy
data for analysis, visualization, or machine learning
tasks using Python programming language. It involves
a series of operations to make the data more
structured, complete, and suitable for the intended
analysis. Python provides various libraries and tools
for efficiently performing data wrangling tasks. Here
are some common steps and techniques involved in
data wrangling in Python:

VISHNU PRIYA P M | PYTHON | V BCA 2

1. Data Loading: Load the raw data into Python using libraries like Pandas (for structured
data), NumPy (for numerical data), or specialized libraries for other data formats like CSV,
Excel, JSON, or databases.

import pandas as pd

# Load data from a CSV file

df = pd.read_csv('data.csv')
2. Data Exploration: Get a preliminary understanding of the data by examining its structure,
summary statistics, and identifying missing values.

# Display the first few rows of the DataFrame

print(df.head())

# Get basic summary statistics

print(df.describe())

# Check for missing values

print(df.isnull().sum())
VISHNU PRIYA P M | PYTHON | V BCA 3
3. Data Cleaning:

Handle missing values by imputing them or dropping rows/columns with missing data.
Remove duplicates.
Correct data errors and inconsistencies.

# Drop rows with missing values

df = df.dropna()

# Remove duplicates
df = df.drop_duplicates()

# Correct data errors

df['column_name'] = df['column_name'].apply(correct_function)

VISHNU PRIYA P M | PYTHON | V BCA 4

4. Data Transformation:

Convert data types.

Normalize or scale numerical data.
Encode categorical variables.
Create new features or variables.
python
Copy code
# Convert data types
df['numeric_column'] = df['numeric_column'].astype(float)

# Normalize numerical data

df['numeric_column'] = (df['numeric_column'] - df['numeric_column'].mean()) /
df['numeric_column'].std()

# Encode categorical variables

df = pd.get_dummies(df, columns=['categorical_column'])

# Create new features

df['new_feature'] = df['feature1'] * df['feature2']
VISHNU PRIYA P M | PYTHON | V BCA 5
5. Data Aggregation and Grouping:

Aggregate data by grouping based on certain attributes.

Calculate summary statistics for groups.

# Group by a categorical variable and calculate the mean

grouped_data = df.groupby('category_column')['numeric_column'].mean()

6. Data Visualization:
Use libraries like Matplotlib or Seaborn to visualize the data, detect patterns, and gain insights.

import matplotlib.pyplot as plt

# Create a histogram
plt.hist(df['numeric_column'])
plt.xlabel('Numeric Column')
plt.ylabel('Frequency')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 6
7. Data Export:

Save the cleaned and transformed data to a new file if necessary.

# Export cleaned data to a CSV file

df.to_csv('cleaned_data.csv', index=False)

VISHNU PRIYA P M | PYTHON | V BCA 7

COMBINING AND MERGING DATA SETS IN PYTHON
Combining and merging data sets in Python is a common operation in data analysis and
manipulation. You can achieve this using various libraries, with the most popular one being
pandas. Here, I'll provide an overview of how to combine and merge data sets using pandas.

Combining Data Sets

1. Concatenation:
Concatenation is used to combine data frames either row-wise or column-wise.

a.Row-wise concatenation:
import pandas as pd

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2'], 'B':

['B0', 'B1', 'B2']})
df2 = pd.DataFrame({'A': ['A3', 'A4', 'A5'], 'B':
['B3', 'B4', 'B5']})
VISHNU PRIYA P M | PYTHON | V BCA 8

result = pd.concat([df1, df2], axis=0) #

b. Column-wise concatenation:

result = pd.concat([df1, df2], axis=1) # Concatenate along columns (axis=1)

2. Appending:

Appending is a convenient way to add rows to an existing DataFrame.

result = df1.append(df2)

VISHNU PRIYA P M | PYTHON | V BCA 9

Merging Data Sets
Merging is used to combine data frames based on common columns or indices.

1. Inner Join:
result = pd.merge(df1, df2, on='key_column', how='inner’)

2. Left Join:
result = pd.merge(df1, df2, on='key_column', how='left’)

3. Right Join:
result = pd.merge(df1, df2, on='key_column', how='right’)

4. Outer Join:
result = pd.merge(df1, df2, on='key_column', how='outer’)

5. Merging on Multiple Columns:

You can merge on multiple columns by passing a list of column names to the on parameter.
result = pd.merge(df1, df2, on=['key_column1', 'key_column2'], how='inner')

VISHNU PRIYA P M | PYTHON | V BCA 10

DATA TRANSFORMATION

Data transformation is the process of converting raw data into a format that is more suitable for
analysis, modeling, or machine learning. It is an essential step in any data science project, and
Python is a popular programming language for data transformation.

There are many different types of data transformation, but some common examples include:

Cleaning and preprocessing: This involves removing errors and inconsistencies from the data,
as well as converting the data to a consistent format.
Feature engineering: This involves creating new features from the existing data, or
transforming existing features in a way that is more informative for the task at hand.
Encoding categorical data: Categorical data, such as text or labels, needs to be converted to
numerical data before it can be used by many machine learning algorithms.
Scaling and normalization: This involves transforming the data so that all features are on a
similar scale, which can improve the performance of machine learning algorithms.
There areP Ma| PYTHON
VISHNU PRIYA number| V BCA of different Python libraries that can be used for data transformation, but
11

the most popular one is Pandas. Pandas is a powerful library for data manipulation and
analysis, and it provides a wide range of functions for data transformation.
import pandas as pd

# Load the data

df = pd.read_csv('data.csv')

# Clean the data

df = df.dropna() # Drop rows with missing values
df['age'] = df['age'].astype('int') # Convert the 'age' column to integers

# Create a new feature

df['age_group'] = df['age'].apply(lambda x: 'young' if x < 18 else 'adult')

# Encode categorical data

df['gender'] = df['gender'].map({'male': 1, 'female': 0})

# Scale the data

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
df[['height', 'weight']] = scaler.fit_transform(df[['height', 'weight']])
VISHNU PRIYA P M | PYTHON | V BCA 12
# Save the transformed data
df.to_csv('transformed_data.csv', index=False)
1. Data Cleaning:
Data cleaning involves handling missing values, removing duplicates, and correcting errors in
your dataset.

Handling Missing Values:

Pandas provides functions like dropna(), fillna(), and interpolate() to handle missing values.

import pandas as pd

# Remove rows with missing values

df.dropna()

# Fill missing values with a specific value

df.fillna(0)

# Interpolate missing values

df.interpolate()
Removing Duplicates:

Use drop_duplicates()
VISHNU PRIYA P M | PYTHON | V BCA to remove duplicate rows from your DataFrame. 13
df.drop_duplicates()
2. Data Filtering:
Filtering allows you to select a subset of data based on certain conditions.

# Filter rows where a condition is met

filtered_df = df[df['column_name'] > 10]
3. Data Aggregation:
Aggregation involves summarizing data by grouping it based on certain criteria.

# Group by a column and calculate aggregate statistics

grouped_df = df.groupby('category_column')['numeric_column'].mean()
4. Data Transformation:
Data transformation includes operations like converting data types, scaling values, or applying
mathematical functions.

# Convert data types

df['numeric_column'] = df['numeric_column'].astype(float)

# Scaling values (e.g., Min-Max scaling)

from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
VISHNU PRIYA P M | PYTHON | V BCA 14

df['scaled_column'] = scaler.fit_transform(df[['numeric_column']])
5. One-Hot Encoding:
Convert categorical variables into a numerical format using one-hot encoding.

encoded_df = pd.get_dummies(df, columns=['categorical_column’])

6. Reshaping Data:
Reshaping data includes tasks like pivoting, melting, or stacking/unstacking for better analysis.

# Pivot a DataFrame
pivoted_df = df.pivot(index='row_column', columns='column_column', values='value_column')

# Melt a DataFrame
melted_df = pd.melt(df, id_vars=['id_column'], value_vars=['var1', 'var2'], var_name='variable',
value_name='value')

VISHNU PRIYA P M | PYTHON | V BCA 15

7. Text Data Processing:
For text data, you can perform transformations such as tokenization, stemming, and stop-word
removal using libraries like NLTK or spaCy.

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

nltk.download('stopwords')
nltk.download('punkt')

# Tokenization and stop-word removal

df['text_column'] = df['text_column'].apply(lambda x: ' '.join([word for word in word_tokenize(x)

VISHNU PRIYA P M | PYTHON | V BCA 16

DETECTING AND FILTERING OUTLIERS IN PYTHON

What are Outliers in Python?

Before diving deep into the concept of outliers, let us understand the origin of raw data.

Raw data that is fed to a system is usually generated from surveys and extraction of data from
real-time actions on the web. This may give rise to variations in the data and there exists a
chance of measurement error while recording the data.

An outlier is a point or set of data points that lie away from the rest of the data values of the
dataset. That is, it is a data point(s) that appear away from the overall distribution of data
values in a dataset.

Outliers are possible only in continuous values. Thus, the detection and removal of outliers are
applicable to regression values only.
VISHNU PRIYA P M | PYTHON | V BCA 17

Basically, outliers appear to diverge from the overall proper and well structured distribution of
the data elements. It can be considered as an abnormal distribution which appears away from
Why is it necessary to remove outliers from the data?
As discussed above, outliers are the data points that lie away from the usual distribution of the
data and causes the below effects on the overall data distribution:

Affects the overall standard variation of the data.

Manipulates the overall mean of the data.
Converts the data to a skewed form.
It causes bias in the accuracy estimation of the machine learning model.
Affects the distribution and statistics of the dataset.

Detection of Outliers – IQR approach

The outliers in the dataset can be detected by the below methods:

•Z-score
•Scatter Plots
•Interquartile range(IQR)

VISHNU PRIYA P M | PYTHON | V BCA 18

1. Visual Inspection:
Start by visualizing your data using histograms, box plots, scatter plots, or other visualization
techniques. Outliers often appear as points far from the main cluster or as values outside the
whiskers of box plots. Visualization can help you identify potential outliers.

import matplotlib.pyplot as plt

import seaborn as sns

# Box plot to visualize outliers

sns.boxplot(x=df['column_name'])
plt.show()

2. Z-Score:
The Z-score measures how far a data point is from the mean in terms of standard deviations. You
can use the Z-score to detect outliers. Typically, data points with a Z-score greater than a
threshold (e.g., 2 or 3) are considered outliers.

from scipy import stats

z_scores = stats.zscore(df['column_name'])
VISHNU PRIYA P M | PYTHON | V BCA 19
outliers = df[abs(z_scores) > 2]
filtered_data = df[abs(z_scores) <= 2]
3. IQR (Interquartile Range) Method:
The IQR method involves calculating the IQR (the difference between the 75th percentile and
the 25th percentile) and identifying outliers as values outside a specified range.

Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)
IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

outliers = df[(df['column_name'] < lower_bound) | (df['column_name'] > upper_bound)]

filtered_data = df[(df['column_name'] >= lower_bound) & (df['column_name'] <=
upper_bound)]

VISHNU PRIYA P M | PYTHON | V BCA 20

4. Tukey's Fences:
Tukey's Fences method is similar to the IQR method but uses a different threshold for identifying
outliers.

Q1 = df['column_name'].quantile(0.25)
Q3 = df['column_name'].quantile(0.75)

lower_fence = Q1 - 3 * (Q3 - Q1)

upper_fence = Q3 + 3 * (Q3 - Q1)

outliers = df[(df['column_name'] < lower_fence) | (df['column_name'] > upper_fence)]

filtered_data = df[(df['column_name'] >= lower_fence) & (df['column_name'] <= upper_fence)]

VISHNU PRIYA P M | PYTHON | V BCA 21

5. Machine Learning-Based Methods:
You can also use machine learning models, such as Isolation Forest or One-Class SVM, to detect
outliers in your data.

from sklearn.ensemble import IsolationForest

clf = IsolationForest(contamination=0.05) # Adjust contamination based on your dataset

outliers = clf.fit_predict(df[['column_name']])
outliers = df[outliers == -1]
filtered_data = df[outliers == 1]

VISHNU PRIYA P M | PYTHON | V BCA 22

STRING MANIPULATION

String manipulation in Python involves performing various operations on strings, such as

concatenation, slicing, searching, replacing, formatting, and more. Python provides a rich set of
string manipulation methods and functions that make it easy to work with text data. Here are
some common string manipulation techniques in Python:

1. String Concatenation:
You can concatenate strings using the + operator or by using the str.join() method.

str1 = "Hello"
str2 = "World"
result = str1 + ", " + str2 # Using the + operator
words = ["Hello", "World"]
result = ", ".join(words) # Using join
VISHNU PRIYA P M | PYTHON | V BCA 23
2. String Slicing:
String slicing allows you to extract substrings from a string based on their positions.

text = "Python Programming"

substring = text[7:18] # Extract "Programming"

3. String Searching:
You can search for substrings within a string using methods like str.find(), str.index(), or regular
expressions with the re module.

text = "Python is a powerful programming language"

position = text.find("powerful") # Find the position of "powerful"

4. String Replacement:
Replace specific substrings within a string using the str.replace() method.

text = "Python is a great programming language"

new_text = text.replace("great", "powerful") # Replace "great" with "powerful"
VISHNU PRIYA P M | PYTHON | V BCA 24
5. String Formatting:
You can format strings using f-strings (Python 3.6+), the .format() method, or the % operator.

name = "Alice"
age = 30
formatted_str = f"My name is {name} and I am {age} years old."
name = "Bob"
age = 25
formatted_str = "My name is {} and I am {} years old.".format(name, age)

6. String Splitting:
Split a string into a list of substrings using the str.split() method.

text = "Python,Java,C++,JavaScript"
languages = text.split(",") # Split by comma

VISHNU PRIYA P M | PYTHON | V BCA 25

7. String Stripping:
Remove leading and trailing whitespace characters using str.strip() or str.lstrip() and str.rstrip().

text = " Python is awesome! "

cleaned_text = text.strip() # Remove leading and trailing spaces

8. String Case Conversion:

Convert the case of a string using methods like str.lower(), str.upper(), or str.capitalize().

text = "Hello World"

lower_case = text.lower()
upper_case = text.upper()
capitalized = text.capitalize()

VISHNU PRIYA P M | PYTHON | V BCA 26

Syntax Function
string.upper() To transform all the characters of the
string into uppercase.

string.lower() To transform all the characters of the

string into lowercase.

string.title() To transform the first letter of a word

into the upper case and the rest of the
characters into the lower case.
string.swapcase() To transform the upper case characters
into lower case and vice versa.
string.capitalize() To transform the first character in the
string to the upper case.
string.isupper() Returns true if all the alphabetic
characters of the string are upper case.
string.islower() Returns true if all the alphabetic
characters of the string are lower case.
string.Endswith() Return true if the string ends with a
specific value.
string.Startswith() Return true if the string starts with a
VISHNU PRIYA P M | PYTHON | V BCA
specific value. 27

string.index(‘character’) Return the position of the character.

VECTORIZED STRING FUNCTIONS
Vectorized string functions in pandas allow you to efficiently perform operations on string data
within a pandas DataFrame or Series. These functions are accessed through the .str attribute of
a pandas Series and enable you to apply string operations element-wise. Here are some
commonly used vectorized string functions in pandas:
import pandas as pd

series = pd.Series(['Alice', 'Bob', 'Carol']) 0 alice

series_lowercase = series.str.lower() 1 bob
2 carol
print(series_lowercase) dtype: object

VISHNU PRIYA P M | PYTHON | V BCA 28

Some of the most commonly used vectorized string functions in Pandas include:

str.lower(): Convert all strings to lowercase.

str.upper(): Convert all strings to uppercase.
str.strip(): Remove whitespace from the beginning and end of all strings.
str.split(): Split all strings into a list of strings, using a specified separator.
str.replace(): Replace all occurrences of a specified substring with another substring in all
strings.
str.contains(): Return a boolean Series indicating whether each string contains a specified
substring.
Vectorized string functions can also be used to perform more complex string operations, such as
regular expression matching and extraction. For example, the following code uses the
str.extract() function to extract the first name from each email address in a Series:

import pandas as pd

series = pd.Series(['[email protected]',
'[email protected]', '[email protected]']) Output:
first_names = series.str.extract(r'(?P<first_name>\
w+)@example.com') 0 alice
VISHNU PRIYA P M | PYTHON | V BCA 1 bob 29

print(first_names) 2 carol
dtype: object
PLOTTING AND VISUALIZATION

Plotting and visualization are crucial for understanding and communicating data. In Python, there
are several libraries for creating plots and visualizations, with Matplotlib, Seaborn, and Plotly
being some of the most popular ones. Here's an overview of how to create plots and
visualizations in Python:
Matplotlib
Matplotlib is an easy-to-use, low-level data visualization library that is built on NumPy arrays. It
consists of various plots like scatter plot, line plot, histogram, etc. Matplotlib provides a lot of
flexibility.
pip install matplotlib

VISHNU PRIYA P M | PYTHON | V BCA 30

Basic Line Plot:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Line Plot')
plt.show()

VISHNU PRIYA P M | PYTHON | V BCA 31

Seaborn:
Seaborn is built on top of Matplotlib and provides a high-level interface for creating informative
and attractive statistical graphics.

Installation:

pip install seaborn

Basic Scatter Plot:

import seaborn as sns

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

sns.scatterplot(x=x, y=y)
plt.xlabel('X-axis')
plt.ylabel('Y-axis')
plt.title('Scatter Plot')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 32
Plotly:
Plotly is a powerful library for creating interactive and web-based visualizations. It is often used
for creating dashboards and web applications.

Installation:

pip install plotly

Basic Bar Chart:

import plotly.express as px

data = {'Category': ['A', 'B', 'C', 'D'],

'Values': [10, 20, 15, 30]}

fig = px.bar(data, x='Category', y='Values', title='Bar Chart')

fig.show()

VISHNU PRIYA P M | PYTHON | V BCA 33

Other Libraries:
Pandas: Pandas also provides basic plotting functionality through the .plot() method for
DataFrames, making it convenient for quick exploratory data analysis.

Bokeh: Bokeh is another library for interactive web-based visualizations and is well-suited for
creating interactive dashboards.

Altair: Altair is a declarative statistical visualization library for Python, making it easy to create
complex visualizations with concise code.

ggplot (ggpy): ggplot is a Python implementation of the popular ggplot2 package from R, which
uses a grammar of graphics to create plots.

VISHNU PRIYA P M | PYTHON | V BCA 34

MATPLOTLIB API PRIMER

Matplotlib is a popular Python library for creating static, animated, and interactive
visualizations. When creating plots with Matplotlib, you can customize the appearance of your
lines, markers, and line styles. Here's a primer on how to do that:

Colors:
Matplotlib allows you to specify colors for lines, markers, and other plot elements in several
ways:

Named Colors: You can use named colors like 'red', 'blue', 'green', etc.
import matplotlib.pyplot as plt

plt.plot(x, y, color='red', label='Red Line')

RGB Values: You can use RGB tuples to specify colors.
plt.plot(x, y, color=(0.1, 0.2, 0.3), label='Custom Color')
Hexadecimal Colors:
VISHNU PRIYA P M | PYTHON | V BCA You can also use hexadecimal color codes. 35

plt.plot(x, y, color='#FF5733', label='Hex Color')

Markers:
Markers are used to indicate specific data points on a plot. You can customize markers in
Matplotlib:

plt.plot(x, y, marker='o', markersize=8, markerfacecolor='yellow', markeredgecolor='black',

label='Custom Marker')
marker: Specifies the marker style (e.g., 'o' for circles, 's' for squares, 'x' for crosses).
markersize: Sets the size of the marker.
markerfacecolor: Sets the marker's fill color.
markeredgecolor: Sets the marker's edge color.

VISHNU PRIYA P M | PYTHON | V BCA 36

Line Styles:
You can customize the line style of your plot:

plt.plot(x, y, linestyle='--', linewidth=2, label='Dashed Line')

linestyle: Specifies the line style (e.g., '-', '--', '-.', ':').
linewidth: Sets the width of the line.

VISHNU PRIYA P M | PYTHON | V BCA 37

Line Styles:
You can customize the appearance of lines in your plot using various line styles, markers, and
colors. Here's an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

# Customizing line style with color, marker, and linestyle

plt.plot(x, y, color='blue', marker='o', linestyle='--', markersize=8, label='Custom Line')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label') In the above example:
plt.title('Custom Line Style')
plt.legend() color: Sets the line color.
plt.show() marker: Specifies the marker style (e.g., 'o' for
circles, 's' for squares).
linestyle: Sets the line style ('--' for dashed, ':' for
VISHNU PRIYA P M | PYTHON | V BCA dotted, etc.). 38

markersize: Adjusts the size of markers.

Ticks and Labels:
You can customize tick locations and labels on the x and y axes using xticks() and yticks()
functions:
import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Customizing x-axis ticks and labels

plt.xticks([1, 2, 3, 4, 5], ['A', 'B', 'C', 'D', 'E'])

# Customizing y-axis ticks and labels

plt.yticks([2, 4, 6, 8, 10], ['Low', 'Medium', 'High', 'Very High', 'Max'])

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Custom Ticks and Labels') In this example:
plt.legend()
VISHNU PRIYA P M | PYTHON | V BCA 39
plt.show() xticks() and yticks() specify the locations and
labels for the ticks on the x and y axes,
Legends:
To add legends to your plot, you can use the legend() function. You should also label your plotted
lines or data points using the label parameter when creating the plot.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y1 = [2, 4, 6, 8, 10]
y2 = [1, 3, 5, 7, 9]

plt.plot(x, y1, label='Line 1')

plt.plot(x, y2, label='Line 2')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Legend Example')
plt.legend()
plt.show()

In this example:
VISHNU PRIYA P M | PYTHON | V BCA 40

label is provided when creating each line, and

legend() is called to display the legend.
ANNOTATIONS AND DRAWING ON A SUBPLOT

Annotations and drawing on a subplot in Matplotlib allow you to add textual or graphical
elements to your plots to provide additional information or highlight specific points of interest.
Here's how you can add annotations and draw on a subplot:

VISHNU PRIYA P M | PYTHON | V BCA 41

Adding Text Annotations:
You can add text annotations to your plot using the
text() function. Here's an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Adding a text annotation

plt.text(3, 7, 'Annotation Here', fontsize=12,
color='red')

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Text Annotation Example')
plt.legend()
VISHNU PRIYA P M | PYTHON | V BCA
plt.show() In this example, plt.text(x, y, text, fontsize, color)
42

is used to add a text annotation at coordinates (3,

7) with the specified text, fontsize, and color.
Adding Arrows with Annotations:
You can add arrows to point to specific locations on your plot using the annotate() function. Here's
an example:

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]

plt.plot(x, y, label='Line Plot')

# Adding an arrow with annotation

plt.annotate('Important Point', xy=(3, 6), xytext=(4, 8), fontsize=12,
arrowprops=dict(arrowstyle='->', color='blue'))

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Arrow Annotation Example')
plt.legend()
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 43

In this example, plt.annotate(text, xy, xytext, fontsize, arrowprops) is used to add an arrow with
text annotation. xy specifies the point being pointed to, and xytext specifies the location of the
Drawing Shapes:
You can draw various shapes, lines, and polygons on a subplot using Matplotlib's plotting
functions. For example, to draw a rectangle:

import matplotlib.pyplot as plt

import matplotlib.patches as patches

# Create a subplot
fig, ax = plt.subplots()

# Add a rectangle
rectangle = patches.Rectangle((1, 2), 2, 4, linewidth=2, edgecolor='red', facecolor='none')
ax.add_patch(rectangle)

plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Drawing Shapes Example')
plt.show()
In this example, we create a subplot, add a rectangle using patches.Rectangle(), and then add it
to the
VISHNU PRIYAplot with| V ax.add_patch().
P M | PYTHON BCA 44
SAVING PLOTS TO FILE

You can save plots created with Matplotlib to various file formats such as PNG, PDF, SVG, and
more using the savefig() function. Here's how to save a plot to a file:

import matplotlib.pyplot as plt

# Create and customize your plot

x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
plt.plot(x, y, label='Line Plot')
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Saved Plot Example')
plt.legend()

#VISHNU
Save the
PRIYA plot |to
P M | PYTHON a file (e.g., PNG)
V BCA 45

plt.savefig('saved_plot.png', dpi=300)
In this example, plt.savefig('saved_plot.png', dpi=300) saves the current plot to a PNG file
named "saved_plot.png" with a resolution of 300 dots per inch (dpi). You can specify the file
format by changing the file extension (e.g., ".pdf" for PDF, ".svg" for SVG).

Common parameters for savefig():

fname: The file name and path where the plot will be saved.
dpi: The resolution in dots per inch (default is 100).
format: The file format (e.g., 'png', 'pdf', 'svg').
bbox_inches: Specifies which part of the plot to save. Use 'tight' to save the entire plot (default).
transparent: If True, the plot will have a transparent background.
orientation: For PDFs, you can specify 'portrait' or 'landscape'.

VISHNU PRIYA P M | PYTHON | V BCA 46

PLOTTING FUNCTIONS IN PANDAS.
Pandas provides a number of plotting functions that can be used to create a variety of different
plots and visualizations. These functions are simple to use and can be used to create plots with
just a few lines of code.

To use the Pandas plotting functions, you first need to import the pandas.plotting module. Once
you have imported the module, you can use the plot() function to create a plot of a Series or
DataFrame. The plot() function takes a number of keyword arguments, which can be used to
control the appearance of the plot.

Here is a simple example of how to use the Pandas plot() function to create a line chart:
import pandas as pd
import matplotlib.pyplot as plt
# Create a Series
series = pd.Series([2, 4, 6, 8, 10])
# Create a line chart of the Series
series.plot()
VISHNU PRIYA P M | PYTHON | V BCA 47

# Show the plot

plt.show()
Line Plot: Scatter Plot:
To create a line plot of a Series or Scatter plots are created similarly, but you specify
DataFrame, simply call the .plot() kind='scatter':
method on the data:
df.plot(kind='scatter', x='x', y='y')
import pandas as pd plt.xlabel('X-axis Label')
import matplotlib.pyplot as plt plt.ylabel('Y-axis Label')
plt.title('Scatter Plot')
# Create a DataFrame plt.show()
data = {'x': [1, 2, 3, 4, 5], 'y': [2, 4, 6, 8,
10]}
df = pd.DataFrame(data)

# Create a line plot

df['y'].plot()
plt.xlabel('X-axis Label')
plt.ylabel('Y-axis Label')
plt.title('Line Plot')
plt.show()
VISHNU PRIYA P M | PYTHON | V BCA 48
Bar Plot: Histogram:
To create a bar plot, you can use kind='bar': For histograms, use kind='hist':

df.plot(kind='bar', x='x', y='y') df['y'].plot(kind='hist', bins=5)

plt.xlabel('X-axis Label') plt.xlabel('Values')
plt.ylabel('Y-axis Label') plt.ylabel('Frequency')
plt.title('Bar Plot') plt.title('Histogram')
plt.show() plt.show()

Customization:
You can further customize your plots by using Matplotlib functions after calling .plot(). Additionally,
you can create subplots using the .subplots() method to have more control over the layout of
multiple plots.

fig, axes = plt.subplots(nrows=2, ncols=2)

df['y'].plot(ax=axes[0, 0], title='Line Plot')

df.plot(kind='scatter', x='x', y='y', ax=axes[0, 1], title='Scatter Plot')
df.plot(kind='bar', x='x', y='y', ax=axes[1, 0], title='Bar Plot')
df['y'].plot(kind='hist',
VISHNU PRIYA P M | PYTHON | V BCA
bins=5, ax=axes[1, 1], title='Histogram') 49

plt.tight_layout()

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Oxford Mathematics PYP 4
No ratings yet
Oxford Mathematics PYP 4
155 pages
Core Web Programming Volumes I II Includes index 2nd ed Edition Hall download pdf
100% (22)
Core Web Programming Volumes I II Includes index 2nd ed Edition Hall download pdf
85 pages
Pandas Cheat Sheet PDF
67% (3)
Pandas Cheat Sheet PDF
1 page
Chapter 1-Number Sense
100% (1)
Chapter 1-Number Sense
5 pages
Cambridge International General Certificate of Secondary Education
No ratings yet
Cambridge International General Certificate of Secondary Education
12 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Python Basics Refresher
No ratings yet
Python Basics Refresher
19 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
datascience
No ratings yet
datascience
26 pages
DSBDAL
No ratings yet
DSBDAL
87 pages
DAP_3_module
No ratings yet
DAP_3_module
62 pages
NumPy and Pandas (1)
No ratings yet
NumPy and Pandas (1)
12 pages
Data Wrangling
No ratings yet
Data Wrangling
15 pages
What is pandas
No ratings yet
What is pandas
9 pages
Data Wrangling
No ratings yet
Data Wrangling
13 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Unit-2 Bda
No ratings yet
Unit-2 Bda
11 pages
Chapter2 - Data Wrangling
No ratings yet
Chapter2 - Data Wrangling
48 pages
data handling module
No ratings yet
data handling module
10 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
04-Data Manipulation With Pandas
No ratings yet
04-Data Manipulation With Pandas
28 pages
Python Cheat Sheet Code Academy
100% (1)
Python Cheat Sheet Code Academy
1 page
Pandas
No ratings yet
Pandas
94 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
dataframing_in_csv
No ratings yet
dataframing_in_csv
14 pages
EDA - Exploratory Data Analysis
No ratings yet
EDA - Exploratory Data Analysis
16 pages
FDS RECORD-1-4
No ratings yet
FDS RECORD-1-4
18 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
No ratings yet
1.2.1. Retrieving Data - 1.2.2. Cleaning Data
35 pages
Introduction to Pandas Programming 2
No ratings yet
Introduction to Pandas Programming 2
3 pages
python interviews
No ratings yet
python interviews
154 pages
Course_ Introduction to Data Science (SD211105)
No ratings yet
Course_ Introduction to Data Science (SD211105)
10 pages
Unit 4 Fod
100% (1)
Unit 4 Fod
21 pages
Usage of NumPy for Numerical Data in Detail
No ratings yet
Usage of NumPy for Numerical Data in Detail
52 pages
ds with py
No ratings yet
ds with py
39 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
No ratings yet
DR Kruti Dangarwala CSE & IT Department Svmit: Python For Data Science Unit 5: Data Wrangling
91 pages
Prac 7
No ratings yet
Prac 7
5 pages
CSE445 NSU Week_3
No ratings yet
CSE445 NSU Week_3
48 pages
Python Notes by Prof T
No ratings yet
Python Notes by Prof T
10 pages
dm(2)
No ratings yet
dm(2)
3 pages
Lecture 4 Data Pre-Processing
No ratings yet
Lecture 4 Data Pre-Processing
43 pages
Advanced Python Lab
No ratings yet
Advanced Python Lab
17 pages
DevOps Session 3 Pandas.pptx
No ratings yet
DevOps Session 3 Pandas.pptx
33 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
DAwHPC L03 Data Cleaning Practical
No ratings yet
DAwHPC L03 Data Cleaning Practical
43 pages
Exp1
No ratings yet
Exp1
3 pages
# (Data Preprocessing) : (Cheatsheet)
No ratings yet
# (Data Preprocessing) : (Cheatsheet)
10 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Lesson 2 - Data Preprocessing
100% (1)
Lesson 2 - Data Preprocessing
72 pages
Advanced Python Programming Data Science: The University of Sheffield
No ratings yet
Advanced Python Programming Data Science: The University of Sheffield
55 pages
Report
No ratings yet
Report
18 pages
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
No ratings yet
Cheat Sheet: The Pandas Dataframe Object I: Preliminaries Get Your Data Into A Dataframe
12 pages
Building Good Training Sets UNIT 1 PART2
No ratings yet
Building Good Training Sets UNIT 1 PART2
46 pages
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
From Everand
Learning Pandas 2.0: A Comprehensive Guide to Data Manipulation and Analysis for Data Scientists and Machine Learning Professionals
Matthew Rosch
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Advanced C Concepts and Programming: First Edition
From Everand
Advanced C Concepts and Programming: First Edition
Gayatri
3/5 (1)
Data Science Programming In Python
From Everand
Data Science Programming In Python
Anita Raichand
No ratings yet
Python for Data Science: A Hands-On Introduction
From Everand
Python for Data Science: A Hands-On Introduction
Yuli Vasiliev
No ratings yet
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
From Everand
THE SQL LANGUAGE: Master Database Management and Unlock the Power of Data (2024 Beginner's Guide)
JAMIE POWERS
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Darwin and Deep Time: Temporal Scales and The Naturalist's Imagination
No ratings yet
Darwin and Deep Time: Temporal Scales and The Naturalist's Imagination
11 pages
JHS Calendar of Activities 2022-23
No ratings yet
JHS Calendar of Activities 2022-23
4 pages
Science Diagram Rubric
No ratings yet
Science Diagram Rubric
1 page
Romeo and Juliet Essay Prompt and Rubric
No ratings yet
Romeo and Juliet Essay Prompt and Rubric
3 pages
User Guide
No ratings yet
User Guide
1,657 pages
Yassine MISSAOUI EOP REPORT
No ratings yet
Yassine MISSAOUI EOP REPORT
69 pages
ENG10-Q3-Module 4 - Lesson 1-Critiquing-a-Literary-Selection-1 DIGITAL
100% (2)
ENG10-Q3-Module 4 - Lesson 1-Critiquing-a-Literary-Selection-1 DIGITAL
24 pages
Paper 4 - WS-6
No ratings yet
Paper 4 - WS-6
8 pages
Codigos SKEY para Modelagem
100% (1)
Codigos SKEY para Modelagem
8 pages
Volume XLVIII February 2012: Interim Edition of Roman Pontifical To Be Published
No ratings yet
Volume XLVIII February 2012: Interim Edition of Roman Pontifical To Be Published
4 pages
A Spotlight On The New .NET Connector 3.0
No ratings yet
A Spotlight On The New .NET Connector 3.0
9 pages
Final List1
No ratings yet
Final List1
41 pages
Umrah Guide
No ratings yet
Umrah Guide
7 pages
Readme
No ratings yet
Readme
7 pages
iNLPCenter Life Metaphor
No ratings yet
iNLPCenter Life Metaphor
15 pages
Clerical Operations_ Alphabetization
No ratings yet
Clerical Operations_ Alphabetization
37 pages
Introduction To ADB'S Management Action Record System (Mars) and Lessons Database
No ratings yet
Introduction To ADB'S Management Action Record System (Mars) and Lessons Database
25 pages
Setting Up Internet Access Server On Basis of Mikrotik Routeros and Isp Billing System Netup Utm5
No ratings yet
Setting Up Internet Access Server On Basis of Mikrotik Routeros and Isp Billing System Netup Utm5
9 pages
Os Lab Manual
No ratings yet
Os Lab Manual
111 pages
Unbound Solution ... Sardar Dler
No ratings yet
Unbound Solution ... Sardar Dler
14 pages
Streaming Stored Audio & Video
No ratings yet
Streaming Stored Audio & Video
12 pages
Os Unit II Unit 2
No ratings yet
Os Unit II Unit 2
42 pages
Chapter 3 PDF
No ratings yet
Chapter 3 PDF
30 pages
Sta5176 SAS Exam1 Fall 2024 Report Nafees
No ratings yet
Sta5176 SAS Exam1 Fall 2024 Report Nafees
8 pages
Chapter 5 - Verbs - Présent
No ratings yet
Chapter 5 - Verbs - Présent
34 pages
IPLINFO
No ratings yet
IPLINFO
61 pages