0% found this document useful (0 votes)
43 views19 pages

DVA Practical

The document provides a comprehensive introduction to data visualization, emphasizing its importance in analytics and detailing common chart types such as bar, line, scatter, histogram, box plot, and pie chart. It also covers tools and libraries like Matplotlib, Seaborn, and Plotly for creating various visualizations, alongside techniques for dataset loading, exploration, cleaning, and preparation. Additionally, it discusses advanced visualization techniques, multivariate analysis, and time series analysis using real-world datasets.

Uploaded by

laxmipriya1521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views19 pages

DVA Practical

The document provides a comprehensive introduction to data visualization, emphasizing its importance in analytics and detailing common chart types such as bar, line, scatter, histogram, box plot, and pie chart. It also covers tools and libraries like Matplotlib, Seaborn, and Plotly for creating various visualizations, alongside techniques for dataset loading, exploration, cleaning, and preparation. Additionally, it discusses advanced visualization techniques, multivariate analysis, and time series analysis using real-world datasets.

Uploaded by

laxmipriya1521
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Q1.

Introduction to Data Visualization


• Understand the importance of data visualization in analytics.
• Overview of common chart types: bar, line, scatter, histogram, box plot, pie chart.

Answer:

What is Data Visualization?

Data visualization is the graphical representation of information and data. It helps in:

• Understanding patterns and trends in the data

• Communicating insights clearly and effectively

• Making data-driven decisions

Why is it Important?

• Simplifies complex data

• Reveals patterns that aren't obvious in raw data

• Helps detect outliers and anomalies

• Facilitates storytelling with data

Common Chart Types import pandas as pd

import [Link] as plt import seaborn as

sns

# Load dataset df = pd.read_csv('[Link]')

# Preview data [Link]()

1. Bar Chart

Use: To compare quantities across categories.

# Count of passengers by class [Link](data=df, x='Pclass')

[Link]('Passenger Count by Class') [Link]('Class')

[Link]('Count') [Link]()

2. Line Chart

Use: To track changes over time. # Simulate some time data df['PassengerId'] =

pd.to_datetime(df['PassengerId'], unit='D', origin='1900-01-01')

[Link](df['PassengerId'].[Link])['Fare'].mean().plot() [Link]('Average Fare Over Time')

[Link]('Year') [Link]('Average Fare') [Link]()


3. Scatter Plot

Use: To show relationship between two numeric variables.

[Link](data=df, x='Age', y='Fare') [Link]('Age vs

Fare') [Link]()

4. Histogram
Use: To view the distribution of a single numeric variable.

[Link](data=df, x='Age', bins=30, kde=True) [Link]('Age

Distribution') [Link]()

5. Box Plot

Use: To show distribution and detect outliers.

[Link](data=df, x='Pclass', y='Age') [Link]('Age

Distribution by Class') [Link]()

6. Pie Chart

Use: To show proportion. # Pie chart of survival survived_counts =

df['Survived'].value_counts() labels = ['Not Survived', 'Survived'] [Link](survived_counts,

labels=labels, autopct='%1.1f%%', startangle=140) [Link]('Survival Rate') [Link]('equal')

[Link]()
Q2. Tools and Libraries for Visualization

• Introduction to Python libraries: Matplotlib, Seaborn, and Plotly.


• Install necessary libraries and understand their use cases.

Answer:

Library Use Case Strengths

Matplotlib Base library for all plots Highly customizable, good for static charts

Seaborn Statistical visualization Clean, attractive default themes, simplifies complex plots

Plotly Interactive plots Great for dashboards and web apps

Installing the Libraries

Open your terminal or Jupyter Notebook and install the following:

pip install matplotlib seaborn plotly

1. Matplotlib – The Foundation

Overview: It’s the base library used to create static, animated, and interactive plots in Python. import [Link]

as plt

# Simple line chart x = [1, 2, 3, 4] y =

[10, 20, 25, 30]

[Link](x, y) [Link]("Simple Line Plot")

[Link]("X-axis") [Link]("Y-axis")

[Link](True) [Link]()

2. Seaborn – Built on Matplotlib

Overview: Makes it easier to create beautiful and informative statistical plots.

import seaborn as sns import pandas as pd

# Load example dataset df =

sns.load_dataset('tips')

# Seaborn scatter plot [Link](data=df, x='total_bill', y='tip',

hue='sex') [Link]("Total Bill vs Tip by Gender") [Link]()


3. Plotly – For Interactive Plots

Overview: Best for interactive, zoomable, and hoverable plots. Excellent for web apps and dashboards.

import [Link] as px

# Load built-in dataset df = [Link]()

# Interactive scatter plot fig = [Link](df, x='sepal_width', y='sepal_length', color='species', title="Iris Sepal

Dimensions") [Link]()

Note: Plotly works in Jupyter Notebooks and browser-based apps by default. No need for [Link]().
Q3. Dataset Loading and Exploration
• Load real-world datasets using Pandas.
• Use .head(), .tail(), .info(), .describe() to explore data.

Answer:

Loading a Dataset import pandas as pd

# Load Titanic dataset df =

pd.read_csv("[Link]")

# Show the first 5 rows [Link]()

Exploring the Dataset .head() – View the first few

rows [Link](3) # First 3 rows

.tail() – View the last few rows [Link](3) # Last 3 rows

.info() – Overview of columns, data types, non-null counts [Link]()

.describe() – Summary statistics for numeric columns [Link]()


Q4. Understanding Variable Types

• Differentiate between categorical, numerical, discrete, and continuous variables.


• Identify types of variables in a dataset.

Answer:

Types of Variables
Type Description Examples

Categorical Represent categories or groups Gender, Class, Embarked

Numerical Represent measurable quantities Age, Fare

➤ Discrete Countable values (integers) Number of siblings, Pclass


➤ Continuous Measurable values (fractions allowed) Age, Fare

Let's Work with the Titanic Dataset import pandas as

pd # Load dataset df = pd.read_csv('[Link]')

[Link]()

Identify Variable Types # Check data types

[Link]
Q5. Data Cleaning and Preparation for Visualization
• Handle missing values, remove duplicates, and convert data types.
• Prepare clean data for analysis and plotting.

Answer:

Step 1: Handling Missing Values Identify Missing Values

[Link]().sum()

Drop or Fill Missing Values

Drop missing rows (when too many nulls or rows aren't crucial):

df_cleaned = [Link](subset=['Embarked'])

Fill missing values (with mean, median, or mode): df['Age'].fillna(df['Age'].median(), inplace=True)

df['Embarked'].fillna(df['Embarked'].mode()[0], inplace=True)

Step 2: Removing Duplicates # Check and remove

duplicates print("Duplicates:", [Link]().sum())

df.drop_duplicates(inplace=True)

Step 3: Convert Data Types

Ensure columns are in correct format: # Convert Survived to

category df['Survived'] = df['Survived'].astype('category')

# Convert Embarked to category df['Embarked'] =

df['Embarked'].astype('category')

# Confirm changes [Link]

Clean Data Ready! # Final check

print([Link]()) print([Link]().sum())
Q6. Creating Basic Plots Using Matplotlib

• Plot line charts, bar charts, histograms using Matplotlib.

• Customize plots with titles, labels, legends, and colors.

Answer:

import pandas as pd import [Link] as

plt

# Load dataset df = pd.read_csv("[Link]")

1. Line Chart # Average fare by class fare_by_class =

[Link]('Pclass')['Fare'].mean()

# Plot line chart [Link](fare_by_class.index, fare_by_class.values, color='green', marker='o', linestyle='--')

[Link]('Average Fare by Passenger Class') [Link]('Passenger Class') [Link]('Average Fare')

[Link](True) [Link]([1, 2, 3]) [Link]()

2. Bar Chart

# Count of passengers per class class_counts =

df['Pclass'].value_counts().sort_index()

# Bar chart [Link](class_counts.index, class_counts.values, color=['skyblue', 'salmon', 'lightgreen'])

[Link]('Passenger Count by Class') [Link]('Passenger Class')

[Link]('Count') [Link]([1, 2, 3]) [Link]()

3. Histogram

# Drop missing values in 'Age' ages =

df['Age'].dropna()

# Histogram [Link](ages, bins=20, color='purple', edgecolor='black')

[Link]('Age Distribution of Passengers') [Link]('Age')

[Link]('Frequency') [Link](axis='y', alpha=0.5)

[Link]()
Q7. Advanced Visualization Using Seaborn

• Create scatter plots, box plots, violin plots, and pair plots.

• Use hue, style, and palette for deeper analysis.

Answer:

import seaborn as sns import pandas as pd

import [Link] as plt

# Load Titanic dataset df = sns.load_dataset('titanic') # built-in

dataset

1. Scatter Plot [Link](data=df, x='age', y='fare', hue='sex', style='class',

palette='Set2')

[Link]("Age vs Fare by Gender and Class") [Link]()

2. Box Plot [Link](data=df, x='class', y='age', hue='sex', palette='coolwarm')

[Link]("Age Distribution by Class and Gender") [Link]()

3. Violin Plot [Link](data=df, x='class', y='age', hue='sex', split=True,

palette='muted')

[Link]("Age Distribution by Class and Gender (Violin Plot)") [Link]()

4. Pair Plot [Link](df[['age', 'fare', 'survived', 'sex']], hue='sex', palette='husl')

[Link]("Pairwise Relationships", y=1.02) [Link]()


Q8. Multivariate Analysis with Seaborn

• Heatmaps and correlation matrices to analyze relationships between multiple variables.

• Apply [Link]() and [Link]().

Answer:

import seaborn as sns import pandas as pd

import [Link] as plt

# Load dataset df = sns.load_dataset('titanic')

1. Correlation Matrix # Select numeric columns only num_df =

df.select_dtypes(include='number')
# Compute correlation matrix corr_matrix = num_df.corr()

# Display correlation matrix print(corr_matrix)

2. Heatmap Using [Link]()

[Link](figsize=(10, 6)) [Link](corr_matrix, annot=True, fmt=".2f", cmap="coolwarm",

linewidths=0.5)

[Link]("Correlation Heatmap - Titanic Numeric Features") [Link]()

3. Pairplot (Again, But for Multivariate) [Link](df[['age', 'fare', 'pclass', 'survived']],

hue='survived', palette='Set1') [Link]("Pairwise Plot of Age, Fare, Pclass, and Survival",

y=1.02) [Link]()
Q9. Time Series and Trend Analysis

• Plot time-based data using Pandas and Matplotlib.

• Perform trend analysis and plot rolling averages.

• Select a real dataset (e.g., COVID-19, IPL stats, sales data).

Answer:

import pandas as pd import [Link] as

plt import numpy as np

# Load dataset df = pd.read_csv("[Link]")

# Create a fake 'Date' column (spread over 100 days before April 15, 1912) df['Date'] =

pd.date_range(end="1912-04-15", periods=len(df))

# Sort by date df.sort_values('Date', inplace=True)

# Group by date and count passengers daily_passengers =

[Link]('Date').size()

# Plotting daily passenger entries [Link](figsize=(12, 5)) daily_passengers.plot(kind='line',

title='Simulated Passenger Entries Over Time') [Link]("Date") [Link]("Number of

Passengers") [Link](True) [Link]()

B. Rolling Averages (Trend Smoothing)

# 7-day rolling average


rolling_avg = daily_passengers.rolling(window=7).mean()

[Link](figsize=(12, 5)) [Link](daily_passengers, label='Daily Count')

[Link](rolling_avg, label='7-Day Rolling Average', color='red') [Link]("Trend of

Simulated Passenger Entries (with Smoothing)") [Link]("Date")

[Link]("Passenger Count") [Link]() [Link](True) [Link]()

You might also like