0% found this document useful (0 votes)
48 views52 pages

Ad3301 - Dev Lab

The document outlines the vision and mission of R.V.S College of Engineering and its Department of Artificial Intelligence and Data Science, emphasizing the development of competent engineers through high-quality education and industry collaboration. It includes program-specific outcomes, educational objectives, and a detailed list of experiments for a course on Data Exploration and Visualization, along with course outcomes. Additionally, it provides installation instructions for Python and examples of exploratory data analysis using email datasets.

Uploaded by

kprabhakaran577
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views52 pages

Ad3301 - Dev Lab

The document outlines the vision and mission of R.V.S College of Engineering and its Department of Artificial Intelligence and Data Science, emphasizing the development of competent engineers through high-quality education and industry collaboration. It includes program-specific outcomes, educational objectives, and a detailed list of experiments for a course on Data Exploration and Visualization, along with course outcomes. Additionally, it provides installation instructions for Python and examples of exploratory data analysis using email datasets.

Uploaded by

kprabhakaran577
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

1

R.V.S COLLEGE OF ENGINEERING


DINDIGUL-624 005.

Register number __________________________

Certified that this is a bonafide record of work done by

Mr/Ms. _________________________________________________________________ in

_________________________ Semester ____________________________ Laboratory in

the Department of _______________________________________ during the academic

year 20 - 20

STAFF-IN-CHARGE HEAD OF THE DEPARTMENT

Submitted for the Anna University Practical Examination conducted on _________

Internal Examiner External Examiner


R.V.S. COLLEGE OF ENGINEERING

Vision

Our VISION is to produce highly Competent Engineers and Quality Technocrats


through continual improvement of the standards of excellence in teaching, training and research
for economical and societal development of the nation.

Mission

RVSCOE Strives:

• To provide high quality education through a dynamic curriculum, state-of-the-art


facilities, and well trained faculty to prepare students for successful engineering careers.
• To promote research and innovation by providing opportunities and resources to both
faculty and students to contribute to the advancement of engineering and technology.
• To forge strong partnerships with industries to enhance practical learning, internships
and employment opportunities for students.
• To instill strong ethical values and social responsibility in students to prepare them to
be conscientious engineers and leaders in society.
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE

Vision

To establish centre of excellence by nurturing Artificial Intelligence and Data Science


engineers through continuous education, research and industrial collaboration to serve the
needs of society.

Mission

By providing a nurturing environment together with competent human resources in


order to develop a diverse pool of skilled Artificial Intelligence and Data Science engineers
who uphold professional readiness for the industry.

1. By providing state of the art facilities and cutting-edge technology, we create an optimal
learning and research environment, enhancing the capabilities of our Artificial
Intelligence and Data Science students and faculty.

2. By providing avenues for industry partnerships and research collaborations, we


facilitate practical experience and innovation, enabling our students to address real
world challenges in Artificial Intelligence and Data Science.

3. By providing a strong foundation in professional ethics and commitment to social


responsibility, we equip our students with the values needed to make a positive societal
impact as industry ready professionals in the field of Artificial Intelligence and Data
Science.

PROGRAMME SPECIFIC OUTCOMES (PSO)

PSO 1: Apply the concepts and practical knowledge in analysis, design and
development of Artificial intelligence and Data science solutions to address real world
problems and meet the challenges of society.

PSO 2: Develop computational knowledge and project development skills using


advanced tools and techniques for the Employment, Higher studies and Research in
Artificial Intelligence and Data science with ethical values.
PROGRAMME EDUCATIONAL OBJECTIVES (PEO)

• PEO1: Develop the next generation of highly skilled graduates equipped with a strong
knowledge in Artificial intelligence and Data Science for creating innovative solutions
to society’s pressing challenges.

• PEO2: To motivate students to pursue higher studies in various disciplines in Artificial


Intelligence and Data Science with an aim of cherishing careers in research and
development, academia, industry and entrepreneurship.

• PEO3: Produce engineers who are professional entrepreneurs and capable of self
learning to excel in their career.

• PEO4 : To prepare graduates who excel in diverse teams upholding professional ethics
and societal responsibilities.

Program Outcomes (PO)

PO-1: Engineering knowledge: Apply the knowledge of mathematics, science,


engineering fundamentals, and an engineering specialization to the solution of complex
engineering problems.

PO-2: Problem analysis: Identify, formulate, review research literature, and analyze
complex engineering problems reaching substantiated conclusions using first principles
of mathematics, natural sciences and engineering sciences.

PO-3: Design/development of solutions: Design solutions for complex engineering


problems and design system components or processes that meet the specified needs
with appropriate consideration for the public health and safety and the cultural, societal,
and environmental considerations.

PO-4: Conduct investigations of complex problems: Use research-based knowledge


and research methods including design of experiments, analysis and interpretation of
data, and synthesis of the information to provide valid conclusions for complex
problems.
PO-5: Modern tool usage: Create, select and apply appropriate techniques, resources
and modern engineering and IT tools including prediction and modelling to complex
engineering activities with an understanding of the limitations.

PO-6: The engineering and society: Apply reasoning informed by the contextual
knowledge to assess societal, health, safety, legal and cultural issues and the consequent
responsibilities relevant to the professional engineering practice.

PO-7: Environment and sustainability: Understand the impact of the professional


engineering solutions in societal and environmental contexts and demonstrate the
knowledge and need for sustainable development.

PO-8: Ethics: Apply ethical principles and commit to professional ethics and
responsibilities and norms of the engineering practice.

PO-9: Individual & team work: Function effectively as an individual and as a


member or leader in diverse teams and in multidisciplinary settings.

PO-10: Communication: Communicate effectively on complex engineering activities


with the engineering community and with society at large, such as, being able to
comprehend and write effective reports and design documentation, make effective
presentations and give and receive clear instructions.

PO-11: Project management and finance: Demonstrate knowledge and


understanding of the engineering and management principles and apply these to one’s
own work, as a member and leader in a team, to manage projects and in
multidisciplinary environments.

PO-12: Life-long Learning: Recognize the need for, and have the preparation and
ability to engage in independent and life-long learning in the broadest context of
technological change.
AD3301 - DATA EXPLORATION AND VISUALIZATION

LIST OF EXPERIMENTS
1. Install the data Analysis and Visualization tool: R/ Python /Tableau Public/ Power BI.

2. Perform exploratory data analysis (EDA) on with datasets like email data set. Export all your
emails as a dataset, import them inside a pandas data frame, visualize them and get different
insights from the data.

3. Working with Numpy arrays, Pandas data frames , Basic plots using Matplotlib.

4. Explore various variable and row filters in R for cleaning data. Apply various plot features
in R on sample data sets and visualize.

5. Perform Time Series Analysis and apply the various visualization techniques.

6. Perform Data Analysis and representation on a Map using various Map data sets with Mouse
Rollover effect, user interaction, etc..

7. Build cartographic visualization for multiple datasets involving various countries of the
world states and districts in India etc.

8. Perform EDA on Wine Quality Data Set.

9. Use a case study on a data set and apply the various EDA and visualization techniques and

present an analysis report.


COURSE OUTCOMES:
At the end of this course, the students will be able to:
CO1: Understand the fundamentals of exploratory data analysis.
CO2: Implement the data visualization using Matplotlib.
CO3: Perform univariate data exploration and analysis.
CO4: Apply bivariate data exploration and analysis.
CO5: Use Data exploration and visualization techniques for multivariate and time series data
CONTENTS

S.No Date Title of the Experiment Page No. Mark sign


CONTENTS

S.No Date Title of the Experiment Page No. Mark sign


EX :NO 01
INSTALL THE DATA ANALYSIS AND VISUALIZATION
DATE : TOOL: PYTHON

Aim:

To install the data analysis and visualization tool Python.

How to Install Python


Python is a popular high-level, general-use programming language. Python is a programming
language that enables rapid development as well as more effective system integration. Python
has two main different versions: Python 2 and Python 3. Both are really different.

Python develops new versions with changes periodically and releases them according to
version numbers. Python is currently at version 3.11.3.

Installation on Windows

Visit the link https://www.python.org to download the latest release of Python. In this process,
we will install Python 3.11.3 on our Windows operating system. When we click on the above
link, it will bring us the following page.

Step - 1: Select the Python's version to download.

Click on the download button to download the exe file of Python.


If in case you want to download the specific version of Python. Then, you can scroll down
further below to see different versions from 2 and 3 respectively. Click on download button
right next to the version number you want to download.

Step - 2: Click on the Install Now

Double-click the executable file, which is downloaded.

The following window will open. Click on the Add Path check box, it will set the Python path
automatically. Now, Select Customize installation and proceed. We can also click on the
customize installation to choose desired location and features. Other important thing is install
launcher for the all user must be checked.

Here, under the advanced options, click on the checkboxes of "Install Python 3.11 for all users",
which is previously not checked in. This will checks the other option "Precompile standard
library" automatically. And the location of the installation will also be changed. We can change
it later, so we leave the install location default. Then, click on the install button to finally install.

Step - 3 Installation in Process

The set up is in progress. All the python libraries, packages, and other python default files will
be installed in our system. Once the installation is successful, the following page will appear
saying "Setup was successful ".

Step - 4: Verifying the Python Installation

To verify whether the python is installed or not in our system, we have to do the following.
o Go to "Start" button, and search " cmd ".
o Then type, " python - - version ".
o If python is successfully installed, then we can see the version of the python installed.
o If not installed, then it will print the error as "'python' is not recognized as an internal
or external command, operable program or batch file. ".

We are ready to work with the Python.

Step - 5: Opening idle

Now, to work on our first python program, we will go the interactive interpreter prompt (idle).
To open this, go to "Start" and type idle. Then, click on open to start working on idle.

Result:

Thus the installation of Python was successfully completed.


Exp No: 2
EXPLORATORY DATA ANALYSIS – EMAIL DATASET
Date:

AIM:
To perform exploratory data analysis (EDA) on with datasets like email data set. Export all
emails as a dataset, import them inside a pandas data frame, visualize them and get different insights
from the data.
ALGORITHM:
STEP 1: Export your Emails as a dataset: Export Emails from your Email client as CSV file,
alternatively, if using Gmail, you can export data using Google Takeout and download it in MBOX
format.

STEP 2: Load the Email data into a pandas data frame: Use the mail-parser library to convert it into
a pandas dataframe.

STEP 3: Data cleaning and preprocessing: Clean the dataset by handling missing values, renaming
columns and extracting useful information.

STEP 4: Perform initial data exploration: Explore unique values by check for unique sender, subjects
and labels and count the number of Emails from each sender.

STEP 5:Data visualization: Plot the number of Emails received over time to identify trends, plot the
top senders based on the number of Emails and analyse the distribution of Email reception times
throughotut the day.

PROGRAM:
import mailbox
import pandas as pd
# Load the mbox file
mbox = mailbox.mbox('path_to_your_mbox_file.mbox')

# Convert to DataFrame
data = {
"from": [],
"subject": [],
"date": [],
"body": []
}
for msg in mbox:
data["from"].append(msg["from"])
data["subject"].append(msg["subject"])
data["date"].append(msg["date"])
if msg.is_multipart():
body = ''
for part in msg.get_payload():
if part.get_content_type() == 'text/plain':
body += part.get_payload()
data["body"].append(body)
else:
data["body"].append(msg.get_payload())

df = pd.DataFrame(data)
EDA on the DataFrame:
Basic Statistics:
print(df.info())
print(df.describe())
print(df.isnull().sum())
Time Series Analysis:
df['date'] = pd.to_datetime(df['date'])
df.set_index('date', inplace=True)
df['subject'].resample('M').count().plot(title="Monthly Email Count")
Sender analysis:
df['from'].value_counts().head(10).plot(kind='bar', title="Top 10 Email Senders")
Word Cloud of Subjects or Bodies:
from wordcloud import WordCloud
import matplotlib.pyplot as plt

wordcloud = WordCloud(width=800, height=400).generate(" ".join(df['subject'].dropna()))


plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()
Sentiment Analysis:
from textblob import TextBlob

df['sentiment'] = df['body'].apply(lambda x: TextBlob(x).sentiment.polarity)


df['sentiment'].hist(bins=20)

OUTPUT:

RESULT:
Thus exploratory data analysis (EDA) with Email dataset and visualize them to get different
insights from the data was executed.
Exp No: 3
NUMPY ARRAYS, PANDAS DATA FRAMES, BASIC PLOTS USING
Date: MATPLOTLIB

AIM:
To understand and implement fundamental data manipulation and visualization techniques using
python libraries NumPy, pandas,Matplotlib.

ALGORITHM:
NumPY
STEP 1:Create array using numpy.
STEP 2:Access the element in the array.
STEP 3:Retrieve element using the slice operation.
STEP 4:Compute calculation on the array.
Pandas
STEP 1:Create a dataframe from a dictionary.
STEP 2:Display first and last few rows of the dataframe.
STEP 3:Select a specific column and filter rows based on condition.
STEP 4:Calculate descriptive statistic for numeric columns.
Matplotlib:
STEP 1: Import matplotlib library.
STEP 2:Define x,y axis.
STEP 3:Label the axis.
STEP 4:Visualizing the data using line plot , bar chart , scatter plot and etc.

PROGRAM:
NUMPY:
1)
import numpy as np
np.random.seed(0) # seed for reproducibility
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(2, 3)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array
print("X1,", x1)
print("X2,", x2)
print("X3,", x3)
#1D
print("x1 ndim: ", x1.ndim)
print("x1 shape:", x1.shape)
print("x1 size: ", x1.size)
#2D
print("x2 ndim: ", x2.ndim)
print("x2 shape:", x2.shape)
print("x2 size: ", x2.size)
#3D
print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)
2) To get input from a user and store it as a NumPy array, you can use Python's input() function.
import numpy as np
# Define how many numbers the user should enter
n = int(input("How many numbers do you want to enter? "))
# Collect the numbers in a list
numbers = []
for i in range(n):
num = float(input(f"Enter number {i+1}: ")) # Use int() for integers
numbers.append(num)
# Convert the list to a NumPy array
arr = np.array(numbers)
print(arr)

1. Creating a NumPy Array

import numpy as np

# Creating a 1D array

arr = np.array([1, 2, 3, 4, 5])

print(arr)
# Creating a 2D array

arr2d = np.array([[1, 2, 3], [4, 5, 6]])

print(arr2d)

2. Merging and Concatenating Arrays


#Concatenation along an axis

# Creating arrays

arr1 = np.array([[1, 2], [3, 4]])

arr2 = np.array([[5, 6], [7, 8]])

# Concatenating along axis 0 (vertically)

merged_vertically = np.concatenate((arr1, arr2), axis=0)

print(merged_vertically)

# Concatenating along axis 1 (horizontally)

merged_horizontally = np.concatenate((arr1, arr2), axis=1)

print(merged_horizontally)

#Concatenating 1D arrays

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

concatenated_arr = np.concatenate((arr1, arr2))

print(concatenated_arr)

3. NumPy dtype (Data Type)


#Checking the data type of an array

arr = np.array([1, 2, 3])

print(arr.dtype)

arr_float = np.array([1.5, 2.5, 3.5])

print(arr_float.dtype)

#Specifying dtype while creating an array


arr_int = np.array([1, 2, 3], dtype='int32')

print(arr_int.dtype)

arr_float = np.array([1, 2, 3], dtype='float64')

print(arr_float.dtype)

#Converting the data type of an array

arr = np.array([1.5, 2.3, 3.7])

arr_int = arr.astype('int') # Convert float to int

print(arr_int)

arr_bool = arr.astype('bool') # Convert float to bool (non-zero values become True)

print(arr_bool)

#Array Slicing: Accessing Subarrays


after index 5
array([5, 6, 7, 8, 9])
x[4:7] # middle subarray x = np.arange(10)
x
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
x[:5] # first five elements
array([0, 1, 2, 3, 4])
x[5:] # elements
array([4, 5, 6])
x[::2] # every other element
array([0, 2, 4, 6, 8])
x[1::2] # every other element, starting at index 1
array([1, 3, 5, 7, 9])
x[::-1] # all elements, reversed
array([9, 8, 7, 6, 5, 4, 3, 2, 1, 0])
x[5::-2] # reversed every other from index 5
array([5, 3, 1])
#Reshaping of Arrays
grid = np.arange(1, 10).reshape((3, 3))
print(grid)

Aggregations: Min, Max, and Everything in Between


import numpy as np
L = np.random.random(100)
x= sum(L)
y= min(L)
z= max(L)
print(x, y, z)

Pandas
1)
import pandas as pd
import numpy as np
ser = pd.Series()
print("Pandas Series: ", ser)
# simple array
data = np.array(['r', 'a', 'g', 'z'])
ser = pd.Series(data)
print("Pandas Series:\n", ser)
#creating dataframe
import pandas as pd
df = pd.DataFrame()
print(df)
# list of strings
lst = ['apple', 'orange', 'kiwi', 'grapes']
df = pd.DataFrame(lst)
print(df)
Matplotlib
#Line plot
import matplotlib.pyplot as plt
x=[1,2,3,4]
y=[3,5,2,7]
plt.plot(x,y,marker='o',linestyle='-',color='blue')
plt.title('sample line chart')
plt.xlabel('x axis')
plt.ylabel('y axis')
plt.show

#Barchart
import matplotlib.pyplot as plt
categories=['A','B','C']
values=[10,20,30]
plt.bar(categories,values,color='red')
plt.title('sample bar chart')
plt.xlabel(' categories')
plt.ylabel('values')
plt.show()

#Scatterplot
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
x = np.random.rand(50) # 50 random points for x-axis
y = np.random.rand(50) # 50 random points for y-axis
# Create a scatter plot
plt.scatter(x, y, color='blue', marker='o', alpha=0.7)
plt.title("Simple Scatter Plot")
plt.xlabel("X-axis")
plt.ylabel("Y-axis")
# Show the plot
plt.show()

#Histogram
import matplotlib.pyplot as plt
import numpy as np
# Generate random data
data = np.random.randn(1000) # 1000 random values from a normal distribution
# Create a histogram
plt.hist(data, bins=30, color='skyblue', edgecolor='black', alpha=0.7)
plt.title("Histogram of Random Data")
plt.xlabel("Value")
plt.ylabel("Frequency")
# Show the plot
plt.show()

OUTPUT:
#Numpy
1)
X1, [5 0 3 3 7 9]
X2, [[3 5 2]
[4 7 6]]
X3, [[[8 8 1 6 7]
[7 8 1 5 9]
[8 9 4 3 0]
[3 5 0 2 3]]

[[8 1 3 3 3]
[7 0 1 9 9]
[0 4 7 3 2]
[7 2 0 0 4]]
[[5 5 6 8 4]
[1 4 9 8 1]
[1 7 9 9 3]
[6 7 2 0 3]]]

#1D
x1 ndim: 1
x1 shape: (6,)
x1 size: 6
#2D
x2 ndim: 2
x2 shape: (2, 3)
x2 size: 6

#3D
x3 ndim: 3
x3 shape: (3, 4, 5)
x3 size: 60
2) How many numbers do you want to enter?
#Creating a NumPy Array
[1 2 3 4 5]
[[1 2 3]
[4 5 6]]
#Concatenation along an axis
[[1 2]
[3 4]
[5 6]
[7 8]]
[[1 2 5 6]
[3 4 7 8]]

#Concatenating 1D arrays
[1 2 3 4 5 6]

# Checking the data type of an array


int64
float64
# Specifying dtype while creating an array
int32
float64
# Converting the data type of an array
[1 2 3]
[ True True True]
# Array Slicing: Accessing Subarrays
[0 1 2 3 4 5 6 7 8 9]
[0 1 2 3 4]
[5 6 7 8 9]
[4 5 6]
[0 2 4 6 8]
[1 3 5 7 9]
[9 8 7 6 5 4 3 2 1 0]
[5 3 1]
# Reshaping of Arrays
[[1 2 3]
[4 5 6]
[7 8 9]]

Aggregations: Min, Max, and Everything in Between

54.02436999434842 0.0009524633405987304 0.9972463379054916

#Pandas

Pandas Series: Series([], dtype: object)


Pandas Series:
0 r
1 a
2 g
3 z
dtype: object
Empty DataFrame
Columns: []
Index: []
0
0 apple
1 orange
2 kiwi
3 grapes

#Matplotlib

RESULT:
Thus the python program to perform NumPy,Pandas and Matplotlib was executed successfully.
EXP NO :04
EXPLORING VARIOUS VARIABLE AND ROW FILTERS IN R FOR
DATE: CLEANING DATA

AIM:
To explore various variable and row filters in R for cleaning data and apply various plot features
in R on sample datasets and visualize.
ALGORITHM:
STEP 1:Load the data: Load iris dataset and ensure it is in the right format.
STEP 2: Data exploration and variable conversion: Verify the types of each variable and covert the
species to factor if necessary.
STEP 3: Filter data by rows: Apply row filters to select rows that meet specific criteria and combine
multiple conditions for refined filtering.
STEP 4: Filter data by columns: Use select() to isolate specific variables for analysis and remove
unnecessary columns to simplify the dataset.
STEP 5: Data cleaning and transformation: Handle missing values by replacing or removing not
available values.
STEP 6: Data visualization: Use ggplot2 to create various plots and explore relationship between
variables in the iris dataset.
PROGRAM:
# Import necessary libraries
library(dplyr)
library(ggplot2)
library(plotly)
data(iris)
ggplot(data = iris) + geom_point(aes(x = Sepal.Length, y = Sepal.Width, color = Species))
#Selecting Variables
selected_vars <- select(iris, Sepal.Length, Sepal.Width)
selected_vars
ggplot(data = selected_vars) + geom_point(aes(x = Sepal.Length, y = Sepal.Width))
#dropping Variables
dropped_vars <- select(iris, -Species)
dropped_vars
ggplot(data = dropped_vars) + geom_point(aes(x = Sepal.Length, y = Sepal.Width))
# Renaming Variables
renamed_vars <- rename(iris, Length = Sepal.Length, Width = Sepal.Width)
renamed_vars
ggplot(data = renamed_vars) + geom_point(aes(x = Length, y = Width))
# Filtering Rows based on Conditions
filtered_data <- filter(iris, Petal.Width > 0.2)
filtered_data
ggplot(data = filtered_data) + geom_point(aes(x = Sepal.Length, y = Sepal.Width))
# Filtering Rows based on Multiple Conditions
filtered_data <- filter(iris, Petal.Width > 0.2 & Sepal.Length > 5)
filtered_data
ggplot(data = filtered_data) + geom_point(aes(x = Sepal.Length, y = Sepal.Width))
# Sorting Rows based on Variables
sorted_data <- arrange(iris, Sepal.Length)
sorted_data
ggplot(data = sorted_data) + geom_point(aes(x = Sepal.Length, y = Sepal.Width))
# Plot Features# Base R - Scatter Plot
plot(iris$Sepal.Length, iris$Sepal.Width)
# Base R - Bar Plotbar
plot(iris$Sepal.Width, names.arg = iris$Species)
# Base R - Histogram
hist(iris$Sepal.Length)
# ggplot2 - Scatter Plot
ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width)) + geom_point()
ggplot
# ggplot2 - Bar Plot
ggplot(iris, aes(x = factor(Species), y = Sepal.Width)) + #Fixed typo gplot to ggplot
geom_col(stat = "summary", fun = "mean") + labs(title = "Mean Sepal Width by Species", x =
"Species", y = "Mean Sepal Width") +
theme_minimal()
# ggplot2 - Histogram
ggplot(iris, aes(x = Sepal.Length)) + geom_histogram()
OUTPUT:
RESULT:
Thus the program for cleaning and visualizing the data using R was executed.
Exp No: 5
TIME SERIES ANALYSIS
Date:

AIM:
To perform Time Series Analysis using temperature dataset and apply the various
visualization techniques.

ALGORITHM:
STEP 1: Load the Data: Load a time series dataset and ensure it is in the right format (e.g., datetime
index).
STEP 2: Visualize Raw Data: Use line plots to understand the overall trend and patterns.
STEP 3: Decomposition: Break down the series into trend, seasonal, and residual components.
STEP 4: Seasonality Analysis: Identify recurring seasonal patterns.
STEP 5: Autocorrelation and Partial Autocorrelation: Visualize lag relationships to detect
seasonality and trend.
STEP 6: Rolling Statistics: Smooth the series to better understand trends and identify anomalies.
PROGRAM:
# Import necessary libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf

# Step 1: Load and Inspect Data


# Load your dataset (e.g., monthly temperature data)
# Replace 'your_data.csv' with your dataset path
data = pd.read_csv('Temperature.csv', index_col='Date', parse_dates=True)
data = data.asfreq('M') # Monthly frequency (adjust as needed)

# Step 2: Basic Line Plot


plt.figure(figsize=(14,6))
plt.plot(data, label='Original Data')
plt.title("Time Series Data")
plt.xlabel("Date")
plt.ylabel("Values")
plt.legend()
plt.show()
data = data.fillna(method='ffill')

# Option 2: Interpolate the missing values


data = data.interpolate(method='linear')

# Option 3: Drop missing values (if minimal)


data = data.dropna()

# Step 3: Decompose the Time Series


# This gives us a look at the trend, seasonality, and residuals
decomposition = seasonal_decompose(data, model='additive')
fig, (ax1, ax2, ax3, ax4) = plt.subplots(4, 1, figsize=(12, 10))
decomposition.observed.plot(ax=ax1, legend=False, color='blue')
ax1.set_ylabel('Observed')
decomposition.trend.plot(ax=ax2, legend=False, color='orange')
ax2.set_ylabel('Trend')
decomposition.seasonal.plot(ax=ax3, legend=False, color='green')
ax3.set_ylabel('Seasonal')
decomposition.resid.plot(ax=ax4, legend=False, color='red')
ax4.set_ylabel('Residuals')
plt.tight_layout()
plt.show()

# Step 4: Autocorrelation and Partial Autocorrelation


# These plots reveal the relationship between the current value and past values (lags)
plt.figure(figsize=(14,6))
plot_acf(data.dropna(), lags=24)
plt.title("Autocorrelation")
plt.show()
plt.figure(figsize=(14,6))
plot_pacf(data.dropna(), lags=24)
plt.title("Partial Autocorrelation")
plt.show()

# Step 5: Rolling Statistics (Smoothing the Series)


# Applying rolling mean for trend visualization
data['Rolling Mean'] = data['Temp'].rolling(window=12).mean() # 12-month rolling mean
data['Rolling Std'] = data['Temp'].rolling(window=12).std() # 12-month rolling std
plt.figure(figsize=(14,6))
plt.plot(data['Temp'], label='Original Data')
plt.plot(data['Rolling Mean'], label='12-Month Rolling Mean', color='orange')
plt.plot(data['Rolling Std'], label='12-Month Rolling Std', color='red')
plt.title("Rolling Mean & Standard Deviation")
plt.xlabel("Date")
plt.ylabel("Values")
plt.legend()
plt.show()
OUTPUT:
RESULT:
These visualizations help illuminate the fundamental characteristics of your time series,
essential in building accurate models and making informed decisions.
Exp No: 6 PERFORM DATA ANALYSIS AND VISUALIZATION ON A MAP
Date: WITH PYTHON, INCLUDING USER INTERACTIONS LIKE MOUSE
ROLLOVER EFFECTS.
AIM:

To perform data analysis and visualization on a map using Python. The project includes
creating an interactive map representation that responds to user interactions, such as mouse rollover,
displaying information, and customizing visual styles.

ALGORITHM:

STEP 1: Load and Inspect Dataset: Load a map-based dataset (e.g., locations with geographical data
like latitude and longitude).

STEP 2: Data Cleaning and Preprocessing: Handle missing values, format data types, and filter
relevant data.

STEP 3: Data Analysis: Analyze data to get insights for visualization (e.g., counts by region,
clustering, or heat maps).

STEP 4: Create Base Map: Initialize a base map centered on a relevant region or with global
coordinates.

STEP 5: Map Visualization with Interactivity:

 Add points or regions with mouse rollover effects.


 Include information tooltips on hover.
 Customize map markers, regions, or layers.

STEP 6: Display and Save Map: Display the interactive map in a Jupyter notebook or save it as an
HTML file.

PROCEDURE:

pip install folium plotly pandas geopandas

import pandas as pd

import folium

from folium import plugins

import geopandas as gpd

from plotly import express as px


# Step 1: Load the Dataset

# Here, we use a sample dataset with geographic locations. Replace 'your_data.csv' with the actual file
path.

data = pd.read_csv('India Cities LatLng.csv') # Ensure this file has columns like 'latitude', 'longitude',
and 'location_name'

# Step 2: Basic Data Preprocessing

# Check if there are any missing values and handle them

data.dropna(subset=['lat', 'lng'], inplace=True)

# Step 3: Perform Data Analysis (e.g., count per location, grouping, etc.)

location_counts = data['country'].value_counts().reset_index()

location_counts.columns = ['Location', 'Count']

# Step 4: Initialize a Base Map (Centered on Mean Lat and Long)

mean_lat, mean_lon = data['lat'].mean(), data['lng'].mean()

m = folium.Map(location=[mean_lat, mean_lon], zoom_start=5, tiles="CartoDB positron")

# Step 5: Add Data Points with Mouse Rollover Effects

# Add points with tooltips for each location

for _, row in data.iterrows():

folium.CircleMarker(

location=(row['lat'], row['lng']),

radius=6,

color='blue',
fill=True,

fill_color='blue',

tooltip=f"<b>{row['country']}</b><br>Additional Info: {row.get('additional_info', 'N/A')}"

).add_to(m)

# Step 6: Add a Heatmap for Visual Representation (optional, for density-based insights)

heat_data = data[['lat', 'lng']].values.tolist()

plugins.HeatMap(heat_data).add_to(m)

# Optional: Add Clustering for Large Datasets

marker_cluster = plugins.MarkerCluster().add_to(m)

for _, row in data.iterrows():

folium.Marker(

location=(row['lat'], row['lng']),

popup=f"<b>{row['country']}</b><br>Count: {location_counts[location_counts['Location'] ==
row['country']]['Count'].values[0]}"

).add_to(marker_cluster)

# Step 7: Save Map as HTML or Display

m.save("interactive_map.html") # View the map in your browser

m
OUTPUT:

RESULT:

The interactive map can be viewed in a web browser as an HTML file, allowing for easy
sharing and use in presentations.
Exp No: 7
BUILD CARTOGRAPHIC VISUALIZATION FOR MULTIPLE
DATASETS INVOLVING VARIOUS COUNTRIES OF THE WORLD;
Date:
STATES AND DISTRICTS IN INDIA ETC.

AIM:

To create cartographic visualizations using multiple datasets that involve geographic


boundaries at various levels, such as countries worldwide, states in India, and districts in India. The
goal is to build interactive maps to visualize and compare data distributions across different regions.

ALGORITHM:

STEP 1: Import Libraries: Import geopandas and matplotlib.pyplot for handling geospatial data and
plotting maps.

STEP 2: Load Shapefiles:

 Load the world shapefile to represent global boundaries.


 Load the India states shapefile to show individual state boundaries within India.

STEP 3: Initialize Plot: Create a side-by-side plot layout with two subplots.

STEP 4: Plot Maps:

 Plot the world map in one subplot, coloring it in light grey.


 Plot the Indian states map in the other subplot with states highlighted in light blue and black
edges.

STEP 5: Display Plot: Show the combined map visualization.

PROGRAM:

import geopandas as gpd

import matplotlib.pyplot as plt

# Load world countries shapefile

world = gpd.read_file('/content/ne_110m_admin_0_countries.shp')

# Load India states shapefile

india_states = gpd.read_file('/content/india_India_Country_Boundary.shp')
# Load India districts shapefile

india_districts = gpd.read_file('/content/india_District_level_2.shp') # Replace with your actual path

# Create a plot with multiple layers

fig, ax = plt.subplots(1, 3, figsize=(20, 10))

# Plot world map with country boundaries

world.plot(ax=ax[0], color='lightgrey', edgecolor='black')

ax[0].set_title('World Map')

# Plot India map with state boundaries

india_states.plot(ax=ax[1], color='lightblue', edgecolor='black')

ax[1].set_title('India States Map')

# Plot India map with district boundaries

india_districts.plot(ax=ax[2], color='lightgreen', edgecolor='black')

ax[2].set_title('India Districts Map')

plt.tight_layout()

plt.show()
OUTPUT:

RESULT:

An India states map showcasing the individual states within India, allowing viewers to see
state boundaries clearly. The code that includes districts within India, with clear distinctions among
world countries, Indian states, and districts.
Exp No: 8
Perform Exploratory Data Analysis (EDA) on Wine Quality Data Set
Date:

AIM

To perform Exploratory Data Analysis (EDA) on the Wine Quality dataset to understand data
distributions, correlations, and patterns that may affect wine quality.

ALGORITHM

STEP 1: Load the Dataset: Import and load the Wine Quality dataset.

STEP 2: Data Overview: Inspect dataset structure, types, and basic statistics.

STEP 3: Data Cleaning: Check for and handle any missing or duplicate values.

STEP 4: Data Exploration:

 Analyze distributions of numerical features.


 Examine correlations between features and the target variable (quality).

STEP 5: Visualization:

 Plot histograms to visualize feature distributions.


 Use box plots to observe outliers in the data.
 Create a heatmap to visualize feature correlations.

PROGRAM: import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

# Step 1: Load the dataset (assuming the file is named 'winequality.csv')

# You can download it from UCI Machine Learning Repository:


https://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

data = pd.read_csv('/content/winequality/winequality-red.csv', delimiter=";")

# Step 2: Overview of the Data

print("Dataset Overview:")
print(data.info())

print("\nDescriptive Statistics:")

print(data.describe())

# Step 3: Data Cleaning (check for missing or duplicate values)

print("\nMissing Values per Column:")

print(data.isnull().sum())

print("\nDuplicate Rows:", data.duplicated().sum())

data.drop_duplicates(inplace=True)

# Step 4: Data Exploration and Visualization

# Distribution of Quality

plt.figure(figsize=(8, 5))

sns.countplot(x='quality', data=data, palette='viridis')

plt.title("Wine Quality Distribution")

plt.show()

# Histogram of Features

data.hist(bins=15, figsize=(15, 10), layout=(4, 3), edgecolor='black')

plt.suptitle("Feature Distributions")

plt.show()

# Correlation Matrix

plt.figure(figsize=(12, 8))

sns.heatmap(data.corr(), annot=True, cmap='coolwarm', fmt='.2f', linewidths=0.5)


plt.title("Feature Correlation Heatmap")

plt.show()

# Boxplot to Detect Outliers in Key Features

plt.figure(figsize=(15, 10))

sns.boxplot(data=data, palette="Set2")

plt.xticks(rotation=45)

plt.title("Boxplots of Features to Detect Outliers")

plt.show()

# Step 5: Relationship between Quality and Key Features

plt.figure(figsize=(12, 6))

sns.barplot(x='quality', y='alcohol', data=data, palette="magma")

plt.title("Alcohol vs. Wine Quality")

plt.show()
OUTPUT:
RESULT:

This EDA helps identify important features and relationships, providing insights for further
analysis or model building.
Exp No: 9 USE A CASE STUDY ON A DATA SET AND APPLY THE VARIOUS
EDA AND VISUALIZATION TECHNIQUES AND PRESENT AN
Date:
ANALYSIS REPORT

AIM:

Use a case study on a data set and apply the various EDA and visualization techniques and present an
analysis report.

PROCEDURE:

Import Libraries:

Start by importing the necessary libraries and loading the dataset.

Descriptive Statistics:

Compute and display descriptive statistics.

Check for Missing Values:

Verify if there are any missing values in the dataset.

Visualize Data Distributions:

Visualize the distribution of numerical variables.

Correlation Heatmap:

Examine the correlation between numerical variables.

Boxplots for Categorical Variables:

Use boxplots to visualize the distribution of features by species.

Violin Plots:

Combine box plots with kernel density estimation for better visualization.

Correlation between Features:

Visualize pair-wise feature correlations.

Conclusion and Summary:

Summarize key findings and insights from the analysis.


This case study provides a comprehensive analysis of the Iris dataset, including data exploration,
descriptive statistics, visualization of data distributions, correlation analysis, and feature-specific
visualizations.

You might also like