0% found this document useful (0 votes)
9 views

Dev Lab Manual Org

Uploaded by

vguruvishnu2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Dev Lab Manual Org

Uploaded by

vguruvishnu2000
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Ex:1

Date:
Install DAV IN PYTHON

AIM:
The Aim is to Install the data Analysis and Visualization tool
python.

PROGRAM:
Installation
Easiest way to install pandas is to use pip:
pip install pandas

Creating A DataFrame in Pandas


# assigning two series to s1 and s2
s1 = pd.Series([1,2])
s2 = pd.Series(["Ashish", "Sid"])
# framing series objects into data
df = pd.DataFrame([s1,s2])
# show the data frame
df

# data framing in another way


# taking index and column values
dframe = pd.DataFrame([[1,2],["Ashish", "Sid"]],
index=["r1", "r2"],
columns=["c1", "c2"])
dframe

# framing in another way


# dict-like container
dframe = pd.DataFrame({
"c1": [1, "Ashish"],
"c2": [2, "Sid"]})
Dframe

OUTPUT:

RESULT:
Thus, the program to install data visualization and analyzation package
is given above.
Ex:2
Date: Performing eda on datasets
AIM:
The Aim is to perform exploratory data analysis (EDA) on with datasets like
email data set. Export all your emails as a dataset, import them inside a pandas
data frame, visualize them and get different insights from the data.

PROGRAM:
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()

Now, let’s also the columns and their data types. For this,
df.info()
Let’s see if our dataset contains any duplicates or not.
Pandas drop_duplicates() method helps in removing duplicates from the data
frame.
data = df.drop_duplicates(subset ="Species",)
data

Visualizing the target column

Our target column will be the Species column because at the end we will need
the result according to the species only. Let’s see a countplot for species.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()

OUTPUT:
RESULT:
Thus, the program to perform eda on datasets is given above.
EX:3 Working with numpy arrays, pandas data frames and basic
Date: plots of matplotlib.

AIM:
The Aim is to Working with Numpy arrays, Pandas data frames, Basic plots
using Matplotlib.

PROGRAM:

#Working with numpy arrays


import numpy as np

li = [1, 2, 3, 4]
numpyArr = np.array(li)
print(numpyArr)

#pandas dataframe
1. import pandas as pd
2. import numpy as np
3. info = np.array(['P','a','n','d','a','s'])
4. a = pd.Series(info)
5. print(a)

#basic plots of matplotlib


# importing the required module
import matplotlib.pyplot as plt

# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]

# plotting the points


plt.plot(x, y)
# naming the x axis
plt.xlabel('x - axis')
# naming the y axis
plt.ylabel('y - axis')

# giving a title to my graph


plt.title('My first graph!')

# function to show the plot


plt.show()

OUTPUT:
1
[1 2 3 4]

2
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object

3
RESULT:
Thus, program for working with numpy, pandas and matplotlib is given
above.
EX:4 Exploring various variables and row filter in R for cleaning
Date: data.

AIM:
The Aim is to explore various variable and row filters in R for cleaning
data. Apply various plot features in R on sample data sets and visualize.

PROGRAM:
#here codes should be implemented in R programming
#1

head(airquality)

#2

mean(airquality$Wind)

#3

mean(airquality$Solar.R, na.rm = TRUE)

#4

summary(airquality)

#5
We can get a clear visual of the irregular data using a boxplot.
R

boxplot(airquality)

#6
Removing irregularities data with is.na() methods.
R

New_df = airquality
New_df$Ozone = ifelse(is.na(New_df$Ozone),

median(New_df$Ozone,

na.rm = TRUE),

New_df$Ozone)

OUTPUT:
1

2 9.95751633986928
3 185.931506849315
4 9.957163387999

5
6

RESULT:
Thus, the program for exploring various variables and row filters in R
for cleaning data is given above.
EX:5 Perform TSA and apply various visualization technique
Date:

AIM:
The Aim is to perform Time Series Analysis and apply the various
visualization techniques.

PROGRAM:
#1
import pandas as pd

# reading the database

data = pd.read_csv("tips.csv")

# printing the top 10 rows


display(data.head(10))

#2 scatterplot
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Scatter plot with day against tip


plt.scatter(data['day'], data['tip'])

# Adding Title to the Plot


plt.title("Scatter Plot")

# Setting the X and Y labels


plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

#3 Line chart
import pandas as pd
import matplotlib.pyplot as plt

# reading the database


data = pd.read_csv("tips.csv")

# Scatter plot with day against tip


plt.plot(data['tip'])
plt.plot(data['size'])

# Adding Title to the Plot


plt.title("Scatter Plot")

# Setting the X and Y labels


plt.xlabel('Day')
plt.ylabel('Tip')

plt.show()

#4 seaborn
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd

# reading the database


data = pd.read_csv("tips.csv")

# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)

# setting the title using Matplotlib


plt.title('Title using Matplotlib Function')

plt.show()

OUTPUT:
1)
2)

3)
4)

RESULT:
Thus, the program for Performing TSA and various visualization
techniques are given above.
EX: 6 Data analysis and representation on a map using various map
Date: data sets with mou rollover effect

AIM:
The Aim is to perform Data Analysis and representation on a Map using
various Map data sets with Mou Rollover effect, user interaction.

PROGRAM:

#1
Import numpy as np
import pandas as pd
import folium as fo
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-
data/master/Volcano.csv")
print(data.head())

#2
Lat=list(data[“Latitude”])
lon = list(data["Longitude"])
name = list(data["Name"])

volcano = fo.FeatureGroup(name="Volcano")
for a, b, c in zip(lat, lon, name):
volcano.add_child(fo.Marker(location=[a, b], popup=c,
icon=fo.Icon(color='blue')))

fo.Map().add_child(volcano)
OUTPUT:
1
Year Month ... TOTAL_HOUSES_DESTROYED TOTAL_HOUSES_DESTROYED_DESCRIPTION
0 2010 1 ... NaN NaN
1 2010 3 ... NaN NaN
2 2010 5 ... 3.0 1.0
3 2010 5 ... NaN NaN
4 2010 8 ... NaN 1.0

[5 rows x 36 columns]

RESULT:
Thus, the program for perform data analysis and representation on a map
using various map is given above.
EX:7 Build cartographic visualization for multiple datasets
Date: involving various countries of world.

AIM:
The Aim is to build cartographic visualization for multiple datasets
involving various countries of the world and states and districts in India.

PROGRAM:

import pandas as pd
import altair as alt
from vega_datasets import data

#1 cylindrical projection
minimap = map.properties(width=225, height=225)
alt.hconcat(
minimap.project(type='equirectangular').properties(title='equirectangular'),
minimap.project(type='mercator').properties(title='mercator'),
minimap.project(type='transverseMercator').properties(title='transverseMercator'),
minimap.project(type='naturalEarth1').properties(title='naturalEarth1')
).properties(spacing=10).configure_view(stroke=None)

#2 Azimuthal projections

minimap = map.properties(width=180, height=180)


alt.hconcat(
minimap.project(type='azimuthalEqualArea').properties(title='azimuthalEqualArea'),
minimap.project(type='azimuthalEquidistant').properties(title='azimuthalEquidistant'),
minimap.project(type='orthographic').properties(title='orthographic'),
minimap.project(type='stereographic').properties(title='stereographic'),
minimap.project(type='gnomonic').properties(title='gnomonic')
).properties(spacing=10).configure_view(stroke=None)

#3 Point maps

alt.Chart(zipcodes).transform_filter(
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
).transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT:

1)

2)
3)

RESULT:
Thus, the program to built cartographic visualization for multiple datasets
involving various countries of world given above.
EX: 8 Perform EDA on wine quality data set
Date:

AIM:
The Aim is to perform EDA on Wine Quality Data Set.

PROGRAM:
#1

!pip install --upgrade seaborn


import numpy as np
import pandas as pd
from sklearn import datasets
import matplotlib.pyplot as plt
import seaborn as sns

%matplotlib inline
wine_dataset = datasets.load_wine()
df = pd.DataFrame(wine_dataset['data'], columns=wine_dataset['feature_names'])
df.head()
df['target'] = wine_dataset['target']
wine_dataset['target_names']
print(wine_dataset['DESCR'])
df.shape
df.dtypes
df.info()

#2
Scatterplot - To understand the distribution of wine classes
# Create an array of numbers
numbers = np.arange(df.shape[0])

fig, axes = plt.subplots(nrows=13, ncols=2, figsize=(16, 40), squeeze=False)

for i, column in enumerate(df.columns, start = 0):


if column != "target":
sns.scatterplot(x=column, y=numbers, data=df, hue='target', ax=axes[i,
0])
sns.scatterplot(x=column, y='target', data=df, hue='target',
ax=axes[i, 1])

fig.tight_layout(pad=3.0)

OUTPUT:
#1
Wine recognition dataset
------------------------

**Data Set Characteristics:**

:Number of Instances: 178 (50 in each of three classes)


:Number of Attributes: 13 numeric, predictive attributes and the class
:Attribute Information:
- Alcohol
- Malic acid
- Ash
- Alcalinity of ash
- Magnesium
- Total phenols
- Flavanoids
- Nonflavanoid phenols
- Proanthocyanins
- Color intensity
- Hue
- OD280/OD315 of diluted wines
- Proline
- class:
- class_0
- class_1
- class_2

:Summary Statistics:

============================= ==== ===== ======= =====


Min Max Mean SD
============================= ==== ===== ======= =====
Alcohol: 11.0 14.8 13.0 0.8
Malic Acid: 0.74 5.80 2.34 1.12
Ash: 1.36 3.23 2.36 0.27
Alcalinity of Ash: 10.6 30.0 19.5 3.3
Magnesium: 70.0 162.0 99.7 14.3
Total Phenols: 0.98 3.88 2.29 0.63
Flavanoids: 0.34 5.08 2.03 1.00
Nonflavanoid Phenols: 0.13 0.66 0.36 0.12
Proanthocyanins: 0.41 3.58 1.59 0.57
Colour Intensity: 1.3 13.0 5.1 2.3
Hue: 0.48 1.71 0.96 0.23
OD280/OD315 of diluted wines: 1.27 4.00 2.61 0.71
Proline: 278 1680 746 315
:Missing Attribute Values: None
:Class Distribution: class_0 (59), class_1 (71), class_2 (48)
:Creator: R.A. Fisher
:Donor: Michael Marshall (MARSHALL%[email protected])
:Date: July, 1988

This is a copy of UCI ML Wine recognition datasets.


https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.

#df.dtypes
Out[9]:
alcohol float64
malic_acid float64
ash float64
alcalinity_of_ash float64
magnesium float64
total_phenols float64
flavanoids float64
nonflavanoid_phenols float64
proanthocyanins float64
color_intensity float64
hue float64
od280/od315_of_diluted_wines float64
proline float64
target int64
dtype: object

#df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 alcohol 178 non-null float64
1 malic_acid 178 non-null float64
2 ash 178 non-null float64
3 alcalinity_of_ash 178 non-null float64
4 magnesium 178 non-null float64
5 total_phenols 178 non-null float64
6 flavanoids 178 non-null float64
7 nonflavanoid_phenols 178 non-null float64
8 proanthocyanins 178 non-null float64
9 color_intensity 178 non-null float64
10 hue 178 non-null float64
11 od280/od315_of_diluted_wines 178 non-null float64
12 proline 178 non-null float64
13 target 178 non-null int64
dtypes: float64(13), int64(1)
memory usage: 19.6 KB
#2

RESULT:
Thus, the program to perform EDA on wine quality data set is given
below.
EX:9 Case study on a data set and apply various EDA and
Date: visualization technique

AIM:
The Aim is to use a case study on a data set and apply the various EDA
and visualization techniques and present an analysis report.

Let us consider credit card scam case study,

PROGRAM:
Importing Necessary Packages

# Filtering Warnings
import warnings
warnings.filterwarnings('ignore')

#Other's
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
import plotly.graph_objects as go
pd.set_option('display.max_columns', 300) #Setting column display limit
plt.style.use('ggplot') #Applying style to graphs
df1 = pd.read_csv("application_data.csv")
df1.head()
OUTPUT DATA SET:

Let’s check the distribution of the target variable visually using a pie chart.

count1 = 0
count0 = 0
for i in df1['TARGET'].values:
if i == 1:
count1 += 1
else:
count0 += 1

count1 = (count1/len(df1['TARGET']))*100
count0 = (count0/len(df1['TARGET']))*100

x = ['Defaulted Population(TARGET=1)','Non-Defauted Population(TARGET=0)']


y = [count1, count0]

explode = (0.1, 0) # only "explode" the 1st slice

fig1, ax1 = plt.subplots()


ax1.pie(y, explode=explode, labels=x, autopct='%1.1f%%',
shadow=True, startangle=110)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Data imbalance',fontsize=25)
plt.show()
OUTPUT:

RESULT:
Thus, the program for case study that is credit card fraud detection has
been implemented and it is also visually represented above.

You might also like