Dev Lab Manual Org
Dev Lab Manual Org
Date:
Install DAV IN PYTHON
AIM:
The Aim is to Install the data Analysis and Visualization tool
python.
PROGRAM:
Installation
Easiest way to install pandas is to use pip:
pip install pandas
OUTPUT:
RESULT:
Thus, the program to install data visualization and analyzation package
is given above.
Ex:2
Date: Performing eda on datasets
AIM:
The Aim is to perform exploratory data analysis (EDA) on with datasets like
email data set. Export all your emails as a dataset, import them inside a pandas
data frame, visualize them and get different insights from the data.
PROGRAM:
import pandas as pd
# Reading the CSV file
df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()
Now, let’s also the columns and their data types. For this,
df.info()
Let’s see if our dataset contains any duplicates or not.
Pandas drop_duplicates() method helps in removing duplicates from the data
frame.
data = df.drop_duplicates(subset ="Species",)
data
Our target column will be the Species column because at the end we will need
the result according to the species only. Let’s see a countplot for species.
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()
OUTPUT:
RESULT:
Thus, the program to perform eda on datasets is given above.
EX:3 Working with numpy arrays, pandas data frames and basic
Date: plots of matplotlib.
AIM:
The Aim is to Working with Numpy arrays, Pandas data frames, Basic plots
using Matplotlib.
PROGRAM:
li = [1, 2, 3, 4]
numpyArr = np.array(li)
print(numpyArr)
#pandas dataframe
1. import pandas as pd
2. import numpy as np
3. info = np.array(['P','a','n','d','a','s'])
4. a = pd.Series(info)
5. print(a)
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
OUTPUT:
1
[1 2 3 4]
2
0 P
1 a
2 n
3 d
4 a
5 s
dtype: object
3
RESULT:
Thus, program for working with numpy, pandas and matplotlib is given
above.
EX:4 Exploring various variables and row filter in R for cleaning
Date: data.
AIM:
The Aim is to explore various variable and row filters in R for cleaning
data. Apply various plot features in R on sample data sets and visualize.
PROGRAM:
#here codes should be implemented in R programming
#1
head(airquality)
#2
mean(airquality$Wind)
#3
#4
summary(airquality)
#5
We can get a clear visual of the irregular data using a boxplot.
R
boxplot(airquality)
#6
Removing irregularities data with is.na() methods.
R
New_df = airquality
New_df$Ozone = ifelse(is.na(New_df$Ozone),
median(New_df$Ozone,
na.rm = TRUE),
New_df$Ozone)
OUTPUT:
1
2 9.95751633986928
3 185.931506849315
4 9.957163387999
5
6
RESULT:
Thus, the program for exploring various variables and row filters in R
for cleaning data is given above.
EX:5 Perform TSA and apply various visualization technique
Date:
AIM:
The Aim is to perform Time Series Analysis and apply the various
visualization techniques.
PROGRAM:
#1
import pandas as pd
data = pd.read_csv("tips.csv")
#2 scatterplot
import pandas as pd
import matplotlib.pyplot as plt
plt.show()
#3 Line chart
import pandas as pd
import matplotlib.pyplot as plt
plt.show()
#4 seaborn
# importing packages
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
# draw lineplot
sns.lineplot(x="sex", y="total_bill", data=data)
plt.show()
OUTPUT:
1)
2)
3)
4)
RESULT:
Thus, the program for Performing TSA and various visualization
techniques are given above.
EX: 6 Data analysis and representation on a map using various map
Date: data sets with mou rollover effect
AIM:
The Aim is to perform Data Analysis and representation on a Map using
various Map data sets with Mou Rollover effect, user interaction.
PROGRAM:
#1
Import numpy as np
import pandas as pd
import folium as fo
data = pd.read_csv("https://raw.githubusercontent.com/amankharwal/Website-
data/master/Volcano.csv")
print(data.head())
#2
Lat=list(data[“Latitude”])
lon = list(data["Longitude"])
name = list(data["Name"])
volcano = fo.FeatureGroup(name="Volcano")
for a, b, c in zip(lat, lon, name):
volcano.add_child(fo.Marker(location=[a, b], popup=c,
icon=fo.Icon(color='blue')))
fo.Map().add_child(volcano)
OUTPUT:
1
Year Month ... TOTAL_HOUSES_DESTROYED TOTAL_HOUSES_DESTROYED_DESCRIPTION
0 2010 1 ... NaN NaN
1 2010 3 ... NaN NaN
2 2010 5 ... 3.0 1.0
3 2010 5 ... NaN NaN
4 2010 8 ... NaN 1.0
[5 rows x 36 columns]
RESULT:
Thus, the program for perform data analysis and representation on a map
using various map is given above.
EX:7 Build cartographic visualization for multiple datasets
Date: involving various countries of world.
AIM:
The Aim is to build cartographic visualization for multiple datasets
involving various countries of the world and states and districts in India.
PROGRAM:
import pandas as pd
import altair as alt
from vega_datasets import data
#1 cylindrical projection
minimap = map.properties(width=225, height=225)
alt.hconcat(
minimap.project(type='equirectangular').properties(title='equirectangular'),
minimap.project(type='mercator').properties(title='mercator'),
minimap.project(type='transverseMercator').properties(title='transverseMercator'),
minimap.project(type='naturalEarth1').properties(title='naturalEarth1')
).properties(spacing=10).configure_view(stroke=None)
#2 Azimuthal projections
#3 Point maps
alt.Chart(zipcodes).transform_filter(
'-150 < datum.longitude && 22 < datum.latitude && datum.latitude < 55'
).transform_calculate(
digit='datum.zip_code[0]'
).mark_line(
strokeWidth=0.5
).encode(
longitude='longitude:Q',
latitude='latitude:Q',
color='digit:N',
order='zip_code:O'
).project(
type='albersUsa'
).properties(
width=900,
height=500
).configure_view(
stroke=None
)
OUTPUT:
1)
2)
3)
RESULT:
Thus, the program to built cartographic visualization for multiple datasets
involving various countries of world given above.
EX: 8 Perform EDA on wine quality data set
Date:
AIM:
The Aim is to perform EDA on Wine Quality Data Set.
PROGRAM:
#1
%matplotlib inline
wine_dataset = datasets.load_wine()
df = pd.DataFrame(wine_dataset['data'], columns=wine_dataset['feature_names'])
df.head()
df['target'] = wine_dataset['target']
wine_dataset['target_names']
print(wine_dataset['DESCR'])
df.shape
df.dtypes
df.info()
#2
Scatterplot - To understand the distribution of wine classes
# Create an array of numbers
numbers = np.arange(df.shape[0])
fig.tight_layout(pad=3.0)
OUTPUT:
#1
Wine recognition dataset
------------------------
:Summary Statistics:
The data is the results of a chemical analysis of wines grown in the same
region in Italy by three different cultivators. There are thirteen different
measurements taken for different constituents found in the three types of
wine.
#df.dtypes
Out[9]:
alcohol float64
malic_acid float64
ash float64
alcalinity_of_ash float64
magnesium float64
total_phenols float64
flavanoids float64
nonflavanoid_phenols float64
proanthocyanins float64
color_intensity float64
hue float64
od280/od315_of_diluted_wines float64
proline float64
target int64
dtype: object
#df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 178 entries, 0 to 177
Data columns (total 14 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 alcohol 178 non-null float64
1 malic_acid 178 non-null float64
2 ash 178 non-null float64
3 alcalinity_of_ash 178 non-null float64
4 magnesium 178 non-null float64
5 total_phenols 178 non-null float64
6 flavanoids 178 non-null float64
7 nonflavanoid_phenols 178 non-null float64
8 proanthocyanins 178 non-null float64
9 color_intensity 178 non-null float64
10 hue 178 non-null float64
11 od280/od315_of_diluted_wines 178 non-null float64
12 proline 178 non-null float64
13 target 178 non-null int64
dtypes: float64(13), int64(1)
memory usage: 19.6 KB
#2
RESULT:
Thus, the program to perform EDA on wine quality data set is given
below.
EX:9 Case study on a data set and apply various EDA and
Date: visualization technique
AIM:
The Aim is to use a case study on a data set and apply the various EDA
and visualization techniques and present an analysis report.
PROGRAM:
Importing Necessary Packages
# Filtering Warnings
import warnings
warnings.filterwarnings('ignore')
#Other's
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
from plotly.subplots import make_subplots
import plotly.graph_objects as go
pd.set_option('display.max_columns', 300) #Setting column display limit
plt.style.use('ggplot') #Applying style to graphs
df1 = pd.read_csv("application_data.csv")
df1.head()
OUTPUT DATA SET:
Let’s check the distribution of the target variable visually using a pie chart.
count1 = 0
count0 = 0
for i in df1['TARGET'].values:
if i == 1:
count1 += 1
else:
count0 += 1
count1 = (count1/len(df1['TARGET']))*100
count0 = (count0/len(df1['TARGET']))*100
RESULT:
Thus, the program for case study that is credit card fraud detection has
been implemented and it is also visually represented above.