0% found this document useful (0 votes)
17 views5 pages

23CS5PCDEV

This document outlines the examination details for the Data Exploration and Visualization course at B.M.S. College of Engineering, including instructions, course code, and maximum marks. It contains a series of questions divided into five units, covering topics such as exploratory data analysis, data transformation, cross tabulation, linear and non-linear scales, and web scraping. Students are required to answer five full questions, selecting one from each unit, and are instructed on handling missing data.

Uploaded by

siddanthn.me24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views5 pages

23CS5PCDEV

This document outlines the examination details for the Data Exploration and Visualization course at B.M.S. College of Engineering, including instructions, course code, and maximum marks. It contains a series of questions divided into five units, covering topics such as exploratory data analysis, data transformation, cross tabulation, linear and non-linear scales, and web scraping. Students are required to answer five full questions, selecting one from each unit, and are instructed on handling missing data.

Uploaded by

siddanthn.me24
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

U.S.N.

B.M.S. College of Engineering, Bengaluru-560019


Autonomous Institute Affiliated to VTU

June 2025 Semester End Main Examinations


Programme: B.E. Semester: V
Branch: Computer Science and Engineering Duration: 3 hrs.
Course Code: 23CS5PCDEV Max Marks: 100
Course: Data Exploration and Visualization.
Instructions: 1. Answer any FIVE full questions, choosing one full question from each unit.
2. Missing data, if any, may be suitably assumed.

UNIT - I CO PO Marks
Important Note: Completing your answers, compulsorily draw diagonal cross lines on the remaining blank

1 a) List out the different measurement scales in data exploratory analysis CO1 PO1 10
explaining each of them with an example.
b) Discuss the various steps performed in EDA and explain each process of CO1 PO1 10
the EDA in detail?

OR
pages. Revealing of identification, appeal to evaluator will be treated as malpractice.

2 a) Describe the aims of Exploratory data analysis and differentiate between CO1 PO1 10
exploratory and confirmatory data analysis.
b) With a neat diagram explain the classification of Exploratory Data CO1 PO1 10
Analysis.

UNIT - II
3 a) Perform the following Transformation operations by considering two CO2 PO2 10
pandas data frames as shown below and write the output as well as
operation code.
a) Inner Join b) Left Outer Join c) Full Outer join d) Right Outer Join e)
append
ID NAME ID Age

1 Alice 2 25
2 Bob 3 30
3 Charlie 4 28
b) Illustrate any five Transformation techniques applied in Data CO1 PO1 10
Transformation with an example for each technique.

OR
4 a) Apply the concept of Discretization for data “height” by creating four CO2 PO2 8
bins and apply binning technique for data ages for following data shown
below. Write the Python code to create bins of equal width and equal
frequency distribution.
height = [10, 20,31,54,51,15, 18, 34, 41,53]
b) Suppose you have a dataset containing information about customers' CO2 PO2 6
purchases at a store. The dataset (customer_data.csv) includes columns:
'Customer_ID', 'Age', 'Gender', 'Purchase_Amount'. Your task is to
perform random sampling to select a subset of 20 customers from this
dataset for a survey.
c) Apply the concept Binning technique for data ages for following data CO2 PO2 6
shown below. Write the Python code to create bins categorize them into
different age groups ages = [22, 35, 47, 50, 28, 19, 65, 37, 42, 51]

UNIT - III
5 a) Discuss the concept of cross tabulation in Pandas and explain how it is CO2 PO2 10
different from the Pivot table. Write a python program to demonstrate
the same.
b) Consider a dataset representing sales data for orders purchased. The CO2 PO2 10
dataset includes the following columns:

Write a Python program that performs the following tasks:


i. Display the original dataset with missing values.
ii. Analyze the missing values in the 'salesman_id' column and
discuss possible reasons for their absence.
iii. Choose and implement the chosen method to fill in the missing
values in the 'salesman_id' column.
iv. Display the dataset after filling in the missing values.
v. Calculate the total number of missing values in a DataFrame.

OR
6 a) Consider a dataset representing date wise sales data for various regions. CO2 PO2 10
The dataset includes the following columns
Write a Python program using pandas that performs the following
tasks:
i. Load the given dataset into a pandas DataFrame.
ii. Create a pivot table that shows the total sales for each product
across
different regions.
iii. Calculate the average sales for each product.
iv. Identify the manager with highest sales_amt.
v. Determine the product that contributed the most to the sales in
each region.
b) Explain the various measures of dispersion and classify the different CO2 PO2 10
skewness and Kurtosis measures available with examples?

UNIT - IV
7 a) Differentiate between different types of Linear & Non-Linear scale CO3 PO3 10
explain each scale with an example.
Plot the population densities (assuming your own value in crores) across
Ten different states in India using Logarithmic scale display the output.
b) Consider the data below. Plot a suitable distribution by considering the CO3 PO3 10
data below. Pick up a suitable kernel smoothing function to plot values
for the dataset below how this function is used for smoothing the data
values.
Age Count Age Count
0-5 36 41-45 54
6-10 19 46-50 50
11-15 18 51-55 26
16-20 99 56-60 22
21-25 139 61-65 16
26-30 121 66-70 3
31-35 76 31-35 3

OR
8 a) Consider the table below for different values of the variable Calculate CO3 PO3 10
the Cumulative distribution function. Plot the data points against
cumulative probabilities obtained. Also explain the Quantile Quantile
Distribution how the values are plotted using different intervals of
Normal Distribution.
X 1 4 6 2 5 3
P(x) 0.1 0.3 0.02 0.2 0.02 0.2
b) Analyze the data below of Test Scores in students in different subject. CO3 PO3 5
Draw a suitable visualization technique which can represent this data.
Subject Class A Class B
Math 85 78
Science 90 84
English 88 86
c) Examine the following boxplot and answer the questions. CO2 PO2 5

• Which mode of transportation has a higher median travel time?


• Calculate the interquartile range (IQR) for both Car and Bus.
• Identify if there are any outliers in the travel time of the Bus.
• Compare the variability in travel time between Car and Bus.
UNIT - V

9 a) Create a suitable code to perform web scraping using URL CO2 PO2 10
http://www.geeksforgeeks.org and print the html documentation by
using suitable library. Perform Web scraping also for the url
http://example.com to print Xml documentation.
b) Differentiate between serialization & deserialization in pandas. With an CO2 PO2 10
Example python code create a random of 100 numbers by storing the
frame created in HDF5 binary format.
OR

10 a) Explain the concept of hierarchical Indexing. Create a Multilevel Index CO3 PO3 8
for a random series of 12 numbers with corresponding row and column
labels with a dimension of four cross three by writing a suitable python
code. Display the output of python code.
b) Perform stacking and unstacking operation by creating a suitable data CO3 PO3 6
frame of two cross three (2x3 Matrix). Index the dataframe by suitable
column name as (a,b.c) and row names by person1 and person2.
c) Perform the following operation by creating two two-dimensional arrays CO3 PO3 6
using numpy i) addition ii) cross-product iii) dot-product iv) subtraction.
Display the result obtained from the above operation.

You might also like