Section A
Assume the following libraries have been imported:
import numpy as np
import pandas as pd
1. (i) A teacher wants to store the marks of four students in two subjects where
marks are random numbers between 40 and 60 (both inclusive). Write Python
code using an appropriate data structure to store marks. Also display the
average marks of the four students.
(ii) Consider a DataFrame df as shown below:
df
Using df determine the output of the following code snippet:
print('Shape of dataFrame: ', [Link])
df_filtered = [Link](thresh=2, axis =0)
print('New Frame:\n', df_filtered)
print('Shape of New Frame:', df_filtered.shape)
(iii) Determine the output of the following code snippet:
arr1 = [Link]([[1, 2, 3],[4, 5, 6]])
arr2 = [Link]([10, 20, 30])
arr3 = arr1 + arr2
print(arr3)
arr4 = [Link](1,0)
print(arr4)
(iv) Differentiate between simple random sampling and stratified random
sampling. Give one example for each type of sampling.
(v) Consider the following two variables:
data = {‘Student_Name’: [‘S1’, ‘S2’, ‘S3’, ‘S4’,
‘S5’, ‘S6’, ‘S7’, ‘S8’, ‘S9’, ‘S10’],
‘Score’: [85, 72, 92, 65, 78, 88, 45, 60, 70, 98]}
grades = { ‘score_ranges’:[‘0-60’, ‘60-70’,
‘70-80’, ‘80-90’, ‘90-100’]
‘letter_grades’:[‘F’, ‘D’, ‘C’, ‘B’, ‘A’]}
Write Python code to do the following:
Page Number 1 of 6
a) Create a DataFrame Student containing students’ exam scores using the
above dictionary data.
b) Add a column Grade_Obt to the DataFrame Student.
c) Categorize each students’ scores into letter grades (‘A’, ‘B’, ‘C’, ‘D’, ‘F’)
based on the given score ranges in grades above.
d) Display names of students getting grade ‘A’.
(vi) Write Python code to load the titanic dataset from seaborn library into
a data frame and replace the missing values in each column by the mean of
that column.
3 (i) Distinguish between unimodal, bimodal and multimodal distribution. Use a
diagram to illustrate your answer.
(ii) Determine the output of the following code snippet using the DataFrames df1
and df2:
df1 df2
merged_df1 = [Link](df1, df2, how='outer')
print("Merged DataFrame 1:")
print(merged_df1)
merged_df2 = [Link](df1, df2, how='outer', on=
[“A”])
print("Merged DataFrame 2:")
print(merged_df2)
(iii) Given the list: my_list = list(range(0,6))
a) Create a numpy array One_D using my_list. Convert this one-
dimensional array into a two-dimensional array Two_D with 3 rows.
b) Create another numpy array Tran_Two_D that is the transpose of the
array Two_D.
c) Replace all odd numbers in Two_D with -1.
d) Print the sum of the arrays Two_D and Tran_Two_D.
4(i) Write Python code for the following:
a) Create a numpy array ArrayNum with 5 rows and 4 columns to store
random numbers from 0 to 1.
b) Compute the mean and standard deviation of each row in the array
ArrayNum.
Page Number 2 of 6
c) Convert the ArrayNum into a DataFrame df and name the columns as
‘A’, ‘B’, ‘C’ and ‘D’ respectively. Name the rows as ‘One’,
‘Two’, ‘Three’, ‘Four’ and ‘Five’ respectively.
d) Find the correlation between columns A and C of df.
(ii) Consider the following data frame df_Sales containing sales data for
multiple products across different regions for four quarters of a year:
df_Sales
Write Python code for the following:
a) Create a boxplot for the column ‘Sales_in_INRLakhs’ in
df_Sales. Give an appropriate title to the plot and save the file on disk
b) Create a hierarchical index for df_Sales such that data is arranged
region-wise and quarter-wise within each region.
c) Using df_Sales, find total sales in the northern region in the 2nd quarter.
d) Update df_Sales such that the Sales_in_INRLakhs for product
‘A’ is increased by 25%.
5(i) Consider a DataFrame df as shown below:
df
Determine the output of the following code snippet using the given DataFrame
df:
Page Number 3 of 6
filled_df1 = [Link](100)
filled_df2 = [Link]()
filled_df3 = [Link]()
print("filled df1:")
print(filled_df1)
print("filled df2:")
print(filled_df2)
print("filled df3:")
print(filled_df3)
print(filled_df3.value_counts())
(ii) Consider the following Series having details of four products:
sales = [Link]({‘Product A’: 5000, ‘Product B’:
8000, ‘Product C’: 3000, ‘Product D’: 6000})
Write Python code to do the following:
a) Find the product name whose sales is maximum.
b) Determine the total quantity sold for all products taken together.
c) Calculate the percentage contribution of each product to the total sales.
d) Find products whose sale is lesser than the average sale.
e) Update the sales figure for the product with the lowest sales to 9999.
6 Consider the following DataFrame Income_Data:
Income_Data
Write Python code for the following:
i. Use an appropriate plot to visualize the distribution of age in
Income_Data. Give an appropriate title to the chart, the x-axis and the
y-axis.
ii. Find the minimum income for each level of education in Income_Data.
Page Number 4 of 6
iii. Determine the Education_Level with the highest average income.
iv. Find the average age of individuals having Income more than 60000.
v. For each Education_Level, find the total number of individuals
studying at that level.
7 Assume that the following data about rubies is saved in an excel file
[Link]).
Cut_Type Cost X Y
Ideal 53940 2.2 1.1
Premium 38450 2.9 1.4
Ideal 64730 2.3 1.2
Good 8493 1.8 0.9
Premium 29480 2.8 1.3
Good 9838 1.7 0.8
Write Python statements to do the following (Mention the libraries used
explicitly):
i. Read data from the given excel file [Link] into a DataFrame df.
ii. Find the total Cost of rubies.
iii. Display the unique values of column Cut_Type.
iv. Find the statistical summary of all numeric columns in the DataFrame df.
v. Arrange details of rubies by Cost in descending order.
vi. Rename the column X as Length and Y as Width.
vii. Create a heatmap of the correlations between the numeric features of df.
Give plot title as “Correlation Matrix”. Save the plotted figure to
a file named “[Link]”.
8 Given a comma separated file [Link] consisting of the
following details of automobiles:
autodf
Cyl: cylinders, HP: horse power
Write Python statement(s) to do the following (Make use of appropriate
libraries):
Page Number 5 of 6
i. Read from the given CSV file [Link] and store this data in
a DataFrame autodf.
ii. Find the number of missing values in each column.
iii. Replace missing values in the HP column with its mean value.
iv. Print Year_of_Make and Model_Name of cars from origin “USA”.
v. Calculate the average Miles for each year.
vi. Plot a pie chart on Country. Give title of the plot as “Pie Chart -
Country”.
vii. Choose a suitable plot to compare frequency of distinct values of the
cylinders. Give appropriate labels to the axes and add a title to the chart.
Page Number 6 of 6