.
Additional programs
Note: Practice these Questions
[Link] a Python program to
1. Create a NumPy array with the product sales below and reshape it to a 2x4 matrix.
2. Calculate the sum and mean of sales.
3. Convert the data into a Pandas DataFrame with columns "Product" and "Sales" and
export it to a CSV file.
4. Load the CSV file and print the DataFrame.
Dataset:
Product Sales
A 120
B 200
C 150
D 180
Q2. Given the data of student scores, write a Python program that:
1. Detects missing values and fills them with the average score of the available data in each
subject.
2. Creates a dictionary with students as keys and their average scores as values.
3. Converts the dictionary values to a NumPy array, calculates mean and standard deviation,
and plots a bar chart of scores.
Dataset:
Student Math Science English
Alex 85 78 92
Ben 74 - 81
Cara 90 87 -
Dana 88 75 83
[Link] a dataset on car prices and mileages, write a Python program that:
1. Creates a Pandas DataFrame from the data below.
2. Finds the sum, mean, and standard deviation of the "Price" column.
3. Normalizes the "Mileage" and "Price" columns to a 0-1 range and adds them as new
columns.
Dataset:
Car Model Price Mileage (in 1000s)
Sedan 20000 35
SUV 30000 45
Truck 25000 60
Coupe 27000 40
[Link] Cleaning, Standardization, and DataFrame Manipulation with Plotting
Using the dataset of test scores, write a Python program that:
1. Identifies and replaces missing values with the median score for each subject.
2. Calculates the total and average scores for each student.
3. Standardizes the scores in each subject (mean=0, std=1) and stores them in a new
DataFrame.
4. Plots a line graph showing standardized scores across subjects for each student.
Dataset:
Student Math Science English
Tom 78 85 -
Lucy 92 88 80
Max 85 - 75
Zoe 70 78 82
Q5. Given the data on house listings, write a Python program that:
1. Creates a Pandas DataFrame from the data below.
2. Calculates the sum, mean, and standard deviation of the "Price" and "Area" columns.
3. Standardizes the "Price" and "Area" columns (mean=0, std=1) and adds them as new
columns.
Dataset:
House Type Price (in 1000s) Area (sqft)
Apartment 150 900
Villa 300 2200
House Type Price (in 1000s) Area (sqft)
Townhouse 200 1500
Cottage 180 1200
Additional Programs
Dataset: Employee Performance and Salary Dataset
Years_of_Exp Performance_ Promotions_L
Employee_ID Age Department Salary (in₹) Education_Level
erience Score ast_5_Years
1 25 HR 2 45000 75 0 Bachelors
2 28 Finance 5 65000 82 1 Masters
3 35 IT 10 90000 88 1 PhD
4 40 HR 15 75000 70 0 Masters
5 29 Marketing 6 72000 85 1 Bachelors
6 50 IT 25 120000 92 1 PhD
7 32 Finance 8 84000 78 1 Masters
8 42 Marketing 18 97000 90 0 PhD
9 23 HR 1 40000 65 0 Bachelors
10 37 IT 12 88000 83 1 Masters
Question 1: Basic Data Preprocessing
Using the given Employee Performance and Salary Dataset, write a Python program to
implement the following basic data preprocessing steps:
1. Check for and handle any missing values in the dataset.
2. Perform Outlier Detection on the following numeric columns:
Age
Salary
Performance_Score
Question 2: Skewness and Kurtosis with Box Plot
Using the given Employee Performance and Salary Dataset, write a Python program to
1. Calculate the Skewness and Kurtosis for the following columns:
Age
Salary
Performance_Score
2. Plot a Box Plot for the Salary column to analyze its distribution.
3. Interpret the Skewness and Kurtosis results and the Box Plot to describe the nature of
the Salary distribution.
Question 3: Feature Selection using ANOVA
Task:
Using the given Employee Performance and Salary Dataset, write a Python program to perform
feature selection using the ANOVA F-test.
1. Use the column Performance_Score as the target variable.
2. Identify the significance of the independent features (Age, Years_of_Experience, Salary,
etc.) in predicting Performance_Score.
3. Display the F-values and p-values for each feature.
4. Interpret the results to identify which features are statistically significant.
Question 4: Heatmap for Correlation
Using the given Employee Performance and Salary Dataset, write a Python program to:
1. Calculate the correlation matrix for all numeric columns.
2. Generate a Heatmap using Seaborn to visualize the correlation between the numeric
features.
3. Identify and print:
The pair of features with the highest positive correlation.
The pair of features with the highest negative correlation.
Question 5: Regression and Classification Models
Task:
Using the given Employee Performance and Salary Dataset, write Python programs for the
following tasks:
a) Regression Task:
Build a Linear Regression model to predict Salary using the following input features:
Age, Years_of_Experience, and Performance_Score.
b) Classification Task:
Develop a Logistic Regression model to classify whether an employee has been
promoted in the last 5 years (Promotions_Last_5_Years column) based on all other
features.
Evaluate the model using accuracy and Confusion Matrix.