Dsa - DK Question Paper

The document outlines an assessment for a Data Science and Analytics course at Anna University, focusing on various analytical techniques and machine learning applications. It includes questions on statistical testing, data modeling, and machine learning concepts, along with practical scenarios for students to analyze. The assessment is divided into three parts, covering theoretical and practical aspects of data analytics and machine learning.

Uploaded by

mytreyan197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views4 pages

Dsa - DK Question Paper

Uploaded by

mytreyan197

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Roll No.

DEPARTMENTSY
(UNIVERSITYTechnology
ANNA UNIVERSITYof Information
Department SemesterVI
Analytics
Science and
IT5602 Data Assessment 1
(Regulation 2019)
Max. Marks: 50

Time: 1.5 hrs

2025
Date: 26th February model with analytical
solutions.
business problems and
ldentify the realworld mathematics background knowledge.
CO 1
problem with relevant apply suitable statisticaltesting
Solve analytical hypothesis and
CO 2
any real world decision making problem to analytics using Hadoop and MapReduce.
Convert
CO 3
simple applications involving data.
CO 4 Write and demonstrate for modeling and
storing
source frameworks
CO 5 Use open visualization using Python.
CO 6 Perform data analytics and Creating)
BL- Bloom's
Taxonomy Levels
- Applying, L4 - Analyzing, L5 - Evaluating, L6 -
Remembering, L2 - Understanding, L3
(L1-
PART- A(5 x 2 = 10 Marks)
(Answer all Questions)
Marks CO BL
Questions 2 1 3
Q. No contribute to data science?
machine learning 2 3 4
1 How does
increase in data generation, how can organizations
2 With the rapid
ensure data quality and accuracy?often assume data follows a normal 2 3 3
models
3 Why do machine learning
distribution? 2 2 4
How does a skewed distribution impact machine learning models? 2 2
important before building a machine
2
5 Why is univariate analysis
learning model?
PART- B(2 x 13= 26 Marks)
Questions Marks CO
Q. No
to predict customer 07 3
6 (a)(0) Amachine learningengineer is developing a model
satisfaction (Z) based on:
replies, in
Response time (X) (howquickly customer service
minutes)
Resolution time (Y) (how long it takes to resolve the issue, in
hours)
They calculate the correlation matrix:
1 0.78 -0.85]
R= 0.78 1 -0.92
-0.85-0.92 1
where:
r(X,Z) = -0.85 (Response time vs. Satisfaction)
r(Y,Z) = -0.92 (Resolution time vs, Satisfaction)
rX,Y) = 0.78 (Response time vs. Resolution time)
1
(A)Which
(B) How factor affects Customer satisfaction more?
6 (a) (ii)
should the companyimprove satisfaction?
A 3
self-drivingof a
probability
sensor
caruses radar sensors to detect pedestrians. The The
pedestrian being present at a given moment
has the following characteristics:
is 0.2.
prior 06 5

Irue Positive Rate: The sensor correctly detects a pedestrian

90% of the time (P(D+ | P) = 0.9).
False Positive Rate: The sensor incorrectly detects a
pedestrian 10% of the time when none is present (P(D+ | -P)
If the =0.1).
sensor
pedestrian
detects a pedestrian, what is the probability that a
is actually present?
Comment on the result.
6 (b) OR
Osing the below data, apply NBC to identify the species of an entity with 13 3 5
the following attributes.
X= {Color = Green, Legs = 2, Height =
SI. No.
Tall, Smelly = No}
Color Legs Height Smelly Species
1 White 3 Short Yes MA
2 Green 2 Tall No M
3 Green 3 Short Yes M
4 White 3 Short Yes I
5 Green 2 Short No
6 White 2 Tall No
7 White 2 Tall No
8 White 2 Short Yes H

7(a) A dataset has three features: X1, X2, X3 with the following covariance
matrix: 13 3 3

[2 1 0
C=1 2
0 12
) Compute the eigenvalues and
eigenvectors.
(i) ldentify the top 2 principal components.
(i) Reduce the dataset from 3D to 2D using PCA.
OR
7 (b) A binary classification dataset has two classes: 13 3 3
Class 1: X,= (1,2), (2,3), (3,3)}
Class 2: X2= {(6,5), (7,8), (8,8))
()Compute the class means mand m2.
(ii) Compute the scatter matrices: within-class scatter Sw and
between-class scatter S.
(ii)Compute the LDA projection vector.
(iv) Project the dataset onto the LDA axis.
2
PART- C(1x 14 =14 Marks)
(Q.No.8 is compulsory)
Q. No Questions Marks CO
to predict the
A car dealership wants to build a rearession model They have 14 1,2,
selling price of used cars based on historical data.
collected the following features:
V Age of the car (years)
V Mileage (miles driven)
V Engine size (liters)
Y Brand popularity score (0-10 scale)
The dataset has 5000 records, and after trainingaMultiple Linear
Regression model, the dealership finds the R² score is 0.55,
which is lower than expected.
You are hired as a machine learning expert toanalyze and improve
the model.

After training, the dealership gets the following regression

equation:
Price =30, 000 - 2,500xAge - 0. 05×Mile age +4,000×Engine Size
+1,500×Brand Popularity
(a) Interpret the regression coefficients. What do they mean in real
world terms?

(b) The model has an R² score of 0.55. What does this mean? Is it
good enough?
(c) What additional features might improve the model's accuracy?
Why?
(d) The dealership now adds more features, but the R²score on
training data rises to 0.90, while the test R²remains at 0.55. What
problem is occurring, and how can it be fixed?
Roll No.
Department of Information Technology
Semester VI (Regulation 2019)
IT5602 Data Science and Analytics
Assessment 1

Time: 1.5 hrs Max. Marks: 50

Date: 16th April 2025
Part-A (5x2 = 10)
1. What happens to bias and variance when you use a very complex model (e.g., a deep neural network) on a small dataset?
What are the risks of removing all outliers from adataset? 'o
3. Why does HDFS use large block sizes (e.g., 128MB) instead of small ones like traditional file systems?
4,What are the risks of using a flexible schema in MongoDB?
5. Why is HiveQL not suitable for real-time analytics?
Part-B (13x2 = 26)
6.(a) Consider a scenario,where you have a 2x2 Gridworld with 4 states: S1, S2, S3,S4.
You start in any state, and at each step, you can move Up, Down, Left, or Right, but if the move would take you off the grid,
you stay in the same state.
Alltransitions are deterministic.
The reward for all transitions is -1, and the discount factor y = 0.9.
You are given a uniform random policy: each action has equal probability (0.25).
Using the Bellman Expectation Equation:
V(s) = ) n(als)> P(s' |s, a) [R(s, a, s') +yV(s)]
a

Compute the value function for each state using iterative policy evaluation for 3 iterations, initialized at V(s) = 0 for all
states.
OR
6.(b) Consider the following Scenario:
You have a dataset of customer profiles including age (numerical),. gender (categorical), and browser used (categorical). You
try to use K-Means, but the results make little sense.
1. Explain why K-Means is not suitable for this type of data.
I1. Suggest appropriate alternatives.
Ill. How can you preprocess this data to make it more suitable for K-Means (if necessary)?
7.(a) in a company, a manager suggests using unsupervised clustering to build a customer churn prediction model, because
the dataset has no labels.
Critically evaluate this suggestion. What are the risks of using clustering for a classification problem? Propose a better
alternative if labels are unavailable.
OR
7.(b)Consider a scenario where you're building a recommendation engine and need to evaluate different algorithms. A
colleague suggests using a simple train/test split (80/20 hold-out) instead of K-Fold for faster experimentation.
Evaluate the pros and cons of using hold-out validation versus K-Fold in this scenario. When is each approach preferable?
Part-C (1x14 = 14)
8. You area data engineer at a large e-commerce company. Your team is planning to store and process petabytes of user
clickstream data. The data will be used for analytics, recommendation engines, and fraud detection. The CTO suggests using
Hadoop Distributed File System (HDFS) to store this data because of its scalability and fault tolerance.
However,your team is concerned about the following:
The average file size is only 1MB, but there are millions of files generated daily.
Youneed fast access to small files for real-time analytics.
V Storage nodes (DataNodes) are expected to fail occasionally due to hardware constraints.
The team is considering whether to increase or reduce the default block size (128MB).
V There's also a plan to store machine learning models and image data on the same cluster.
Critically evaluate the suitability of HDESfor this workload.
ldentify and explain at least three challenges this scenario poses for HDEs. and propose practical solutions or
workarounds for each.

Data Science For Online Customer Analytics - Assignment
No ratings yet
Data Science For Online Customer Analytics - Assignment
11 pages
CS3002 Solution Paper 2015.16 - v2
No ratings yet
CS3002 Solution Paper 2015.16 - v2
6 pages
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
100% (3)
Ultimate Step by Step Guide To Machine Learning Using Python Predictive
56 pages
CEG Assessment II
No ratings yet
CEG Assessment II
4 pages
SampleQuestion - AIOL 2024
No ratings yet
SampleQuestion - AIOL 2024
5 pages
Eda Fat
No ratings yet
Eda Fat
3 pages
Mba ZG536 Ec-2r First Sem 2023-2024
No ratings yet
Mba ZG536 Ec-2r First Sem 2023-2024
4 pages
ML SP24 Mid Term Exam - Solution
No ratings yet
ML SP24 Mid Term Exam - Solution
8 pages
Ids Final Sol
No ratings yet
Ids Final Sol
16 pages
Soal CISDM
No ratings yet
Soal CISDM
3 pages
Big Data Analysis On ML Main Points
No ratings yet
Big Data Analysis On ML Main Points
5 pages
BDA University Question Paper
No ratings yet
BDA University Question Paper
10 pages
ML FA24 Final Term Exam (Solution)
No ratings yet
ML FA24 Final Term Exam (Solution)
19 pages
HW 02
No ratings yet
HW 02
3 pages
ML End Sem Nov2024 Paper
No ratings yet
ML End Sem Nov2024 Paper
4 pages
2022 CS244 End Sem Soln
No ratings yet
2022 CS244 End Sem Soln
6 pages
Yr 2022-26, All Subjects Mid Sem 4 Pyqpdf by Himanshu Raj (AIDS )
No ratings yet
Yr 2022-26, All Subjects Mid Sem 4 Pyqpdf by Himanshu Raj (AIDS )
9 pages
Id5059 23 2 1
No ratings yet
Id5059 23 2 1
8 pages
DM-I Q Paper 2024
No ratings yet
DM-I Q Paper 2024
12 pages
Data Science and ML-KTU
No ratings yet
Data Science and ML-KTU
11 pages
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
No ratings yet
0.extracted Pages 20MCA201 From 2020 MCA S3 S4
18 pages
2024 Fods Ques
No ratings yet
2024 Fods Ques
4 pages
PA Answers
No ratings yet
PA Answers
4 pages
BigDatal PDF
No ratings yet
BigDatal PDF
50 pages
Exam - HND
No ratings yet
Exam - HND
3 pages
Question Bank1
No ratings yet
Question Bank1
9 pages
Machine Learning PYQ 2021
No ratings yet
Machine Learning PYQ 2021
4 pages
Ids Past Papers Merged
No ratings yet
Ids Past Papers Merged
62 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
DM Practice Problem Set-2
No ratings yet
DM Practice Problem Set-2
7 pages
MachineLearning MidTerm UMT Spring 2021
100% (1)
MachineLearning MidTerm UMT Spring 2021
12 pages
MFDS - Test 1 Problems
No ratings yet
MFDS - Test 1 Problems
9 pages
DSBDA Merged
No ratings yet
DSBDA Merged
13 pages
Data Mining End 23 24
No ratings yet
Data Mining End 23 24
2 pages
Assignment III
No ratings yet
Assignment III
3 pages
CSCI 5521 Spring 2025 Final Exam
No ratings yet
CSCI 5521 Spring 2025 Final Exam
8 pages
ML Mid Sem Sep2023 Paper
No ratings yet
ML Mid Sem Sep2023 Paper
3 pages
CS2011 Ai & ML End Sem
No ratings yet
CS2011 Ai & ML End Sem
2 pages
ML QB
No ratings yet
ML QB
6 pages
HW 4
No ratings yet
HW 4
13 pages
BA Questions
No ratings yet
BA Questions
5 pages
Untitled Document
No ratings yet
Untitled Document
8 pages
Assignment 1 DA - E Oct 2023 V1-1
No ratings yet
Assignment 1 DA - E Oct 2023 V1-1
3 pages
Midsem I 31 03 2023
No ratings yet
Midsem I 31 03 2023
12 pages
Machine Learning PYQ ALL (Pran Tehare)
No ratings yet
Machine Learning PYQ ALL (Pran Tehare)
18 pages
Machine Learning 1
No ratings yet
Machine Learning 1
2 pages
Sample QP For Mid-Semester Exam
No ratings yet
Sample QP For Mid-Semester Exam
5 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
CCD 7104 Machine Learning 2
No ratings yet
CCD 7104 Machine Learning 2
4 pages
DIT865 2018 Mar Solution
No ratings yet
DIT865 2018 Mar Solution
9 pages
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
Big Data Analytics Suggestion
No ratings yet
Big Data Analytics Suggestion
3 pages
AI Past Quizzes and Past Question
No ratings yet
AI Past Quizzes and Past Question
12 pages
Ba ZG512 Ec-3r First Sem 2024-2025
No ratings yet
Ba ZG512 Ec-3r First Sem 2024-2025
3 pages
Datanest - Data Science Interview
No ratings yet
Datanest - Data Science Interview
19 pages
ML Papers
No ratings yet
ML Papers
10 pages
Question Samples
No ratings yet
Question Samples
4 pages
Data Science Final Mock Test
No ratings yet
Data Science Final Mock Test
47 pages
2024 May Data Science and Big Data Analytics Ds Bda Pattern 2019
No ratings yet
2024 May Data Science and Big Data Analytics Ds Bda Pattern 2019
3 pages
ML MID-1 QB With Answers
No ratings yet
ML MID-1 QB With Answers
10 pages
How to Crack Tech Interviews in the Era of AI?: 1, #1
From Everand
How to Crack Tech Interviews in the Era of AI?: 1, #1
DR. SOHIT AGARWAL
No ratings yet
APIGen-MT: Agentic Pipeline For Multi-Turn Data Generation Via Simulated Agent-Human Interplay
No ratings yet
APIGen-MT: Agentic Pipeline For Multi-Turn Data Generation Via Simulated Agent-Human Interplay
12 pages
Unit 1 DMW
No ratings yet
Unit 1 DMW
41 pages
Movie Recommender System Using K-Means
No ratings yet
Movie Recommender System Using K-Means
7 pages
Credit Card Fraud Detection Using Machine Learning
No ratings yet
Credit Card Fraud Detection Using Machine Learning
28 pages
Walton PDF
100% (1)
Walton PDF
27 pages
Python-Project 2023
No ratings yet
Python-Project 2023
19 pages
Machine Learning Presentation
No ratings yet
Machine Learning Presentation
20 pages
34 Machine Learning Interview Questions & Answers For 2020
No ratings yet
34 Machine Learning Interview Questions & Answers For 2020
27 pages
Women in Artificial Intelligence AI
No ratings yet
Women in Artificial Intelligence AI
334 pages
Golden Hour
No ratings yet
Golden Hour
28 pages
2nd Sem, Data Science Syllabus
No ratings yet
2nd Sem, Data Science Syllabus
16 pages
INAIO Syllabus
No ratings yet
INAIO Syllabus
4 pages
Cluster Analysis Set 01: Types of Clustering
No ratings yet
Cluster Analysis Set 01: Types of Clustering
18 pages
Ba 404
No ratings yet
Ba 404
2 pages
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
No ratings yet
Network Traffic Intrusion Detection System Using Decision Tree & K-Means Clustering Algorithm
3 pages
54 Sukre Avishkar Vitthal Heirarchical Clustering
No ratings yet
54 Sukre Avishkar Vitthal Heirarchical Clustering
15 pages
Use of Big Data Analytics in Banking Industry PDF
No ratings yet
Use of Big Data Analytics in Banking Industry PDF
4 pages
U20cs604 Machine Learning Unit III
No ratings yet
U20cs604 Machine Learning Unit III
23 pages
ML Interview Questions
No ratings yet
ML Interview Questions
146 pages
98 Jicr September 3208
No ratings yet
98 Jicr September 3208
6 pages
Viberg Et Al. (2018) - The Current Landscape of Learning Analytics in HE
No ratings yet
Viberg Et Al. (2018) - The Current Landscape of Learning Analytics in HE
13 pages
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
No ratings yet
2023-Contextualizing The Current State of Research On The Use Ofmachine Learning For Student Performance Prediction Asystematic Literature Review
25 pages
Functional Data Clustering - A Survey
No ratings yet
Functional Data Clustering - A Survey
29 pages
Speech Emotion Recognition1
No ratings yet
Speech Emotion Recognition1
86 pages
Course Objectives:: University of Mumbai, Information Technology (Semester V and VI) (Rev-2012)
No ratings yet
Course Objectives:: University of Mumbai, Information Technology (Semester V and VI) (Rev-2012)
5 pages
Advanced Machine Learning and Artificial Intelligence
No ratings yet
Advanced Machine Learning and Artificial Intelligence
9 pages
Data Science Technical Interview Questions
No ratings yet
Data Science Technical Interview Questions
24 pages
Deep Learning Based Convolutional Neural Networks (DLCNN) On Classification Algorithm To Detect The Brain Turnor Diseases Using MRI and CT Scan Images
No ratings yet
Deep Learning Based Convolutional Neural Networks (DLCNN) On Classification Algorithm To Detect The Brain Turnor Diseases Using MRI and CT Scan Images
8 pages
KV Preboard
No ratings yet
KV Preboard
14 pages

Dsa - DK Question Paper

Uploaded by

Dsa - DK Question Paper

Uploaded by

Roll No.

Time: 1.5 hrs

Irue Positive Rate: The sensor correctly detects a pedestrian

After training, the dealership gets the following regression

Time: 1.5 hrs Max. Marks: 50

You might also like