Data Analytics All Paper Solution
Data Analytics All Paper Solution
Paper 1
Q1) Attempt all of the following for 1 mark:
a) Define Data Analytics.
Data Analytics is the science of extracting meaningful, valuable
information from raw data to aid decision-making and identify
patterns.
b) Define Tokenization.
Tokenization is the process of splitting text into smaller units,
such as words or phrases, for analysis in text processing.
c) Define Machine Learning.
Machine Learning is a subset of AI where systems improve their
performance by learning from data without being explicitly
programmed.
d) What is clustering?
Clustering groups data into clusters based on similarity, with
each cluster containing data points similar to each other.
e) What is Frequent Itemset?
A frequent itemset is a set of items that occur frequently
together in transactions, typically identified in market basket
analysis.
f) What is data characterization?
Data characterization summarizes the general features or
properties of a dataset to provide insights into its content.
g) What is outlier?
An outlier is a data point that significantly deviates from the
rest of the data, indicating a potential error or anomaly.
h) What is Bag of Words?
The Bag of Words (BoW) model represents text by counting the
frequency of each word, ignoring grammar and word order.
i) What is Text Analytics?
Text Analytics analyzes textual data to extract meaningful
patterns, insights, and trends.
j) Define Trend Analytics.
Trend Analytics identifies patterns or trends in data over time to
provide actionable insights.
Paper 1, Q3
a) What is prediction? Explain any one regression model
in detail.
Prediction involves forecasting future values or outcomes
based on historical data using models or statistical methods. It
is widely used in fields like business forecasting, healthcare,
and finance. Predictions are typically categorized into
classification (categorical output) or regression (continuous
output).
Linear Regression Model:
Linear regression is a supervised learning algorithm used for
predictive analysis. It predicts a dependent variable (Y) based
on the relationship with an independent variable (X) using the
equation:
Y=mX+cY = mX + c
Where:
mm: Slope of the line (rate of change).
cc: Y-intercept (value of YY when X=0X = 0).
Example: Predicting house prices based on size.
Steps in linear regression:
1. Collect historical data (e.g., house sizes and prices).
2. Plot data points on a graph.
3. Determine the best-fitting line that minimizes the sum of
squared errors between actual and predicted values.
Advantages:
Simple to implement and interpret.
Useful for small datasets.
Limitations:
Assumes linear relationships, which may not hold for
complex data.
e) What is an outlier?
An outlier is a data point significantly different from others in a
dataset, potentially indicating variability or measurement error.
i) Define classification.
Classification is a supervised learning technique where input
data is categorized into predefined classes based on its
features.
j) Define Recall.
Recall measures the proportion of actual positives correctly
identified by the model.
Recall=True Positives (TP)True Positives (TP) + False Negatives (
FN)\text{Recall} = \frac{\text{True Positives (TP)}}{\text{True
Positives (TP) + False Negatives (FN)}}