0% found this document useful (0 votes)
41 views

Predictive Numericals 20 Questions

Uploaded by

fwtngwf47h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Predictive Numericals 20 Questions

Uploaded by

fwtngwf47h
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

1.

Given the following datasets:

(a) Calculate the mean, median, and mode (if it exists) for the following data.

Data: 78, 85, 92, 67, 70, 88, 92, 81, 85, 79

(b) Determine the missing value x for the following data, given that the mean is 27.

Data: 23, 27, 31, 22, 30, 28, x, 26, 29

(c) Calculate all three measures of central tendency, and identify which measure(s) is/are
affected by the outlier (250).

Data: 42, 45, 47, 44, 43, 46, 44, 250

2. The number of books read by students in a semester is as follows:

Number of Books Frequency


0-2 4
3-5 8
6-8 6
9-11 2

• Estimate the mean number of books read using the midpoint method.
• Determine the modal class.
• Find the median class.

3. A company conducted a survey of annual incomes (in thousands) of employees in two depart-
ments:

Department A: 35, 40, 38, 50, 100, 120 Department B: 35, 38, 39, 40, 41, 42

• Calculate the mean, median and mode for both departments.


• Which department has more income disparity? Justify your answer using the calculated
measures.

4. The following are the weekly working hours of employees in a company:

Data: 32, 35, 40, 38, 42, 45, 39, 44, 46, 41, 48, 50, 36, 43, 49

• Calculate the first quartile Q1 , the second quartile Q2 (median) and the third quartile Q3 .
• Find the five-number summary and construct a box plot based on it.

5. Two sets of exam scores are given:

Class A: 55, 60, 65, 70, 75, 80, 85, 90 Class B: 45, 50, 55, 60, 75, 80, 85, 95

• Calculate the quartiles for both classes.


• Which class has a higher interquartile range (IQR)?

6. The following are the yearly expenses (in thousands) of a group of individuals:

Data: 150, 160, 165, 170, 175, 180, 185, 200, 250, 500

• Identify any outliers in the data using the IQR method.


7. The following table shows the ages of employees in a company:

Age (years) Frequency


20-25 5
26-30 7
31-35 12
36-40 8
41-45 5

• Estimate the first quartile Q1 , the median/second quartile Q2 and the third quartile Q3 .

8. Two employees track their monthly sales over the past year:

Employee A: 12, 15, 14, 16, 18, 17, 19, 20, 22, 24, 21, 25

Employee B: 10, 30, 20, 40, 50, 25, 35, 45, 55, 30, 25, 60

• Calculate the variance for both employees’ monthly sales.


• Which employee shows more variability in their sales?
Justify your answer with a proper statistical measure.

9. Given the following dataset of the sizes (in square feet) and corresponding prices (in thousands)
of 8 houses:

Size (sq ft) Price (in 1000)


1500 300
1800 360
2100 420
2400 480
2600 520
3000 600
3200 640
3500 700

• Calculate the variance and standard deviation of the house sizes.


• Calculate the population variance and population standard deviation of the house prices.
• Calculate the sample variance and sample standard deviation of the house prices, assuming
this is a sample of a larger population.
• If the price of houses are increased by 100 thousand to the current house price, calculate
the new variance and standard deviation. Comment on the effect of adding a constant to
all data points on variance and standard deviation.

10. The following table shows the distribution of marks obtained by students in a test:

Marks (Range) Frequency


0-20 3
21-40 7
41-60 12
61-80 6
81-100 2

• Estimate the variance and standard deviation of the marks.

11. The following are the exam scores of 20 students:

Data: 45, 50, 55, 60, 62, 63, 65, 68, 70, 72, 74, 75, 78, 80, 82, 83, 85, 88, 90, 95

• Construct a histogram for the data using class intervals of width 10.
• Describe the shape of the distribution (e.g., symmetric, skewed).
12. Given the following data set of incomes (in thousands):

Data: 22, 25, 28, 30, 35, 40, 42, 45, 48, 50

• Calculate the quartiles (Q1 , Q2 , and Q3 ).


• Construct a quantile plot using the calculated quantiles.

13. The following are the heights (in cm) of 10 individuals:

Data: 150, 152, 154, 156, 158, 160, 162, 164, 166, 168

• Generate a Q-Q plot to check if the data follows a normal distribution.


• Interpret the Q-Q plot and discuss whether the data appears to be normally distributed.

14. The following table provides data on hours studied and exam scores for 8 students:

Hours Studied Exam Scores


5 50
6 55
7 60
8 65
9 72
10 74
11 80
12 85

• Create a scatter plot for the data.


• Comment on the relationship between hours studied and exam scores. Is there a positive
or negative correlation?

15. The following table provides data on the size of houses (in square feet), the number of bedrooms,
and the corresponding house prices (in thousands) for 6 houses:

Size (sq ft) Number of Bedrooms Price (in 1000s)


1500 3 300
1800 4 360
2400 4 480
3000 5 600
3500 5 700
4000 6 800

Perform the following analyses:

• Calculate the correlation coefficient between the size of houses and their prices. Interpret
the result. Does it indicate positive, negative, or no correlation?
• Calculate the covariance between the size of houses and their prices.
• Calculate the covariance matrix for the variables: Size, Number of Bedrooms, and Price.
Interpret the signs of the covariances.
• Apply standardization to the house prices.
• Apply normalization (Min-Max scaling) to the house prices.

16. Consider the following two data points representing the ratings of two users on seven different
movies:
User A: (5, 4, 3, 2, 1, 3, 5) User B: (1, 2, 3, 4, 5, 2, 4)

• Compute the Manhattan distance, Euclidean distance and Minkowski distance (with h =
3) between the two users’ ratings across all seven movies.
• Discuss how these distance metrics reflects the similarity or dissimilarity between User A
and User B.
17. Consider the following two data points representing house prices (in thousands), house sizes (in
square feet), number of bedrooms, and number of bathrooms:

House A: (250, 1800, 3, 2) House B: (300, 2100, 4, 3)

• Calculate the Euclidean distance between the two houses without any feature scaling.
• Discuss the importance of feature scaling when calculating distance measures in machine
learning and re-calculate the Euclidean distance after scaling the features using min-max
normalization.

18. You are given the following data points representing two documents in a text classification task,
with each value representing the frequency of a certain term across six terms:

Document 1: (4, 2, 0, 3, 6, 1) Document 2: (3, 1, 2, 4, 5, 0)

• Calculate the cosine similarity between the two documents across all six terms.
• Compute the Euclidean distance and discuss how it differs from cosine similarity in inter-
preting document similarity.

19. Consider the following binary vectors representing the presence (1) or absence (0) of certain
features for three users in a machine learning dataset:

User A: (1, 0, 1, 0, 1, 0, 1) User B: (0, 1, 1, 0, 1, 1, 0) User C: (1, 1, 0, 1, 0, 1, 0)

• Compute the Hamming distance between the following pairs:


– User A and User B
– User A and User C
– User B and User C
• Calculate the Jaccard’s coefficien between the following pairs:
– User A and User B
– User A and User C
– User B and User C
• Compare the results obtained from Hamming distance and Jaccard’s coefficient for the
three pairs of users.

20. The following table provides data on the observed frequency of customers’ preference for three
types of products (A, B, and C) based on their income levels (Low, Medium, and High):

Income Level Product A Product B Product C Total


Low 20 30 50 100
Medium 40 40 20 100
High 40 20 40 100
Total 100 90 110 300

• Formulate the null hypothesis H0 and alternative hypothesis H1 for testing the indepen-
dence between Income Level and Product Preference.
• Calculate the expected frequencies for each cell under the assumption of independence.
• Perform the Chi-square (χ2 ) -test by calculating the χ2 statistic:
X (O − E)2
χ2 =
E
where O is the observed frequency and E is the expected frequency.
• Given a significance level of α = 0.05 and appropriate degrees of freedom, compare the
calculated χ2 -statistic with the critical value from the χ2 distribution table.
• Interpret the result and conclude whether there is a significant correlation between Income
Level and Product Preference.

You might also like