0% found this document useful (0 votes)
44 views10 pages

Sample_Exam_ML4DT-revised

The document is a sample exam focused on machine learning techniques for data types, covering Natural Language Processing (NLP), Computer Vision, and Time Series (TS). It includes multiple-choice questions related to document-term matrices, logistic regression, neural networks, and time series forecasting methods. Each section tests understanding of key concepts and calculations in machine learning applications.

Uploaded by

김건우
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views10 pages

Sample_Exam_ML4DT-revised

The document is a sample exam focused on machine learning techniques for data types, covering Natural Language Processing (NLP), Computer Vision, and Time Series (TS). It includes multiple-choice questions related to document-term matrices, logistic regression, neural networks, and time series forecasting methods. Each section tests understanding of key concepts and calculations in machine learning applications.

Uploaded by

김건우
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Machine Learning for Data Types

Sample Exam
Part 1 – NLP (18 points)
Consider the dataset below, made of 6 documents.

The first 4 documents are labelled with 0, for negative and the last 2 are labelled
with 1 for positive.
Assume that we have preprocessed the corpus by lowercasing the documents,
removing the punctuation, and stemming all the words.
Question 1 (3 points)
What are the dimensions (shape) of the Document-Term Matrix generated from
the pre-processed dataset above?
a. (17,6)
b. (6,17)
c. (6,1)
d. (1,6)
Question 2 (3 points)
What would be the weight of the token ‘lm’ in document 0 if we apply TF-IDF?
Using the natural log.
a. 2
b. 1
c. 0.69
d. 0.333
Question 3 (3 points)
Assume that we work with a logistic regression model to learn a classifier for
the data above. What would be the shape of the weight vector W to be learned
during training?
a. (6,1)
b. (17,6)
c. (6,17)
d. (17,1)
e. (1,17)
f. (1,6)
Question 4 (3 points)
What is the total number of parameters learnable as part of this model?
a. 6
b. 7
c. 17
d. 18
Question 5 (3 points)
Assume that we initialize all trainable parameters of the logistic regressor to 1
(weight vector and bias). What value would the model predict for the first
document?
‘new lm mani featur need’.
a. 6
b. 1
c. 0.95
d. 0.85
Question 6 (3 points)
Assume that we learn the following weights for some of the features in our
dataset.
[0.02,
3.04,
1.5,
-3.7,
-5.2,
..]
Which of these weights is more likely to be assigned to the feature ‘mani’?
a. 0.02
b. 3.04
c. 1.5
d. -3.7
Part 2 – Computer Vision (26 points)
Consider a dataset of 2000 flower images, with the dimensions (6,6,3).
The images are labelled as either ‘lavender’ or ‘lotus’, with an even distribution
of both sets of classes.
Assume that we wish to train a FC DNN on this dataset to be able to learn a
model that can classify new data points accordingly.
Question 1 (3 points)
How many input features should your FC DNN input?
a. 2000
b. 36
c. 108
d. 216000
Question 2 (3 points)
Assume we use a FC DNN with two hidden layers, hl1 and hl2, with 16 and 8
neurons in each (respectively)
What is the shape of the weight matrices W[1], W[2] and W[3]?
a. W[1]=(16,108) W[2] =(8,16) and W[3]=(1,8)
b. W[1]=(16,108) W[2] =(16,8) and W[3]=(8,1)
c. W[1]=(108,1) W[2] =(16,1) and W[3]=(1,1)
d. None of the above
Question 3 (3 points)
What is the shape of the bias vectors b[1], b[2] and b[3]?
a. b[1]=(108,1) b[2] =(8,1) and b[3]=(1,1)
b. b[1]=(108,1) b[2] =(8,1) and b[3]=(1,1)
c. b[1]=(16,1) b[2] =(8,1) and b[3]=(1,1)
d. b[1]=(1,16) b[2] =(1,8) and b[3]=(1,1)
e. None of the above
Question 4 (3 points)
What is the total number of parameters to be learned when training this model?
a. 1889
b. 1864
c. 2000
d. 216000
Question 5 (3 points)
Assume we shift from a dense neural network (FC DNN) to a Convolutional
Neural Network (CNN).
We first add a convolution layer, in which we use 8 kernels, of the shape (3,3).
We use no padding, and the value both horizontal and vertical strides is 1.
How many feature maps will this layer output?
a. 1
b. 2000
c. 8
d. 9
Question 6 (3 points)
What is the shape of any feature map produced by the layer above?
a. (4,4)
b. (3,3)
c. (6,6)
d. Feature maps outputted are not of the same size
Question 7 (3 points)
Assume that after using the one convolutional layer we defined above, we then
proceed to flatten it and pass it to a DNN.
How many input feature should our DNN expect?
a. 216
b. 128
c. 9
d. 72

Question 8 (5 points)

Assume that we have a dataset of two images , a and b, with labels 1 and 0,
respectively.
A black pixel is of the value 0 and a white one is of the value 1. Notice that the
lower row in B has two pixels with the value 0 in each.
Assume that we also have the following neural network architecture;

All weights and biases are initialized with the value 0.5. What will the network
predict in the first iteration?
The flattening follows a row-major approach rather than a column-major approach
(this simply means we’re stacking rows on top of each other when flattening).
For your convenience, the equations of the forward pass are available below;

a. 0.78 for A and 0.78 for B


b. 0.78 for A and 0 for B
c. 0 for A and 0.78 for B
d. Both A and B will have a prediction of 0
Part 3 – TS (23 points)
Exercise 1 (5 points)
Are the following statements true or false? Explain your answer.
a. Good forecast methods should have normally distributed residuals.
• True
• False
b. A model with small residuals will give good forecasts.
• True
• False
c. The best measure of forecast accuracy is MAPE.
• True
• False
a. If your model doesn’t forecast well, you should make it more
complicated.
• True
• False
a. Always choose the model with the best forecast accuracy as measured
on the test set.
• True
• False
Exercise 2 (3 points)
Which of the following is not a component of time series?
a. Seasonal variation
b. Trend
c. Variance
d. Cyclical variation
Exercise 3 (3 points)
The number of cars sold by a car dealer during just 6 months in 2022 was as
follows:
January February March April May June
18 16 28 51 47 55
What is the first 3 monthly moving average?
a. 20.67
b. 51.00
c. 42.00
d. 31.67
Exercise 4 (3 points)
Data concerning events over a period of time is called a;
a. Time Series
b. Random Sample
c. Moving Average
d. Frequency Distribution
Exercise 5 (3 points)
Find the order of the ARIMA process Xt=0.4Xt−1–
0.2Xt−2+0.15Xt−3+Zt+0.5Zt−1–0.3Zt−2
a. ARIMA(2, 1, 3)
b. ARIMA(2, 0, 3)
c. ARIMA(3, 1, 2)
d. ARIMA(3, 0, 2)
Exercise 6 – Open Questions (3 points)
Show that a 3×5 MA is equivalent to a 7-term weighted moving average with
weights of 0.067, 0.133, 0.200, 0.200, 0.200, 0.133, and 0.067.
Exercise 7 – Open Question (3 points)
Figures 1 and 2 show the result of decomposing the number of persons in the
civilian labour force in Australia each month from February 1978 to August
1995.
Figure 1. Decomposition of the number of persons in the civilian labour force in
Australia each month from February 1978 to August 1995.
Figure 2. Seasonal component from the decomposition shown in the previous
figure.
a. Write about 3–5 sentences describing the results of the decomposition. Pay
particular attention to the scales of the graphs in making your interpretation.
b. Is the recession of 1991/1992 visible in the estimated components?

You might also like