Chartered Data Scientists Practice Paper: Section 1: Probability Theory, Statistics and Linear Algebra
Chartered Data Scientists Practice Paper: Section 1: Probability Theory, Statistics and Linear Algebra
The below questions give a sense of exam that you can expect on the day of your CDS registration.
Please use this practise paper as a yardstick but expect the difficulty level to go up on the exam
day.
2. The chance that doctor A will diagnose a disease X correctly is 60 per cent. The chance that a
patient will die by his treatment after a correct diagnosis is 40 per cent, and the chance of death
by the wrong diagnosis is 70 per cent. A patient of doctor A, who had disease X, died. What is
the probability that this disease was diagnosed correctly?
i. 6/25 ii. 7/25 iii. 6/7 iv. 6/13
3. If X , X , ...., X be a sequence of mutually independent random variables where X can take only
1 2 n i
4. Which of the following is the correct formula for calculating the mean of a discrete series by
direct method?
i. X=X / f ii. X=ifiXi / ifi
iii. X=A+ifidxi/ fi iv. None of these
1
i. N/2 ii. N-1 iii. N iv. 2N
6. Let A be the 2 × 2 matrix with elements a = a = a = +1 and a = −1. Then the eigenvalues of the
11 12 21 22
matrix A are: 19
7. In a class, there are 15 boys and 10 girls. Three students are selected at random. The
probability that 1 girl and 2 boys are selected, is:
i. 21/46 ii. 25/117 iii. 1/50
iv. 3/25
8. Three unbiased coins are tossed. What is the probability of getting at most two heads?
i. 3/4 ii.1/4 iii. 3/8 iv. 7/8
10. If X and Y are two random variables such that their expectations exist and P (x ≤ y) = 1. Which
of the following is correct?
i. E(X) ≤ E(Y)
ii. E(X) ≥ E(Y)
iii. E(X) = E(Y)
iv. None of the above
11. If the rank of a 5 x 6 matrix Q is 4, then which one of the following is true?
i. Q will have four linearly independent rows and four linearly independent columns
ii. Q will have four linearly independent rows and five linearly independent columns
iii. QQ will be invertible
T
13. In a lottery, there are 10 prizes and 25 blanks. A lottery is drawn at random. What is the
probability of getting a prize?
i. 1/10 ii. 2/5 iii. 2/7 iv. 5/7
2
16. t-distribution is used to test:
i. The validity of a postulated value of the population mean
ii. To test the significance of sample correlation coefficient
iii. To test the equality of two population means
iv. All of the above
18. Two eigenvalues of a 3 x 3 real matrix P are (2 + √-1) and 3. What is the determinant of P?
i. 0 ii. 1 iii. 15 iv. -1
Answers of Section 1:
1. i 2. iv 3. iii 4. ii 5. iii 6. iv 7. i 8. iv 9. ii 10. i 11. i 12. i
13. iii 14. ii 15. iii 16. iv 17. i 18. iii
3
Section 2: Data Engineering and Database
1. Consider a relational table with a single record for each registered student with the following
attributes.
a. Registration_Number: Unique registration number for each registered student
b. UID: Unique Identity number, unique at the national level for each citizen
c. BankAccount_Number: Unique account number at the bank. A student can have multiple
accounts or joint accounts. This attributes stores the primary account number
d. Name: Name of the Student
e. Hostel_Room: Room number of the hostel
4
ii. The required data are located in at least one nonlocal site and the distributed DBMS routes
request as necessary.
iii. The required data are at one local site and the distributed DBMS passes the request to only
the local DBMS.
iv. The required data are located in at least one nonlocal site and the distributed DBMS passes
the request to only the local DBMS.
11. Domain constraints, functional dependency and referential integrity are special forms of:
i. Foreign Key ii. Primary Key iii. Assertion iv. Referential Constraint
Answers of Section 2:
1. i 2. iii 3. ii 4. i 5. iv 6. ii 7. iii 8. iv 9. iii 10. ii 11. iii 12. i
5
Section 3: Exploratory Data Analysis
1. A person has been deputed to find the average income of the factory employees. To provide the
correct picture of average income, the person should find out
i. Geometric Mean ii. Weighted Mean
iii. Progressive Mean iv. Arithmetic Mean
3. The percentage of items in a frequency distribution lying between upper and lower quartiles is:
i. 80 per cent ii. 40 per cent iii. 50 per cent iv. 25 per cent
6. Proportions of males and females in India in different occupations in the year 2019 can most properly
be represented by:
i. Sliding bar diagram ii. Deviation bar diagram
iii. Sub-divided bar diagram iv. Multiple bar diagram
7. The data relating to the number of registered allopathic and homoeopathic doctors in six different
states can be most appropriately represented by the diagram:
i. Line graph ii. Histogram iii. Pie-chart iv. Double bar diagram
8. When for some countries, the magnitudes are small and for others, the magnitudes are very large, to
represent the data, it is preferred to construct:
i. Deviation bar diagram ii. Duo-directional bar diagram
Iii. Broken bar diagram iv. Any of these
6
12. The immigration and outmigration of people in a number of countries and also the net migration can
be better displayed by:
i. Duo-directional column chart ii. Gross-deviation column chart
iii. Net deviation column chart iv. Range chart
Answers of Section 3:
1. ii 2. iii 3. iii 4. ii 5. ii 6. i 7. iv 8. iii 9. iii 10. ii 11. i 12. ii
7
Section 4: Supervised Learning and Unsupervised Learning
1. Which of the following statements is not correct about regularization?
i. Using too large a value of lambda can cause your hypothesis to underfit the data.
ii. Using too large a value of lambda can cause your hypothesis to overfit the data.
iii. Using a very large value of lambda cannot hurt the performance of your hypothesis.
iv. None of the above
2. How can you avoid the bad local optima issue while running a clustering algorithm?
i. Set the same seed value for each run
ii. Use multiple random initializations
iii. Both i and ii
iv. None of the above
3. In which of the following cases will K-means clustering fail to give good results?
A. Data points with outliers
B. Data points with different densities
C. Data points with nonconvex shapes
4. Which of the following is a reasonable way to select the number of principal components "k"?
i. Choose k to be the smallest value so that at least 99% of the variance is retained.
ii. Choose k to be 99% of m (k = 0.99*m, rounded to the nearest integer).
iii. Choose k to be the largest value so that 99% of the variance is retained.
iv. Use the elbow method
5. You run gradient descent for 15 iterations with ɑ=0.3 and compute J(Θ) after each iteration.
You find that the value of J(Θ) decreases quickly and then levels off. Based on this, which of the
following conclusions seems most plausible?
i. Rather than using the current value of a, use a larger value of a (say ɑ=1.0)
ii. Rather than using the current value of a, use a smaller value of a (say ɑ=0.1)
iii. ɑ=0.3 is an effective choice of learning rate
iv. None of the above
6. Let a feature P can take certain values 1, 2, 3 and 4 and represents platform numbers at a
railway station. Which of the following statements is true?
i. Feature P is an example of nominal variable
ii. Feature P is an example of ordinal variable
iii. Both i and ii.
iv. Neither i nor ii.
8
A. By running the algorithm for different centroid initialization
B. By adjusting the number of iterations
C. By finding out the optimal number of clusters
i. All of these ii.Only A and B
iii. Only B and C iv. Only A and C
9. What is the consequence between a node and its predecessors while creating a Bayesian
network?
i. Functionally dependent
ii. Dependant
iii. Conditionally independent
iv. Both conditionally dependent and independent
10. Computational learning theory analyzes the sample complexity and computational
complexity of:
i. Unsupervised Learning ii. Inductive learning
iii. Forced based learning iv. Weak learning
14. While performing regression or classification, which of the following is the correct way to
preprocess the data?
i. Normalize the data → PCA → normalize PCA output → training
ii. Normalize the data → PCA → training
iii. PCA → normalize PCA output → training
iv. None of the above
9
iv. None of the above
17. The factors which affect the performance of a learner system does not include:
i. Representation scheme used
ii. Training scenario
iii. Type of feedback
iv. Good data structures
22. Which of the following is a widely used and effective model for machine learning based on
the concept of bagging?
i. Decision Tree ii. K-Nearest Neighbors iii. AdaBoost iv.
Random Forest
23. To find the minimum or the maximum of a function, we set the gradient to zero because:
i. The value of the gradient at extrema of a function is always zero
ii. Depends on the type of problem
iii. Both i and ii
iv. None of the above
Answers of Section 4:
1. iv 2. ii 3. iv 4. i 5. iii 6. ii 7. ii 8. i 9. iii 10. ii 11. iv 12. iv
13. iii 14. ii 15. iv 16. iii 17. iv 18. iv 19. i 20. i 21. ii 22. iv 23. i 24. iii
10
Section 5: Neural Networks and Deep Learning
4. Which of the following statements is true when you use 1 x 1 convolution in a Convolutional
Neural Network?
i. It can help in dimensionality reduction
ii. It can be used for feature pooling
iii. It suffers less overfitting due to small kernel size
iv. All of the above.
6. Suppose we wish to predict the probabilities of n classes (p , p , …, p ) such that sum of p over
1 2 k
all n equals to 1, which of the following activation function should be used at the output layer of
the neural network?
i. Softmax ii. ReLu ii. Sigmoid iv. Tanh
7. Ability to learn how to do tasks based on the data given for training or initial experience is called:
i. Self-organization ii. Adaptive learning
iii. Auto Associativity iii. None of these
8. What is the name of the neural network model given in the following diagram?
11
i. The Rosenblatt Perceptron Model ii. McCulloch-Pitt’s Model
iii. Widrow’s Adaline Model iv. Autoassociative Network
9. What is the fundamental difference between the Adaline and the perceptron model?
i. Weights are compared with output
ii. Sensory units result is compared with output
iii. Analog activation value is compared with output
iv. All of the above
10. Which of the following elements is not available in the architecture of a Gated Recurrent Unit (GRU), a
variant of LSTM Recurrent Neural Network?
i. Cell ii. Input Gate iii. Output Gate iv. Forget Gate
13. What are the issues where biological neural networks prove to be superior to artificial neural
networks?
i. Robustness and fault tolerance ii. Flexibility
iii. Collective computation iv. All of these
14. The activation function of a neuron in the neural network can be:
i. Linear ii. Non-linear iii. Can be linear or non-linear iv. None of these
12
i. Adaptive Least Infinite Network ii. Adaptive Linear Network
iii. Automatic Linear Element iv. Adaptive Linear Element
Answers of Section 5:
1. i 2. iv 3. i 4. iv 5. iii 6. i 7. ii 8. ii 9. iii 10. iii 11. iii 12. ii
13. iv 14. iii 15. iv 16. iii 17. ii 18. i
13
Section 6: Natural Language Processing
1. Which of the following techniques can be used for the purpose of keyword normalization where a
keyword is converted into its base form?
A. Lemmatization
B. Stemming
C. Levenshtein
i. Only A and B ii. Only B and C iii. Only A and C iv. All of these
3. In Latent Dirichlet Allocation model for text classification purposes, what do alpha and beta
hyperparameters represent-
i. Alpha is the number of topics within documents and beta is the number of terms within topics
ii. Alpha is the density of terms generated within topics and beta is the density of topics
generated within terms
iii. Alpha is the number of terms within documents and beta is the number of terms within topics
iv. Alpha is the density of topics generated within documents and beta is the density of terms
generated within topics
6. Which of the following models can be used for the purpose of document similarity?
i. Training a word 2 vector model on the corpus that learns context present in document
ii. Training a bag of words model that learns the occurrence of words in document
iii. Creating a document-term matrix and using cosine similarity for each document
iv. All of the above
7. What can be the role of natural language processing in collaborative filtering and content-
based filtering algorithms?
i. Feature Extraction from text
ii. Measuring Feature Similarity
iii. Engineering Features for vector space learning model
iv. All of these
8. What is the main difference between Conditional Random Field (CRF) and Hidden Markov Model
(HMM)?
14
i. CRF is Generative whereas HMM is Discriminative model
ii. CRF is Discriminative whereas HMM is Generative model
iii. Both CRF and HMM are Generative model
iv. Both CRF and HMM are Discriminative model
10. In which of the following areas the natural language processing can be applied?
i. Automatic text summarization
ii. Automatic question-answering system
iii. Information retrieval
iv. All of the above
11. In linguistic morphology, which of the following is the process for reducing inflected words to their
root form?
i. Stemming ii. Rooting iii. Text-proofing iv. Both i and ii.
12. Which of the following techniques is not part of the flexible text matching?
i. Soundex ii. Metaphor iii. Edit Distance iv. Keyword Hashing
Answers of Section 6:
1. i 2. iv 3. iv 4. iii 5. i 6. iv 7. iv 8. ii 9. iv 10. iv 11. i 12. iv
15
Section 7: Computer Vision
1. Let an input signal x = (2, 3, 80, 6) and we are using a window size of three with one entry
immediately preceding and following each entry. What will be the output signal y if we are
using a median filter?
i. y = (2, 6, 80 ,3)
ii. y = (3, 6, 6, 3)
iii. y = (2, 6, 6, 3)
iv. y = (2, 6, 6, 2)
2. Which of the following steps may be used to avoid boundary issues in image processing?
A. Avoid processing the boundaries, with or without cropping the signal or image boundary
afterwards
B. Fetching entries from other places in the signal. With images, for example, entries from
the far horizontal or vertical boundary might be selected
C. Shrinking the window near the boundaries, so that every window is full
i. All of the above ii. Only A and B
iii. Only B and C iv. Only A and C
4. To avoid the negative values taking absolute values in Laplacian image doubles:
i. Thickness of line ii. Thinnes of line
iii. Thickness of edge iv. Thinnes of edge
5. If pixels in the image are very different in colour or intensity from their surrounding pixels; the
defining characteristic is that the value of a noisy pixel bears no relation to the colour of
surrounding pixels. What is the type of noise present in the image?
i. Gaussian noise ii. Salt and pepper noise
iii. Shot noise iv. Quantization noise
6. In _____________, each pixel in the image will be changed from its original value by a (usually)
small amount. A histogram, a plot of the amount of distortion of a pixel value against the
frequency with which it occurs, shows a normal distribution of noise.
i. Gaussian noise ii. Salt and pepper noise
iii. Shot noise iv. Quantization noise
7. One method to remove noise is by convolving the original image with a mask that represents
a _____________ or smoothing operation.
i. Band-pass filter ii. High-pass filter
iii. Low-pass filter iv. Narrow-pass filter
8. The method to remove noise by evolving the image under a smoothing partial differential
equation similar to the heat equation is called:
i. Linear smoothening filtering
ii. Non-local means
iii. Anisotropic diffusion
iv. None of the above
16
10. A continuous image is digitised at ___________ points.
i. Random ii. Vertex iii. Contour iv. Sampling
11. What is the term referred to the transition between continuous values of the image function
and its digital equivalent?
i. Sampling ii. Quantization iii. Rasterization iv. None of these.
12. The dynamic range of the imaging system is a ratio where the upper limit is determined by:
i. Saturation ii. Noise ii. Brightness iv.
Contrast
Answers of Section 7:
1. iii 2. i 3. iv 4. i 5. ii 6. i 7. iii 8. iii 9. i 10. iv 11. ii 12. i
17
Section 8: Deployment and Model Management
1. Which of the following may be the reasons when A model with thousands of features to attain
an accuracy of more than 90% on evaluation might not be good enough for deployment?
A. Portability
B. Scalability
C. Operationalization
i. All of the above ii. Only A and B
iii. Only B and C iv. Only A and C
2. Which of the following are the ways of training machine learning models into production?
A. One-off
B. Batch
C. Real-Time/Online
i. All of the above ii. Only A and B
iii. Only B and C iv. Only A and C
4. What is the application of gain and lift charts in machine learning model evaluation?
i. It measures the performance of classification models
ii. It measures the performance of local resources
iii. It checks the rank ordering of the probabilities
iv. None of these
5. Which of the following machine learning architectures will be employed when training and
persisting are done offline while prediction is done in real-time?
i. Train by batch, predict by batch, serve through a shared database
ii. Train by batch, predict on the fly, serve via REST API
iii. Train, predict by streaming
iv. Train by batch, predict on mobile
7. Which of the following layers in a machine learning architecture transforms features into
predictions?
i. Scoring layer ii. Evaluation layer
ii. Feature layer iv. Data Layer
18
ii. When models are constantly iterated on and subtly changed, tracking config updates whilst
maintaining config clarity and flexibility becomes an additional burden
iii. If we have an input feature which we change, then the importance, weights or use of the
remaining features may all change as well
iv. Machine learning systems require cooperation between multiple teams, which can result in
no single team or person understanding how the overall system works, teams blaming each
other for failures, and general inefficiencies
9. The process in which we integrate a machine learning model into an existing production
environment to make practical business decisions based on data is called:
i. Model verification ii. Model evaluation
iii. Model deployment iv. Model scraping
10. The machine learning model deployment is the _______ stage of machine learning life cycle.
i. First ii. Last iii. Second last iv. None of these
Answers of Section 8:
1. i 2. i 3. ii 4. iii 5. ii 6. ii 7. i 8. iii 9. iii 10. ii 11. i 12. ii
19
Section 9: Python and R
1. What is the output of the following Python code?
print([i.lower() for i in "HELLO"])
i. [‘h’, ‘e’, ‘l’, ‘l’, ‘o’]
ii. ‘hello’
iii. [‘hello’]
iv. hello
2. Let a list in Python L = [1, 2, 2, 3]. What will be the output of print(L*2)?
i. [1, 2, 2, 3, 1, 2, 2, 3] ii. [2, 4, 4, 6]
iii. [2, 4, 4, 6, 2, 4, 4, 6] iv. [1, 4, 4, 9]
4. How many times ‘Welcome to Python!’ will be printed if the following Python is executed?
a=0
while a<10:
print('Welcome to Python!')
pass
i. 9 ii. 10 iii. 11 iv. Infinite number of time
9. You can check to see whether an R object is NULL with the _________ function.
i. is.null() ii. is.nullobj() iii. null() iv. as.nullobj()
20
10. What is the class defined in the following R code?
y <- c(FALSE, 2)
i. Character ii. Numeric iii. Logical iv. Integer
13. Given a function that does not return any value, What value is thrown by default when
executed in the shell.
i. int ii. bool iii. void iv. None
14. In R programming, which of the following functions is used to create matrices by row
binding?
i. rjoin() ii. rbinding() iii. rowbind() iv. rbind()
15. In R programming, what is the function used to test objects (returns a logical operator) if they
are NA?
i is.na() ii. is.nan() iii. as.na() iv. as.nan()
Answers of Section 9:
1. i 2. ii 3. ii 4. iv 5. iii 6. ii 7. ii 8. iv 9. i 10. ii 11. iii 12. iv
13. iv 14. iv 15. i
21
Section 10: Business and Data Science
1. In a data science project, who are the key stakeholders in the business understanding phase?
A. Business end-users
B. Data analysts
C. Business analysis
i. All of the above ii. Only A and B
iii. Only B and C iv. Only A and C
2. “Miscommunication between data scientists and the data engineer, leading to poor
identification of necessary and available data sources”. This issue refers to which of the
following phases of a data science project?
i. Business understanding
ii. Data understanding
iii. Data preparation
Iv. Model deployment
6. Which of the following is the process of basing an organization’s actions and decisions on
actual measured results of performance?
i. Institutional performance management ii. Gap analysis
iii. Slice and Dice iv. None of these
7. Which of the following correctly specifies the outcome of engagement of stakeholders in the
data understanding phase?
i. how important it is to have a clean database for a correct analysis
ii. Illustrate the possibilities with well-designed examples and set realistic expectations
iii. Picture the benefits of the data project
iv. None of these
22
9. Which type of analytics gain insights from historical data with reporting, scorecards,
clustering etc.
i. Decisive ii. Descriptive iii. Predictive iv. Prescriptive
10. What is to be used when faced with the decision of how to arrange furniture in a room?
i. Mathematical model ii.Mental model
iii. Physical model iv. Visual model
11. The method that takes data from a given data set and rather than looking at all users as one
unit, it breaks them into related groups for analysis, is known as:
i. Behavioural analytics ii. Cohort Analysis
iii. Collection analytics iv. Contextual data modelling
13. Which of the following analytics recommend decisions using optimization, simulation etc.?
i. Decisive ii. Descriptive iii. Predictive iv. Perspective
14. Which of the following analytics supports human decisions with visual analytics that the user
models to reflect reasoning?
i. Decisive ii. Descriptive iii. Predictive iv. Perspective
16. Which type of analytics uses statistical and machine learning techniques?
i. Decisive ii. Descriptive iii. Predictive iv. Perspective
17. Data which is taken from the publication, “Agricultural Situation in India” will be considered
as:
i. Primary data ii. Secondary data
Iii. Primary and secondary data iv. Neither primary nor secondary data
18. Which of the following examples does not constitute an infinite population?
i. A population consisting of odd numbers
ii. The population of weights of newly born babies
iii. The population of heights of 15-year old children
iv. The population of heads and tails in tossing a coin successively
23