0% found this document useful (0 votes)

25 views8 pages

Untitled Document (1)

The document provides answers and explanations for a series of questions across various topics including Advanced Machine Learning, Complex Statistics & Probability, Big Data & Distributed Systems, and more. Each question is followed by a correct answer and a brief explanation of the concept involved. The content serves as a comprehensive guide for understanding key concepts in data science and related fields.

Uploaded by

justjoyapple123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views8 pages

Untitled Document (1)

Uploaded by

justjoyapple123

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Here are the answers and detailed explanations for each question:

---

Advanced Machine Learning

1. A) All neurons are active

Explanation: During training with dropout, some neurons are randomly "dropped" to prevent
overfitting. At test time, all neurons are active, and the weights are scaled accordingly.

2. D) Loss function
Explanation: Batch normalization normalizes the inputs of layers (input or hidden) to stabilize
learning. The loss function is not normalized.

3. B) Mode collapse
Explanation: In GANs, mode collapse happens when the generator produces limited varieties of
outputs, failing to capture the diversity of real data.

4. B) Minimum loss reduction for split

Explanation: The gamma parameter in XGBoost controls the minimum loss reduction required to
make a further partition on a leaf node. It helps control model complexity.

5. B) To speed up convergence
Explanation: Teacher forcing uses actual target outputs during training of LSTMs instead of
predicted ones, which helps the model learn faster and more accurately.

---

Complex Statistics & Probability

6. B) P(parameters|data)
Explanation: The posterior distribution is the probability of the parameters given the observed
data, central to Bayesian inference.

7. B) Decreases statistical power

Explanation: Bonferroni correction is conservative. While it reduces Type I errors, it increases
Type II errors, thus lowering statistical power.

8. B) Normality of residuals
Explanation: A QQ plot compares the quantiles of a dataset with a normal distribution,
assessing whether data is normally distributed.

9. A) To generate candidate samples

Explanation: In MCMC, the proposal distribution suggests the next point to sample, which is
then accepted or rejected based on a probability.

10. B) Test if a sample follows a specified distribution

Explanation: The KS test compares the empirical distribution function of the sample with the
expected cumulative distribution function.

---

Big Data & Distributed Systems

11. A) To share large read-only variables across nodes

Explanation: Spark broadcast variables efficiently distribute large data like lookup tables to all
worker nodes.

12. A) In-Sync Replicas

Explanation: ISR in Kafka refers to replicas that are fully caught up with the leader and ready to
take over in case of failure.

13. A) Systems can only guarantee 2 of 3: Consistency, Availability, Partition tolerance

Explanation: The CAP theorem states a distributed system can at most guarantee two of the
three properties.

14. B) Allocates cluster resources

Explanation: Hadoop YARN’s ResourceManager manages resource allocation among
applications in the cluster.
15. A) Better compression and query performance
Explanation: Columnar storage allows efficient queries on selected columns and better
compression compared to row-based formats.

---

Advanced Algorithms & Optimization

16. C) O(n³)
Explanation: The Hungarian algorithm for solving assignment problems has a cubic time
complexity, suitable for small to moderate datasets.

17. B) Model becomes more sensitive to individual points

Explanation: A high gamma makes the RBF kernel more focused on individual data points,
potentially leading to overfitting.

18. B) Discounted future rewards

Explanation: The Bellman equation expresses the value of a policy as the expected sum of
current and future (discounted) rewards.

19. B) Self-attention mechanisms

Explanation: Transformers use self-attention to capture dependencies without recurrence,
making them faster and better at long-range modeling.

20. A) Identifying confounders

Explanation: The backdoor criterion is used in causal inference to find variables that block
spurious paths and help estimate causal effects correctly.

---

Machine Learning

21. B) Random classifier

Explanation: An ROC AUC of 0.5 means the model is no better than random guessing in
distinguishing between classes.

22. A) Select optimal k

Explanation: The elbow method helps choose the number of clusters by finding the "elbow"
point where adding more clusters doesn't improve much.

23. B) F1-score
Explanation: The F1-score balances precision and recall, making it suitable for imbalanced
datasets where accuracy can be misleading.

24. B) Dimensionality reduction for visualization

Explanation: t-SNE is a nonlinear technique for reducing high-dimensional data into 2 or 3
dimensions to visualize patterns.

25. B) Word importance in a corpus

Explanation: TF-IDF assigns importance to words based on their frequency in a document
relative to the corpus.

---

Data Engineering

26. B) Stores quantitative metrics

Explanation: A fact table contains the measurable metrics (facts) and foreign keys to dimension
tables in a star schema.

27. B) Store raw data in native format

Explanation: A data lake allows storing structured, semi-structured, and unstructured data in its
original form.

28. B) HAVING
Explanation: HAVING is used to filter grouped data after aggregation, unlike WHERE, which
filters rows before aggregation.
29. A) Data modeling approaches
Explanation: Schema-on-read means data is interpreted at query time, while schema-on-write
enforces structure when data is ingested.

30. A) Directed Acyclic Graph (workflow)

Explanation: In Airflow, a DAG defines the structure and dependencies of tasks to be executed
in a scheduled workflow.

---

Statistics & Probability

31. C) 95%
Explanation: In a normal distribution, about 95% of the data falls within ±2 standard deviations
from the mean.

32. B) Probability of observed result given null is true

Explanation: A p-value is the likelihood of seeing the observed data (or more extreme)
assuming the null hypothesis is true.

33. A) Sample means converge to population mean as n increases

Explanation: The Law of Large Numbers ensures that as sample size grows, the average of the
sample gets closer to the population mean.

34. B) ANOVA
Explanation: ANOVA (Analysis of Variance) is used to test differences between three or more
group means.

35. B) Constant variance of residuals

Explanation: Heteroscedasticity violates the assumption of homoscedasticity, where residuals
should have constant variance.
---

Programming & Tools

36. B) Creates getter method

Explanation: The @property decorator in Python turns a method into a read-only attribute getter.

37. B) O(log n)
Explanation: A balanced Binary Search Tree allows efficient querying in logarithmic time.

38. B) git add .

Explanation: This command stages all modified and new files for commit in Git.

39. A) COPY is faster but ADD handles URLs

Explanation: ADD supports more features like extracting tar files or downloading from URLs, but
COPY is preferred for simplicity and performance.

40. B) Runs regardless of exceptions

Explanation: The finally block in Python executes no matter what, ensuring cleanup or closing
resources.

---

Basic Concepts

41. B) Understand data patterns

Explanation: Exploratory Data Analysis (EDA) involves summarizing the data's main
characteristics, often visually.

42. C) Correct positive predictions

Explanation: True positives are cases where the model correctly predicts the positive class.

43. A) Labels vs no labels

Explanation: Supervised learning uses labeled data, while unsupervised learning deals with
unlabeled data to find patterns.
44. C) RMSE
Explanation: Root Mean Squared Error measures how well a regression model predicts
continuous values.

45. B) Creating informative input variables

Explanation: Feature engineering transforms raw data into features that better represent the
underlying problem.

---

Tools & Libraries

46. B) Data manipulation

Explanation: Pandas is used for data cleaning, transformation, and analysis in tabular form.

47. A) df.head()
Explanation: This command shows the first 5 rows of a DataFrame.

48. B) Correlation matrix visualization

Explanation: sns.heatmap() displays values in a matrix as color-encoded grid cells, often used
with correlation matrices.

49. B) Figure and axes objects

Explanation: plt.subplots() returns a figure and a set of subplots (axes), useful for custom
layouts.

50. B) TensorFlow
Explanation: TensorFlow is a popular deep learning library for building and training neural
networks.

---
Simple Statistics

51. B) -1 to 1
Explanation: The correlation coefficient ranges from -1 (perfect negative) to 1 (perfect positive),
with 0 indicating no correlation.

52. B) 10
Explanation: Mean = (5 + 10 + 15) / 3 = 10

53. B) 4
Explanation: Sorted list = [1, 3, 5, 7]; Median = (3 + 5)/2 = 4

54. C) Mean
Explanation: The mean is heavily influenced by extreme values, unlike median or mode.

55. B) Data spread

Explanation: Standard deviation quantifies the amount of variation or dispersion in a dataset.

---

Let me know if you'd like a PDF of these explanations or want to focus more deeply on any
topic!

Machine Learning Bangalore City University 2024
No ratings yet
Machine Learning Bangalore City University 2024
5 pages
Ocs353dsf Unit Wise Notes
100% (2)
Ocs353dsf Unit Wise Notes
121 pages
Examples Regression
No ratings yet
Examples Regression
19 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
big-data-imp-notes-of-big-dats (1)
No ratings yet
big-data-imp-notes-of-big-dats (1)
17 pages
Ids PDF
No ratings yet
Ids PDF
397 pages
Data Scientist RoadMap
No ratings yet
Data Scientist RoadMap
8 pages
202307_PavicJakov_WEKA
No ratings yet
202307_PavicJakov_WEKA
40 pages
Data Science Master
No ratings yet
Data Science Master
11 pages
BCS602 Model Question paper Solved(Search Creators)
No ratings yet
BCS602 Model Question paper Solved(Search Creators)
37 pages
BIG DATA PART-I
No ratings yet
BIG DATA PART-I
15 pages
BCS602 Model Question Paper Solved(Search Creators)-2-37
0% (2)
BCS602 Model Question Paper Solved(Search Creators)-2-37
36 pages
AIL Quiz Loc
No ratings yet
AIL Quiz Loc
33 pages
Big Data
No ratings yet
Big Data
5 pages
Practitioner's Guide To Data Science
No ratings yet
Practitioner's Guide To Data Science
403 pages
AI-ML Syllabus
100% (1)
AI-ML Syllabus
8 pages
Research paper (3)
No ratings yet
Research paper (3)
14 pages
Study Structure
No ratings yet
Study Structure
13 pages
AIML MODEL
No ratings yet
AIML MODEL
13 pages
Libro Nuevo ML
No ratings yet
Libro Nuevo ML
577 pages
ML SIG - Day 1
No ratings yet
ML SIG - Day 1
55 pages
Data Scientist Roadmap
No ratings yet
Data Scientist Roadmap
3 pages
Unit 3
No ratings yet
Unit 3
97 pages
M.E CSE Syllabus
No ratings yet
M.E CSE Syllabus
7 pages
Common DS Interview Questions and Answers - 1
No ratings yet
Common DS Interview Questions and Answers - 1
4 pages
AIL Quiz
No ratings yet
AIL Quiz
30 pages
Viva
No ratings yet
Viva
7 pages
MLDAF
No ratings yet
MLDAF
102 pages
DS Cheat Sheets
No ratings yet
DS Cheat Sheets
18 pages
Lecture 6-_Spark ML
No ratings yet
Lecture 6-_Spark ML
31 pages
Data Science
No ratings yet
Data Science
31 pages
Data Science By Internshala Trainings
No ratings yet
Data Science By Internshala Trainings
46 pages
Summary DS231
No ratings yet
Summary DS231
11 pages
Data Science Course Syllabus
No ratings yet
Data Science Course Syllabus
13 pages
Question Bank( DA)-1
No ratings yet
Question Bank( DA)-1
14 pages
Course Outline (Ds & Ai) 2024
No ratings yet
Course Outline (Ds & Ai) 2024
13 pages
da last year
No ratings yet
da last year
21 pages
Tackling Big Data Using Matlab
No ratings yet
Tackling Big Data Using Matlab
73 pages
Lecture 1
No ratings yet
Lecture 1
62 pages
Ai Blueprint
No ratings yet
Ai Blueprint
6 pages
Data Science
No ratings yet
Data Science
9 pages
@DataScience - Ir - 111 Essential Concepts For Data Scientists
No ratings yet
@DataScience - Ir - 111 Essential Concepts For Data Scientists
14 pages
AI_Concepts_Using_Python
100% (5)
AI_Concepts_Using_Python
428 pages
T1_SCHEME_24_25
No ratings yet
T1_SCHEME_24_25
5 pages
Foundation of Data Science previous year question paper
No ratings yet
Foundation of Data Science previous year question paper
40 pages
Introduction To Data Science - Lin and Li
No ratings yet
Introduction To Data Science - Lin and Li
403 pages
Introduction To Data Science: Hui Lin and Ming Li
No ratings yet
Introduction To Data Science: Hui Lin and Ming Li
403 pages
Zero To Deep Learning With Keras and Tensorflow Compress
No ratings yet
Zero To Deep Learning With Keras and Tensorflow Compress
769 pages
BDA 2 Marks
No ratings yet
BDA 2 Marks
13 pages
Roadmap AI
No ratings yet
Roadmap AI
19 pages
Model Paper
No ratings yet
Model Paper
37 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Data-Science-and-Analytics-Reviewer
No ratings yet
Data-Science-and-Analytics-Reviewer
5 pages
Zero to Deep Learning
100% (4)
Zero to Deep Learning
753 pages
Project Report
No ratings yet
Project Report
37 pages
uni1,2,3,mcq bank
No ratings yet
uni1,2,3,mcq bank
57 pages
Data Science
No ratings yet
Data Science
132 pages
Random Sample Consensus: Robust Estimation in Computer Vision
From Everand
Random Sample Consensus: Robust Estimation in Computer Vision
Fouad Sabry
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
From Everand
Machine Learning in the AWS Cloud: Add Intelligence to Applications with Amazon SageMaker and Amazon Rekognition
Abhishek Mishra
No ratings yet
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
From Everand
Stevens' Handbook of Experimental Psychology and Cognitive Neuroscience, Methodology
Wiley
No ratings yet
Untitled document (4)
No ratings yet
Untitled document (4)
21 pages
Adobe Scan May 24, 2025 (18) (1)
No ratings yet
Adobe Scan May 24, 2025 (18) (1)
1 page
Adobe Scan May 24, 2025 (11)
No ratings yet
Adobe Scan May 24, 2025 (11)
1 page
Adobe Scan May 24, 2025 (9)
No ratings yet
Adobe Scan May 24, 2025 (9)
1 page
MGU MBA Curriculum & Syllabus 2017 FINAL PDF
No ratings yet
MGU MBA Curriculum & Syllabus 2017 FINAL PDF
123 pages
Sample Size Determination: Ralph B. Dell, Steve Holleran, and Rajasekhar Ramakrishnan
No ratings yet
Sample Size Determination: Ralph B. Dell, Steve Holleran, and Rajasekhar Ramakrishnan
7 pages
Towards Sustainable Urban Planning For Puyo (Ecuador) : Amazon Forest Landscape As Potential Green Infrastructure
No ratings yet
Towards Sustainable Urban Planning For Puyo (Ecuador) : Amazon Forest Landscape As Potential Green Infrastructure
28 pages
A Study On Peter Principle Effect in Software Development Firms of Sri Lanka
No ratings yet
A Study On Peter Principle Effect in Software Development Firms of Sri Lanka
109 pages
Evaluation of Kamias Averrhoa Bilimbi Fruit and Leaf Extracts Against Post Harvest Rotting Fungi on Tomato Solanum Lycopersicum l. Fruits
No ratings yet
Evaluation of Kamias Averrhoa Bilimbi Fruit and Leaf Extracts Against Post Harvest Rotting Fungi on Tomato Solanum Lycopersicum l. Fruits
64 pages
Statistics
No ratings yet
Statistics
11 pages
ANOVA
No ratings yet
ANOVA
7 pages
UNIT 4
No ratings yet
UNIT 4
18 pages
Twenge2003 Parenthood and Marital Satisfaction A Meta Analytic Review
No ratings yet
Twenge2003 Parenthood and Marital Satisfaction A Meta Analytic Review
10 pages
MEV-19-EM-PYQ-MP (1)
No ratings yet
MEV-19-EM-PYQ-MP (1)
42 pages
The Impact of Working From Home On Employee Productivity During 21st Century
100% (2)
The Impact of Working From Home On Employee Productivity During 21st Century
16 pages
Vangilder-PopulationEcologyEastern-1995
No ratings yet
Vangilder-PopulationEcologyEastern-1995
49 pages
2 Effects of Recreational
No ratings yet
2 Effects of Recreational
7 pages
Quality and Reliability Engineering
No ratings yet
Quality and Reliability Engineering
4 pages
BA5106 IQ Statistics For Management
No ratings yet
BA5106 IQ Statistics For Management
23 pages
Week 017 Measures of Central Tendency
No ratings yet
Week 017 Measures of Central Tendency
15 pages
Engineering Statistics Handbook 2003
No ratings yet
Engineering Statistics Handbook 2003
1,522 pages
Applying Lean Six Sigma To Improve Garment Production Processes-A Case Study
No ratings yet
Applying Lean Six Sigma To Improve Garment Production Processes-A Case Study
11 pages
BES107 Reviewer (1)
No ratings yet
BES107 Reviewer (1)
2 pages
EASJEHL 25 289-295 C ASNoFWk
No ratings yet
EASJEHL 25 289-295 C ASNoFWk
7 pages
Research Methodology - Multi Variate Analysis 13 10 23
No ratings yet
Research Methodology - Multi Variate Analysis 13 10 23
17 pages
Biostatistics
No ratings yet
Biostatistics
2 pages
Differential Effects of Non-Dual and Focused Attention Meditations On The Formation of Automatic Perceptual Habits in Expert Practitioners
No ratings yet
Differential Effects of Non-Dual and Focused Attention Meditations On The Formation of Automatic Perceptual Habits in Expert Practitioners
9 pages
A Primer of Ecological Statistics 2nd Edition Full Text PDF
No ratings yet
A Primer of Ecological Statistics 2nd Edition Full Text PDF
16 pages
Chapter 11
No ratings yet
Chapter 11
69 pages
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
No ratings yet
Chapter 4 - Anova (EC2206 B2) 22dec22 (Thursday) - 7-12
6 pages
33007349
No ratings yet
33007349
22 pages
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
100% (2)
Institute of Actuaries of India: Subject CT3 - Probability and Mathematical Statistics
6 pages

Untitled Document (1)

Uploaded by

Untitled Document (1)

Uploaded by

Here are the answers and detailed explanations for each question:

Advanced Machine Learning

1. A) All neurons are active

4. B) Minimum loss reduction for split

Complex Statistics & Probability

7. B) Decreases statistical power

9. A) To generate candidate samples

10. B) Test if a sample follows a specified distribution

Big Data & Distributed Systems

11. A) To share large read-only variables across nodes

12. A) In-Sync Replicas

13. A) Systems can only guarantee 2 of 3: Consistency, Availability, Partition tolerance

14. B) Allocates cluster resources

Advanced Algorithms & Optimization

17. B) Model becomes more sensitive to individual points

18. B) Discounted future rewards

19. B) Self-attention mechanisms

20. A) Identifying confounders

21. B) Random classifier

22. A) Select optimal k

24. B) Dimensionality reduction for visualization

25. B) Word importance in a corpus

26. B) Stores quantitative metrics

27. B) Store raw data in native format

30. A) Directed Acyclic Graph (workflow)

Statistics & Probability

32. B) Probability of observed result given null is true

33. A) Sample means converge to population mean as n increases

35. B) Constant variance of residuals

Programming & Tools

36. B) Creates getter method

38. B) git add .

39. A) COPY is faster but ADD handles URLs

40. B) Runs regardless of exceptions

41. B) Understand data patterns

42. C) Correct positive predictions

43. A) Labels vs no labels

45. B) Creating informative input variables

Tools & Libraries

46. B) Data manipulation

48. B) Correlation matrix visualization

49. B) Figure and axes objects

55. B) Data spread

You might also like