0% found this document useful (0 votes)
82 views

Fundamentals of Data Science and Analytics - AD3491 - Important Questions 2 Marks with Answer - Unit 4 - Analysis of Variance

The document outlines the curriculum for a B.Tech in Computer Science and Business Systems, detailing various subjects across eight semesters. It includes a question bank for the course 'Fundamentals of Data Science and Analytics,' covering topics such as T-Test, F-Test, ANOVA, and statistical testing methods. The document serves as a resource for students and faculty in understanding course content and assessment methods.

Uploaded by

crazymohan4334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
82 views

Fundamentals of Data Science and Analytics - AD3491 - Important Questions 2 Marks with Answer - Unit 4 - Analysis of Variance

The document outlines the curriculum for a B.Tech in Computer Science and Business Systems, detailing various subjects across eight semesters. It includes a question bank for the course 'Fundamentals of Data Science and Analytics,' covering topics such as T-Test, F-Test, ANOVA, and statistical testing methods. The document serves as a resource for students and faculty in understanding course content and assessment methods.

Uploaded by

crazymohan4334
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Click on Subject/Paper under Semester to enter.

Professional English Discrete Mathematics Environmental Sciences


Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering
lOMoARcPSD|45333583

www.BrainKart.com

RAMCO INSTITUTE OF TECHNOLOGY


DEPARTMENT OF COMPUTER SCIENCE AND BUSINESS SYSTEMS
Academic Year: 2023-24 (Odd Semester)
QUESTION BANK
Degree, Semester& Branch: III Semester B.Tech. CSBS
Course Code & Name: AD3491-Fundamentals of Data Science and Analytics
Name of the Faculty member: Dr. M. GomathyNayagam

UNIT IV- ANALYSIS OF VARIANCE


Part A

1. Define T-Test?
Statistical method for the comparison of the mean of the two groups of the normally
distributed sample(s).
2. Define F-Test?
An F-test is any statistical test in which the test statistic has an F-distribution under the
null hypothesis. It is most often used when comparing statistical models that have been
fitted to a data set, in order to identify the model that best fits the population from which
the data were sampled.
3. What is analysis of variance?
Analysis of variance is a collection of statistical models and their associated estimation
procedures used to analyze the differences among means. ANOVA was developed by the
statistician Ronald Fisher
4. Define effect size estimation
Effect size estimates provide important information about the impact of a treatment on the
outcome of interest or on the association between variables. • Effect size estimates provide
a common metric to compare the direction and strength of the relationship between
variables across studies
5. What is mean by multiple comparisons, multiplicity or multiple testing.
The multiple comparisons, multiplicity or multiple testing problem occurs when one
considers a set of statistical inferences simultaneously or infers a subset of parameters
selected based on the observed values.
6. Define ANOVA.
Analysis of variance (ANOVA) is an analysis tool used in statistics that splits an observed
aggregate variability found inside a data set into two parts: systematic factors and random
factors. The systematic factors have a statistical influence on the given data set, while the
random factors do not. Analysts use the ANOVA test to determine the influence that
independent variables have on the dependent variable in a regression study.
7. Write the formula for calculating F-score value.

8. Compare one-way vs two-way ANOVA.


There are two main types of analysis of variance: one-way (or unidirectional) and two
way (bidirectional). One-way or two-way refers to the number of independent variables
in your analysis of variance test. A one-way ANOVA evaluates the impact of a sole factor
on a sole response variable. It determines whether the observed differences between the
1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

means of independent (unrelated) groups are explainable by chance alone, or whether


there are any statistically significant differences between groups.

A two-way ANOVA is an extension of the one-way ANOVA. With a one-way, you have
one independent variable affecting a dependent variable. With a two-way ANOVA, there
are two independents. For example, a two-way ANOVA allows a company to compare
worker productivity based on two independent variables, such as department and gender.
It is utilized to observe the interaction between the two factors. It tests the effect of two
factors at the same time.

A three-way ANOVA, also known as three-factor ANOVA, is a statistical means of


determining the effect of three factors on an outcome
9. What do you mean by two-factor factorial design?
A two-factor factorial design is an experimental design in which data is collected for all
possible combinations of the levels of the two factors of interest. If equal sample sizes are
taken for each of the possible factor combinations then the design is a balanced two-factor
factorial design.
10. Define statistical test in F-test
An F-test is any statistical test in which the test statistic has an F-distribution under the
null hypothesis. It is most often used when comparing statistical models that have been
fitted to a data set, in order to identify the model that best fits the population from which
the data were sampled.
11. What are the two- way analyses of variance?
The two-way analysis of variance is an extension of the one-way ANOVA that examines
the influence of two different categorical independent variables on one continuous
dependent variable.
12. What are the types of ANOVA?
There are two main types of ANOVA: one-way (or unidirectional) and two-way. There
also variations of ANOVA. For example, MANOVA (multivariate ANOVA) differs from
ANOVA as the former tests for multiple dependent variables simultaneously while the
latter assesses only one dependent variable at a time.
13. Define chi-square test.
The Chi-Square test is a statistical procedure used by researchers to examine the
differences between categorical variables in the same population. For example, imagine
that a research group is interested in whether or not education level and marital status are
related for all people in the U.S.
14. What Does the Analysis of Variance Reveal?
The ANOVA test is the initial step in analyzing factors that affect a given data set. Once
the test is finished, an analyst performs additional testing on the methodical factors that
measurably contribute to the data set's inconsistency. The analyst utilizes the ANOVA test
results in an f- test to generate additional data that aligns with the proposed regression
models.

The ANOVA test allows a comparison of more than two groups at the same time to
determine whether a relationship exists between them. The result of the ANOVA formula,
the F statistic (also called the F-ratio), allows for the analysis of multiple groups of data
to determine the variability between samples and within samples.

If no real difference exists between the tested groups, which is called the null hypothesis,
the result of the ANOVA's F-ratio statistic will be close to 1. The distribution of all
possible values of the F statistic is the F-distribution. This is actually a group of
distribution functions, with two characteristic numbers, called the numerator degrees of
freedom and the denominator degrees of freedom.
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

15. How to Use ANOVA?


A researcher might, for example, test students from multiple colleges to see if students
from one of the colleges consistently outperform students from the other colleges. In a
business application, an R&D researcher might test two different processes of creating a
product to see if one process is better than the other in terms of cost efficiency.

The type of ANOVA test used depends on a number of factors. It is applied when data
needs to be experimental. Analysis of variance is employed if there is no access to
statistical software resulting in computing ANOVA by hand. It is simple to use and best
suited for small samples. With many experimental designs, the sample sizes have to be
the same for the various factor level combinations.

ANOVA is helpful for testing three or more variables. It is similar to multiple two-sample
ttests. However, it results in fewer type I errors and is appropriate for a range of issues.
ANOVA groups differences by comparing the means of each group and includes
spreading out the variance into diverse sources. It is employed with subjects, test groups,
between groups and within groups.
16. What is the Analysis of Variance in Other Applications?
In addition to its applications in the finance industry, ANOVA is also used in a wide
variety of contexts and applications to test hypotheses in reviewing clinical trial data.For
example, to compare the effects of different treatment protocols on patient outcomes; in
social science research (for instance to assess the effects of gender and class on specified
variables), in software engineering (for instance to evaluate database management
systems), in manufacturing (to assess product and process quality metrics), and industrial
design among other fields.
17. What is a Test?
In technical analysis and trading, a test is when a stock’s price approaches an established
support or resistance level set by the market. If the stock stays within the support and
resistance levels, the test passes. However, if the stock price reaches new lows and/or new
highs, the test fails. In other words, for technical analysis, price levels are tested to see if
patterns or signals are accurate.

A test may also refer to one or more statistical techniques used to evaluate differences or
similarities between estimated values from models or variables found in data. Examples
include the t-test and z-test
18. Define Range-Bound Market Test.
When a stock is range-bound, price frequently tests the trading range’s upper and lower
boundaries. If traders are using a strategy that buys support and sells resistance, they
should wait for several tests of these boundaries to confirm price respects them before
entering a trade.

Once in a position, traders should place a stop-loss order in case the next test of support
or resistance fails.
19. What is the Trending Market Test?
In an up-trending market, previous resistance becomes support, while in a down-trending
market, past support becomes resistance. Once price breaks out to a new high or low, it
often retraces to test these levels before resuming in the direction of the trend. Momentum
traders can use the test of a previous swing high or swing low to enter a position at a more
favorable price than if they would have chased the initial breakout. A stop-loss order
should be placed directly below the test area to close the trade if the trend unexpectedly
reverses.

20. Define Statistical Tests.


3
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

Inferential statistics uses the properties of data to test hypotheses and draw conclusions.
Hypothesis testing allows one to test an idea using a data sample with regard to a
population parameter. The methodology employed by the analyst depends on the nature
of the data used and the reason for the analysis. In particular, one seeks to reject the null
hypothesis, or the notion that one or more random variables have no effect on another. If
this can be rejected, the variables are likely to be associated with one another
21. What is Alpha Risk?
Alpha risk is the risk that in a statistical test a null hypothesis will be rejected when it is
actually true. This is also known as a type I error, or a false positive. The term "risk" refers
to the chance or likelihood of making an incorrect decision. The primary determinant of
the amount of alpha risk is the sample size used for the test. Specifically, the larger the
sample tested, the lower the alpha risk becomes. Alpha risk can be contrasted with beta
risk, or the risk of committing a type II error (i.e., a false negative).
22. What is Range-Bound Trading?
Range-bound trading is a trading strategy that seeks to identify and capitalize on
securities, like stocks, trading in price channels. After finding major support and resistance
levels and connecting them with horizontal trend lines, a trader can buy a security at the
lower trend line support (bottom of the channel) and sell it at the upper trend line resistance
(top of the channel).
23. What is a One-Tailed Test?
A one-tailed test is a statistical test in which the critical area of a distribution is one-sided
so that it is either greater than or less than a certain value, but not both. If the sample being
tested falls into the one-sided critical area, the alternative hypothesis will be accepted
instead of the null hypothesis.
24. Give the four Possible Outcomes of the Vitamin C Experiment and also do hypothesis
testing
Vitamin C has an effect on IQ scores, it makes sense to estimate, with a 95 percent
confidence interval, that the interval between 102 and 112 describes the possible size of
that effect, namely, an increase (above 100) of between 2 and 12 IQ points.
25. Distinguish between dependent variables and explanatory variables
Dependent variables Explanatory/Independent variables
A dependent variable is a variable An Independent variable is a variable
whose value depends on another whose value never depends on another
variable. variable but the researcher.
The dependent variable is the presumed The Independent variable is the presumed
effect. cause.
Dependent variable changes, then the Any change in the independent variable
independent variable will not be also affects the dependent variable.
affected.
Dependent variables are often referred Independent variables are the predictors or
as the predicted variable. regressors.
Dependent variables are obtained from Independent variables can become easily
longitudinal research or by solving obtainable and do not need any complex
complex mathematical equations. mathematical procedures and observations.
You cannot be manipulated by the Independent variables are can be
research or any other external factor. manipulated by the researcher. So he or
she is biased. Then it may affect the results
of the research.
Independent variables are positioned Dependent variables are positioned
horizontally on the graph. vertically on the graph.

26. What is the significance of p-value in hypothesis? (APRIL/MAY 2023)


The p value is a number, calculated from a statistical test, that describes how likely you
4
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

are to have found a particular set of observations if the null hypothesis were true. P values
are used in hypothesis testing to help decide whether to reject the null hypothesis
27. Comparison between t-test and ANOVA. (APRIL/MAY 2023)
The t-test is a method that determines whether two populations are statistically different
from each other, whereas ANOVA determines whether three or more populations are
statistically different from each other.
28. Compare the various test static like Z-Score, t-statistic,f-statistic, chi-squared with
its associated test.

Part B
1. A library systems lends books for the periods of 21 days. This policy is being
reevaluated in view of a possible new loan period that could be either longer or shorter
than 21 days. To aid in making this decision, books-lending records were consulted
to determine the loan period actually used by the patrons. A random sample of 8
records revealed the following loan periods in days: 21,15,12,24,20,21,13 and 16. Test
the null hypothesis with t-test, using the .05 level of significance. (APRIL/MAY
2023)
2. A consumers’ group randomly samples 10 “one-pound” package of ground wheat sold
by a super market. Calculate the mean and the estimated standard error of the mean
for this sample, given the following weight in ounces:16,15,14,15,14,15,16,14,14,14
3. Illustrate in detail about one factor ANOVA with example. (APRIL/MAY 2023)
4. A random sample of 90 college students indicates whether they most desire love,
wealth, power, health, fame, or family happiness.
i. Using the .05 level of significance and the following results, test the null hypothesis
that, in the underlying population, the various desires are equally popular.
ii. Specify the approximate p-value for this test result. (APRIL/MAY 2023)

5
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

5. Estimate the calculations for the t test for gas mileage investigation. Showcase
the hypothesis analysis, t ratio calculation with three panels along with confidence
interval
6. Estimate the calculations for the t test using two independent samples for EPO
experiment. Showcase the hypothesis analysis, sampling distribution, t ratio
calculation with three panels, p value estimation along with confidence interval
7. State the use of counterbalancing and explain the EPO experiment with repeated
measures. Give the detailed table of summary of t tests for population MEANS
for one sample, two independent samples and two related samples
8. Suggest the hypothesis test summary for t test for a population correlation
coefficient for the case study on Greeting Card Exchange
9. Suggest the hypothesis test summary using One-Factor F Test for Sleep
Deprivation Experiment and also the variance estimates, mean squares, sum of
squares with degree of freedom
10. Blood pressure of 8 patients are before and after are recorded:
Before: 180,200,230, 240,170,190,200 and 165
After: 140,145, 150,155,120,130,140 and 130.
Find, is there any significant difference between BP reading before and after by
applying two-sample t-test.
11. Marks of student are 10.5, 9, 7, 12, 8.5, 7.5, 6.5, 8, 11 and 9.5.Mean population
score is 12 and standard deviation is 1.80.Is the mean value for student
significantly differ from the mean population value.
12. Estimate the calculations for the t test for gas mileage investigation. Showcase
the hypothesis analysis, t ratio calculation with three panels along with confidence
interval.
13. Odds ratios can be calculated for larger cross-classification tables, and one way of
doing this is by reconfiguring into a smaller 2 × 2 table. The 2 × 3 table for the lost
letter study, could be reconfigured into a 2 × 2 table if, for example, the investigator
is primarily interested in comparing return rates of lost letters only for campus and
off-campus locations (both suburbia and downtown), that is
(i) Given (1,n = 200) = 7.42, p < .01, 2 c = .037 for these data, calculate and
interpret the odds ratio for a returned letter from campus.
(ii) Calculate and interpret the odds ratio for a returned letter from off-campus.

14. Estimate the calculation of Sum of Squares (Two-Factor ANOVA) with an


6
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
lOMoARcPSD|45333583

www.BrainKart.com

example
15. Explain in detail about the chi-square test with an example.

7
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes
Click on Subject/Paper under Semester to enter.
Professional English Discrete Mathematics Environmental Sciences
Professional English - - II - HS3252 - MA3354 and Sustainability -
I - HS3152 GE3451
Digital Principles and
Statistics and Probability and
Computer Organization
Matrices and Calculus Numerical Methods - Statistics - MA3391
- CS3351
- MA3151 MA3251
3rd Semester
1st Semester

4th Semester
2nd Semester

Database Design and Operating Systems -


Engineering Physics - Engineering Graphics
Management - AD3391 AL3452
PH3151 - GE3251

Physics for Design and Analysis of Machine Learning -


Engineering Chemistry Information Science Algorithms - AD3351 AL3451
- CY3151 - PH3256
Data Exploration and Fundamentals of Data
Basic Electrical and
Visualization - AD3301 Science and Analytics
Problem Solving and Electronics Engineering -
BE3251 - AD3491
Python Programming -
GE3151 Artificial Intelligence
Data Structures Computer Networks
- AL3391
Design - AD3251 - CS3591

Deep Learning -
AD3501

Embedded Systems
Data and Information Human Values and
and IoT - CS3691
5th Semester

Security - CW3551 Ethics - GE3791


6th Semester

7th Semester

8th Semester

Open Elective-1
Distributed Computing Open Elective 2
- CS3551 Project Work /
Elective-3
Open Elective 3 Intership
Big Data Analytics - Elective-4
CCS334 Open Elective 4
Elective-5
Elective 1 Management Elective
Elective-6
Elective 2
All Computer Engg Subjects - [ B.E., M.E., ] (Click on Subjects to enter)
Programming in C Computer Networks Operating Systems
Programming and Data Programming and Data Problem Solving and Python
Structures I Structure II Programming
Database Management Systems Computer Architecture Analog and Digital
Communication
Design and Analysis of Microprocessors and Object Oriented Analysis
Algorithms Microcontrollers and Design
Software Engineering Discrete Mathematics Internet Programming
Theory of Computation Computer Graphics Distributed Systems
Mobile Computing Compiler Design Digital Signal Processing
Artificial Intelligence Software Testing Grid and Cloud Computing
Data Ware Housing and Data Cryptography and Resource Management
Mining Network Security Techniques
Service Oriented Architecture Embedded and Real Time Multi - Core Architectures
Systems and Programming
Probability and Queueing Theory Physics for Information Transforms and Partial
Science Differential Equations
Technical English Engineering Physics Engineering Chemistry
Engineering Graphics Total Quality Professional Ethics in
Management Engineering
Basic Electrical and Electronics Problem Solving and Environmental Science and
and Measurement Engineering Python Programming Engineering

You might also like