0% found this document useful (0 votes)
90 views

Stat Assignment

The document provides information about Habtamu Mulugeta, a student at Haramaya University in Ethiopia. Specifically, it mentions that Habtamu Mulugeta is a student in the School of Water Resources and Environmental Engineering at Haramaya Institute of Technology. Their student ID number is sgs/0909/12. The document also indicates that it is a submission to Dr. Shimelis that was submitted on December 29, 2020.

Uploaded by

habtamu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
90 views

Stat Assignment

The document provides information about Habtamu Mulugeta, a student at Haramaya University in Ethiopia. Specifically, it mentions that Habtamu Mulugeta is a student in the School of Water Resources and Environmental Engineering at Haramaya Institute of Technology. Their student ID number is sgs/0909/12. The document also indicates that it is a submission to Dr. Shimelis that was submitted on December 29, 2020.

Uploaded by

habtamu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 11

HARAMAYA UNIVERSITY

HARAMAYA INSTITUTE OF TECHNOLOGY

SCHOOL OF WATER RESOURCES AND ENVIRONMENTAL

ENGINEERING

PROGRAM – REGULAR

DEPARTMENT – ENGINEERING HYRDOLOGY

NAME: HABTAMU MULUGETA ID.NO sgs/0909/12


Submitted to: Dr.Shimelis

Submission date: Dec 29, 20

1. What makes nonparametric tests different from parametric tests?

Parametric

Tests assume underlying statistical distributions in the data. Therefore, several conditions of


validity must be met so that the result of a parametric test is reliable. For example, Student’s t-
test for two independent samples is reliable only if each sample follows a normal distribution and
if sample variances are homogeneous.

 Non-parametric tests

 Do not rely on any distribution. They can thus be applied even if parametric conditions of
validity are not met. Parametric tests often have nonparametric equivalents. You will find
different parametric tests with their equivalents when they exist in this grid.

2. Define population and sample in inferential statistics. What are the difference between
them?

A population is the entire group that you want to draw conclusions about. It is all possible
values.

A sample is the specific group that you will collect data from. The size of the sample is always
less than the total size of the population. It is a subset of population.

3. Let say the median of a right skewed distribution is 15, can the mean be greater than or less
than 20?

It is may be equal to 20 or less than 20. But it is greater than 15.

Generally, if the distribution of data is skewed to the left, the mean is less than the median,


which is often less than the mode. If the distribution of data is skewed to the right, the mode is
often less than the median, which is less than the mean.

4. In what situations do we use nonparametric tests?


Non parametric tests are used when your data isn't normal. Therefore the key is to figure out if
you have normally distributed data. For example, you could look at the distribution of your data.

In statistics, nonparametric tests are methods of statistical analysis that do not require a
distribution to meet the required assumptions to be analyzed (especially if the data is not
normally distributed). Due to this reason, they are sometimes referred to as distribution-free tests.
Nonparametric tests serve as an alternative to parametric tests such as T-test or ANOVA that can
be employed only if the underlying data satisfies certain criteria and assumptions.

The main reasons to apply the non-parametric test include the following:

 The underlying data do not meet the assumptions about the population sample

Generally, the application of parametric tests requires various assumptions to be satisfied. For
example, the data follows a normal distribution and the population variance is homogeneous.
However, some data samples may show skewed distributions.

The skewness makes the parametric tests less powerful because the mean is no longer the best
measure of central tendency because it is strongly affected by the extreme values. At the same
time, nonparametric tests work well with skewed distributions and distributions that are better
represented by the median.

 The population sample size is too small

The sample size is an important assumption in selecting the appropriate statistical method. If a


sample size is reasonably large, the applicable parametric test can be used. However, if a sample
size is too small, it is possible that you may not be able to validate the distribution of the data.
Thus, the application of nonparametric tests is the only suitable option.

 The analyzed data is ordinal or nominal

Unlike parametric tests that can work only with continuous data, nonparametric tests can be
applied to other data types such as ordinal or nominal data. For such types of variables, the
nonparametric tests are the only appropriate solution.

5. What is degrees of freedom (df) in statistics?

Degrees of freedom is a combination of how much data you have and how many parameters you
need to estimate. It indicates how much independent information goes into a parameter estimate.
Degrees of freedom are the number of independent values that a statistical analysis can estimate.

In statistics, the number of degrees of freedom is the number of values in the final calculation of
a statistic that are free to vary.

The number of independent ways by which a dynamic system can move, without violating any
constraint imposed on it, is called number of degrees of freedom. In other words, the number of
degrees of freedom can be defined as the minimum number of independent coordinates that can
specify the position of the system completely.

Estimates of statistical parameters can be based upon different amounts of information or data.


The number of independent pieces of information that go into the estimate of a parameter are
called the degrees of freedom. In general, the degrees of freedom of an estimate of a parameter
are equal to the number of independent scores that go into the estimate minus the number of
parameters used as intermediate steps in the estimation of the parameter itself (most of the time
the sample variance has N − 1 degrees of freedom, since it is computed from N random scores
minus the only 1 parameter estimated as intermediate step, which is the sample mean).[2]

Mathematically, degrees of freedom is the number of dimensions of the domain of a random


vector, or essentially the number of "free" components (how many components need to be
known before the vector is fully determined).

The term is most often used in the context of linear models (linear regression, analysis of
variance), where certain random vectors are constrained to lie in linear subspaces, and the
number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also
commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such
vectors, and the parameters of chi-squared and other distributions that arise in associated
statistical testing problems.

While introductory textbooks may introduce degrees of freedom as distribution parameters or


through hypothesis testing, it is the underlying geometry that defines degrees of freedom, and is
critical to a proper understanding of the concept.

6. What is an outlier? How can outliers be determined in a dataset?


Outliers are data points that are far from other data points. In other words, they’re unusual values
in a dataset. Outliers are problematic for many statistical analyses because they can cause tests to
either miss significant findings or distort real results.

Unfortunately, there are no strict statistical rules for definitively identifying outliers. Finding
outliers depends on subject-area knowledge and an understanding of the data collection process.
While there is no solid mathematical definition, there are guidelines and statistical tests you can
use to find outlier candidates.

Ways to Find Outliers in Your Data

Sorting Your Datasheet to Find Outliers

Graphing Your Data to Identify Outliers

Using Z-scores to Detect Outliers

Using the Interquartile Range to Create Outlier Fences

Finding Outliers with Hypothesis Tests

7. What is the impact of outliers in statistics?

An outlier is an unusually large or small observation. Outliers can have a


disproportionate effect on statistical results, such as the mean, which can result in misleading
interpretations

8. What is the meaning of the five-number summary in statistics?

The five-number summary is a set of descriptive statistics that provides information about a


dataset. It consists of the five most important sample percentiles:

1. The sample minimum (smallest observation)

2. The lower quartile or first quartile

3. The median (the middle value)

4. The upper quartile or third quartile

5. The sample maximum (largest observation)

9. What is the advantage of using box plots?


 Graphically display a variable’s location and spread at a glance.
 Provide some indication of the data symmetry and skewness.
 Unlike many other methods of data display, box plots show outliers.
 By using a box plot for each categorical variable side by side on the same graph, one
quickly can compare data sets.
10. List the most commonly used nonparametric tests and describe when to use them?

The most commonly used non parametric tests are

Mann-Whitney U Test: The Mann-Whitney U Test is a nonparametric version of the


independent samples t-test. The test primarily deals with two independent samples that
contain ordinal data.

Wilcoxon Signed Rank Test: The Wilcoxon Signed Rank Test is a nonparametric
counterpart of the paired samples t-test. The test compares two dependent samples with
ordinal data.

The Kruskal-Wallis Test: The Kruskal-Wallis Test is a nonparametric alternative to


the one-way ANOVA. The Kruskal-Wallis test is used to compare more than two
independent groups with ordinal data.

11. Consider the following sample data for annual peak discharge (cumec) at a gauging
station A. Evaluate the mean, variance, coefficient of skewness, and coefficient of
kurtosis for the given sample data. Also, comment regarding the coefficient of
skewness and coefficient of kurtosis.

solution

annual peak
Year discharge (cumec) x-xm (x-xm)^2 (x-xm)^3 (x-xm)^4
2000 4630 1244.063 1547691.5 1925424961.58 2.39535E+12
2001 2662 -723.938 524085.504 -379405149.5 2.74666E+11
2002 1913 -1472.94 2169544.88 -3195604010 4.70692E+12
2003 3655 269.0625 72394.6289 19478679.84 5240982294
2004 3670 284.0625 80691.5039 22921430.33 6511118803
2005 4005 619.0625 383238.379 237248508.9 1.46872E+11
2006 4621 1235.063 1525379.38 1883938869 2.32678E+12
2007 1557 -1828.94 3345012.38 -6117818578 1.11891E+13
2008 2405 -980.938 962238.379 -943895709.8 9.25903E+11
2009 1625 -1760.94 3100900.88 -5460492641 9.61559E+12
2010 6216 2830.063 8009253.75 22666688702 6.41481E+13
2011 2602 -783.938 614558.004 -481775065.2 3.77682E+11
2012 2157 -1228.94 1510287.38 -1856048796 2.28097E+12
2013 3120 -265.938 70722.7539 -18807832.37 5001707920
2014 6403 3017.063 9102666.13 27463312628 8.28585E+13
2015 2934 -451.938 204247.504 -92307106.3 41717042852
35672858891.1
Mean 3385.9375 0 33222912.9 8 1.81305E+14
Variance     2076432.06    
st.dev     1440.98302    
Skewnes
s     0.74514595    
Kurtosis     -0.37182478    
Depending on the kurtosis it is positively skew. When there is no mode

Depending on kurtosis it is less than zero and it is platy kurtic.

12. The following stream flow measurements are taken from three different outlets. Test
whether the difference between the means of both the outlets is significant using α =
0.01. Analyze the data with the Kruskal-Wallis Test.

Outlet ranked
Ordered data from smallest to largest rank

Outlet1 Outlet2 Outlet3

65 1

69 2

72 3

74 4

75 75 5.5 5.5

76 7

78 78 8.5 8.5

79 10

80 80 80 12 12 12

81 14

86 15

Sum of rank 52.5 36.5 31

N=15

12 Ri2
H=( ∑
n(n+1) ¿ )−3(n+1)

H= (12/ (15*16)) (52.5^2/5+36.5^2/5+31^2/5)-3(15+1)

H=2.495

Degree of freedom = number of group-1

= 3-1=2
H critical= 9.21

H calculated < H critical

Accept null hypothesis

Results: there is no any significance between those outlets

13. It is found from the long-term historical data that the mean wind speed of a region is
51.35 km/h and standard deviation is 11 km/h. It is required to test whether the mean
has increased or not. To test this, a sample of 80 stations in that region is tested and it is
found that the mean wind speed is 54.47 km/h. (a) Can we support the claim at a 0.01
level of significance? (b) What is the p-value of the test?

Given

µ = 51.35 km/hr

ẟ =11 km/hr

n= 80

Xmean =54.54

Solution

Step 1: state the hypothesis and identify the claim

H0 : µ <= 51.35 and H1 : > 54.47

Step 2: State the critical value, since α = 0.01 and the test is a right tailed test the
critical value is z= 2.576

Step 3 : compute the test value

Z= [54.47-51.35]/[11/80^1/2]

Z = 2.537

2.537< 2.576

Do not reject the null hypothesis because z calculated is less than z critical and reject
the claim

Summarize the results, there is no enough evidence to support the claim.


14. For a large catchment, the precipitation and runoff are being recorded monthly. The
records for 2 years are tabulated in the following table. The variables are assumed to be
linearly related. Workout a relationship between the monthly precipitation and runoff
for the location and use the relationship to estimate the expected amount of runoff
generated when monthly precipitation is 14 cm.

Solution

(x-xmea)(y-
X Y x-xmean y-ymean (x-xmean)^2 (y-ymean)^2 ymean)
6.9 2.4 7.4375 -0.2625 55.31640625 0.06890625 -1.95234375
6.4 1.1 -1.0375 -1.5625 1.07640625 2.44140625 1.62109375
6.5 1.7 -0.9375 -0.9625 0.87890625 0.92640625 0.90234375
5.1 0.5 -2.3375 -2.1625 5.46390625 4.67640625 5.05484375
7.1 1.8 -0.3375 -0.8625 0.11390625 0.74390625 0.29109375
7.1 2 -0.3375 -0.6625 0.11390625 0.43890625 0.22359375
10.2 4.2 2.7625 1.5375 7.63140625 2.36390625 4.24734375
9.9 3 2.4625 0.3375 6.06390625 0.11390625 0.83109375
8.4 4.7 0.9625 2.0375 0.92640625 4.15140625 1.96109375
5.8 3.4 -1.6375 0.7375 2.68140625 0.54390625 -1.20765625
10.1 4.4 2.6625 1.7375 7.08890625 3.01890625 4.62609375
9.3 2.8 1.8625 0.1375 3.46890625 0.01890625 0.25609375
5.5 1.4 -1.9375 -1.2625 3.75390625 1.59390625 2.44609375
11.4 6.3 3.9625 3.6375 15.70140625 13.23140625 14.41359375
10.8 4.4 3.3625 1.7375 11.30640625 3.01890625 5.84234375
7.5 1.8 0.0625 -0.8625 0.00390625 0.74390625 -0.05390625
8.2 4.2 0.7625 1.5375 0.58140625 2.36390625 1.17234375
7.9 2.9 0.4625 0.2375 0.21390625 0.05640625 0.10984375
4.1 0 -3.3375 -2.6625 11.13890625 7.08890625 8.88609375
5 1.3 -2.4375 -1.3625 5.94140625 1.85640625 3.32109375
6.7 1.3 -0.7375 -1.3625 0.54390625 1.85640625 1.00484375
4.3 0 -3.1375 -2.6625 9.84390625 7.08890625 8.35359375
10.4 5.9 2.9625 3.2375 8.77640625 10.48140625 9.59109375
3.9 2.4 -3.5375 -0.2625 12.51390625 0.06890625 0.92859375
Mean=7.437 Mean=2.66 Sum=171.143 Sum=68.956 Sum=72.87031
5 25     75 25 25
Use regression method

M=SP/SSx

SP=72.8703125

SSx = 171.14375

M=72.8703125/171.14375

M = 0.426

C= Ymean-c*Xmean

= 2.6625 -0.426*7.4375

C = - 0.5

Y= mx+c

At precipitation 14 cm the amount of runoff from the catchment is estimated by using the above
formula

Now,

Y= 0.426*14-0.5

Y= 5.464, which is the amount of runoff generated from the catchment

You might also like