Stat Assignment
Stat Assignment
ENGINEERING
PROGRAM – REGULAR
Parametric
Non-parametric tests
Do not rely on any distribution. They can thus be applied even if parametric conditions of
validity are not met. Parametric tests often have nonparametric equivalents. You will find
different parametric tests with their equivalents when they exist in this grid.
2. Define population and sample in inferential statistics. What are the difference between
them?
A population is the entire group that you want to draw conclusions about. It is all possible
values.
A sample is the specific group that you will collect data from. The size of the sample is always
less than the total size of the population. It is a subset of population.
3. Let say the median of a right skewed distribution is 15, can the mean be greater than or less
than 20?
In statistics, nonparametric tests are methods of statistical analysis that do not require a
distribution to meet the required assumptions to be analyzed (especially if the data is not
normally distributed). Due to this reason, they are sometimes referred to as distribution-free tests.
Nonparametric tests serve as an alternative to parametric tests such as T-test or ANOVA that can
be employed only if the underlying data satisfies certain criteria and assumptions.
The main reasons to apply the non-parametric test include the following:
The underlying data do not meet the assumptions about the population sample
Generally, the application of parametric tests requires various assumptions to be satisfied. For
example, the data follows a normal distribution and the population variance is homogeneous.
However, some data samples may show skewed distributions.
The skewness makes the parametric tests less powerful because the mean is no longer the best
measure of central tendency because it is strongly affected by the extreme values. At the same
time, nonparametric tests work well with skewed distributions and distributions that are better
represented by the median.
Unlike parametric tests that can work only with continuous data, nonparametric tests can be
applied to other data types such as ordinal or nominal data. For such types of variables, the
nonparametric tests are the only appropriate solution.
Degrees of freedom is a combination of how much data you have and how many parameters you
need to estimate. It indicates how much independent information goes into a parameter estimate.
Degrees of freedom are the number of independent values that a statistical analysis can estimate.
In statistics, the number of degrees of freedom is the number of values in the final calculation of
a statistic that are free to vary.
The number of independent ways by which a dynamic system can move, without violating any
constraint imposed on it, is called number of degrees of freedom. In other words, the number of
degrees of freedom can be defined as the minimum number of independent coordinates that can
specify the position of the system completely.
The term is most often used in the context of linear models (linear regression, analysis of
variance), where certain random vectors are constrained to lie in linear subspaces, and the
number of degrees of freedom is the dimension of the subspace. The degrees of freedom are also
commonly associated with the squared lengths (or "sum of squares" of the coordinates) of such
vectors, and the parameters of chi-squared and other distributions that arise in associated
statistical testing problems.
Unfortunately, there are no strict statistical rules for definitively identifying outliers. Finding
outliers depends on subject-area knowledge and an understanding of the data collection process.
While there is no solid mathematical definition, there are guidelines and statistical tests you can
use to find outlier candidates.
Wilcoxon Signed Rank Test: The Wilcoxon Signed Rank Test is a nonparametric
counterpart of the paired samples t-test. The test compares two dependent samples with
ordinal data.
11. Consider the following sample data for annual peak discharge (cumec) at a gauging
station A. Evaluate the mean, variance, coefficient of skewness, and coefficient of
kurtosis for the given sample data. Also, comment regarding the coefficient of
skewness and coefficient of kurtosis.
solution
annual peak
Year discharge (cumec) x-xm (x-xm)^2 (x-xm)^3 (x-xm)^4
2000 4630 1244.063 1547691.5 1925424961.58 2.39535E+12
2001 2662 -723.938 524085.504 -379405149.5 2.74666E+11
2002 1913 -1472.94 2169544.88 -3195604010 4.70692E+12
2003 3655 269.0625 72394.6289 19478679.84 5240982294
2004 3670 284.0625 80691.5039 22921430.33 6511118803
2005 4005 619.0625 383238.379 237248508.9 1.46872E+11
2006 4621 1235.063 1525379.38 1883938869 2.32678E+12
2007 1557 -1828.94 3345012.38 -6117818578 1.11891E+13
2008 2405 -980.938 962238.379 -943895709.8 9.25903E+11
2009 1625 -1760.94 3100900.88 -5460492641 9.61559E+12
2010 6216 2830.063 8009253.75 22666688702 6.41481E+13
2011 2602 -783.938 614558.004 -481775065.2 3.77682E+11
2012 2157 -1228.94 1510287.38 -1856048796 2.28097E+12
2013 3120 -265.938 70722.7539 -18807832.37 5001707920
2014 6403 3017.063 9102666.13 27463312628 8.28585E+13
2015 2934 -451.938 204247.504 -92307106.3 41717042852
35672858891.1
Mean 3385.9375 0 33222912.9 8 1.81305E+14
Variance 2076432.06
st.dev 1440.98302
Skewnes
s 0.74514595
Kurtosis -0.37182478
Depending on the kurtosis it is positively skew. When there is no mode
12. The following stream flow measurements are taken from three different outlets. Test
whether the difference between the means of both the outlets is significant using α =
0.01. Analyze the data with the Kruskal-Wallis Test.
Outlet ranked
Ordered data from smallest to largest rank
65 1
69 2
72 3
74 4
75 75 5.5 5.5
76 7
78 78 8.5 8.5
79 10
80 80 80 12 12 12
81 14
86 15
N=15
12 Ri2
H=( ∑
n(n+1) ¿ )−3(n+1)
H=2.495
= 3-1=2
H critical= 9.21
13. It is found from the long-term historical data that the mean wind speed of a region is
51.35 km/h and standard deviation is 11 km/h. It is required to test whether the mean
has increased or not. To test this, a sample of 80 stations in that region is tested and it is
found that the mean wind speed is 54.47 km/h. (a) Can we support the claim at a 0.01
level of significance? (b) What is the p-value of the test?
Given
µ = 51.35 km/hr
ẟ =11 km/hr
n= 80
Xmean =54.54
Solution
Step 2: State the critical value, since α = 0.01 and the test is a right tailed test the
critical value is z= 2.576
Z= [54.47-51.35]/[11/80^1/2]
Z = 2.537
2.537< 2.576
Do not reject the null hypothesis because z calculated is less than z critical and reject
the claim
Solution
(x-xmea)(y-
X Y x-xmean y-ymean (x-xmean)^2 (y-ymean)^2 ymean)
6.9 2.4 7.4375 -0.2625 55.31640625 0.06890625 -1.95234375
6.4 1.1 -1.0375 -1.5625 1.07640625 2.44140625 1.62109375
6.5 1.7 -0.9375 -0.9625 0.87890625 0.92640625 0.90234375
5.1 0.5 -2.3375 -2.1625 5.46390625 4.67640625 5.05484375
7.1 1.8 -0.3375 -0.8625 0.11390625 0.74390625 0.29109375
7.1 2 -0.3375 -0.6625 0.11390625 0.43890625 0.22359375
10.2 4.2 2.7625 1.5375 7.63140625 2.36390625 4.24734375
9.9 3 2.4625 0.3375 6.06390625 0.11390625 0.83109375
8.4 4.7 0.9625 2.0375 0.92640625 4.15140625 1.96109375
5.8 3.4 -1.6375 0.7375 2.68140625 0.54390625 -1.20765625
10.1 4.4 2.6625 1.7375 7.08890625 3.01890625 4.62609375
9.3 2.8 1.8625 0.1375 3.46890625 0.01890625 0.25609375
5.5 1.4 -1.9375 -1.2625 3.75390625 1.59390625 2.44609375
11.4 6.3 3.9625 3.6375 15.70140625 13.23140625 14.41359375
10.8 4.4 3.3625 1.7375 11.30640625 3.01890625 5.84234375
7.5 1.8 0.0625 -0.8625 0.00390625 0.74390625 -0.05390625
8.2 4.2 0.7625 1.5375 0.58140625 2.36390625 1.17234375
7.9 2.9 0.4625 0.2375 0.21390625 0.05640625 0.10984375
4.1 0 -3.3375 -2.6625 11.13890625 7.08890625 8.88609375
5 1.3 -2.4375 -1.3625 5.94140625 1.85640625 3.32109375
6.7 1.3 -0.7375 -1.3625 0.54390625 1.85640625 1.00484375
4.3 0 -3.1375 -2.6625 9.84390625 7.08890625 8.35359375
10.4 5.9 2.9625 3.2375 8.77640625 10.48140625 9.59109375
3.9 2.4 -3.5375 -0.2625 12.51390625 0.06890625 0.92859375
Mean=7.437 Mean=2.66 Sum=171.143 Sum=68.956 Sum=72.87031
5 25 75 25 25
Use regression method
M=SP/SSx
SP=72.8703125
SSx = 171.14375
M=72.8703125/171.14375
M = 0.426
C= Ymean-c*Xmean
= 2.6625 -0.426*7.4375
C = - 0.5
Y= mx+c
At precipitation 14 cm the amount of runoff from the catchment is estimated by using the above
formula
Now,
Y= 0.426*14-0.5