0% found this document useful (0 votes)
244 views26 pages

Bcs 040

Uploaded by

lenovo mi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
244 views26 pages

Bcs 040

Uploaded by

lenovo mi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
BCS-040 Solved Assignment 2022-2023 Course Code |BCS-040 |Course Title [Statistical Techniques [Assignment Number IBCA(4)040/Assignment/2022-23 [Maximum Marks 100 [Weightage fa5%6 lLast Date of Submission [sist October, 2022 (for July session) sth April, 2023 (for January session) Notes This assignment has 15 questions of 80 marks (Q.no.1 to 14 are of § marks each, Qus carries 10 marks). Answer all the questions. Rest 20 marks are for viva voce. You may use illustrations and diagrams to enhance explanations. Please go through the guidelines regarding assignments given in the Progranune Guide for the format of presentation. Q1. Ina study on the Per capita Income for a particular year in a city, the following weekly observations were made. (5) Per Capita Income (Rs) 14K -15K | 15K-16K | 16K-17K | 17K-18K | 18K-19K | 19K- 20K - (1K=1000) Number of Weeks 5 10 20 9 6 2 Draw a histogram and a frequency polygon on the same scale Solution: Per Capita Income (Rs.) Mid Value (x) Frequency (f) 13K - 14K 13.5K 0 14K - 15K 14.5K 5 15K - 16K 155K 10 16K - 17K 16.5K 20 17K - 18K 175K 5 18K -19K 185K 6 19K - 20K 195K 2 20K - 21K 20.5K o Histogram and Frequency Polygon (Graph) : Q2. Do you find any correlation between ages and playing habits of the students, whose distribution according to age groups is given in the following table (5) Calculation of Percentage of Regular Players: ‘Age of groups(Years)| 15-16 16-17 17-18 18-19 19-20 | 20-21 Number of Students | _200 270 340 360 400 300 Number of Regular es 150 152 170 180 180 120 Players Solution: Number of Students —, Percentage of a Players (y;) (in 200 150 = * 100 = 75 270 152 = x 100 = 56.3 340 170 a x 100 = 50 360 180 = x 100 = 50 400 180 oo x 100 = 45 300 120 ia x 100 = 40 Mean x = — ” _ 15,5+16.5 +175 + 18.5 + 19.5 +205 ~ 6 _ 108 ~ 6 =18 7 my eo an j= — _ 75 +56.3+50+50+45+40 ~ 6 _ 3163 ~ 6 = 52.7167 Use assumed mean 4 18 5 = 52.7167 is not an integer, use assumed mean B = 53 ClassX|Midvaluex) y |dv=x-4=x-18|dy=y-B=y-53] de? de: dy 1546 | 155 | 75 25 2 6.25 55 1617 | 165 | 563 = 33 225 10.89 4.95 wae | 175 | 50 05 3 0.25 9 15 7a19 | 185 | 50 05 3 025 8 a5 1920 | 195 | 45 15 8 2.25 64 “12 2021 | 205 | 40 25 “13 6.25 169 “325 1098/3163] Yar=0 Lav= -1.7 | Pa? = 175] Yo? = 745.89] Y de dy = - 104.45 Correlation Coefficient r : ns Lexdy - Dax X — 36 = + (0.9075Y — 77.1375) = X = 36+ (0.9075Y — 77.1375) ie, X = 0.9075Y — 41.1375 or X = —-0.9075Y + 113.1375 (Answer) Regression equation of Y on X: Y¥ ~¥ = bye (X —¥) => Y — 85 = 40.48 (X — 36) = Y — 85 = + (0.48X — 17.28) = Y = 85+ (0.48Y — 17.28) ie. Y =0.85X + 67.72 or Y =—0.48X + 102.28 (Answer) Gi) Caleulation of expected marks in A corresponding to 75 marks obtained in B x = 0.9075 (75) — 41.1375 or X = —0.9075 (75) + 113.1375 => X = 68.0625 — 41.1375 or X = —68.0625 + 113.1375 => X = 26.925 or X = 45.075 (Answer) Q4. Calculate 2-sigma and 3-sigma upper and lower control limits for means of samples 4 and prepare a control chart for a drilling machine, which bores holes with a mean deviation of 0.5230 em anda standard deviation of 0.0032 em. (5) Solution: por X = 0.5230, ¢ = 0.0032, n=4 9 _ 0.0082 va Va 2 Sigma (Internal Control) Limits : = 0.0016 UCLa = 1 + 207 = 0.5230 + 2 x 0.0016 = .5262 em LCLg = je — 203 = 0.5230 — 2 x 0.0016 = 0.5198 cm 3 Sigma (Outer Control) Lin UCL = + Baz = 0.5230 + 2 x 0.0016 = 0.5278 cm LCLg = i ~ 803 = 0.5230 — 2 x 0.0016 = 0.5182 cm Based on the above calculation, the control charts is the following: For n=4 0.5278 UCL (INNER 2-u) 0.5262 CENTRAL LINE learningBcience.co.in 05188 ————— Ter awe ay 0.5182 ict ouT SAMPLE NO. Fig.: Control Chart Q5. Construct 5- yearly moving averages from the following data. (5) YEAR [2000] 2001 | 2002 | 2003 | 2004 | 2005 | 2006 | 2007 ]2008] 2009 | 2010 | 2011 | 2012 SALE | 105 | 107 [ 109 | 12 [ 114 | 116 [ 118 | 121 | 123 | 124 | 125 | 127 | 129 Solution: Year | Saley | 5-yearly totals |5-yearly moving averages 2000 105 — 2001 107 _ 2002 109 547 109.4 2003 2 598 111.6 2004 114 509 113.8 2005 u6 581 116.2 2006 u8 502 118.4 2007 sat 602 120.4 2008 123 ou 122.2 2009 124 620 124.0 2010 125 628 125.6 zon 127 _ _ 2012 129 a= Q6. In 120 throws of a single dice, following distribution of faces was observed. (5) FACES 1 2 3 4 i 6 TOTAL Fo 30 25 38 10 22 15 120 From the given data, verify that the hypothesis “dice is biased” is acceptable or not Solution: On the basis of principle of equal probability p = z the theoretical frequencies for each face is Ny = 120 x i = 20, Thus we have, Faces 1 2 3 4 5 6 | Total Fo 30 25 18 10 22 6 120 Fe 20 20 20 20 20 20 120 Since no theoretical frequeney is less than 10, Hence, m Rye f y& Fr) = (B0—20y" , (25-20)? (18—20)* | (10-20), (22-20)* (1520) =" 20 a) 2 20 20 = gp l00+ 25 +4 + 100.4 44-25) _ 2bR ~ 20 2.9 Degree of freedom, v = 6 -1=5 x55 (5) =11.070 (from table) ++ x2 > x25 (5), hypothesis of equal probability is rejected Q7 Acompany wants to estimate, how its monthly costs are related to its monthly output rate. The data for a sample of nine months is tabulated below: Out Put 8 6 5 8 (Tons) ‘ * . =|” Cost (Lakhs) 2 3 4 = 6 6 8 8 6 Using the data given above, perform following tasks : (a) Calculate the best linear regression, where the monthly output is the dependent variable and monthly cost is the independent variable. (b) Use the regression line to predict the company’s monthly cost, if they decide to produce 4 tons per month. Solution: (a) Let, y: monthly cost (in Lakhs) a: monthly outputs (in Tons) mean of monthly output +24+44+84+6+4+5+84+94+7 , 50 eee mean of monthly cost a 248444 74645484846 9 49 9 9 i vi ue XiUi 1 1 2 2 4 3 9 4 16 4 16 16 8 64 7 49 56 6 36 6 36 36 5 25 5 25 25 8 64 8 64 4 9 81 8 64, 72 7 49 6 36 42 Ex{7 = 340 Expyi= 319 Calculation of Sum Syxt Sxx = ) — nz [n =9 (total observations)] 50. 49 = 840-9 Tx > _ 560 9 Calculation of Sum Syyt Sxy = (= ss) —nE-D il 9 (total observations)] 50 49 = 819 -9x Fx > _ 421 a) Now; b= 2 Sxx 7 °F Correspondingly, we have a=5- 49 50 sg 7 OTR x = 1.266 w. The best linear regression line is =>y=atbhe y= 1.266 +0.7522 (Answer) (b) If the firm decides to produce 4 tons per month, then the predicted cost is given by ~ => y= 1.266 + 0.752 x 4 [. Here, x =4 tons/month] 2 = 4.274 lakhs (Answer) Q8. The Probability that at least one of the two Independent events oceut is 0.5. Probability that 5 first event occurs but not the second is (3/25). Also the probability that the second event occurs but not the first is (8/25). Find the probability that none of the two event occurs. Solution: Given, P(AU B) = 0.5 cy 3 co ay PCAN B®) = = and PAC NB) = = We have, P(AU B) = P(A) + P(B) — P(ANB) “A and B are two independent events, 2. P(ANB) = P(A).P(B) => P(AU B) = P(A) + P(B) — P(A).P(B) = P(AUB) = P(A) {1— P(B)} + P(B) = P(AUB) = P(AN B®) + P(B) = 05= 2+ P(B) = P(B) =0.5-% o P(B) = 0.38 Again, we have, P(AU B) = P(A) + P(B) — P(AN B) A and B are two independent events, *P(AN B) = P(A).P(B) = P(AUB) = P(A) + P(B) — P(A).P(B) = P(AUB) = P(B) {1— P(A)} + P(A) = P(AUB) = P(BNA®) + P(A) = P(AUB) = P(A° NB) + P(A) 305=5+P(A) => P(A) =05-% ~ P(A) = 0.18 Q9. Marks of six students are tabulated below : (5) ‘Name Raj Anil Amit Om Rita Renu ‘Marks 34 50 52 48 5O 52 From the population, tabulated above, you are suppose to choose a sample of size two. (a) Determine, how many samples of size two are possible (b) Construet sampling distribution of means by taking samples of size 2 and organize the data Solution: (a) The number of sample of size 2 is given by - m 6C, [n = Number of persons = 6] = (6 x 5)/(2 x 1) =15 (Answer) (b) Samples of size 2 are given in the following table: Sample Ti 54,50 52 54, 52 53 54,48 5t 54,50 52 54, 52 53 Sample Zi 50, 52 51 50, 48 49 50, 50 50 50, 52 Bi 52,48 50 Sample Fi 52,50 51 52, 52 52 48, 50 49 48,52 50 50, 52 BL Quo. Two new types of petrol, called premium and super, are introduced in the market, and their manufacturers claim that they give extra mileage. Following data were obtained on extra mileage which is defined as actual mileage minus 10. (5) Ordinary Petrol 1 2 2 1 Premium Petrol 2 2 1 Super Petrol 4 1 2 (i Using ANOVA, test whether premium or super gives an extra mileage. (di) What is your estimate for the error variance? Assuming that the error variance is known and is equal to 1, obtain the 95% confidence interval for the mean extra mileage of super. Solution: Observation Ordinary Petrol Premium Petrol Super Petrol 1 1 2 4 2 2 2 3 2 1 2 4 1 3 3 Solving using One-way ANOVA method: Ordinary Petrol (A) Premium Petrol (B) Super Petrol (C) 1 2 4 2 2 1 2 1 2 1 3 3 EA=6 EB=8 EC=10 Ae Bt ce 1 16 4 1 4 1 1 9 EA? =10 EB =18 EC?=30 Date Table: Group A B N m=4 ys DYi=10 | Yxd=18 Ys, [1= Xx, =61%= Dx =8/7,= Lx, = 10 Means, | %=15 | %=2 Std Devs, S; = 0.5774 | Sy = 0.8165 Let k = the number of different samples = 3 N=NtNytNg=4t4t4=12 Grand Mean or Overall: SMor=N+h+%=6+8+10=24 ........(i) i -3- T (8 8 10?) a wE-($+5+P) <0 se (iti) M2 = Vit 4 Va} = 10 + 18 + 30 =58 48 .....(d) Calculation of Degree of Freedom (a0 Degree of freedom between samples (dfgetween) senvcen = k-1= 3 -1= 2 [k=no of columns] Degree of freedom within samples (df,,ithin) Ofwithin = u-k=12-3=9 (iv) Total degree of freedom (dfrorat) rota = Uoetween + Vwithin =2+9=11 ANOVA: Step-1 : Sum of squares between samples (SSB) 1: 2 SSB= () — Ey" = ag (it) — (i) ni n = 50—48=2 or SSB =O n;-(#- 2) =4(1.5 —2)8 +4(2—2)°+4(2.5 — 2) Step-2 : Sum of squares within samples (SSW) 2 ag ( =Ye- (= a) 3 Balin) — Bq.(it) = 58-50 ss =8 Step-3 : Total Sum of squares (SST) SST = SSB + SSW =248 =10 Step-4 : Mean square between samples SSB MSB= ° Hrctween Step-5 : Mean square within samples ssw MSW = A fusehin == “9 = 0.8889 Step-6 : Test statistic F for one way ANOVA test MSB Msw _ ol © 0.8889 = 1.125 ANOVA Table Source of Variation ss df Ms F P-value Ferit BetweenGroups (Treatment) 2. 2 1) 42s | 0.366357 | 4.256495 Within Groups (Error) 9 0.888889 Total 10 11 Hg : There is no significant differentiating between extra mileage samples H, : There is significant differentiating between extra mileage samples F(2,9) at 0.05 level of significance = 4.2565 As calculated F = 1.125 < 4.2565 (i) So, Hy is accepted, Hence there is no significant differentiating between extra mileage of Premium and Super petrol. (ii) The error variance is 0.888889 Qut. Two floppies are selected at random without replacement from a box containing 7 good and 3 defective floppies. Let A be the event that the first floppy drawn is defective, and let B be the event that the second floppy drawn is defective. (i) Find the conditional probabilities P(B|A) and P(B|A°) (ii) Show that P(B) = P(B|A). P(A) + P(BIAS) P(AS) = P(A). Solution: ‘Total number of balls = 7 + 3 = 10 The probability that the first floppy drawn is defective, P (A) = 2 3 7 =1~ 70 = qo + The probability that the first floppy drawn is good, P(A" (i) The probability that the second floppy drawn is defective if the first floppy drawn is defective (Case of without replacement), P(BIA) = ae = z areal Because, 1 defective floppy from 10 floppies was drawn. The probability that the second floppy drawn is defective if the first floppy drawn is good (Case of without replacement), P(BOA) 3 PBI =~ pay = 9 (Answer) Because, 1 good floppy from 10 floppies was drawn. (i) We know that, P(BNA)=P(A). (BIA) + P(BMA®) =P (A). P (BIAS) Now, P(BIA). P(A) + P(BIAS) PAS) =P(BNA)+P(BN A =P(BNA)+ [P(B)- P (BNA) [Since, P (BM A‘) = P(B) ~ P(BN A] =P(BNA)+P(B)-P(BN A) =P(B) Hence, P (B) = P(BJA). P(A) + P(B|A®) P(A) Now, on substituting the known probabilities, we must have P (B) = PCBIA). P(A) + PEBIAS) P(A) 23,3 7 =ix 42x 9” 10° 9* 10 6421 7 3 =o ~ 9070 ~P Hence, P (B) = P(BJA). P(A) + P(BIA®) P(A®) = P (A) (Proved) Q12. A drilling machine bores holes with a mean deviation of 0.5230 em and a standard deviation of 0.0032 em. Caleulate 2-sigma and 3-sigma upper and lower control limits for means of samples 4 and prepare a control chart. Solution: yor X = 0.5230, ¢ = 0.0032, n=4 a _ 0.0032 = 0.0016 va 2 Sigma (Internal Control) Limits : UCLg = 4 + 203 = 0.5230 + 2 x 0,0016 = 0.5262 cm LCLg = p — 20 = 0.5230 — 2 x 0.0016 = 0.5198 cm 3 Sigma (Outer Control) UCLz = p+ 805 = 0.5230 + 2 x 0.0016 = 0.5278 em LCLz = — 30% = 0.5230 — 2 x 0.0016 = 0.5182 cm Based on the above calculation, the control charts is the following: 0.5278 F ------~—~-—-—------- 05262 }--—---———------~—-- UCL (INNER 2-1) CENTRAL LINE learningbcience.co.in 0.5198 F—- -—-----—-——~——--—-—- oat ~—— ick OUTER 3) ‘SAMPLE NO. Fig.: Control Chart Q13. What are control charts briefly discuss the utility of control charts? (5) Solution: A control chart is a statistical process or SPC chart, which is one of several graphical tools typically used in quality control analysis to understand how a process changes over time The main fundamentals of a control chart are: + Avisual representation of a time series graph that illustrates data points collected at specific time intervals. * Ahorizontal control line to easily picture variations and trends. + Horizontal lines, displaying upper and lower control limits, are placed at equal distances above and below the control line. These upper and lower thresholds are calculated from the data on the time-series graph over a specified period of time, : . E Z i * f- Se : . Sample number ( Legend 7 Utilities of Control Charts 1. Employee Retention Rates Finding, hiring, and training new employees is an expensive and time-consuming process for a company. Therefore, it is to a company’s advantage to retain good employees as long as possible. A control chart can be constructed that compares a company’s actual employee turnover rate to its desired rate. If the chart reveals an excessively high turnover rate, then the company can do further investigation to find the cause(s) of the high turnover rate, and then make changes designed to reduce the rate. 2. Returns on Investments Quality control can also be applied to examine returns on your investments, checking the extent to which individual investments in your portfolio either outperform or underperform compared to your expected investment returns. Wide variations in investment results, either up or down, may indicate that your current investment portfolio carries a higher degree of risk than the risk level that you are comfortable with. Outperforming and underperforming investments can also be examined for common characteristies that may help you identify future investments that offer a higher probability of obtaining maximum profits. 3. E-commerce websites Control charts can be used to monitor the processes and functionality of an e-commerce website. For example, anyone engaged in such a business would do well to monitor the number of instances where there is some type of glitch in the website's operation that causes a customer to abandon the process of making a purchase. By monitoring the quality of the website's operational performance, any problems or issues that arise can be quickly addressed before they lead to a substantial decline in revenues. Qug. Compare the following: a) Cluster sampling , Stratifies sampling and Systematic sampling b) Parametric and Non-Parametrie Tests Solution: (a) Comparision among Cluster sampling , Stratified sampling and Systematic sampling: Cluster sampling also involves dividing the population into subgroups, but each subgroup should have similar characteristics to the whole sample. Instead of sampling individuals from each subgroup, you randomly select entire subgroups. Ifitis practically possible, you might include every individual from each sampled cluster. If the clusters themselves are large, you can also sample individuals from within each cluster using one of the techniques above. This is called multistage sampling. This method is good for dealing with large and dispersed populations, but there is more risk of error in the sample, as there could be substantial differences between clusters. It's difficult to guarantee that the sampled clusters are really representative of the whole population, Stratified sampling involves dividing the population into subpopulations that may differ in important ways. It allows you draw more precise conclusions by ensuring that every subgroup is properly represented in the sample, To use this sampling method, you divide the population into subgroups (called strata) based on the relevant characteristic (e.g, gender, age range, income bracket, job role). Systematic sampling is similar to simple random sampling, but its usually slightly easier to conduct. Every member of the population is listed with a number, but instead of randomly generating numbers, individuals are chosen at regular intervals Ifwe use this technique, it is important to make sure that there is no hidden pattern in the list that might skew the sample (b) Comparision between Parametric and Non-Parametric Tests: Properties Parametric Non-parametric Assumptions Yes No central tendeney Value ‘Mean value ‘Median value Correlation Pearson Spearman Probabilistic distribution Normal Arbitrary Population knowledge Requires Does not require Used for Interval data Nominal data Applicability Variables Attributes & Variables Examples z-test, t-test, ete. Kruskal-Wallis, Mann-Whitney Q45. Explain the following with the help of an example each: (10) a) Goodness of fit test b) Test of Independence ©) Criteria for a good estimator d) Chi-Square Test Solution: (a) Goodness og fit test: Instatistical hypothesis testing, the Chi-Square Goodness-of-Fit test determines whether a variable is likely to come from a given distribution or not. We must have a set of data values and the idea of the distribution of this data. We can use this test when we have value counts for categorical variables. This test demonstrates a way of deciding if the data values have a“ good enough” fit for our idea or if it is a representative sample data of the entire population. For Example- Suppose we have bags of balls with five different colours in each bag. The given condition is that the bag should contain an equal number of balls of each colour. The idea we would like to test here is that the proportions of the five colours of balls in each bag must be exact. (b) Test of Independence: The Chi-Square Test of Independence is a derivable (also known as inferential ) statistical test which examines whether the two sets of variables are likely to be related with each other or not. This test is used when we have counts of values for two nominal or categorical variables and is considered ‘as non-parametric test. A relatively large sample size and independence of obseravations are the required criteria for conducting this test. For Example- Ina movie theatre, suppose we made a list of movie genres. Let us consider this as the first variable The second variable is whether or not the people who came to watch those genres of movies have bought snacks at the theatre. Here the null hypothesis is that th genre of the film and whether people bought snacks or not are unrelatable. If this is true, the movie genres don’t impact snack sales. (©) Criteria for a good estimator: An estimator is any quantity calculated from the sample data which is used to give information about an unknown quantity in the population. For example, the sample mean & is an estimator of the population mean 44. Criteria for a Good Estimator 1. Unbiasedness: An estimate is said to be an unbiased estimate of a given parameter when the expected value of that estimator can be shown to be equal to the parameter being estimated. That's just saying, if the estimator (say, the sample mean) equals the parameter (which would be the population mean for this case), then it’s an unbiased estimator. Unbiasedness is a good quality for an estimate, since, in such a case, using weighted average of several estimates provides a better estimate than each one of those estimates. For example, if your estimates of the population mean 1 are say, 10, and 11.2 from two independent samples of sizes 20 and 30 respectively, then a better estimate of the population mean 1t based on both samples is [20 (10) + 30 (11.2)]/(20 + 30) = 10.75 (20.72, actually’) Therefore, umbiasedness allows us to upgrade our estimates. 2. Consistency: An estimator is said to be "consistent" if increasing the sample size produces an estimate with smaller standard error Therefore, our estimate is consistent with the sample size. 3. Efficiency: An efficient estimate is one which has the smallest standard error among all unbiased estimators. Also, the "best" estimator is the one which is the closest to the population parameter being estimated (d) Chi-Square Test: The Chi-Square test is a statistical procedure for determining the difference between observed and expected data. This test can also be used to determine whether it correlates to the categorical variables in our data. It helps to find out whether a difference between two categorical variables is due to chance ora relationship between them, The chi-square formula- (O-E) LSE Where: X? is the chi-square test statistic is the summation operator (it means “take the sum of”) Ois the observed frequency Eis the expected frequency The larger the difference between the observations and the expectations (O ~ E in the equation), the bigger the chi-square will be. To decide whether the difference is big enough to be statistically significant, you compare the chi-square value to a critical value. Example: Handedness and nationality Contingency table of the handedness of a sample of Indian and American Right-handed Left-handed Indian 334 18 American 250 23 A chi-square test (a test of independence) can test whether these observed frequencies are significantly different from the frequencies expected if handedness is unrelated to nationality. BCA ist Semester Solved Assignment 2022-2023 BCA ond Semester Solved Assignment 2022-2023 BCA ard Semester Solved Assignment 2022-2023 BCA 4th Semester Solved Assignment 2022-2023

You might also like