\_y
Xu*., Roll number:
Q,l [10 marks] A one-dimensional dataset of 1 lakh points has the mean as 100 and yaruance
as 9. Chebyshev's inequality states that atleast points are contained in this
interval*182, 1181
h=6, 1" LL- *-goTnU
Q.2 I10 marksl We fit a linear regression model on some dataset. One of the independent
variables is Temperature, measured in Kelvin (K). The coefficient of this variable
comes out
to be 2000. If the temperature was measured in Fahrenheit (o F), the coefficient of
this variable
wouldhave come outto be t I I I " I ?
uF
[Hint: -915 (K -273)+32]
Q.3 [10 marks] A salesperson has visited 1000 customers to sell a book in a city. He has the
data of several attributes of these customers (Income, education level, interest
in ieading, etc.).
Finally, he has also maintained the record of who purchased the book or not. He uses
iogistic
regression model to buitd a model for this data. Assume that the cost
of the book is Rs.i00,
selling price of the book is Rs. 3000, average cost of visiting a customer is Rs. 200.
The salesperson gets a new list of 10000 customers and the attributes used in building
the
model' For a customer, the probability of purchasing the book comes out as ,.p,,.
The
salesperson decides to visit a customer only if the e*pe.ted value of visiting
the cusiomer is
more than Rs. 800. For what values of "p" should the salesperson visit a customer.
z3oo1 tt *t) 2o o. s oo
- )
Q.4 t10 marksl In the last offering, IBM-3r, ,o,$3 r?#o with 200 srudents. 160 srudents
gave the ETE, others have dropped the course. If we were to give
an estimate of the proportion
of students who drop IBM-322 course, and the class .un b. considered u, u ,.pi.sentatire
sample for the same, what is the unbiased estimate for the proportion
of studenis who drop
IBM-322 course. Construct a95o/o confidence interval for the proportion of students
who droo
rB*-322course'
Q= 0,L CT* p*at_,ry g (. irrq ;.;d)
Q'5 t5 marksl Let W be a random variable that follows chi-square distribution with degrees
of freedom: 15. what is the expectation of w? Derive the result. v
6tvc), 1g-
Q'6 [5 marks] True/False (no expl.anation required): A box plot is always a better way to
describe the data as compared to a histogram.
filt*
Q.7 [10 marks] Consider a 3-sided dice with letters A,B,C written on the 3 faces. We are
interested in estimating the probability of these letters appearing
when the dice is rolled. The
dice is rolled l0 times. The observed outcome is A,A,B,A,B,A,d,A,A,B
What is the maximum likelihood estimate of probability of ihese'letters
appearing? you may
use the notations pa, pe and pc for these probabilities.
$tuav and briefly show all the steps, just writing the final answer is not sufficient.
fDon't worry auoultlel.Tl#rytricgl shape Jf u g-rid.d dice, assume such a dice exists]
Ylt-6"*- PAt. f*. f.
btrEL$
E-u.,
F. A !
ry*&j3ry-€
fg
3 &(._
"r*ffi
flA=#; ?s* -'*2 i Vr-* *J*
lB '/
5
le
Name :
Q. 8- [5 +10 marks] There is a six sided fair dice (with numb.i"ittr?t"T' th. six faces). Let -
ECX') f
I
an experiment be rolling the dice 100 times. V {r-\.= ZEty)
a 5c + b L\tiu/:) .
Denote by
rrxil arafidom variable which counts the number of times ah evb,ffnumber appears
in these 100 attemPts. EC=) lSo -
,,Y", a random variable which counts the number of times a number 1::t,l}^:fl"d
Denote by
e CnvtK'U
totwocomesinthese 100attempts. Va,rrC")= rlahtn)+ 1.voJ.t\) t
Let Z * X+3Y
.Ia.r.Cr)= eg -f q"Y + (or(rc,9
CrrC'n,f ) - ECXv)- Ec).) gc*r) ,=- 6
Find E(Z) and Variance(Z)
. Var.C-)= {S-+{ooto =-2Q{
* an event to raise awareness about mental health.
e.9 [10 + 10 marks] SAC wants to organize
pirt rrut.ly, SAC hasteen able to obtain some funding to provide unlimited Kaju-Katli(Indian
;*;;0 i; tire students who come to the event. SAC now needs to get an idea about how many
Kaju-Katli will be consumed during the event.
past exp.erience,
Assume there are 10000 students who have the possibility of coming. From
the student will come
we know that if we randomly pick a student, there ts a So/a chance that
to the event. So, the number ofstudents who come to the event can be modelled
by a Binomial
distribution. Etf):$ooX3:fgoo r 4
Var tr)= Ecr.r ).Va.r" (x) -+- [+^): .Yartni/
Also, model the number of Kaju-Katli i'stuae"t *ili'#nJ.rliv a-polsson d(tribution
r'vith
mean:3. Val.tf): FOO XZ + ?LXA$€
){*HP*ffiffi*S"rs the total number of Kaju-Katlis that will
Let T denote the random
consumed during the event.
Find the EXPECTATION and VARIANCE of random variable T'
is thought that avanable Y is dependent on
e.lg - t15 marksl Consider a scenario where it
a variable X.
Usually, the hypothesised linear regression model is -
E(Y):alpha+beta*X
wants to propose a new
Mr. Kamal Mohan from IIT Delhi believes that life is not constant and
relationship between y and X without the constant term and with the
X2 term. Mr. Mohan also
has strange reasons to believe that the coefficient.of X*2 would be
half of coefficient of X
.The hypothesised relationship as per Mr. Mohan is
E(Y) : beta*X + (betalz)*xz
is the best estimate
Using the same criteria of minimi zingthe sum of lQuare of residuals, what
as xi and yi ]
of beta? [Assume that there are 'on" fioints in the data, use the notations for them
&[x'] *+$] ffi
{>
4
p ,.€- tf
?l
A {xY+* Y;)^U*
'i 2l
Roll number:
Name:
be random variables X and Y distributed indePendentlY and
Q.11 [10+5+5 marks] Let there
having the following distribution :
X * Normal(mean : Z},vatiance:25)
:
Htq
Y - Normai(mean : 100,variance 36)
LetUandVberandomvariableswhicharegivenasfollows:
u:Y +2x ECU)= [10
ECV) = 92-
,,_{X, probabtltty -A.6
u: 6-[u)=,"6(
Y,
probabiti.ty 0.4
t - 12-- /O S
and Y with probability 0'4
In simple terms, v will be X, with probability 0'6
Find the Expectation of the random variables u and v. Arso, find the standard deviation of the
random variable U-
of v' only the shape needs to be
plot rough sketch of the PDF (probability density function) y-axis.
a
correct. Don,t worry about the exact values on
the X-axis and [simulation may help
herel
model was built to understand if comrption level
(cL) can be
e.12 [5+5 marks] A regression
explained using rlr capita Income(PCI) and
Litglacv Rate(LR) of a country' The model was
was -
inierestingly a very good fit to the data' The model
E(CL) = 100 - 0.0005 * PCI - l'2* LR
You can assume that there are no multlcollinearity issuesffX'm$tpmrr b ?4#-*
a.) Give the interpretation of the coefficient of PCl.
;.i ffiffiH#;r.,iu";, ,o ,.0,,e the corupti#n*fl&.11flv?if,ffireasing.
,----^L r t>
-^^J- +^
to by how much LR needs to
the
Literacy Rate. Can the model provide an exact answer r{
,.rri*.ih. targetiin ttte expected sense). If yes, what is the answer' If
- ^-^^_-.^-
;;; i"
no, why not? ,Its'
distributed' We draw a sample of 6
e.13 [10 marksl Assume that the populalign-t: i::rylly
Construct a symmetnc 99a/o confidence
observations. They turn out to be - 12,,4,6,8,!0,12\'
i;;ri'fb. it . poprtation mean. F= + tr= !-fu +- q'03
professor Sumit Nagar bertt]rln"j very few sludents (less than 20%) think
e.14 [10 marks] hypothesis]'
tilrt going to the gy*i5 a waste of tirie [Put this in the alternative
gets the data.of 500 students with 90
To test this, professor Nagar,decided to coflect data. I-Ie
expressing that going to gym is a waste of time.
whai is the p-varue associated with this
hypothesistest.
r/b-ld_e_=o.tzlo .