Capital University of Science and Technology
Department of Computer Science
MTCS2063 – Probability and Statistics
ASSIGNMENT NO. 1
Semester: Fall 2024
Instructor: Ms. Mahnoor Ali
Assigned Date: 04/10/2024 Due Date: 11/10/2024
Name: Section: 4
Part1:
Question: 1
For the following statements which statements are true or false?
(i) A population is a collection of all objects on interest.
(ii) A sample is any specific collection of all objects of interest.
(iii) A measurement is a number of attribute computed for each member of a population or
sample.
(iv) A sample that consists of an entire population is called a census.
(v) A parameter is a number computed from sample data.
(vi) Sample data is the collection of measurements from a sample.
(vii) Descriptive statistics is the branch of statistics that involves drawing conclusions
about a population based on sample data.
(viii) A statistic is a number computed from sample data.
(ix) Qualitative data are measurements that involve non-numerical characteristics.
(x) The colors of a sample of 100 cars is an example of quantitative data.
Question: 2 Choose correct option.
The data about the weights of plants:
(i) Continuous data
(ii) Discrete data
(iii) Quantitative data
The procedure of inferring about the population characteristic using the sample is
called________.
(i) Descriptive statistics
(ii) Inferential statistics
(iii) Science
(iv) Statistic
The variable that takes numerical value is called______ variable.
(i) Qualitative
(ii) None of these
(iii) Primary
(iv) Quantitative
What is primary purpose of descriptive statistics?
(i) Make predictions about a population
(ii) To summarize and describe the dataset
(iii) To conduct hypothesis testing
(iv) To analyze the relationship among variables
Quantitative data can be classified into which two categories?
(i) Nominal and ordinal
(ii) Discrete and continuous
(iii) Categorical and interval
(iv) None
An example of discrete data is:
(i) Amount of milk in jug
(ii) Temperature in Celsius
(iii) Number of tress in university
(iv) The height of tree
Question 3: What are the uses of statistics? Write down any five application of statistics in field
of Computer Science.
Question 4: Differentiate between primary and secondary data. Also write about the sources of
primary and secondary data.
Part 2:
Question: 1
The following data set representing the 20 number of bugs found in different software modules,
such as: 0, 0, 1, 2, 1, 1, 1, 3, 3, 3, 4, 4, 4, 4, 0, 0, 0, 5, 5, 2. Make the categorical frequency
distribution. Also obtain percentage of frequencies.
Question: 2
In a survey conducted among 25 software engineers to identify their preferred programming
languages, the following results were obtained: python, python, java, java, python, C++,
C++, java, java, Ruby, Ruby, Ruby, Go, java, Go, java, python, C++, C++, C++, Go, Go,
Java, C++, python.
Make the Frequency distribution also presents the data graphically.
Question: 3
The number of stories in each of a sample of the world’s 30 tallest buildings follows. Construct
a grouped frequency distribution and a cumulative frequency distribution with 7 classes.
88 88 110 88 80 69 102 78 70 55 79 85 80 100 60 90 77 55 75 55 54 60 75 64 105 56 71 70
65 72.
Question: 4
The following data set presents the number of features developed across 44 software projects.
57 61 57 57 58 57 61 54 68 51 49 64 50 48 65 52 56 46 54 49 51 47 55 55 54 42 51 56 55 51 54
51 60 62 43 55 56 61 52 69 64 46 54 47.
Construct the frequency distribution (Use your own judgment as to the number of classes and class
size), and answer the following questions.
(i) Were the data obtained from a population or a sample? Explain your answer.
(ii) Are there any peaks in the distribution?
(iii) Write a brief summary of the nature of the data as shown in the frequency distribution.
(iv) ldentify any possible outliers.
Question: 5
The following data set presents the various completion time (in hours) for software development
task. Construct the Frequency distribution to present the data set by taking Class interval 10.
24 29 30 56 58 29 34 45 22 39 25 41 33 47 50 26 28 31 55 60 32 48 27 49 52 36 37 54 23
44 40 53 38 42 46 20 29 30 30 31 19 18 19 19 25 25 55 60 60 32 49 48 52 55 55 14 13 12
14 13.
Also summarize the results.
Question: 6
The data show the number of railroad crossing accidents for the 50 states of the United States for
a specific year. Construct a histogram and frequency polygon for the data. Comment on the
skewness of the distribution.
Class Limits frequency
1-43 24
44-86 17
87-129 3
130-172 4
173-215 1
216-258 0
259-301 0
302-344 1
Question: 7
Consider a frequency distribution of execution time (in millisecond) for a set of sorting algorithms
applied to data set of 1000 elements. Construct a histogram, frequency polygon and ogive for the
data. Comment on the shape of the distribution.
Execution time frequency
0-10 15
11-20 25
21-30 20
31-40 10
41-50 5
51-60 3
61-70 2
Question: 8
Assume you are a realtor in Bradenton, Florida. You have recently obtained a listing of the selling
prices of the homes that have sold in that area in the last 6 months. You wish to organize
those data so you will be able to provide potential buyers with useful information. Use the
following data to create a histogram, frequency polygon, and ogive.
142,000 73,800 123,000 179,000 127,000 135,000 99,600 119,500 91,000 112,000 159,400
114,000 231,000 205,300 119,600 189,500 205,000 147,000 144,400 93,000 177,600
162,000 67,900 110,000 321,550 163,000 123,000 83,400 89,000 156,300 156,300 87,900
96,000 187,000 77,000 93,000 104,500 104,000 88,400 81,000 96,000 132,300 99,500
108,650 133,900 180,000 131,000 80,000 166,000.
Answer the following questions:
(i) What questions could be answered more easily by looking at the histogram rather
than the listing of home prices?
(ii) What different questions could be answered more easily by looking at the frequency
polygon rather than the listing of home prices?
(iii) What different questions could be answered more easily by looking at the ogive rather
than the listing of home prices?
(iv) Are there any extremely large or extremely small data values compared to the other
data values?
(v) Which graph displays these extremes the best?
(vi) Is the distribution skewed?
Good luck