0% found this document useful (0 votes)
27 views26 pages

Probability and Statistics Introduction To Statistics

The document outlines a module on Probability and Statistics, detailing its intended learning outcomes, assessment criteria, and historical context. It covers the significance of statistics in IT, including applications in algorithms, databases, networking, and artificial intelligence. Additionally, it defines key concepts such as populations, samples, parameters, and types of data, while distinguishing between descriptive and inferential statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views26 pages

Probability and Statistics Introduction To Statistics

The document outlines a module on Probability and Statistics, detailing its intended learning outcomes, assessment criteria, and historical context. It covers the significance of statistics in IT, including applications in algorithms, databases, networking, and artificial intelligence. Additionally, it defines key concepts such as populations, samples, parameters, and types of data, while distinguishing between descriptive and inferential statistics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Probability and Statistics

Introduction to Statistics

Thilini Kulaweera
MSc (Colombo) | BSc (Kelaniya)
Intended Learning Outcomes
End of this module you will be able to learn,
LO1: Describe the use of statistics in the domain of IT.
LO2: Understand statistical theories.
LO3: Understand the principles of probability and the concept of probability.
LO4: Analyze scenarios and problems that will be arrived at in new situations.
LO5: Demonstrate an understanding of descriptive statistics by practical
application of quantitative reasoning and data visualization.
Module Details
Module Name
Probability and Statistics
Module code IT1103
Credit Points 3
Method of delivery
Lectures (2 hours/week)
Tutorials (1 hour/week)
Labs (2 hours/week)
Enrollment key PS03@
Assessment criteria
● Midterm -20%
● Assignment – 30%
(Data analysis report with a
presentation)
● Final Exam – 50%
(3-hours essay type questions)
History of Statistics
The ideas and methods of statistics developed The agricultural, life and behavioural
gradually as society grew interested in sciences also began to rely on data to
collecting and using data for a variety answer fundamental questions. Effective
applications. This was upon desires of rulers methods for dealing with these types of
to count the number of inhabitants or measure questions developed slowly with debate.
the value of taxable land in their domains.

17th and 18th century


20th Century

History 19th Century

The importance of careful measurements of Several Statisticians are active in


weights, distances, and other types of developing new methods, theories and
physical quantities grew. Statistical applications of statistics.
methods were used to analyze scientific The availability of modern computers is
measurements. the major factor of development of
statistics into data science. .
Statistics Statistics is the
science of collecting,
organizing and
Definition
analyzing data, and
finally drawing
conclusions in the
best possible way.
The main functions in Statistics
★ Formulate real problem in statistical terms.

★ Give advice on efficient data collection.

★ Analyze data efficiently and extract the maximum amount of

information.

★ Interpret and report the results.


Probability and Probability and statistics have
started to be used in practically all
Statistics in the areas of computer science

Domain of IT
A pictorial view of our
road, starting in the
plains of Mathematics
and winding up the
hills of Probability to
the heights of
Statistics. From there,
we will look onto and
take short
explorations on the
nearby mountains,
most often on the AI
mountain.
Algorithms and data structures – randomized algorithms and proofs using
probability in deterministic algorithms. For example: randomized sort, some
polynomial time primality testing algorithms, randomized rounding in integer
programming.

Compilers – modern compilers optimize code at run time, based on collecting data
about the running time of different sections of the code

Cryptography

Databases – to maximize speed of access databases are indexed and structured


taking into account the most frequent queries and their respective probability

Networking and communications – computer networks behave


nondeterministically from the point of view of the user. The probabilistic analysis of
computer networks is in its beginnings.

Circuit design – both testing the functionality of a circuit and testing that a given chip
is working correctly involve probabilistic techniques
Computer engineering – cache hits and misses, bus accesses, jumps in the
code, interruptions are all modeled as random events from the point of view
of the system designer.

Artificial intelligence – probabilistic methods are present and play a central


role in most areas of AI. Here are just a few examples: machine learning,
machine vision, robotics, probabilistic reasoning, planning, natural language
understanding, information retrieval.

Computer graphics – machine learning techniques and their underlying


statistical framework are starting to be used in graphics; also, making
rendered scenes look “natural” is often done by injecting a certain amount of
randomness (for example rendering of clouds, smoke, fields with grass and
flowers, tree foliage).
Statistics provides Methods for;
Statistical Methods are used to
answer Following questions; ● Design - Planning and carrying out

research studies.
● What kind and how much data need
to be collected? ● Description - Summarizing and
● How should we organize and
exploring data.
summarize data?
● How can we analyse the data and ● Inference - Making predictions and
draw conclusions on it?
generalizing about phenomenon
● How can we assess the strength of
the conclusions and evaluate their represented by the data.
uncertainty?
Population | Sample
A population (universe) is the collection of all items or things under
consideration.
A sample is a portion of the population selected for analysis.
Parameter | Statistic
A parameter is a summary measure that describes a characteristic of
the population.
A statistic is a summary measure computed from a sample to
describe a characteristic of the population.
Descriptive Inferential
Statistics Statistics
Drawing conclusions and/or
Collecting, Summarizing and making decisions concerning a
Describing data
population based only on
sample data
Descriptive Statistics

❖ Collect data
❖ Present data
❖ Characterized data
Inferential Statistics
❖ Estimation
Estimate Population parameters using sample statistics.
Ex: Estimate the population mean weight using sample mean weight.

❖ Hypothesis Testing
A method to use data from a sample to claim about a parameter of the
population.
Ex: Test the claim that population weight is greater than 120 pounds.
How to collect data?
Data Source

Primary Data Secondary


Data

Observation Survey

Print or Electronic
Experiments
Types of data
Data

Categorical
Numerical Data
Data

Examples:
● Marital Status
● Gender
● Eye Color Discrete Continuous

Examples: Examples:
● Number of children ● Age
● Number of defects per hour ● Height, weight
Level of measurements and measurement scale
Highest level Differences between measurements, true zero
(Strongest forms of Ratio Data exists.
Measurements)

Differences between measurements, but no


Interval Data true zero exists.

Ordered Categories (Ranking, Orde or Scaling)


Higher level Ordinal Data

Lowest level
(Weakest forms of Categories ( No orders or scaling)
Measurements)
Nominal Data
Introduction to Variables

An individual is the object described by a set of data.


Individuals may be people, animals or things.
Variables are characteristics of items or individuals and are
what we analyze when we use statistical methods. A variable
can take different values for different individuals.
Categorical

Are you employed? YES / NO

Variable
Type

Discrete

Numerical

Continuous
Categorical Variable
●Categorical variables also known as qualitative variables
have values that can only be placed into categories such as
‘yes’ and ‘no’.

● They also sometimes can results in more than two


possible responses. Example: Indicate the day on which the
purchase was made.
Numerical Variable
●Numerical variables also known as quantitative variables have values that represent
quantities. Numerical variables are further subdivided into Discrete or Continuous
variables.

●Continuous variables produce numerical responses that arise from a measuring


process. The time you wait at the bus halt to get into a bus is an example of a continuous
numerical variable because the response takes any value within a continuum, or interval
depending on the precision of the measuring instrument.

● For example, your waiting would be 1 minute, 1.1 minutes, 1.11 minutes. Depending on
the precision of the measuring device you use.
Numerical Variable – Discrete
●Discrete variables have numerical values that arise from a counting process.

● “How many brothers you have?” is an example of a discrete numerical variable


because the response in one of a finite number of integers.

You may have zero, one, two and so on number of brothers.


Thank you

You might also like