
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Mathematical Statistics Functions in Python
The statistics module of Python library consists of functions to calculate statistical formulae using numeric data types including Fraction and Decimal types.
Following import statement is needed to use functions described in this article.
>>> from statistics import *
Following functions calculate the central tendency of sample data.
mean() − This function calculates the arithmetic mean of data in the form of sequence or iterator.
>>> from statistics import mean >>> numbers = [12,34,21,7,56] >>> mean(numbers) 26
The sample data may contain Decimal object or Fraction object
>>> from decimal import Decimal >>> numbers = [12,34,21,Decimal('7'),56] >>> mean(numbers) Decimal('26') >>> from fractions import Fraction >>> numbers = [12,20.55,Fraction(4,5),21,56] >>> mean(numbers) 22.07
harmonic_mean() − Harmonic mean is calculated by taking the arithmetic mean of reciprocals of elements in sample data and then taking reciprocal of the arithmetic mean itself.
Sample = [1,2,3,4,5]
Reciprocals = [1/1, 1/2, 1/3, 1/4, 1/5] = 2.28333333333
mean = 2.28333333333/5 = 0. 45666666666666667
Harmonic mean = 1 / 45666666666666667 = 2.189784218663093
>>> harmonic_mean([1,2,3,4,5]) 2.18978102189781
median() − Median is the middle value of the sample data. The data is arranged automatically in the ascending order to find the median. If the count of elements is odd, the median is the middle value. If the count is odd, the mean of two middlemost numbers is the median.
>>> median([2,5,4,8,6]) 5 >>> median([11,33,66,55,88,22]) 44.0
mode() − This function returns the most common value in the sample. This function can be applied to numeric or non-numeric data.
>>> mode((4,7,8,4,9,7,12,4,8)) 4 >>> mode(['cc','aa','dd','cc','ff','cc']) 'cc'
Following function deal with the measure of dispersion of elements in the sample from central value.
variance() − This function reflects the variability or dispersion of data in the sample. Large variance means data is scattered. Smaller variance indicates that data is closely clustered.
Following is the procedure to find the variance
- Find arithmetic mean of all elements in the sample.
- Find the square of the difference between the mean and each element and add the squares.
- Divide the sum by n-1 if the sample size is n to get the variance
Mathematically, the above procedure is represented by the following formula −
$$s^2 = \frac{1}{n-1}\displaystyle\displaystyle\sum\limits_{i=1}^n(x_{i}-\overline{x})^2$$
Thankfully variance() function does the computation of the above formula for you.
>>> num = [4, 9, 2, 11, 5, 22, 90, 32, 56, 70] >>> variance(num) 981.2111111111111
stdev() − This function returns the standard deviation of data in the sample. Standard deviation is the square root of the variance.
>>> num = [4, 9, 2, 11, 5, 22, 90, 32, 56, 70] >>> stdev(num) 31.324289474960338