
Data Structure
Networking
RDBMS
Operating System
Java
MS Excel
iOS
HTML
CSS
Android
Python
C Programming
C++
C#
MongoDB
MySQL
Javascript
PHP
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Absolute and Relative Frequency in Pandas
In statistics, the term "frequency" indicates the number of occurrences of a value in a given data sample. As a software meant for mathematical and scientific analysis, Pandas has many in-built methods to calculate frequency from a given sample.
Absolute Frequency It is same as just the frequency where the number of occurrences of a data element is calculated. In the below example, we simply count the number of times the name of a city is appearing in a given DataFrame and report it out as frequency.
Approach 1 − We use the pandas method named .value_counts.
Example
import pandas as pd # Create Data Frame data = ["Chandigarh","Hyderabad","Pune","Pune","Chandigarh","Pune"] # use the method .value_counts() df = pd.Series(data).value_counts() print(df)
Output
Running the above code gives us the following result −
Pune 3 Chandigarh 2 Hyderabad 1 dtype: int64
Approach 2 − We use the pandas method named .crosstab
Example
import pandas as pd data = ["Chandigarh","Hyderabad","Pune","Pune","Chandigarh","Pune"] df = pd.DataFrame(data,columns=["City"]) tab_result = pd.crosstab(index=df["City"],columns=["count"]) print(tab_result)
Output
Running the above code gives us the following result −
col_0 count City Chandigarh 2 Hyderabad 1 Pune 3
RelativeFrequency − This is a fraction between a given frequency and the total number of observations in a data sample. So the value can be a floating point value which can also be expressed as a percentage. To find it out we first calculate the frequency as shown in the first approach and then divide it with total number of observations which is found out using the len() function.
Example
import pandas as pd # Create Data Frame data = ["Chandigarh","Hyderabad","Pune","Pune","Chandigarh","Pune"] # use the method .value_counts() df = pd.Series(data).value_counts() print(df/len(data))
Output
Running the above code gives us the following result −
Pune 0.500000 Chandigarh 0.333333 Hyderabad 0.166667 dtype: float64