Basic of Statistics
Basic of Statistics
Career
Data Science
1. Data collection,
2. Analysis
3. Decision
Business problem
======================================================================
Airtel
Data based
20C My office
1000 GB
Hadoop
Hive Data Science/Analytics
Scala collected Analysis Decision
Spark
Analysis
1. Statistics
2. Mathematics
3. Machine learning --> Model Development or Algorithm
Data preprocessing
4. Python--> language (advantage --> Packages, libraries, modules)
5. Natural Language Processing
6. Deep Learning
7. Reinforcement Learning --> A.I
GeekLurn_7.30 AM Page 1
Tableau --> business analyst
1000 --> classify the people and will tell us how many people may leave the networks and NO
==================================================================================
Y = C + M(X) ===70k
20 + 10 (3) ===50k
10,000
=================================================================================
1. Structured data ---> CSV, Rows, Columns, banking, hospitals, insurances, payrolls, Retails
2. Unstructured data --> no rows, no columns ex: google search engine, what's up text, Facebook
comments,
EDA:
GeekLurn_7.30 AM Page 2
Assumptions: Follow specific assumptions.
X <-----Independent variables.
Y=X+C
Data transformations:
Standardization
Normalization
Scaling
==========================
===================================================================
--> 160
South
===================================================================
MSE = 10,000
Avg = 1 lakh
1lakh (90k to 1,10 k) --> 1,50k
===================================================================
GeekLurn_7.30 AM Page 3
Task: Calculate the Average Indian height in 2020
When I not known exactly what is the total lot size then we will call them as
Population
in
X
Sample Population
Statistic from the variable Statistics will become parameters, it’s a constant
X bar = Average Mean = mu
Standard Deviation is 's' Standard Deviation is 'Sigma'
Finite Data Infinite data
We can calculate on data Its not always possible
whatever we will calculates are comes under Taking the instructions from D.S and apply some additional theory which provides us
"Descriptive Statistics" "Inferential Statistics"
GeekLurn_7.30 AM Page 4