0% found this document useful (0 votes)
27 views4 pages

Basic of Statistics

The document discusses data science concepts like data collection, analysis, and decision making. It also discusses different data science tools and techniques used in analysis like Hadoop, Hive, Scala, Spark, statistics, machine learning algorithms, and Python. The document provides steps involved in a typical data science project life cycle including data collection, exploratory data analysis, data visualization, modeling, and deployment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views4 pages

Basic of Statistics

The document discusses data science concepts like data collection, analysis, and decision making. It also discusses different data science tools and techniques used in analysis like Hadoop, Hive, Scala, Spark, statistics, machine learning algorithms, and Python. The document provides steps involved in a typical data science project life cycle including data collection, exploratory data analysis, data visualization, modeling, and deployment.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Introduction

07 December 2020 07:27

Career

Data Science

1. Data collection,
2. Analysis
3. Decision

Business problem

======================================================================

Airtel
Data based

20C My office

1000 GB

Big data engineers

Hadoop
Hive Data Science/Analytics
Scala collected Analysis Decision
Spark

Analysis

1. Statistics
2. Mathematics
3. Machine learning --> Model Development or Algorithm
Data preprocessing
4. Python--> language (advantage --> Packages, libraries, modules)
5. Natural Language Processing
6. Deep Learning
7. Reinforcement Learning --> A.I

GeekLurn_7.30 AM Page 1
Tableau --> business analyst

--> Deployment App, web page

1000 --> classify the people and will tell us how many people may leave the networks and NO

What could be reasons of leaving the network


500 --> Y
500 --> No

==================================================================================

Y = Mx + C ----> Straight line equation

X = 20 ---> Salary ------->


M =10 Performance (Experience, Designation, Technical skills)
C=5
Bias + 10 (Experience)
Find Y? ------> Y = 205 Salary ---> 20 + 10 (5) --> 20 + 10X is my mathematical model equation

Y = C + M(X) ===70k
20 + 10 (3) ===50k

Y = Bo + b1x1 --> model --> Y = Bo + B1x1 + B2x2 + B3x3

Y --> Salary ---> Target variable / Dependent variable / Output variable


X --> Experience --> Independent variables

Machine learning ---> Algorithm

10,000
=================================================================================

Steps involved in your project life cycle

Data can be in two types of formats

1. Structured data ---> CSV, Rows, Columns, banking, hospitals, insurances, payrolls, Retails
2. Unstructured data --> no rows, no columns ex: google search engine, what's up text, Facebook
comments,

EDA:

All columns are may not important

GeekLurn_7.30 AM Page 2
Assumptions: Follow specific assumptions.

Data visualization: Graphs, Plots, Charts ---> Business Analyst

Rows and columns --> data

Records and Variables --> Data

Y <--- Target variable/output variable/Dependent variables --> whichever I need to predict

X <-----Independent variables.

Y=X+C

Data transformations:

Standardization
Normalization
Scaling

==========================

Y = mx + C ---> Y = Bo + B1X1 + B2X2 + B3X3 + ……BnXn --> Linear Regression

Logistic Regression, Support vector machine, KNN, Naïve Bayes,

===================================================================

North -->180 --> Average Indian height--> 170

--> 160
South

===================================================================

MSE = 10,000

Avg = 1 lakh
1lakh (90k to 1,10 k) --> 1,50k

===================================================================

GeekLurn_7.30 AM Page 3
Task: Calculate the Average Indian height in 2020

When I not known exactly what is the total lot size then we will call them as
Population

Average: Sum of all values / Total sample size

in
X

Sample Population
Statistic from the variable Statistics will become parameters, it’s a constant
X bar = Average Mean = mu
Standard Deviation is 's' Standard Deviation is 'Sigma'
Finite Data Infinite data
We can calculate on data Its not always possible
whatever we will calculates are comes under Taking the instructions from D.S and apply some additional theory which provides us
"Descriptive Statistics" "Inferential Statistics"

GeekLurn_7.30 AM Page 4

You might also like