0% found this document useful (0 votes)
57 views7 pages

Data Science Essentials for Beginners

1. The document outlines learning objectives for understanding data science, including describing its importance, defining what it is, discussing who does related jobs, and where/when it is applied. 2. It provides a video overview of data science from Dr. DJ Patil, the first chief data scientist of the United States. 3. The document discusses how data science differs from but includes statistics and analysis, with the goal of discovering actionable knowledge for decisions and predictions rather than just explanations.

Uploaded by

wei ng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
57 views7 pages

Data Science Essentials for Beginners

1. The document outlines learning objectives for understanding data science, including describing its importance, defining what it is, discussing who does related jobs, and where/when it is applied. 2. It provides a video overview of data science from Dr. DJ Patil, the first chief data scientist of the United States. 3. The document discusses how data science differs from but includes statistics and analysis, with the goal of discovering actionable knowledge for decisions and predictions rather than just explanations.

Uploaded by

wei ng
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Learning Objectives

1. Describe why data science is important nowadays.


ABOUT DATA SCIENCE 2. Paraphrase what is data science.
3. Discuss who is involve in doing data science job.
4. Identify where data science is used.
5. Determine when data science is applied.
6. Explain how data science works.

VIDEO
• "Data Science: Where are We Going?" - Dr. DJ Patil (12:59 minutes)
https://www.youtube.com/watch?v=3_1reLdh5xw

USA first chief data scientist

3 4

Is Data Science New? Is Data Science the same as Statistics or Analysis?


“For a long time I have thought I was a Analyzing data is something people have been doing with statistics and related methods for a
statistician, interested in inferences from the
1997, C.F. Jeff Wu gave an while.
particular to the general. But as I have watched
inaugural lecture titled simply Explaining Versus Predicting
mathematical statistics evolve, I have had cause
“Statistics = Data Science?”
to wonder and to doubt…I have come to feel that
my central interest is in data analysis, which I Data analysis has been generally used as a way of explaining some phenomenon by
take to include, among other things: procedures extracting interesting patterns from individual data sets with well-formulated queries.
for analyzing data, techniques for interpreting
the results of such procedures, ways of planning 2008. The term “data scientist” is Data science aims to discover and extract actionable knowledge from the data, that is,
the gathering of data to make its analysis often attributed to Jeff knowledge that can be used to make decisions and predictions, not just to explain what’s
easier, more precise or more accurate, and all Hammerbacher (now, founder going on.
the machinery and results of (mathematical) and Chief Scientist of Cloudera)
statistics which apply to analyzing data.” and DJ Patil.
Tukey, “The Future of Data Analysis”, 1962 Statistics is part of Data Science.
Analysis is part of Data Science.
5 6
The Groceries Dataset

What do we do with the data?


We typically collect data to answer
one of 2 questions:
❑ What is the world like?
❑ What is the world going to be like?

Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the
7 of x items purchased. Each line is called a transaction and each column in a row represents an item. 8

Data + Science
• DATA - Facts and statistics collected together for reference or analysis. Watch the video on
“What is Data?” https://www.youtube.com/watch?v=EMHP-q4GEDc
“Data –Information – Knowledge” https://www.youtube.com/watch?v=QsP5WGv0aQc

• SCIENCE - A systematic study through observation and experiment.


What is Science? https://www.youtube.com/watch?v=hDQ8ggroeE4
How does do science? │ Figuring out what's true https://www.youtube.com/watch?v=3MRHcYtZjFY

• DATA SCIENCE - The scientific exploration of data to extract meaning or insight,


and the construction of software to utilize such insight in a business context.

Transform data into valuable insights


Transform data into data products
DATA Transform data into interesting stories
9 10

WHY Data Science? Data Science and Its Relationship to Big Data and
Data-Driven Decision Making
• The diagram shows data science supporting data-driven
decision making, but also overlapping with it.
• This highlights the fact that, increasingly, business decisions are being
Latest Trend – Google Trends
made automatically by computer systems.
• Data engineering and processing are critical to support data-science
activities, but they are more general and are useful for much more.
• Data-processing technologies are important for many business tasks
that do not involve extracting knowledge or data-driven decision
Emerged as hottest new professions and academic disciplines.
Demand is racing ahead of supply.
making, such as efficient transaction processing, modern web system
processing, online advertising campaign management, and others.
• Big data - datasets that are too large for traditional data-processing
Keyword search.
systems and that therefore require new technologies.

NEED: the future is increasingly complex and difficult to predict.


NEED: we don’t have enough qualified experts, and experts often get it wrong.
RAW MATERIALS: we are collecting huge amounts of data at an increasing rate.
ENABLER: new hardware and software tools are emerging. Data science in the context of closely related processes in the organization
THEREFORE: Data science is inevitable! We don’t have a choice.
11 http://online.liebertpub.com/doi/pdfplus/10.1089/big.2013.1508 12
WHAT is Data Science? "We have lots of data – now what?"
(How can we unlock valuable insight from our data?)
Core Aspects of Effective Data Analysis
The discipline of drawing useful conclusions from data
using computation. In order of difficulty: Descriptive →Exploratory → Inferential → Predictive → Causal → Mechanistic

The Science of Data Science: ➢ Descriptive analysis - describe set of data, interpret what you see (census, Google Ngram).
 Analyze and understand data that’s available. ➢ Exploratory analysis - discovering connections (correlation does not = causation).
 Find and acquire what more is needed. ➢ Inferential analysis - use data conclusions from smaller population for the broader group.
 Discovering what we don’t know from data. ➢ Predictive analysis - use data on one object to predict values for another (if X predicts Y,
 Obtaining predictive, actionable insight from data does not = X cause Y).
 Creating data products that have business impact now ➢ Causal analysis - how does changing one variable affect another, using randomized studies,
 Communicating relevant business stories from data. Strong assumptions, golden standard for statistical analysis.
 Building confidence in decisions that drive business value. ➢ Mechanistic analysis - understand exact changes in variables in other variables, modeled by
empirical equations (engineering/physics).
Code of conduct for Data Science -
http://www.datascienceassn.org/code-of-conduct.html Source: Jeffery Leek https://github.com/jtleek/dataanalysis/blob/master/week1/007typesOfQuestions/index.md

A Very Short History Of Data Science - http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#2fe9538a69fd Dataset Explorer https://rpubs.com/Salimah/143370 https://salimahm.shinyapps.io/DatasetExplorer/
13 14

Name of Data Analysis by Data Type


✓ Biostatistics for medical data.
✓ Data Science for data from web analytics.
✓ Machine Learning for data in computer science/
computer vision.
✓ Natural Language Processing for data from texts.
✓ Signal Processing for data from electrical signals.
✓ Business Analytics for data on customers.
✓ Econometrics for economic data.

https://www.kdnuggets.com/2017/07/4-types-data-analytics.html Source: Jeff Leek’s Data Analysis Coursera Class 16

WHO is Involved? Curriculum via Metromap

Becoming a Data Scientist


Overall plan progressively
into the following areas:
1. Fundamentals
2. Statistics
3. Programming
4. Machine Learning
5. Text Mining / Natural
Language Processing
6. Data Visualization
7. Big Data
8. Data Ingestion
9. Data Munging
10. Toolbox

Source: Swami Chandrasekaran


http://nirvacana.com/thoughts/becoming-a-data-scientist/

17 18
WHERE is Data Science Used?

Data Science in Fashion https://www.kdnuggets.com/2018/03/data-science-fashion.html 19 20

21 22

WHEN is Data Science Applied? HOW Data Science Works?


Data science is a multi-step process and each step in this process requires a diverse set of skills and technologies.
Some examples of the outcome of data science i.e. the data products:
✓ Friend Recommendations on Facebook
✓ Music Recommendations on Spotify
✓ Product Recommendations on Amazon
✓ Dynamic Learning and Customized Assessments at Knewton Academy
✓ Trading Algorithms, Models and Credit Ratings in Finance.
✓ New government policies based on data.
✓ Predicting Flu Trends in Health
✓ Targeted Advertising

Data Science Workflow


SOURCE: http://www.analyticsvidhya.com/blog/2015/09/applications-data-science/ 23 24
Data Science Methodology Foundational Methodology for Data Science
• Problem Formulation – First, identify the problem to be solved. This step is easily
overlooked. However, many dollars and hours have been spent solving the wrong
problems. • A methodology is a general
strategy that guides the processes
• Obtain The Data – Next, collect new data and/or gather the data that already and activities within a given
exists. In almost all cases, this data will need to be transformed and cleansed. It is domain.
important to note that this stage does not always involve big data or a data lake. • Methodology does not depend on
particular technologies or tools,
• Analysis – This is the part of the process where insight is to be extracted from the nor is it a set of techniques or
data. Commonly, this step will involve creating and optimizing statistical/machine recipes.
learning models for prediction, but that is not always necessary. Sometimes, the • Rather, a methodology provides
the data scientist with a
analysis only contains graphs, charts, and basic descriptions of the data. framework for how to proceed
• Data Product – The end goal of data science is a data product. The insight from with whatever methods,
processes and heuristics will be
the Analysis phase needs to be conveyed to an end user. The data product might
used to obtain answers or results.
be as simple as a slideshow; more commonly it is a website dashboard, a
message, an alert, or a recommendation.
25 26

Don’t ignore domain knowledge. Don’t Start with the Data!


Do consult a subject matter expert Do Start with a Good Question.

http://101.datascience.community/2016/04/25/dos-and-donts-of-data-science/ 27 28

Don’t brag about the size of your data.


Do collect relevant data.

Don’t publish a table of numbers


Do create informative charts

Don’t use just your own data


Do enhance your analysis with open data
29 30
Don’t always build your own tools
Do use lots of open source tools
31 32

The Importance
of Visualization
Don’t think one person can do it all
Do build a well-rounded team.
Don’t only use one tool.
Do use the best tool for the job

Infographics - a great way to convey


information about data.

Don’t keep all your findings to yourself.


Do share your analysis and results with the world!
33 34

Resources Conclusion
• The Open-Source Data Science Masters - http://datasciencemasters.org/ • Data science is a growth area. (The number of
data scientists has doubled over the last 4
• Data Science Certificate - https://www.coursera.org/specializations/jhu-data-science years)
• Data Science Courses – bigdatauniversity.com • The future belongs to the companies and
people that turn data into products. (The
• Data Science Association – www.datascienceassn.org Information Technology and Services industry
employs the largest number of data
JOURNAL TITLE scientists.)
EJP Data Science • Top skills (The top five skills listed by data
scientists are: Data Analysis, R, Python, Data
CODATA Data Science Journal
Mining, and Machine Learning)
Journal of Data Science Online
• Education level (Over 79% of data scientists
Big Data Journal that list their education have earned a
graduate degree, and 38% have earned a
JDS (focuses heavily on the applications of data.) PhD.)

GigaScience (focuses on life and biomedical science)


Source: https://rjmetrics.com/resources/reports/the-state-of-data-science/
35 36
Next -Word Predictor App
• The goal of this project is to create a product that predicts the next word
(English-language) the user is likely to enter, based on a prediction
algorithm.
• This Shiny app is built entirely in R, and the predictions are made on the
dataset provided by Coursera.
• The dataset is from a corpus called HC Corpora
(www.corpora.heliohost.org).
• Data from 3 Text files: en_US.blogs.txt, en_US.news.txt, and
en_US.twitter.txt
• Next-word predictor can be accessed at:
https://salimah.shinyapps.io/FinalProject/

37 38

You might also like