Data Science Essentials for Beginners
Data Science Essentials for Beginners
VIDEO
• "Data Science: Where are We Going?" - Dr. DJ Patil (12:59 minutes)
https://www.youtube.com/watch?v=3_1reLdh5xw
3 4
Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the
7 of x items purchased. Each line is called a transaction and each column in a row represents an item. 8
Data + Science
• DATA - Facts and statistics collected together for reference or analysis. Watch the video on
“What is Data?” https://www.youtube.com/watch?v=EMHP-q4GEDc
“Data –Information – Knowledge” https://www.youtube.com/watch?v=QsP5WGv0aQc
WHY Data Science? Data Science and Its Relationship to Big Data and
Data-Driven Decision Making
• The diagram shows data science supporting data-driven
decision making, but also overlapping with it.
• This highlights the fact that, increasingly, business decisions are being
Latest Trend – Google Trends
made automatically by computer systems.
• Data engineering and processing are critical to support data-science
activities, but they are more general and are useful for much more.
• Data-processing technologies are important for many business tasks
that do not involve extracting knowledge or data-driven decision
Emerged as hottest new professions and academic disciplines.
Demand is racing ahead of supply.
making, such as efficient transaction processing, modern web system
processing, online advertising campaign management, and others.
• Big data - datasets that are too large for traditional data-processing
Keyword search.
systems and that therefore require new technologies.
The Science of Data Science: ➢ Descriptive analysis - describe set of data, interpret what you see (census, Google Ngram).
Analyze and understand data that’s available. ➢ Exploratory analysis - discovering connections (correlation does not = causation).
Find and acquire what more is needed. ➢ Inferential analysis - use data conclusions from smaller population for the broader group.
Discovering what we don’t know from data. ➢ Predictive analysis - use data on one object to predict values for another (if X predicts Y,
Obtaining predictive, actionable insight from data does not = X cause Y).
Creating data products that have business impact now ➢ Causal analysis - how does changing one variable affect another, using randomized studies,
Communicating relevant business stories from data. Strong assumptions, golden standard for statistical analysis.
Building confidence in decisions that drive business value. ➢ Mechanistic analysis - understand exact changes in variables in other variables, modeled by
empirical equations (engineering/physics).
Code of conduct for Data Science -
http://www.datascienceassn.org/code-of-conduct.html Source: Jeffery Leek https://github.com/jtleek/dataanalysis/blob/master/week1/007typesOfQuestions/index.md
A Very Short History Of Data Science - http://www.forbes.com/sites/gilpress/2013/05/28/a-very-short-history-of-data-science/#2fe9538a69fd Dataset Explorer https://rpubs.com/Salimah/143370 https://salimahm.shinyapps.io/DatasetExplorer/
13 14
17 18
WHERE is Data Science Used?
21 22
http://101.datascience.community/2016/04/25/dos-and-donts-of-data-science/ 27 28
The Importance
of Visualization
Don’t think one person can do it all
Do build a well-rounded team.
Don’t only use one tool.
Do use the best tool for the job
Resources Conclusion
• The Open-Source Data Science Masters - http://datasciencemasters.org/ • Data science is a growth area. (The number of
data scientists has doubled over the last 4
• Data Science Certificate - https://www.coursera.org/specializations/jhu-data-science years)
• Data Science Courses – bigdatauniversity.com • The future belongs to the companies and
people that turn data into products. (The
• Data Science Association – www.datascienceassn.org Information Technology and Services industry
employs the largest number of data
JOURNAL TITLE scientists.)
EJP Data Science • Top skills (The top five skills listed by data
scientists are: Data Analysis, R, Python, Data
CODATA Data Science Journal
Mining, and Machine Learning)
Journal of Data Science Online
• Education level (Over 79% of data scientists
Big Data Journal that list their education have earned a
graduate degree, and 38% have earned a
JDS (focuses heavily on the applications of data.) PhD.)
37 38