Data Science Generating Value From Data Course Slides Red
Data Science Generating Value From Data Course Slides Red
Data Science:
Getting Value out of Data
Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
By the end of this video, you will be able:
Python for Data Science
Big Data
Insight
Data Science
Action
Insight Data Product
Python for Data Science
Analysis
Data Insight
Question
Insight Data Product
Python for Data Science
Analysis
Data Insight
Question
Book Recommendations
Customer
Python for Data Science
Book reviews
Find Potential Audience for a Book
Python for Data Science
Model of customer’s
book preferences
Who is likely to like
this book?
New book
information
Market a New Book
Python for Data Science
Insight Action
Actionable Information
Python for Data Science
Prediction
Python for Data Science
Action
Prediction
Python for Data Science
Smart Manufacturing Personalized Precision Medicine Smart Grid and Energy Management
How Much Data Is Big Data?
Python for Data Science
Image Source: http://www.marketwatch.com/story/one-chart-
shows-everything-that-happens-on-the-internet-in-just-one-
Python for Data Science
minute-2016-04-26
Every minute…
Python for Data Science
200,000 photos
1.8 Million likes
• 30 TB of data annually
• MODIS: modis.gsfc.nasa.gov
• 219 TB of data annually
• Precision Medicine
• 4 EB (1018 bytes) of data in 2016
(www.fastcompany.com)
• LIGO, Deep Space Network, Protein Data
Bank, …
100 MBs ~= couple of
volumes of Encyclopedias
Python for Data Science
A DVD ~= 5 GBs
1 TB ~= 300 hours of
good quality video
– John Naisbitt
Source: Megatrends, 1982
• Programming in Python
• Statistics
• Machine Learning
• Scalable Big Data Analysis
Data Science
The sum is bigger than the
Python for Data Science
parts!
unicorns?
Data science is
Python for Data Science
team sport!
Data scientists…
Python for Data Science
http://www.kdnuggets.com/2017/01/most-popular-language-machine-learning-data-science.html
Why Python for Data Science?
Python for Data Science
•Jupyter notebooks
•NumPy
•Pandas
•Matplotlib
•Scikit-Learn
•BeautifulSoup
Python for Data Science
Case Study:
Soccer Data Analysis
Dr. Ilkay Altintas and Dr. Leo Porter
Twitter: #UCSDpython4DS
By the end of this video, you will be able:
Python for Data Science
• Talk about the “Big Picture” of data science through a soccer case
study
• Generate statistics about a soccer data set
• Summarize how data cleaning and correlations were applied to an
existing dataset
• Recite the data visualization techniques employed in this study
• Explain how clustering similar groups and plotting these clusters
helped the case study
• Recall what was used to drawing conclusion based on data analysis
Week 1 Case Study: Soccer Data Analysis
Python for Data Science
Insight
Action
Data
Ask yourself:
“What insights do I expect to get!”
ACTIONS
• Coach can design programs that improve these
areas in teams
Basic Steps in a Data Science Project
Python for Data Science
• Databases
• Relational
• Non-relational (NoSQL)
• Text files
• CSV files
• Text files
• Live feeds
• Sensors
• Online Platforms
• Twitter
• Live feeds of weather observations
Data Ingestion to Analytics Platform
Python for Data Science
Data Preparation: Explore using Statistics
Python for Data Science
Data Cleaning
Python for Data Science
• Supervised Learning
• Unsupervised Learning
• Semi supervised Learning
http://scikit-learn.org
Soccer Data Analysis: Feature Selection
Python for Data Science
http://scikit-learn.org/stable/modules/clustering.html
K-Means clustering in Python
Python for Data Science
Y = KMeans(n_clusters=3,random_state=random_state).fit_predict(X)
…
How to choose the right algorithm?
Python for Data Science
Cluster #
Attribute Value (Scaled)
Player Attributes
Summary ACQUIRE
PREPARE
Python for Data Science
ANALYZE
REPORT
ACT
INSIGHTS
• Better understanding and insights on
• player strengths
• enhancing performance
• critical attributes for a player’s performance
ACTIONS
• Coach can design programs that improve these
areas in teams