Silabus CDSS
Silabus CDSS
Learning Outcomes
Upon completion of this course, you will be able to:
Identify the appropriate model for different data types Differentiate key data ETL process, from cleaning, processing to visualization.
Create your own data process and analysis workflow Implement algorithms to extract information from dataset.
Define and explain the key concepts and models relevant to data science. Apply best practices in data science, and become familiar with standard tools.
Course Outline
Day 1 Day 2
Introduction to Data Science Data Science Workflow Data Science Prerequisites Beginning Databases
Life of a data scientist Data Gathering Structured Query Language (SQL) Introduction to Python
• What is a Data Scientist? • Obtain data from online • Performing CRUD • Basics of Python language
• Data Scientist Roles repositories (Create, Retrieve,Update, Delete) • Functions and packages
• What does a Data Scientist Look Like? • Import data from local • Designing a Real world database • Python lists
• T-Shaped Skillset file formats (json, xml) • Normalizing a table • Functional programming
• Data Scientist Roadmap • Import data using Web API • Knowledge Check Lab Activity in Python
• Data Scientist Education Framework • Scrape website for data • Numpy and Scipy
• Thinking like a Data Scientist • Knowledge check • iPython
• Knowns and Unknowns • Knowledge check
• Demand and Opportunity • Lab Activity
• Labor Market • Lab: Exploring data using
• Applications of Data Science Python
• Data Science Principles
• Data-Driven Organization
• Developing Data Products
• Knowledge Check
Day 3 Day 5
Data Preparation and Cleansing Introduction to R Data Visualization Big Data Landscape
• Extract, Transform and Load (ETL) • Packages for data import, • Choosing the right visualization • What is small data?
- Pentaho, Talend, etc wrangling, and visualization • Plotting data using Python libraries • What is big data?
• Data Cleansing with OpenRefine • Conditionals and Control Flow • Plotting data using R • Big data analytics vs Data Science
• Aggregation, Filtering, Sorting, Joining • Loops and Functions • Using Jupyter Notebook • Key elements in Big Data (3Vs)
• Knowledge Check Lab Activity • Knowledge check to validate scripts • Extracting values from big data
• Lab activity • Knowledge check • Challenges in Big data
• Lab: Exploring data using R • Lab activity
Exploratory Data Analysis (Descriptive) Data Quality Data Analysis Presentation Big data Tools and Applications
• What is EDA? • Raw vs Tidy Data • Using Markdown language • Introducing Hadoop Ecosystem
• Goals of EDA • Key Features of Data Quality • Convert your data into slides • Cloudera vs Hortonworks
• The role of graphics • Maintenance of Data Quality • Data presentation techniques • Real world big data applications
• Handling outliers • Data Profiling • The pitfall of data analysis • Knowledge check
• Dimension reduction • Data Completeness and • Knowledge check • Group discussion
Consistency • Lab activity
• Group presentation Lab:
Day 4 Mini Project