0% found this document useful (0 votes)
165 views

Data Science Bootcamp Curriculum 2

This document outlines the 12-week curriculum for the NYC Data Science Academy bootcamp. The curriculum covers topics in data science tools and techniques using R and Python, including data analytics, machine learning, data visualization, and web scraping. In the first six weeks students learn data science fundamentals in R, including data manipulation, visualization, and machine learning algorithms. They also begin learning Python for data analytics and manipulation tasks.

Uploaded by

khiari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
165 views

Data Science Bootcamp Curriculum 2

This document outlines the 12-week curriculum for the NYC Data Science Academy bootcamp. The curriculum covers topics in data science tools and techniques using R and Python, including data analytics, machine learning, data visualization, and web scraping. In the first six weeks students learn data science fundamentals in R, including data manipulation, visualization, and machine learning algorithms. They also begin learning Python for data analytics and manipulation tasks.

Uploaded by

khiari
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

NYC

Data Science Academy


12-Week Data Science Bootcamp C urriculum

Week 1
Data Science Toolkit Linux, Git, Bash, and SQL
Data Science with R Data Analytics Part I
Linux system
o Introduce Linux environment
o Learn Linux commands
o IO redirection and Pipe
o Introduce server-side Linux usage
Git
o Introduce modern source code management
o Learn common git operations
o Setup github and personal portfolio page
Other server related topics
o Text editors and IDEs
o ssh: how to communicate with a remote server
o Linux environment variables
SQL
o Introduction to relational database
o Introduction to structured query language
o SQL major commands and examples
Programming foundation in R I
o Syntax
o Data object: Vectors, Matrices, Data Frames, and Lists
o Common functions
o Rstudio environment and package management
o Local data input/output
o Introduction to R data visualization
Programming foundation in R II
o Data sorting and merging
o String manipulation
o Dates and times
o Connecting to an external database
Data manipulation with dplyr
o Tables in R
o Join
o Subset
o Advanced manipulations with dplyr

Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Week 2
Data Science with R Data Analytics Part II
Data Visualization with "ggplot2"
o Histogram
o Point graphics
o Columnar graphics
o Line charts
o Pie charts
o Box plots
o Scatter plots
o Visualizing multivariate data
o Matrix-based visualizations
o Maps
Introduction to Shiny
o Shiny introduction
o Design the User-interface
o Control widgets
o Build reactive output
o Use data table in Shiny Apps
o Use R scripts, data and packages
o UI and server for the App
o Make Shiny perform quickly
o Matrix-based visualizations
o Use reactive expressions
o Share and deploy Shiny apps
Lab: Moneyball
Project 1 Due: Exploratory Data Visualization

Week 3
Data Science with R - Machine Learning Part I

Foundations of Statistics
o
Descriptive Statistics

Measures of Centrality

Measures of Variability

Frequency, Proportion & Contingency Tables

Correlation
o
Hypothesis Testing

One Sample t-test

Two Sample t-test

F-test

One-way ANOVA
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

X2 Test of Independence
o
Introduction to Machine Learning

Supervised Learning

Regression

Classification

Unsupervised Learning

Clustering

Dimension Reduction

Missingness & Imputation


o
Types of Missingness

MCAR

MAR

MNAR
o
Basic Methods of Imputation

Mean Value Imputation

Simple Random Imputation

Regression Prediction
o
K-Nearest Neighbors

Voronoi Tessellations

KNN for Classification

KNN for Regression

Distance Measures

Linear Regression I
o
Simple Linear Regression

From a Mathematical Standpoint

Accuracy of the Coefficient Estimates

Performing Hypothesis Tests

Constructing Confidence Intervals


o
Assumptions & Diagnostics
o
Transformations

Power Transformation

Box-Cox Transformation
o
The Coefficient of Determination R2

Linear Regression II
o
Multiple Linear Regression

From a Mathematical Standpoint


o
Assumptions & Diagnostics
o
Potential Problems
o
Research Questions
o
Variable Selection
o
Factors
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Interactions
Higher-Order Terms

o
o

Week 4
Data Science with R - Machine Learning Part II
Lab: Building Bridges
Generalized Linear Models
o
Logistic Regression
The Curse of Dimensionality
o
Ridge Regression
o
Lasso Regression
o
Cross-Validation
o
Bias/Variance Tradeoff
o
Density
o
Principal Component Analysis
The Curse of Dimensionality
o
Density
o
Principal Components Analysis
Guest Lecture
Project 2 Due: R Shiny Interactive Applications

Week 5
Data Science with Python - Data Analytics Part I
Data Science with R Machine Learning (Continued)
Python Programming Language I
o
Overview of syntax
o
Built-in functions
o
Data structures
o
Standard libraries
o
Object oriented programming
Python Programming Language II
o
List comprehension
o
Data copy
o
Introduction to algorithm concepts
String Processing / Regular Expressions
o
Regular expressions
o
Web scraping: Ajax, XPath, Beautiful Soup
o
Accessing APIs
Time Series Analysis
o
Smoothing
o
Seasonal Decomposition
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

ARIMA

Week 6
Data Science with Python Data Analytics Part II
Data Science with R - Machine Learning (Continued)
Numpy / Scipy
Matplotlib / Data Structures and Visualization in Pandas / Seaborn
Data Manipulation in Pandas
Lab: Oil Boilers
Cluster Analysis
o
K-Means Clustering
o
Agglomerative Clustering
o
Hierarchical Clustering
Project 3 Due: Python Web Scraping

Week 7
Data Science with R - Machine Learning (Continued)
Data Science with Python Data Analytics (Continued)
Classification
o
Feature Selection
o
Decision Trees
o
Pruning
o
Purity
o
Entropy
o
GINI
o
Random Forests
o
Bagging
o
Boosting
o
Support Vector Machines
o
Neural Networks
Lab: Simple Linear Regression from Scratch

Week 8
Data Science with R - Machine Learning (Continued)
Introduction to Natural Language Processing
Case Study: Spam Detection
Association Rules
o
Market Basket Analysis
Nave Bayes Analysis
Introduction to Natural Language Processing Part I
Introduction to Natural Language Processing Part II
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Guest Lecture


Week 9
Data Science with Python - Machine Learning
Machine Learning Recap / Linear Regression
Naive Bayes Classifiers / KNN / Logistic Regression / LDA
Cross-validation / Bootstrap / Feature Selection / Regularization / Model Selection
SVM / Decision Trees / Random Forest
Principal Components Analysis/ Kmeans / Hierarchical Clustering
Project 4 Due: Machine Learning Project (It can be a Kaggle competition, a hiring partner
project or a non-profit project from our partners)

Week 10
Big Data
Machine Learning Review
Parallel Processing: Parallel Computing in Python / Parallel Computing in R
Introduction to Hadoop:
o
Hadoop Ecosystem
o
Hadoop Data Flow
o
Introduction to the origin and functions of Hadoop
o
The principle operations of the Hadoop Distributed File System (HDFS)
Python for MapReduce:
o
The principle system and working mechanisms of MapReduce
o
MapReduce Programming
o
MapReduce with Streaming
Advanced Hadoop Applications: Hive
Spark
Machine Learning Theory Interview Questions Review Session

Week 11
Big Data (Continued)
Python Computer Science
Spark: MLlib
Introduction to Algorithms / Data Structures
Big-O notation
Sorting and Searching

Week 12
Capstone Project Presentations
From the beginning of Bootcamp, you will work on hands-on projects. Now your Capstone
Project lets you create your own data product that showcases your interests and talents.
Updated January 12, 2016

NYC Data Science Academy


12-Week Data Science Bootcamp C urriculum

Students are free to use anything covered in class on this project.


Updated January 12, 2016

You might also like