Welcome!
Machine Learning Decal
Hosted by Machine Learning at Berkeley
1
Overview
Agenda
Who are we?
What is Machine Learning?
Class Logistics
General Overview and Context
Machine Learning Pipeline
Python/Numpy
Scikit-Learn
Questions
2
Who are we?
Machine Learning @ Berkeley Education Team
3
What is Machine Learning?
Age Old Question
4
Can AI compose music?
5
Can AI paint a canvas?
6
Can AI paint a canvas?
7
Dancing!
8
FAKE NEWS!
9
Pose tracking!
10
Superhuman reasoning
11
Selfdriving Cars
12
Our view of intelligence
11
What intelligence is actually like
12
Class Logistics
Goal
Goals:
• Understand the major concepts in machine learning
• Understand the tradeo↵s between di↵erent approaches (what
do I use when and why?)
• Gain familiarity with code to solve ML problems
How we accomplish this:
• Lecture (2 hrs/week) - theoretical intro to various techniques,
along with demos
• Homeworks (3-6 hrs/week) - practice implementing di↵erent
models on di↵erent datasets
13
Enrollment
• Some of you here are in the class, others on the waitlist
• Must show up to first lecture if you wish to be enrolled
• We will give out course enrollment codes tonight - first to all
enrolled students who showed up to this class, then waitlist
students who showed up
14
Logistical
Join Piazza: [Link]/berkeley/fall2018/cs198082
All communication will be through Piazza.
Join Gradescope:
Clone the github:
[Link]
• All homeworks, lecture slides, and lecture recordings will be
posted here
• Also check out the calendar which has all important dates
(homework deadlines, office hours, etc.) for the class
15
Attendance
• Attendance is mandatory and will be taken at every lecture
• You may miss up to 3 classes [not including first class]
• After your 3rd missed day of class, you will automatically be
assigned a no pass
16
Grades
• 70 % average on all homework assignments
• Submit final project
• Automatic ”no pass” for insufficient attendance
17
Assignments
• 7 Homework Assignments
• Work in groups up to 3
• 1-2 week timeline
• Final Project/Hackathon
• Can either submit a final project on own time or can attend
and submit a project through the decal hackathon at the end
of the semester
• All the instructors will be there to answer questions and guide
you through projects
18
Late Assignments
• Each assignment will have 1 week late turn in period
• Small penalty for turning in late
• No submissions allowed after late turn in period over!
Homework 1 is out NOW, due next Tuesday (in one week)!
19
Office Hours
Date/Time/Location of office hours for the next two weeks will be
posted on Piazza and on the calendar
20
Got Questions?
During lecture:
• Raise your hand!
• Ask on the lecture questions thread on Piazza!
Outside lecture / about anything else:
• Please don’t email
• PIAZZA IS YOUR FRIEND!!!
21
Course Content
Syllabus
22
General Overview and Context
3 Di↵erent Classes of Machine Learning Problems
23
Supervised Learning (Classification) Ex: ImageNet (10M+ im-
ages), 1000 classes
24
Supervised Learning (Regression) Ex: Imaging Genetics and
Genome-Wide Association Studies
25
Unsupervised Learning (Clustering)
26
Reinforcement Learning
27
Machine Learning Pipeline
Typical Pipeline
• Acquire the data
• Prepare/Visualize Data
• Choose a Model
• Train a Model on the Training Set
• Evaluate Model Performance
• Tune Model Hyperparameters
• Prediction!
28
STEP ONE: Acquire the Data
29
STEP TWO: Prepare and Visualize Data
30
Importance of Feature Selection and Data Preprocessing
in ML problems
• Data Preprocessing and Feature Selection must be performed
before proceeding to actual data mining and ML analyses.
31
STEPs THREE and FOUR: Choose A Model (3)
and Train It! (4)
32
Dealing with Datasets in Machine Learning
33
STEP FIVE: Evaluate the Models Performance in the
Validation Step (K-fold Cross-Validation)
34
STEP SIX: Tune Model Hyperparameters Via
K-Fold Cross-Validation
• What happens if our cross-validation accuracy is rather low,
and we would like to improve it?
• Choose di↵erent Hyperparameters to our model!
• While the ML model finds optimal values of the parameters to
minimize some type of loss function, the ML practitioner gets
to choose values for the hyperparameters.
35
STEP SEVEN: Characterize Your Models Performance
on a Test Dataset and Predict using Your Model!
36
Python/Numpy
DEMO
37
Scikit-Learn
DEMO
38
Questions
Questions?
39