0% found this document useful (0 votes)
48 views7 pages

AWS - Machine Learning Essentials - Lynn Lagit

The document outlines the essentials of using AWS for machine learning, emphasizing the importance of dedicated accounts, understanding machine learning's role in analytics, and identifying when machine learning is necessary. It details various AWS services and tools, such as Amazon SageMaker, Rekognition, Lex, and Comprehend, and their applications in tasks like fraud detection, demand forecasting, and natural language processing. Additionally, it covers algorithm types, ideal usage patterns, and the workflow for building and deploying machine learning models on AWS.

Uploaded by

vakratunda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views7 pages

AWS - Machine Learning Essentials - Lynn Lagit

The document outlines the essentials of using AWS for machine learning, emphasizing the importance of dedicated accounts, understanding machine learning's role in analytics, and identifying when machine learning is necessary. It details various AWS services and tools, such as Amazon SageMaker, Rekognition, Lex, and Comprehend, and their applications in tasks like fraud detection, demand forecasting, and natural language processing. Additionally, it covers algorithm types, ideal usage patterns, and the workflow for building and deploying machine learning models on AWS.

Uploaded by

vakratunda
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

AWS - Machine Learning Essentials - Lynn Lagit

About using cloud services: 1) Dedicated accounts: Best to use dedicated AWS accounts for testing.
At a minimum, use a unique login for testing. 2) Free tier: Not all services are included. 3) Turn it
off: Use the billing dashboard.

What is Machine Learning? 1) An application of artificial intelligence (AI). Provides the ability to
automatically learn and improve from experience without being explicitly programmed. Uses
computer programs that can access data and use it to learn.

Machine learning is part of analytics:

More and more companies are starting to use machine learning as part of their solution. Don't buy
into the hype. Machine learning should not be your first go to for analytics solutions. It is a set of
tools for a specific set of problems

Data Science is about Domain expertise-> Data Analysis-> Data Processing-> Data Science->
Programming-> Machine Learning-> Statistics and Mathematics.

Types of Analytics

Business Predictive
Yes and no answer Likelihood answer
Reports Predictions
SQL query languages Machine learning languages and often labelled data sets

When is machine learning needed? 1) Start with the business problem. 2) Many people thing they
need to use ML but actually they don’t need it. 3) Aim to produce a MVR (Minimum viable report) 4)
Use the simplest possible technology. 5) Use clean, understood data.

Ideal Usage Patterns: 1) Enable application to flag suspicious transactions: Build an ML model that
predicts where a new transaction is legit or fraudulent. 2) Forecast product demand: Input historical
order info to predict future order quantities. 3) Personalize application content: Predict which items
a user will be most interested in and get these items in real time. 4) Predict user activity: Analyze
user behaviour to customize your website and provide a better user experience. 5) Listen to social
media: Ingest and analyse social media feeds that potentially impact business decisions.

Which algorithms to use? Types:

Supervised: Labeled Data Unstructured


Classification – Predicts membership in a class Clustering – Groups items together, such as K-
or group, such as logistic regression. means
Regression – predicts a value, such as linear
regression
Products with the likelihood percentage of Does this new thing belong to a group? Which
belonging to an existing group or predict a group? What percentage of likelihood?
value
Algorithm plus trained or labelled data required Algorithm only and no training data needed

[Link] is a good site to get more details on algorithms cheat-sheet.


Not all problems are solvable with Machine Learning.

What is Deep Learning?

Artificial Intelligence Solutions for every developer

AI Services AWS Rekognition Amazon Polly Amazon Lex


Image Recognition Text to Speech Voice and text chatbots
AI Platforms AWS ML AWS EMR Spark & SparkML
AI Frameworks AWS Deep Learning AMI
Apache MxNet Caffe, Torch, Theano CNTK, Keras
TensorFlow
AI Infrastructure AWS EC2 P2 and G2 AWS EC2 CPUs AWS Lambda
GPUs Enhanced Networking IOT, GreenGrass

Which Analytics to Use. 1) Stream: Milliseconds/Seconds - > Fraud Alerts . 2) Interactive: Seconds ->
Self service dashboards. 3) Batch: Minutes/hours -> Daily/weekly/monthly reports.

Batch-> Amazon EMR(MapReduct, Hive, Pig, Spark). Interactive-> Amazon RedShift, Athena, EMR
(Presto, Spark). Stream-> Amazon EMR (Spark Streaming), Kinesis Analytics, KCL, Lambda. Predictive
-> Amazon AI (Lex, Polly, ML, Rekognition), EMR (Spark ML)

C++, Python are most supported for Machine learning activities. Python is preferred. Tensorflow
being Google is GO language supported. For MxNet it is Scala and not Java.

What is an Amazon ML API? 1) Easy to use machine learning. 2) Algorithms plus AWS’s labelled data
models. 3) You supply input data and retrieve predictions.

Amazon ML API Example: 1) Input (Social media, posts, emails, web pages, documents and phone
transcription. 2) Comprehend an AWS ML API-> Automatically extract key phrases, entities,
sentiment, language and topics. 3) Output (Extracted data and topics with confidence scores.

There are not enough people available to model ML from scratch so AWS Rekognition, Lex,
Comprehend, Translate, Polly will help a lot.

They're broadly categorized in the Amazon ecosystem in three categories: vision, conversation, and
language. So in vision, it's called recognition and recognition of both image or video based content.
And note that it's categorized using deep learning. So it's an implementation of deep learning or
neuro-nets that is much simpler to use than a custom model, which we will also cover later in this
course, but is a good entry point to see what's possible. In the area of conversational chatbots, we
have Amazon Lex, which is designed to build chatbots to engage customers. This is also the
technology behind the very popular Amazon Alexa device. We have a set of language services.
Amazon Comprehend, insights and relationships in text. Translate, fluent translation of text.
Transcribe, automatic speech recognition. And Polly, natural sounding text to speech.

Working with AL AI is: 1) Test out in the AWS Console. 2) Script with the aws-cli. 3) Incorporate into
applications using the AWS SDK. 4) Process data stored in AWS S3 – requires an IAM user.

Comprehend helps us analyse Entity, Sentiment (based on Weather), Key phrases, Languages.

It also does help on topic modelling.


Predict using AWS Comprehend for NLP:

Input (Support communications are generated across email, chat, social and phone calls) -> S3 (Test
data and call transcriptions are stored in an AWS S3 data lake -> Comprehend (Processes the text to
extract key phrases, entities and sentiment for further analysis). -> Redshift ( Extracted data is
analysed to identify what actions lead to the most positive customer outcomes). -> Output (Insights
are applied to customer service training to improve the rate of positive outcomes in less time)

Amazon Comprehend for NLP: 1) Natural Language Processing (NLP) – entities, key phrases,
language, sentiment. 2) Does “topic modelling”. 3) English or Spanish.

Amazon Polly for Text to Speech: 1) Supports many languages. 2) Supports many voices. 3) Can
customize with a lexicon. 4) Works with Speech Synthesis Markup Language (SSML)

Poly Use Case: Talking IoT-> IoT Sensors -> AWS IoT + Lambda (AWS IoT receives signal and calls
lambda function to get a pre-signed URL to Polly with text of notification, “Sean’s room is getting
cold” -> IoT Loudspeaker (This receives pre-signed URL and uses it to request speech audio from
polly. -> Amazon Polly (Polly receives and streams speech audio back to be played on the
loudspeaker).

Amazon Lex for Conversation: 1) Used by Amazon Alexa. 2) Used to create conversational interfaces
(Chat bots). 3) Uses ML for both voice and text.

Amazon Rekognition for images: 1) Can detect objects and scenes. 2) Can be used for “image
moderation”. 3) Can detect facial information and compare faces, including celebrity images. 4) Can
detect text in image.

Amazon Rekognition for Video: 1) Can detect objects and activities. 2) Can be used with “moderated
lables”.

Rekognition video use case: Celebrities-> 1) Input (Users upload new video footage via a web
application or mobile app). 2) AWS S3 (Video content is stored securely in AWS S3. 3) Lambda (It is
function triggered to initiate processing by recognition when a new video is uploaded. 4) Rekognition
Video -> Analyzes the footage and returns the names, time stamps, and IMDB links for all detected
celebrities. 5) Output -> Metadata is indexed in elastisearch and output is persisted in DynamoDB
for use in programming or news.

Amazon Transcribe: 1) Automatic voice recognition into text output. 2) Can be used to timestamp
audio files, such as .mp3 and .wav. 3) In private beta.

AWS Translate: 1) Natural language translation. 2) In private beta. 3) Languages -Arabic, Chinese,
French, German, Portuguese, Spanish 4) Designed to integrate with other ML APIs – Polly (Speech
Output), S3 (Document repository translation), Comprehend (extracts entities)

Expanding platform service options: 1) Serverless or containers: Supply your own training data and
AWS picks the model – Amazon ML. 2) Supply data and pick (or Create) model – SageMaker or ECS.
3) Managed virtual servers: Managed Spark/SparkML-EMR, AWS Batch – Managed EC2 spot (fleets),
Vendor Solutions – Spark on Databricks.

Understanding AWS ML: 1) Includes two prebuilt machine learning models. 2) Suggests which
algorithm to use based on your input data. 3) Is integrated with S3.
Amazon ML makes it easy for developers of all skill levels to use machine learning technology. It is a
managed service for building ML models and generating predictions that enable the development of
robust, scalable smart applications.

The normal setup contains Input data-> Shema-> Target-> Review. Example of banking data to check
if person has taken loan in past, Credit history, and whether to give next loan.

Machine Learning model workflow: 1) Generate example data (Fetch, Clean, Prepare). 2) Train the
model (Train model, Evaluate model). 3) Deploy the model (Deploy to production, Monitor/Collect
data/ evaluate)

Scalable ML: Amazon SageMaker: 1) Build (Connect to other AWS services and transform data in
SageMaker notebooks). 2) Train (Use SageMaker’s algorithms and frameworks, or bring your own,
for distributed training). 3) Tune ( SageMaker automatically tunes your model by adjusting multiple
combinations of algorithms parameters). 4) Deploy (Once training is completed. Models can be
deployed to SageMaker endpoints, for real-time predictions).

SageMaker Concepts:

Function SageMaker Object


Build experiment User interface Jupyter Notebook
Train Model SageMaker Job
Tune/Host Model SageMaker Model
Deploy Model SageMaker Endpoints

Jupyter Notebooks: 1) Apache Jupyter – Open Standard. 2) Documents data science experiments. 3)
Text, code, and visualizations.

SageMaker with Jupyter Notebooks: 1) Launches an ML compute instance and network interface. 2)
Installs anaconda packages and libraries. 3) Attaches an ML 5 GB storage volume. 4) Copies example
Jupyter notebooks.

Notebook Settings -> Instance name, type, IAM role, VPC Optional, Encryption key. We have various
libraries, programs available for immediate execution on data sets either in production or from
various standard website to test results. The libraries available are Sparkmagic in PySpark, PySpark3,
Spark, SpartR. Conda_MxNet_P27,36 Python2, Python3, TensorFlowP27,P33. There are also
various notebooks with built in algorithms like Supervised, Unsupervised (for e.g. K-means, DT,
Random Forest, Bayes theorm, clustering, classification etc.. etc..) This helps to try few algorithms
with data sets before decided which is a best one for our use.

We can load these libraries and work on interactive lines to get/test output. Later on we can save all
these work. This help test and try kind of model to write robust programs.

Amazon SageMakter in Nutshell: 1) Notebook Instance (Explore AWS data in your notebooks, and
use algorithms to create models via training jobs -> Create notebook instance). 2) Jobs (Track
training jobs at your desk or remotely. Leverage high performance AWS algorithms. -> View Jobs). 3)
Models (Create models for hosting from job outputs, or import externally trained models into AWS
SageMaker - > View Models). 4) EndPoint (Deploy endpoints for developers to use in production.
A/B test model variants via endpoint.

Various data sets tested against algorithms gives us an idea at the end with SageMaker if the
algorithm is right for the data sets.
Scikit-learn is one of good website for experimenting with various algorithms on classification,
regression, Clustering ([Link]/stable)

Algorithm types supported by SageMaker: 1) Built-in: Ten common ML algorithms, optimized to run
on a SageMaker environment. 2) Apache Spark: Library for Spark or Spark ML is included in
SageMaker. 3) Deep Learning frameworks: Python library for TensorFlow\MxNet is included in
SageMaker. 4) Custom: Using docker images, you can create, package, and use your own algorithms.

Selecting Algorithms for Models: 1) Answers fit into discrete categories – Y/N or C1/C2… 2) Answers
that a quantitative – Which one or how much? 3) Insight with no labelled data – Any data attribute
groupings? 4) Specialty algorithms – image, video or text data.

Algorithms for Discrete Categories

Algorithm Function
K-Means Unsupervised clustering
Liner learning Supervised classification
XGBoost Supervised classification using gradient-boosted decision trees.

Algorithms for quantitative values

Algorithm Function
Linear Learning Supervised Classification
XGBoost Supervised regression (Binary/Multiclass) and using gradient-boosted
decision trees.

Algorithms for text processing.

Algorithm Function
Neural topic model Unsupervised classification for topics, for example, large documents.
(NTM)
Latent Dirichlet Allocation Unsupervised clustering – Used for topic modelling in text
(LDA)
Sequence-to sequence Supervised token1 to token2, for example, machine translation
(seq2seq) modelling
Blazing text (Word2Vec) Learns vector representations of word collections

Specialty SageMaker Algorithms

Algorithm Function
Image Classification Supervised classifier for images
DeepAR forecasting Supervised forecasting of scaler time series via training over related
sets of time series
Principal component Unsupervised dimensionality reduction
analysis (PCA)
Factorization machines Supervised clustering or regression for high dimensional sparse data
sets, for example item recommendation
Custom (Use your own Your own, or popular deep learning -> MxNet and TensorFlow
algorithm)
Advanced use of SageMaker: 1) Deploying multiple variants of a model to the same endpoint for
A/B testing. 2) Updating the endpoint by providing an updated endpoint configuration to change ML
computer instance type, or distributing traffic among model variants. 3) Logging endpoint access
with CloudTrail. 4) Using only needed components.

Read More or Videos (YouTube) for Jupyter notebooks.

Researchers worldwide use notebook (Jupyter) as service for few hours/days/weeks on data they
have. Once they get results they just destroy instances. It is very cost effective way of learning.

A/B testing (also known as bucket tests or split-run testing) is a randomized experiment with two
variants, A and B. It includes application of statistical hypothesis testing or "two-sample hypothesis
testing" as used in the field of statistics. A/B testing is a way to compare two versions of a single
variable, typically by testing a subject's response to variant A against variant B, and determining
which of the two variants is more effective.

Should I use a server/virtual machine for ML? :

Type Example Detail


SaaS Databricks on AWS Notebooks, Spark and libraries
PaaS Amazon EMR Spark and libraries. Can work with SageMaker notebooks
IaaS EC2 ML AMI, EC2 AWS optimized for deep learning. Can manually install and
configure all language runtimes and ML libraries.

In General when you work in SageMaker you write script in top dialog box and then download
training and test data in next box, Upload the data, Implement the training function, Run the training
script on SageMaker, Clean up the after script run work area.

Deep learning techniques are facilitated through MxNet and Gluon interface. This is possible using
Apache Jupyter notebooks. Gluon supports many classification, clustering, Time series algorithms
for testing against the data sets.

Using Databricks with AWS: Databricks offers Spark as a service, or Apache Spark. And the reason
for this is that most of the committers to the open source Apache Spark set of libraries actually work
at Databricks which is a commercial company. And they provide optimized versions of server clusters
as a service for their commercial offering. So, you can select Databricks on either, now AWS or Azure

1) Databricks sample notebooks. 2) Range of operations (including ML). 3) Includes many types of
ML. 4) Free databricks community edition. It provides range of algorithms scripts to test data sets.

[Link]/

Setup the AWS Deep learning AMIs: 1) Conda AMI: Preinstalled pip packages for deep learning. 2)
Base AMI: Clean base image. 3) AMI with source code: Packages and source code.

Amazon EMR for ML: 1) Managed Hadoop server clusters. 2) Can install many libraries easily. 3) Can
run bootstrap tasks to further customize. 4) Can utilize instance/fleets to optimize scaling.

I've purposely focused very heavily on Jupyter notebooks, as opposed to using the traditional
terminal, and just running batch jobs or scripts, because I really do think that they reflect the future
for machine learning workloads. It works well with high processing data load through EMR. AWS
EMR provides good wizard integrated with Jupyter notebooks to learn high processing loads using
various ML algorithms.
ML APIs for Conversational APIs.: 1) Select feature -> Send request to Lex to start conversation ->
Lex will use Lambda functions to get info from services provider or S3 storage -> Update UI or system
features. Output info through Polly.

Best Practices for ML Algorithms: 1) Use the right algorithm for the job. 2) Take advantage of AWS
algorithm optimization. 3) Consider using ML APIs first, such as Lex or Polly. 4) Consider using
SageMaker or Amazon ML second. 5) Use deep learning or custom algorithms only when business is
justified – test with the Amazon ML AMIs. Use the simplest possible thing required. Do not
complicate matter with hi-fi algorithm usage.

Machine Learning Architectures: Except all 1-5 above Use ML virtual servers (DataBricks, EMR) for
huge complex workloads to take advantage of GPUs.

Make use of spot instances as much as possible while working with EMR. Your billing can be down
to 60-90% reduced. Using reserved instance should be your last option looking at the massive
infrastructure needed and skills/expertise to manage it.

You might also like