0% found this document useful (0 votes)

18 views63 pages

Intership Report Intership Report

Uploaded by

yokshilucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views63 pages

Intership Report Intership Report

Uploaded by

yokshilucky

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 63

lOMoARcPSD|50067771

intership report

Mechanical engineering (Gayatri Vidya Parishad College of Engineering)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by dileep b ([email protected])
lOMoARcPSD|50067771

AWS AI-ML VIRTUAL INTERNSHIP

Internship report submitted in partial fulfilment for the requirements
for the award of degree of

Bachelor of Technology
in
Mechanical Engineering

Submitted by
BANDARU SYMAN PAUL
(20131A0313)

Under the guidance of

Mr Y. Rajesh
Assistant Professor

Department of Mechanical Engineering

GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
(AUTONOMOUS)
Affiliated to J.N.T.U.K , Kakinada
VISAKHAPATNAM-530048
July, 2023

1|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

AWS AI-ML VIRTUAL INTERNSHIP

Internship report submitted in partial fulfilment for the requirements
for the award of degree of

Bachelor of Technology
in
Mechanical Engineering

Submitted by
BANDARU SYMAN PAUL
(20131A0313)

Under the guidance of

Mr Y. Rajesh
Assistant Professor

Department of Mechanical Engineering

GAYATRI VIDYA PARISHAD COLLEGE OF ENGINEERING
(AUTONOMOUS)
Affiliated to J.N.T.U.K , Kakinada
VISAKHAPATNAM-530048
July, 2023
2|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CERTIFICATE

This is to certify that the internship titled, “AWS AI-ML VIRTUAL INTERNSHIP” is
a bonafide record of the work done by

BANDARU SYMAN PAUL

(20131A0313)

in a partial fulfilment of the requirements for the award of the degree of

Bachelor of Technology in Mechanical Engineering of the Gayatri Vidya Parishad
College of Engineering (Autonomous) affiliated to Jawaharlal Nehru Technological
University, Kakinada and approved by AICTE during the year 2020-2024.

Supervisor Head of the Department

Mr Y. Rajesh Dr. B. GOVINDA RAO
Assistant Professor Professor
Department of Mechanical Engineering Department of Mechanical Engineering

3|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

VIRTUAL INTERNSHIP COMPLETION CERTIFICATE

4|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

ACKNOWLEDGEMENT

I would like to express my deepest sense of gratitude to our esteemed institute Gayatri
Vidya Parishad College of Engineering (Autonomous), which has provided us an
opportunity to fulfil our cherished desire.

It is with a sense of great respect and gratitude that I express my sincere thanks to
Mr Y. Rajesh Assistant Professor, Department of Mechanical Engineering, Gayatri
Vidya Parishad College of Engineering (A) for his inspiring guidance, supervision, and
encouragement towards the successful completion of my internship.

We thank our Course Coordinator Dr. Ch. Sita Kumari , Associate Professor,
Department of Computer Science and Engineering , Gayatri Vidya Parishad College
of Engineering (A) for her guidance and encouragement towards the successful
completion of my internship.

I take this opportunity to thank Dr. B. Govinda Rao, Professor and Head of the
Department of Mechanical Engineering, Gayatri Vidya Parishad College of
Engineering (A), for permitting me to pursue the internship.

I express my sincere thanks to Dr. A. Bala Koteswara Rao, Principal of Gayatri

Vidya Parishad College of Engineering (A) for granting permission for providing
all the necessary resources for completing this internship.

I wish to express my appreciation and heartfelt thanks to my parents who supported

me towards my goals and I would like to thank my friends, who helped and inspired
me in odd and even hours for the successful completion of internship.

Last but not least, I would like to convey special thanks to all those who helped
either directly or indirectly for the completion of internship.

BANDARU SYMAN PAUL

ROLL NO:20131A0313

5|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

ABSTRACT

This is a two-phase course comprising of CLOUD FOUNDATIONS and MACHINE

LEARNING.

AWS Academy Cloud Foundations is intended for students who seek an overall
understanding of cloud computing concepts, independent of specific technical roles. It
provides a detailed overview of cloud concepts, AWS core services, security,
architecture, pricing, and support. Machine learning is the use and development of
computer systems that can learn and adapt without following explicit instructions, by
using algorithms and statistical models to analyse and draw inferences from patterns in
data.
In this course, we will learn how to describe machine learning (ML), which includes
how to recognize how machine learning and deep learning are part of artificial
intelligence, it also describes artificial intelligence and machine learning terminology.
Through this we can identify how machine learning can be used to solve a business
problem. We can also learn how to describe the machine learning process in detail and
the list the tools available to data scientists to identify when to use machine learning
instead of traditional software development methods. Implementation of a machine
learning pipeline, which includes learning how to formulate a problem from a business
request, obtain and secure data for machine learning, use Amazon Sage Maker to build a
Jupyter notebook, outline the process for evaluating data, explanation of why data must
be pre-processed. Using the open-source tools to examine and pre- process data. We can
use Amazon Sage Maker to train and host a machine learning model.
It also includes in the use of cross validation to test the performance of a machine
learning model, use of hosted model for inference and creating an Amazon Sage Maker
hyperparameter tuning job to optimize a model’s effectiveness. Finally, we will learn
how to use managed Amazon ML services to solve specific machine learning problems
in forecasting, computer vision, and natural language processing.

6|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CONTENTS
COURSE: AWS CLOUD FOUNDATIONS

SNO MODULES INTRODUCTION PG NO

1. Cloud Concepts Overview
• Introduction to cloud computing
Module 1 • Advantages of cloud computing 9-10
• Introduction to Amazon Web Services
(AWS)
• AWS Cloud Adoption Framework
2. Cloud Economics and Billing
• Fundamentals of pricing
• Total Cost of Ownership
Module 2 • AWS Organizations 11-12
• AWS Billing and Cost Management
• Technical Support Demo
3. AWS Global Infrastructure Overview
Module 3 • AWS Global Infrastructure 13
• AWS Service overview
4. AWS Cloud Security
• AWS shared responsibility model
• AWS Identity and Access Management
(IAM)
Module 4 • Securing a new AWS account 14-15
• Securing accounts
• Securing data on AWS
• Working to ensure compliance
5. Networking and Content Delivery
• Networking basics
• Amazon Virtual Private Cloud (Amazon
VPC)
Module 5 • VPC networking 16-17
• VPC security
• Amazon Route 53
• Amazon CloudFront

7|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

6. Compute
Compute services overview
Module 6 Amazon EC2
Amazon EC2 cost optimization Container 18-19
• • services
• • Introduction to AWS Lambda
• Introduction to AWS Elastic Beanstalk

7. Storage
• Amazon Elastic Block Store (Amazon
Module 7 EBS)
• Amazon Simple Storage Service (Amazon 20-21
S3)
• Amazon Elastic File System (Amazon
EFS)
• Amazon Simple Storage Service Glacier
8. Databases
• Amazon Relational Database Service
(Amazon RDS)
Module 8 22
• Amazon DynamoDB
• Amazon Redshift
• Amazon Aurora
9. Cloud Architecture
• AWS Well-Architected Framework
Module 9 • Reliability and high availability 23-24
• AWS Trusted Advisor
10. Auto Scaling and Monitoring
• Elastic Load Balancing
Module 10 • Amazon CloudWatch 25-26
• Amazon EC2 Auto Scaling

8|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

COURSE: MACHINE LEARNING FOUNDATIONS

SNO MODULES TOPICS PGNO

1. Introducing Machine Learning
• What is machine learning?
• Business problems solved with machine
learning
Module 1 • Machine learning process
27-29
• Machine learning tools overview
• Machine learning challenges

2. Module 2 Implementing a Machine Learning Pipeline 30-38

with Amazon SageMaker
• Formulating machine learning problems
• Collecting and securing data
• Evaluating your data
• Feature Engineering
• Training
• Hosting and Using the Model
• Evaluating the accuracy of the model
• Hyperparameter and model tuning
3. Introducing Forecasting
• Forecasting Overview
Module 3 • Processing time series data 39-41
• Using Amazon Forecast
4. Introducing Computer Vision (CV)
• Introduction to computer vision
Module 4
• Image and video analysis 42-47
• Preparing custom datasets for computing
vision
5. Introducing Natural Language Processing
• Overview of natural language processing
Module 5
• Natural language processing managed 48-50
services

9|Page

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

7. Case Study
Case study on ML Architecture 55

8. Conclusion and References 56-57

10 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

COURSE: AWS CLOUD FOUNDATIONS

MODULE: 1 CLOUD CONCEPTS OVERVIEW

1. Introduction to Cloud Computing:

Cloud computing is the on-demand delivery of compute power, database,
storage, applications, and other IT resources via the internet with pay-as-you-go
pricing. These resources run on server computers that are in large data centres
in different locations around the world.
CLOUD SERVICES: There are three main cloud service models.
Infrastructure as a service (IaaS): IaaS is also known as Hardware as a
Service (Haas). It is a computing infrastructure managed over the internet. The
main advantage of using IaaS is that it helps users to avoid the cost and
complexity of purchasing and managing the physical servers.
Platform as a service (PaaS): PaaS cloud computing platform is created for the
programmer to develop, test, run, and manage the applications.
Software as a service (SaaS): SaaS is also known as "on-demand software”. It
is a software in which the applications are hosted by a cloud service provider.
Users can access these applications with the help of internet connection and web
browser.

11 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

2. Advantages of Cloud Computing:

1) Back-up and restore data
2) Improved collaboration
3) Excellent accessibility
4) iServices in the pay-per-use model

3. Introduction to Amazon Web Services:

Amazon Web Services (AWS) is a secure cloud platform that offers a broad set
of global cloud-based products. Because these products are delivered over the
internet, you have on-demand access to the compute, storage, network, database,
and other IT resources that you might need for your projects—and the tools to
manage them.

4. AWS Cloud Adoption Framework:

The AWS Cloud Adoption Framework (AWS CAF) provides guidance
and best practices to help organizations identify gaps in skills and processes. It
also helps organizations build a comprehensive approach to cloud computing—
both across the organization and throughout the IT lifecycle—to accelerate
successful cloud adoption.
At the highest level, the AWS CAF organizes guidance into six areas of focus,
called perspectives. Perspectives span people, processes, and technology. Each
perspective consists of a set of capabilities,which covers distinct responsibilities
that are owned or managed by functionally related stakeholders.

12 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 2 CLOUD ECONOMICS AND BILLING

1. Fundamentals of Pricing:
There are three fundamental drivers of cost with AWS: compute, storage,
and outbound data transfer. These characteristics vary somewhat, depending on
the AWS product and pricing model you choose.

There is no charge (with some exceptions) for:

• Inbound data transfer.
• Data transfer between services within the same AWS Region.
• Pay for what you use.
• Start and stop anytime.
• No long-term contracts are required.
• Some services are free, but the other AWS services that they provision
might not be free.

2. Total Cost of Ownership:

Total Cost of Ownership (TCO) is the financial estimate to help identify direct
and indirect costs of a system.
• To compare the costs of running an entire infrastructure environment or
specific workload on-premises versus on AWS
• To budget and build the business case for moving to the cloud

3. AWS Organisations:
AWS Organizations is a free account management service that enables you to
consolidate multiple AWS accounts into an organization that you create and
centrally manage.

13 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

AWS Organizations include consolidated billing and account management

capabilities that help you to better meet the budgetary, security, and compliance
needs of your business. The main benefits of AWS Organizations are:
• Centrally managed access policies across multiple AWS accounts.
• Controlled access to AWS services.
• Automated AWS account creation and management.
• Consolidated billing across multiple AWS accounts.

4. AWS Bill And Cost Management:

AWS Billing and Cost Management is the service that you use to pay your AWS
bill, monitor your usage, and budget your costs. Billing and Cost Management
enables you to forecast and obtain a better idea of what your costs and usage
might be in the future so that you can plan.
You can set a custom time period and determine whether you would like to
view your data at a monthly or daily level of granularity.

14 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 3 GLOBAL INFRASTRUCTURE OVERVIEW

1. AWS Global Infrastructure:

The AWS Global Infrastructure is designed and built to deliver a flexible,
reliable, scalable, and secure cloud computing environment with high-quality
global network performance.

AWS Global Infrastructure Map: https://aws.amazon.com/about-aws/global-

infrastructure/#AWS_Global_Infrastructure_MapChoose a circle on the map to
view summary information about the Region represented by the circle.

Regions and Availability Zones: https://aws.amazon.com/about-aws/global-

infrastructure/regions_az/Choose a tab to view a map of the selected geography
and a list of Regions, Edge locations, Local zones, and Regional Caches.

2. AWS Service Overview:

15 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 4 CLOUD SECURITY

1. AWS Shared Responsibility Model:

AWS responsibility:
Security of the cloud AWS responsibilities:
• Physical security of data centres i.e., Controlled, need-based access.
• Hardware and software infrastructure i.e., Storage decommissioning, host
operating system (OS) access logging, and auditing
• Network infrastructure i.e., Intrusion detection.
• Virtualization infrastructure.
• Instance isolation
Customer responsibility:
Security in the cloud Customer responsibilities:
• Amazon Elastic Compute Cloud (Amazon EC2) instance operating system
including patching, maintenance
• Applications: -Passwords, role-based access, etc.
• Security group configuration: - OS or host-based firewalls and including
intrusion detection or prevention systems

2. AWS Identity and Access Management (IAM): IAM is a no-cost AWS

account feature.
Use IAM to manage access to AWS resources–
• A resource is an entity in an AWS account that you can work with
• Example resources; An Amazon EC2 instance or an Amazon S3 bucket
Example–Control who can terminate Amazon EC2 instances
Define fine-grained access rights – •
Who can access the resource
• Which resources can be accessed and what can the user do to the resource
• How resources can be accessed.
3. Securing a New AWS Account:
AWS account root user access versus IAM access
Best practice: Do not use the AWS account root user except when necessary.
• Access to the account root user requires logging in with the email address
(and password) that you used to create the account.
Example actions that can only be done with the account root user:
• Update the account root user password

16 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• Change the AWS Support plan

• Restore an IAM user's permissions
• Change account settings (for example, contact information, allowed
Regions).
4: Securing Accounts:
Security features of AWS Organizations:
• Group AWS accounts into organizational units (OUs) and attach different
access policies to each OU.
• Integration and support for IAM: Permissions to a user are the intersection
of what is allowed by AWS Organizations and what is granted by IAM in
that account.
• Use service control policies to establish control over the AWS services and
API actions that each AWS account can access.

5. Securing Data on AWS:

Encryption encodes data with a secret key, which makes it unreadable.
• Only those who have the secret key can decode the data
• AWS KMS can manage your secret keys AWS supports encryption of
data at rest.
• Data at rest = Data stored physically (on disk or on tape)
• You can encrypt data stored in any service that is supported by AWS
KMS, including: Amazon S3, Amazon EBS, Amazon Elastic File System
(Amazon EFS), Amazon RDS managed databases.

17 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 5 NETWORKING AND CONTENT DELIVERY

1: Networking Basics:
Computer Network:
An interconnection of multiple devices, also known as hosts, that are connected
using multiple paths for the purpose of sending/receiving data or media.
Computer networks can also include multiple devices/mediums which help in
the communication between two different devices; these are known as Network
devices and include things such as routers, switches, hubs, and bridges.

2. Amazon Virtual Private Cloud (VPC):

Enables you to provision a logically isolated section of the AWS Cloud where
you can launch AWS resources in a virtual network that you define. Gives you
control over your virtual networking resources, including:
• Selection of IP address range
• Creation of subnets
• Configuration of route tables and network gateways
• Enables you to customize the network configuration for your VPC
• Enables you to use multiple layers of security 3. VPC Networking:
There are several VPC networking options, which include:
• Internet gateway: Connects your VPC to the internet
• NAT gateway: Enables instances in a private subnet to connect to the
internet
• VPC endpoint: Connects your VPC to supported AWS services
• VPC peering: Connects your VPC to other VPCs

18 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• VPC sharing: Allows multiple AWS accounts to create their application

resources into shared, centrally managed Amazon VPCs
• AWS Site-to-Site VPN: Connects your VPC to remote networks
• AWS Direct Connect: Connects your VPC to a remote network by using a
dedicated network connection
• AWS Transit Gateway: A hub-and-spoke connection alternative to VPC
peering.
4. VPC Security:
• Build security into your VPC architecture:
• Isolate subnets if possible.
• Choose the appropriate gateway device or VPN connection for your
needs.
• Use firewalls.
• Security groups and network ACLs are firewall options that you can use to
secure your VPC.
5. Amazon Router 53:
• Is a highly available and scalable Domain Name System (DNS) web
service.
• Is used to route end users to internet applications by translating names
(like www.example.com) into numeric IP addresses (like 192.0.2.1) that
computers use to connect to each other.
• Is fully compliant with IPv4 and IPv6.
6. Amazon Cloud Front:
• Fast, global, and secure CDN service.
• Global network of edge locations and regional edge caches.
• Self-service model.
• Pay-as-you-go pricing.

19 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 6 COMPUTE
1. Compute Services Overview:

2. Amazon EC2:
Amazon Elastic Compute Cloud (Amazon EC2):
• Provides virtual machines—referred to as EC2 instances—in the cloud.
• Gives you full control over the guest operating system (Windows or
Linux) on each instance.
• You can launch instances of any size into an Availability Zone anywhere
in the world.
• Launch instances from Amazon Machine Images (AMIs).
• Launch instances with a few clicks or a line of code, and they are ready in
minutes.
• You can control traffic to and from instances.
3. Amazon EC2 Cost Optimization:

20 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

4. Container Services: Containers are a method of operating system

virtualization. Benefits are:
• Repeatable.
• Self-contained environments.
• Software runs the same in different environments.
• Developer's laptop, test, production.
• Faster to launch and stop or terminate than virtual machines.
5. Introduction to AWS Lambda: It is a serverless computing service.
• It supports multiple programming languages.
• Completely automated administration.
• Built-in fault tolerance It supports the orchestration of multiple functions.
• Pay-per-use pricing.

21 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 7 STORAGE

1. Amazon Elastic Block Store (Amazon EBS):

Amazon EBS enables you to create individual storage volumes and attach them
to an Amazon EC2 instance:
• Amazon EBS offers block-level storage.
• Volumes are automatically replicated within its Availability Zone.
• It can be backed up automatically to Amazon S3 through snapshots.
Uses include –
• Boot volumes and storage for Amazon Elastic Compute Cloud (Amazon
EC2) instances.
• Data storage with a file system.
• Database hosts.
• Enterprise applications.
2. Amazon Simple Storage Service (Amazon EBS):
• Backup and storage –Provide data backup and storage services for others
• Application hosting –Provide services that deploy, install, and manage
web applications
• Media hosting –Build a redundant, scalable, and highly available
infrastructure that hosts video, photo, or music uploads and downloads
• Software delivery –Host your software applications that customers can
download

3. Amazon Elastic File System (EFS):

File storage in the AWS Cloud:
• Works well for big data and analytics, media processing workflows,
content management, web serving, and home directories.
• Petabyte-scale, low-latency file system.
• Shared storage.
• Elastic capacity.
• Supports Network File System (NFS) versions 4.0 and 4.1 (NFSv4).
• Compatible with all Linux-based AMIs for Amazon EC2.
4. Amazon Simple Storage Service Glacier:
• Amazon S3 Glacier is a data archiving service that is designed for
security, durability, and an extremely low cost.
• Amazon S3 Glacier is designed to provide 11 9s of durability for objects.
• It supports the encryption of data in transit and at rest through Secure
Sockets Layer (SSL) or Transport Layer Security (TLS).
22 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• The Vault Lock feature enforces compliance through a policy.

• Extremely low-cost design works well for long-term archiving.
• Provides three options for access to archives—expedited, standard, and
bulk—retrieval times range from a few minutes to several hours.

MODULE: 8 DATABASES

23 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

1. Amazon Relational Databases Service:

Amazon RDS is a web service that makes it easy to set up, operate, and scale a
relational database in the cloud. It provides cost-efficient and resizable capacity
while managing time-consuming database administration tasks so you can focus
on your applications and your business. Amazon RDS is scalable for compute
and storage, and automated redundancy and backup is available. Supported
database engines include Amazon Aurora, PostgreSQL, MySQL, MariaDB,
Oracle, and Microsoft SQL Server.
2. Amazon Dynamo DB:
Fast and flexible NoSQL database service for any scale.
• Virtually unlimited storage.
• Items can have differing attributes.
• Low-latency queries.
• Scalable read/write throughput.
The core DynamoDB components are tables, items, and attributes.
• A table is a collection of data.
• Items are a group of attributes that is uniquely identifiable among all the
other items.
3. Amazon Redshift:
Usage Case:1
Enterprise data warehouse (EDW)
• Migrate at a pace that customers are comfortable with
• Experiment without large upfront cost or commitment
• Respond faster to business needs
Big data
• Low price point for small customers
• Managed service for ease of deployment and maintenance • Focus more
on data and less on database management.
Usage Case:2
Software as a service (SaaS)
• Scale the data warehouse capacity as demand grows
• Add analytic functionality to applications 4. Amazon Aurora:
• Enterprise-class relational database.
• Compatible with MySQL or PostgreSQL.
• Automate time-consuming tasks (such as provisioning, patching, backup,
recovery, failure detection, and repair).

24 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE:9. CLOUD ARCHITECTURE

1. AWS Well Architected Framework:

A guide for designing infrastructures that are:
✓ Secure
✓ High performing
✓ Resilient ✓ Efficient
• A consistent approach to evaluating and implementing cloud
architectures
• A way to provide best practices that were developed through lessons
learned by reviewing customer architectures.
2. Reliability and Availability:

3. AWS Trusted Advisors:

Cost Optimization–AWS Trusted Advisor looks at your resource use and

makes recommendations to help you optimize cost by eliminating unused and
idle resources, or by making commitments to reserved capacity.

25 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Performance–Improve the performance of your service by checking your

service limits, ensuring you take advantage of provisioned throughput, and
monitoring for overutilized instances.
Security–Improve the security of your application by closing gaps, enabling various
AWS security features, and examining your permissions.
Fault Tolerance–Increase the availability and redundancy of your AWS
application by taking advantage of automatic scaling, health checks, multi-AZ
deployments, and backup capabilities.
Service Limits–AWS Trusted Advisor checks for service usage that is more than
80percentof the service limit. Values are based on a snapshot, so your current
usage might differ. Limit and usage data can take up to 24 hours to reflect any
changes.

26 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 10 AUTO SCALING AND MONITORING

1. Elastic Load Balancing:

2. Amazon CloudWatch:
• Amazon CloudWatch helps you monitor your AWS resources—and the
applications that you run on AWS—in real time.
CloudWatch enables you to –
• Collect and track standard and custom metrics.
• Set alarms to automatically send notifications to SNS topics or perform
Amazon EC2 Auto Scaling or Amazon EC2 actions.
• Define rules that match changes in your AWS environment and route these
events to targets for processing.

27 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

3. Amazon EC2 Auto Scaling:

• Helps you maintain application availability.
• Enables you to automatically add or remove EC2 instances according to
conditions that you define.
• Detects impaired EC2 instances and unhealthy applications and replaces
the instances without your intervention.
• Provides several scaling options –Manual, scheduled, dynamic or on-
demand, and predictive.

28 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

COURSE: MACHINE LEARNING FOUNDATIONS MODULE

1: INTRODUCING TO MACHINE LEARNING

1. What is Machine Learning?

Machine learning is the scientific study of algorithms and statistical models to
perform a task by using inference instead of instructions.

• Artificial intelligence is the broad field of building machines to perform

human tasks.
• Machine learning is a subset of AI. It focuses on using data to train ML
models so the models can make predictions.
• Deep learning is a technique that was inspired from human biology. It uses
layers of neurons to build networks that solve problems.
• Advancements in technology, cloud computing, and algorithm
development have led to a rise in machine learning capabilities and
applications.

2. Business Problems Solved with Machine Learning Machine learning is used

throughout a person’s digital life. Here are some examples:
• Spam –Your spam filter is the result of an ML program that was trained
with examples of spam and regular email messages.
• Recommendations –Based on books that you read or products that you
buy, ML programs predict other books or products that you might want.
Again, the ML program was trained with data from other readers’ habits
and purchases.
• Credit card fraud –Similarly, the ML program was trained on examples of
transactions that turned out to be fraudulent, along with transactions that
were legitimate.

Machine learning problems can be grouped into –

• Supervised learning: You have training data for which you know the
answer.
• Unsupervised learning: You have data, but you are looking for insights
within the data.

29 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• Reinforcement learning: The model learns in a way that is based on

experience and feedback.
• Most business problems are supervised learning.
3. Machine Learning Process
The machine learning pipeline process can guide you through the process of
training and evaluating a model.
The iterative process can be broken into three broad steps –
• Data processing
• Model training • Model evaluation

ML PIPELINE:

4. Machine Learning Tools Overview

• Jupyter Notebook is an open-source web application that enables you to
create and share documents that contain live code, equations,
visualizations, and narrative text.
• Jupyter Lab is a web-based interactive development environment for
Jupyter notebooks, code, and data. Jupyter Lab is flexible.
• pandas is an open-source Python library. It’s used for data handling and
analysis. It represents data in a table that is similar to a spreadsheet. This
table is known as a panda Data Frame.
• Matplotlib is a library for creating scientific static, animated, and
interactive visualizations in Python. You use it to generate plots of your
data later in this course.
• Seaborn is another data visualization library for Python. It’s built on
matplotlib, and it provides a high-level interface for drawing informative
statistical graphics.

30 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• NumPy is one of the fundamental scientific computing packages in

Python. It contains functions for N-dimensional array objects and useful
math functions such as linear algebra, Fourier transform, and random
number capabilities.
• scikit-learn is an open-source machine learning library that supports
supervised and unsupervised learning. It also provides various tools for
model fitting, data pre-processing, model selection and evaluation, and
many other utilities.

5. Machine Learning Challenges

31 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE 2: IMPLEMENTING A MACHINE LEARNING PIPELINE

WITH AMAZON SAGEMAKER

1. Formulating Machine Learning Problems

Business problems must be converted into an ML problem. Questions to ask
include –
• Have we asked why enough times to get a solid business problem
statement and know why it is important?
• Can you measure the outcome or impact if your solution is implemented?
Most business problems fall into one of two categories –
• Classification (binary or multi): Does the target belong to a class?
• Regression: Can you predict a numerical value?

2. Collecting and Securing Data

• Private data is data that you (or your customers) have in various existing
systems. Everything from log files to customer invoice databases can be
useful, depending on the problem that you want to solve. In some cases,
data is found in many different systems.
• Commercial data is data that a commercial entity collected and made
available. Companies such as Reuters, Change Healthcare, Dun &
Bradstreet, and Foursquare maintain databases that you can subscribe to.
These databases include curated news stories, anonymized healthcare
transactions, global business records, and location data. Supplementing
your own data with commercial data can provide useful insights that you
would not have otherwise.
• Open-source data comprises many different open-source datasets that
range from scientific information to movie reviews. These datasets are
usually available for use in research or for teaching purposes. You can find
open-source datasets hosted by AWS, Kaggle, and the UC Irvine Machine
Learning Repository.

32 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Securing Data
3. Evaluating Data
• Descriptive statistics can be organized into different categories. Overall
statistics include the number of rows (instances) and the number of
columns (features or attributes) in your dataset. This information, which
relates to the dimensions of your data, is important. For example, it can
indicate that you have too many features, which can lead to high
dimensionality and poor model performance.
• Attribute statistics are another type of descriptive statistic, specifically
for numeric attributes. They give a better sense of the shape of your
attributes, including properties like the mean, standard deviation, variance,
minimum value, and maximum value.
• Multivariate statistics look at relationships between more than one
variable, such as correlations and relationships between your attributes.

33 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

4. Feature Engineering

Feature selection is about selecting the features that are most relevant and
discarding the rest. Feature selection is applied to prevent either redundancy or
irrelevance in the existing features, or to get a limited number of features to
prevent overfitting.
Feature extraction is about building up valuable information from raw data by
reformatting, combining, and transforming primary features into new ones. This
transformation continues until it yields a new set of data that can be consumed by
the model to achieve the goals.
Outliers
During feature engineering. You can handle outliers with several different
approaches. They include, but are not limited to:
• Deleting the outlier: This approach might be a good choice if your outlier
is based on an artificial error. Artificial error means that the outlier isn’t
natural and was introduced because of some failure—perhaps incorrectly
entered data.
• Transforming the outlier: You can transform the outlier by taking the
natural log of a value, which in turn reduces the variation that the extreme
outlier value causes. Therefore, it reduces the outlier’s influence on the
overall dataset.
• Imputing a new value for the outlier: You can use the mean of the
feature, for instance, and impute that value to replace the outlier value.
Again, this would be a good approach if an artificial error caused the
outlier.

34 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Feature Selection: Filter Methods

Filter methods use a proxy measure instead of the actual model’s performance.
Filter methods are fast to compute, and they still capturing the usefulness of the
feature set. Common measures include:
• Pearson’s correlation coefficient –Measures the statistical relationship or
association between two continuous variables.
• Linear discriminant analysis (LDA) –Is used to find a linear
combination of features that separates two or more classes.
• Analysis of variance (ANOVA) –Is used to analyse the differences
among group means in a sample.
• Chi-square–Is a single number that tells how much difference exists
between your observed counts and the expected counts, if no relationship
exists in the population.

Feature Selection: Wrapper Methods

• Forward selection starts with no features and adds them until the best
model is found.
• Backward selection starts with all features, drops them one at a time, and
selects the best model.
Feature Selection: Embedded Methods
Embedded methods combine the qualities of filter and wrapper methods. They
are implemented from algorithms that have their own built-in feature selection
methods.
Some of the most popular examples of these methods are LASSO and RIDGE
regression, which have built-in penalization functions to reduce overfitting.

Filter Methods Wrapper Methods Embedded Methods

35 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

5. Training
Holdout technique and k-fold cross validation methods are the most used ones
when the data is to be classified as test set and training set.

Holdout method k-fold cross validation

XGBOOST ALGORITHM
XGBoost is a popular and efficient open-source implementation of the gradient
boosted trees algorithm. Gradient boosting is a supervised learning algorithm
that attempts to accurately predict a target variable. It attains its prediction by
combining an ensemble of estimates from a set of simpler, weaker models.
XGBoost has done well in machine learning competitions. It robustly handles
various data types, relationships, and distributions, and the many
hyperparameters that can be tweaked and tuned for improved fit. This flexibility
makes XGBoost a solid choice for problems in regression, classification (binary
and multiclass), and ranking.

36 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

LINEAR LEARNER
The Amazon SageMaker linear learner algorithm provides a solution for both
classification and regression problems.
With the Amazon SageMaker algorithm, you can simultaneously explore
different training objectives and choose the best solution from your validation
set. You can also explore many models and choose the best one for your needs.
The Amazon SageMaker linear learner algorithm compares favourably with
methods that provide a solution for only continuous objectives.
It provides a significant increase in speed over naive hyperparameter
optimization techniques.

6. Hosting and Using the Model

• You can deploy your trained model by using Amazon SageMaker to
handle API calls from applications, or to perform predictions by using a
batch transformation.
• The goal of your model is to generate predictions to answer the business
problem. Be sure that your model can generate good results before you
deploy it to production.
• Use Single-model endpoints for simple use cases and use multi-model
endpoint support to save resources when you have multiple models to
deploy.

37 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

7. Evaluating the Accuracy of the Model

38 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CONFUSION MATRIX
TERMINOLOGY

COMPARISION OF MODELS

SENSITIVITY

39 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

SPECIFICITY

WHICH MODEL IS BETTER NOW

OTHER CLASSIFICATION METRICS

40 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

8.Hyperparameter and model tuning

ABOUT HYPERPARAMETER:

HYPERPARAMETER TUNING:
• Tuning hyperparameters can be labour-intensive. Traditionally, this
kind of tuning was done manually.
• Someone—who had domain experience that was related to that
hyperparameter and use case—would manually select the
hyperparameters, according to their intuition and experience.
• Then, they would train the model and score it on the validation data.
This process would be repeated until satisfactory results were
achieved.
• This process is not always the most thorough and efficient way of
tuning your hyperparameters.

41 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE: 3. INTRODUCING FORECASTING

1. OVERVIEW OF FORECASTING
Forecasting is an important area of machine learning. It is important because so
many opportunities for predicting future outcomes are based on historical data.
It’s based on time series of data.
Time series data as falling into two broad categories.
The first type is univariate, which means that it has only one variable. The
second type is multivariate.
In addition to these two categories, most time series datasets also follow one
of the following patterns:
• Trend –A pattern that shows the values as they increase, decrease, or
stay the same over time.
• Seasonal –A repeating pattern that is based on the seasons in a year.
• Cyclical –Some other form of a repeating pattern.
• Irregular –Changes in the data over time that appear to be random or
that have no discernible pattern.

2. PROCESSING TIME SERIES DATA

• Time Series Data Handling

• Time Series Data Handling: Smoothing of Data

Smoothing your data can help you deal with outliers and other anomalies.
You might consider smoothing for the following reasons.
• Data preparation –Removing error values and outliers.
• Visualization –Reducing noise in a plot.

42 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Some Time Series Data Functions Using Python:

Time Series Data Algorithms:

• Autoregressive Integrated Moving Average (ARIMA): This

algorithm removes autocorrelations, which might influence the
pattern of observations.
• DeepAR+: A supervised learning algorithm for forecasting one-
dimensional time series. It uses a recurrent neural network to train a
model over multiple time series.
• Exponential Smoothing (ETS): This algorithm is useful for datasets
with seasonality. It uses a weighted average for all observations. The
weights are decreased over time.
• Non-Parametric Time Series (NPTS): –Predictions are based on
sampling from past observations. Specialized versions are available
for seasonal and climatological datasets.

43 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

• Prophet: A Bayesian time series model. It’s useful for datasets that
span a long time period, have missing data, or have large outliers.
3. Using Amazon Forecast
Below flowchart describes about the forecasting steps:

Import your data –You must import as much data as you have—both
historical data and related data. You should do some basic evaluation and
feature engineering before you use the data to train a model.
Train a predictor –To train a predictor, you must choose an algorithm. If
you are not sure which algorithm is best for your data, you can let Amazon
Forecast choose by selecting Auto ML as your algorithm. You also must
select a domain for your data, but if you’re not sure which domain fits best,
you can select a custom domain. Domains have specific types of data that
they require. For more information, see Predefined Dataset Domains and
Dataset Types in the Amazon Forecast documentation.
Generate forecasts –As soon as you have a trained model, you can use the
model to make a forecast by using an input dataset group. After you
generate a forecast, you can query the forecast, or you can export it to an
Amazon Simple Storage Service (Amazon S3) bucket. You also have the
option to encrypt the data in the forecast before you export it.

44 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE 4: INTRODUCING COMPUTER VISION

1. Computer Vision enables machines to identify people, places, and things
in images with accuracy at or above human levels, with greater speed and
efficiency. Often built with deep learning models, computer vision
automates the extraction, analysis, classification, and understanding of
useful information from a single image or a sequence of images. The image
data can take many forms, such as single images, video sequences, views
from multiple cameras, or three-dimensional data.

Applications of Computer Vision:

Public safety and home security
Computer vision with image and facial recognition can help to quickly
identify unlawful entries or persons of interest. This process can result in
safer communities and a more effective way of deterring crimes.
Authentication and enhanced computer-human interaction
Enhanced human-computer interaction can improve customer satisfaction.
Examples include products that are based on customer sentiment analysis in
retail outlets or faster banking services with quick authentication that is
based on customer identity and preferences.
Content management and analysis
Millions of images are added every day to media and social channels. The
use of computer vision technologies—such as metadata extraction and
image classification—can improve efficiency and revenue opportunities.
Autonomous driving
By using computer-vision technologies, auto manufacturers can provide
improved and safer self-driving car navigation, which can help realize
autonomous driving and make it a reliable transportation option.

45 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Medical imaging
Medical image analysis with computer vision can improve the accuracy and
speed of a patient's medical diagnosis, which can result in better treatment
outcomes and life expectancy. Manufacturing process control Well-trained
computer vision that is incorporated into robotics can improve quality
assurance and operational efficiencies in manufacturing applications. This
process can result in more reliable and cost-effective products.
Computer vison problems:
Problem 01: Recognizing food & state whether it’s breakfast or lunch or
dinner

As the CV classified the objects as milk, peaches, ice cream, salad,

nuggets, bread roll thus it’s a breakfast.

46 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Problem 02: Video Analysis

2. Image and Video Analysis

Amazon Recognition is a computer vision service based on deep learning.
You can use it to add image and video analysis to your applications.

47 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Amazon Recognition enables you to perform the following types of

analysis: Searchable image and video libraries–Amazon Recognition
makes images and stored videos searchable so that you can discover the
objects and scenes that appear in them.
Face-based user verification–Amazon Recognition enables your
applications to confirm user identities by comparing their live image with a
reference image. Sentiment and demographic analysis–Amazon
Recognition interprets emotional expressions, such as happy, sad, or
surprise. It can also interpret demographic information from facial images,
such as gender.
Unsafe content detection–Amazon Recognition can detect inappropriate
content in images and in stored videos.
Text detection–Amazon Recognition Text in Image enables you to
recognize and extract text content from images.

CASE 01: Searchable Image Library

CASE 02: Image Moderation

48 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CASE 03: Sentiment Analysis

4. Preparing Customs Dataset for Computer Vison

There are 6 steps involved in preparing customs data:
STEP 01: Collect Images

STEP 02: Create Training Dataset

49 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

STEP 03: Create Test Dataset

STEP 04: Train the Model

50 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

STEP 05: Evaluate

STEP 06: Use Model

51 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

MODULE 6: INTRODUCING NATURAL LANGUAGE

PROCESSING

1. Overview of Natural Language Processing

NLP develops computational algorithms to automatically analyze and
represent and represent human language. By evaluating the structure of
language, machine learning systems can process large sets of words,
phrases, and sentences.

Some challenges of NLP

Discovering the structure of the text –One of the first tasks of any
NLP application is to break the text into meaningful units, such as
words, phrases, and sentences.
Labelling data –After the system converts the text to data, the next
challenge is to apply labels that represent the various parts of speech. Every
language requires a different labelling scheme to match the language’s
grammar. Representing context –Because word meaning depends on
context, any NLP system needs a way to represent context. It is a big
challenge because of the large number of contexts.
Applying grammar –Dealing with the variation in how humans use
language is a major challenge for NLP systems.

NLP FLOW CHART:

52 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

2. Natural Language Processing Managed services

USES:

• Medical transcription
• Subtitle in streaming content and in offline content

USES:
• Navigation Systems
• Animation Productions

Animation Productions

53 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

USES :
 International Websites
 Software Localisation

USES:
• Document Analysis

Fraud Detection
USES:
• Interactive Assistants
• Database Queries

54 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

LAB MODULES
1. LAB: Implementing a Machine Learning pipeline with Amazon
SageMaker.
• Amazon SageMaker, Creating and Importing Data.
1. Launch an Amazon SageMaker notebook instance.
2. Launch a Jupyter notebook.
3. Run code in a notebook.
4. Download data from an external source.
5. Upload and download a Jupyter notebook to your local machine.
• Exploring Data
1. From the uploaded data use some functions in python like dtypes() -
for describing the data types of each variable used.
2. describe() function is used to find the statistical insights like “mean”,
“standard deviation”, “min, max”, “count”, “quartiles” etc..
3. dataframe.plot() is used for the visualization.
• Encoding Categorical Data
1. Step:1 df.info() for the getting the dtypes and df.columns() for column
names
2. Step 2: For encoding ordinal features firstly we use df[“column
name”].value_counts(). Apply mapper using replace method i.e.
df[“new col”]=df[“col1”].repalce(col2)
3. Step: 3 use get_dummies() to add binary features to the required
columns of the data frame.
4. Step:4 Now using info() to get the encoded data.
• Training a Model
1. Step:1 import data that is required to train.
2. Step:2 import boto3 and from sagemaker.image_uris import retrieve
and import sagemaker and apply format changes to the imported data.
Now explore the data.
• Deploying Model
1. Import the necessary libraries, perform predictions using df.predict()
function to the test part of the row.
2. To delete the end point of the predictor use the function
df.delete_endpoint(test data)
3. Now perform batch transform using the botos3 library and apply the
key value pairs as the dictionary.
4. Convert the values to binary features using “.apply(binary_convert)”
function to the transformed data.

55 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

2. Creating Forecast with Amazon Forecast.

Importing python packages in Jupyter notebook
Import boto3 (AWS SDK for python) and import warnings add use
function warnings.filterwarnings(‘ignore’).
2. Import pandas for data frames and matplotlib for the visualization and
for plotting functions.
3. Import helper functions like “time”, “sys”, “os”, “io” “json”.
• Read the file formats like .csv/.xlsx etc. and convert into time
series.
1. Using pd.read_excel(‘file.xlsx’). Now use df.dropna() to remove the
missing values from the dataset or use XGBOOST algorithm to deal
with missing values.
2. Now convert dataset using column that contains dates to time series
data set. Using “pd.to_datetime()”.
• Cleaning and reducing the size of the data
1. In this task we need to select the data that is unique. Let the unique
data be x (through column). Now using the function “x.unique()” ,
now the redundancy data gets deleted.
• Examining the required code and removing anomalies
2. Using data.requiredcode.describe() we can quickly verify the dataset.
3. Use describe() and plot() for changes in the metrics.
• Splitting the data
1. Split the data into 2 or more samples that contains columns that are
correlated.
2. Each split parts of pairs should be assigned to separate variables.
• Down sampling and Forecasting
1. Using the resample function from pandas we can make the cumulative
summation.
2. Using the groupby() function and using the .create_predictor we
create the “predictor” and using “create_forecast “ we create forecast :
>predictor_arn = create_predictor_response[‘PredictorArn’]
>create_forecast_response=forecast.create_forecast(ForecastName=f
orec ast_Name, PredictorArn=predictor_arn).
• Forecast completion
1. Create instances of the forecast and forecast the object query using
boto3 to the session variable then apply .client() to the service name.

56 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

2. Using forecast we just need to use the service name in the

query_forecast(forecasrArn=forest_arn). And plot the results using
stock code.
3. Facial Recognition
Importing required libraries
Import necessary libraries like “from skimage import io”, “from
skimage.transform import rescale”, “from matplotlib import
pyplot as plt”.
2. Now “import boto3”, “import numpy as np”, “from PIL import Image,
ImageDraw, ImageColor, ImageOps.
• Creating a collection
1. Client =boto3.client(‘recognition’)
2. Collection_id = ‘collection’
3. Response = client.create_collection(CollectionId=collection_id)
• Uploading an image to search
1. Use the io.imread(“image file”) and use im.show(“filename”) to show
image.
2. Rescale the image size using “filename=rescale(filename,0.50,
mode=’constant’)”
• Adding image to the collection
1. Using stock code add the image data to the collection.
2. Now the objects are created.
• Viewing the bounding box of the detected face
1. Set a variable img as image.open(filename)
2. Add a variables like imgwidth and imgheight =img.size
3. Set a variable as “draw” filename.draw(img)
4. Use a loop to set the bounding box with imgwidth, imgheight for top,
left, width, height.
5. Plot the image using “plt.imshow(img)”.
• Listing and finding the faces in the collection 1. Use the forecast
code from the client-service name.
2. Set the “targetimage” for the images in the collection excluding the
search image. Then threshold value and search images to count at
once.
3. Draw a box around the discovered face among the collection suing
the same stock code for bounding.

57 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

4. To reset and delete the collection data from the client use the
“delete_collection” in exceptions handling functions (try and except)
for the clear display of status code.

4. Natural Language Processing

Importing Necessary Libraries
Import pandas as pd, import numpy as np, import re (regular
expression), import nltk (natural language tool kit).
2. Now download stopwords using “nltk.download(‘stopwords’)”.
• Sample of corpus of text document
1. Set a variable “corpus” and add list of sentences. Now setup the
labels that required.
2. Convert the list of documents to the data frame using pandas and
data frame to be accessed using keys like “document” and
“category”. • Simple Text Pre-processing
1. Now we need to remove the stop words from the data, so we use
“nltk.WordPunctTokenizer()” as “wpt” and
“nltk.corpus.stopwords.words(“English”) as “stopwords”
2. Now create a function “normalize_document” in that use re.sub() for
applying regular expression to the doc function and strip() to remove
white spaces.
3. Now tokenize the entire document using wpt.tokenize(doc) and apply
filters to remove the stop words and join the filtered tokens with ‘
‘.join().
• Normalization
1. Firstly, np.vectorize the “normalize_document” function and
assign to normalize_corpus.
2. Now using the “corpus” variable as the parameter to the
normalize_corpus(corpus) and assign to “norm_corpus” variable. •
Using Bag of words model
1. Feature extraction is the background process of bag of words by importing
“countvectorizer”.
2. Set a variable “cv” and assign the “countvectorizer” with min=0 and
max=1. Using function fit_transform() convert “normcorpus” into matrix.
3. Extracting the labels using cv.get_feature_names() and print the data frame
to the extracted features.

58 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CASE STUDY

UNLOCKING CLINICAL DATA FROM NARRATIVE REPORTS

Objective:
To evaluate the automated detection of clinical conditions described in
narrative reports.

Design:
Automated methods and human experts detected the presence or absence of
six clinical conditions in 200 admission chest radiograph reports.

Study Subjects:
A computerized, general-purpose natural language processor; 6
internists; 6 radiologists; 6 lay persons; and 3 other computer methods.
Main Outcome Measures:
Intrasubject disagreement was quantified by “distance” (the average number
of clinical conditions per report on which two subjects disagreed) and by
sensitivity and specificity with respect to the physicians.

Results:
Using a majority vote, physicians detected 101 conditions in the 200 reports
(0.51 per report); the most common condition was acute bacterial
pneumonia (prevalence, 0.14), and the least common was chronic
obstructive pulmonary disease (prevalence, 0.03). Pairs of physicians
disagreed on the presence of at least 1 condition for an average of 20% of
reports. The average intrasubject distance among physicians was 0.24 (95%
CI, 0.19 to 0.29) out of a maximum possible distance of 6. No physician had
a significantly greater distance than the average. The average distance of the
natural language processor from the physicians was 0.26 (CI, 0.21 to 0.32;
not significantly greater than the average among physicians). Lay persons
and alternative computer methods had significantly greater distance from
the physicians (all >0.5). The natural language processor had a sensitivity of
81% (CI, 73% to 87%) and a specificity of 98% (CI, 97% to 99%);
physicians had an average sensitivity of 85% and an average specificity of
98%.

59 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

Conclusions:
Physicians disagreed on the interpretation of narrative reports, but this was
not caused by outlier physicians or a consistent difference in the way
internists and radiologists read reports. The natural language processor was
not distinguishable from the physicians and was superior to all other
comparison subjects. Although the domain of this study was restricted (six
clinical conditions in chest radiographs), natural language processing seems
to have the
potential to extract clinical information from narrative reports in a manner
that will support automated decision-support and clinical research.

60 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

CONCLUSION
These modules described how model explain ability relates to AI/ML
solutions, giving customers insight to explain ability requirements when
initiating AI/ML use cases. Using AWS, four pillars were presented to
assess model explain ability options to bridge knowledge gaps and
requirements for simple to complex algorithms. To help convey how these
models explain ability options relate to real-world scenarios, examples from
a range of industries were demonstrated. It is recommended that AI/ML
owners or business leaders follow these steps when initiating a new AI/ML
solution:
• Collect business requirements to identify the level of explain ability
required for your business to accept the solution.
• Based on business requirements, implement an assessment for model
explain ability.
• Work with an AI/ML technician to communicate model explain
ability assessment and find the optimal AI/ML solution to meet your
business objectives.
• After the solution is completed, revisit the model explain ability
assessment to evaluate that business requirements are continuously
met.
• By taking these steps, we will mitigate regulation risks and ensure
trust in our model. With this trust, when the event comes to push your
AI/ML solution into an AWS production environment, we will be
ready to create business value for our use case

61 | P a g e

Downloaded by dileep b ([email protected])

lOMoARcPSD|50067771

REFERENCES

1. Machine learning on AWS

https://aws.amazon.com/machinelearning/?nc2=h_ql_sos
ml
2. Amazon _AWS _EC2
https://aws.amazon.com/ec2/
3. Amazon
AWS_S3
https://aws.amazon.com/s3/
4. Amazon_AWS_sagemaker
https://aws.amazon.com/sagemaker/
5.GitHub machine learning scikit learn
https://github.com/scikit-learn/scikit-learn.git
6. AWS_forecast
https://aws.amazon.com/forecast
007.Case_study_on_ML_architecture-uber
https://pantelis.github.io/cs634/docs/common/lectures/uber-ml-arch-case-
study/
8. AWS Global Infrastructure Map:
https://aws.amazon.com/about-aws/global-
infrastructure/#AWS_Global_Infrastructure_
MapChoose a circle on the map to view summary information about the
Region represented by the circle.
9. Regions and Availability Zones:
https://aws.amazon.com/about-aws/global- infrastructure/regions_az/
Choose a tab to view a map of the selected geography and a list of
Regions, Edge locations, Local zones, and Regional Caches.
10. Clinical Narrative reports.
https://www.acpjournals.org/doi/abs/10.7326/0003-4819-122-9199505010-
00007

62 | P a g e

Downloaded by dileep b ([email protected])

Intership Report
No ratings yet
Intership Report
63 pages
Engineering Internship Insights
No ratings yet
Engineering Internship Insights
40 pages
DE AWS Test (1) T
No ratings yet
DE AWS Test (1) T
74 pages
AI & ML Internship Report CE
No ratings yet
AI & ML Internship Report CE
24 pages
AI-ML Virtual Internship Report 2023
0% (1)
AI-ML Virtual Internship Report 2023
36 pages
20121a3226 Internship Report
No ratings yet
20121a3226 Internship Report
64 pages
AWS Virtual Cloud
No ratings yet
AWS Virtual Cloud
28 pages
Google Aiml
No ratings yet
Google Aiml
47 pages
Internship Cohort 9
No ratings yet
Internship Cohort 9
32 pages
Aws Aiml
No ratings yet
Aws Aiml
34 pages
Report Final
No ratings yet
Report Final
21 pages
Neha Int
No ratings yet
Neha Int
33 pages
Geetha Intern de
No ratings yet
Geetha Intern de
26 pages
Aws Aiml
No ratings yet
Aws Aiml
34 pages
AWS Data Engineering Internship Report
No ratings yet
AWS Data Engineering Internship Report
50 pages
AWS AI-ML Virtual Internship Report
100% (1)
AWS AI-ML Virtual Internship Report
45 pages
Report 2 Merged
No ratings yet
Report 2 Merged
25 pages
AWS Internship Report
No ratings yet
AWS Internship Report
31 pages
Aws Cloud
No ratings yet
Aws Cloud
86 pages
Intere
No ratings yet
Intere
72 pages
Aws Intern Report
No ratings yet
Aws Intern Report
37 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
50 pages
Nagarjuna AI ML
No ratings yet
Nagarjuna AI ML
20 pages
AWS Internship Report
No ratings yet
AWS Internship Report
14 pages
AWS Cloud Virtual Internship Report
No ratings yet
AWS Cloud Virtual Internship Report
50 pages
AI & ML Virtual Internship: Bachelor of Technology
No ratings yet
AI & ML Virtual Internship: Bachelor of Technology
34 pages
Ajay AWS Final
No ratings yet
Ajay AWS Final
75 pages
Ram Documentatation
No ratings yet
Ram Documentatation
56 pages
AWS Cloud Virtual Internship Report
No ratings yet
AWS Cloud Virtual Internship Report
50 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
50 pages
Internship Report Sample 1 Internship Report Sample 1: B Tech (Galgotias University) B Tech (Galgotias University)
No ratings yet
Internship Report Sample 1 Internship Report Sample 1: B Tech (Galgotias University) B Tech (Galgotias University)
28 pages
Aws Data Engineer
No ratings yet
Aws Data Engineer
66 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
50 pages
593 Internship
No ratings yet
593 Internship
50 pages
Aws Data Engineer 4
No ratings yet
Aws Data Engineer 4
65 pages
21981a4924 Aws Aiml Virtual Internship
No ratings yet
21981a4924 Aws Aiml Virtual Internship
28 pages
Internshipreport FINAL441
No ratings yet
Internshipreport FINAL441
14 pages
Varma Document 2
No ratings yet
Varma Document 2
27 pages
FINAL PART - Merged
No ratings yet
FINAL PART - Merged
52 pages
AWS AI-ML Internship Report by Mahesh
No ratings yet
AWS AI-ML Internship Report by Mahesh
22 pages
Aws Cloud Virtual Internship Report Aicte GGVP
No ratings yet
Aws Cloud Virtual Internship Report Aicte GGVP
89 pages
Ai ML Virtual Internship
No ratings yet
Ai ML Virtual Internship
51 pages
Google Aiml2
No ratings yet
Google Aiml2
33 pages
AI-ML Virtual Internship Report
No ratings yet
AI-ML Virtual Internship Report
34 pages
AWS AI ML Virtual Internship Full Report
No ratings yet
AWS AI ML Virtual Internship Full Report
33 pages
Data Engineering Report Final
No ratings yet
Data Engineering Report Final
56 pages
Internship
No ratings yet
Internship
42 pages
AWS Cloud Internship Report
No ratings yet
AWS Cloud Internship Report
17 pages
Google AI-ML Virtual Internship Report
No ratings yet
Google AI-ML Virtual Internship Report
27 pages
Intership Report
No ratings yet
Intership Report
30 pages
570 Report
No ratings yet
570 Report
38 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
56 pages
Disease Drug Prediction Usiing ML: Computer Science and Engineering (Artificial Intelligence)
No ratings yet
Disease Drug Prediction Usiing ML: Computer Science and Engineering (Artificial Intelligence)
48 pages
Internship Report
No ratings yet
Internship Report
57 pages
Vijaya Durga Internship
No ratings yet
Vijaya Durga Internship
57 pages
Industry Internship Report
No ratings yet
Industry Internship Report
12 pages
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
No ratings yet
MAJOR AND MINOR PROJECT REPORT FORMAT Niist
9 pages
21mh1a4205 Documentation
No ratings yet
21mh1a4205 Documentation
54 pages
Smart Home Automation Internship Report
No ratings yet
Smart Home Automation Internship Report
27 pages
Embassy to the Kingdom of Ava, 1795
No ratings yet
Embassy to the Kingdom of Ava, 1795
150 pages
Debt Advice - Replying - To - A - County - Court - Claim - Form
100% (1)
Debt Advice - Replying - To - A - County - Court - Claim - Form
7 pages
Project Report On Wto Organization
67% (3)
Project Report On Wto Organization
31 pages
(23752122 - The Dostoevsky Journal) 200 Years Since The Birth of Dostoevsky 1821-1881
No ratings yet
(23752122 - The Dostoevsky Journal) 200 Years Since The Birth of Dostoevsky 1821-1881
4 pages
6FM90TD X
No ratings yet
6FM90TD X
2 pages
Mexico's Mixed Economy Overview
No ratings yet
Mexico's Mixed Economy Overview
5 pages
Soal Bahasa Inggris Kelas 4 Kurikulum Merdeka
80% (5)
Soal Bahasa Inggris Kelas 4 Kurikulum Merdeka
3 pages
Horkheimer ReasonItself 1996
No ratings yet
Horkheimer ReasonItself 1996
10 pages
Ice Trade
No ratings yet
Ice Trade
25 pages
Btcmining - Best Hack Skript
100% (1)
Btcmining - Best Hack Skript
4 pages
Books Published in Goa 2009
No ratings yet
Books Published in Goa 2009
43 pages
Bulus P. U - Academic - CV
No ratings yet
Bulus P. U - Academic - CV
5 pages
Main Catalog 2018 Volume 1 - Ipc, Motion, Automation: Industrial PC Embedded PC Drive Technology Twincat Twinsafe
No ratings yet
Main Catalog 2018 Volume 1 - Ipc, Motion, Automation: Industrial PC Embedded PC Drive Technology Twincat Twinsafe
17 pages
Indifi Loan Application Status Report
No ratings yet
Indifi Loan Application Status Report
10 pages
Grade 4 End RM 3 Science
No ratings yet
Grade 4 End RM 3 Science
6 pages
Sacred & Secular Music Sheets Collection
No ratings yet
Sacred & Secular Music Sheets Collection
2 pages
Unsupported Audio & Video Mimes
No ratings yet
Unsupported Audio & Video Mimes
33 pages
Understanding Diabetic Ketoacidosis
100% (1)
Understanding Diabetic Ketoacidosis
11 pages
Ielts Academic Reading Practice Test 170 D11648ad09
No ratings yet
Ielts Academic Reading Practice Test 170 D11648ad09
5 pages
Bmats201 - I Ia QB
No ratings yet
Bmats201 - I Ia QB
2 pages
KWTranslator Help
No ratings yet
KWTranslator Help
18 pages
Two Phase Immiscible Fluids Flow Through A Porous
No ratings yet
Two Phase Immiscible Fluids Flow Through A Porous
9 pages
ME1102 Fluids Reading
No ratings yet
ME1102 Fluids Reading
19 pages
2024 - 2 Graduation List Batch 1 For 2025 Convocation - SC
No ratings yet
2024 - 2 Graduation List Batch 1 For 2025 Convocation - SC
76 pages
HL-68547475 Mxver Glyco BC 7TH Update
No ratings yet
HL-68547475 Mxver Glyco BC 7TH Update
3 pages
New India Mediclaim Policy Details
No ratings yet
New India Mediclaim Policy Details
5 pages
Case Study 2 1
No ratings yet
Case Study 2 1
43 pages
RTB DUMP-RUDRA PATCH
No ratings yet
RTB DUMP-RUDRA PATCH
2 pages
Definition of Tawaf
No ratings yet
Definition of Tawaf
27 pages
Customized Maintenance Program AS 350 B3e / H125 (9N-ANO) Helicopter
No ratings yet
Customized Maintenance Program AS 350 B3e / H125 (9N-ANO) Helicopter
12 pages