0% found this document useful (0 votes)
36 views149 pages

MIG and Load Balancing

Managed Instance Groups (MIGs) allow for the automatic scaling and management of similar virtual machines, with health checks ensuring the recreation of unhealthy instances. Load balancing can be external or internal, utilizing various types such as HTTP(S), SSL proxy, and TCP proxy, to efficiently distribute traffic. Key components include instance templates, target proxies, URL maps, and backend services that work together to optimize application performance and reliability.

Uploaded by

flaviano teodoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
36 views149 pages

MIG and Load Balancing

Managed Instance Groups (MIGs) allow for the automatic scaling and management of similar virtual machines, with health checks ensuring the recreation of unhealthy instances. Load balancing can be external or internal, utilizing various types such as HTTP(S), SSL proxy, and TCP proxy, to efficiently distribute traffic. Key components include instance templates, target proxies, URL maps, and backend services that work together to optimize application performance and reliability.

Uploaded by

flaviano teodoro
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Managed Instance Groups and Load Balancing

Managed instance groups are a pool of similar machines


which can be scaled automatically

Over view Load balancing can be external or internal, global or


regional

Basic components of HTTP(S) load balancing - target


proxy, URL map, backend service and backends

Use cases and architecture diagrams for all the load


balancing types HTTP(S), SSL proxy, TCP proxy,
network and internal load balancing
Managed Instance Groups
Instance Groups
A group of machines which can be created and managed together to avoid
individually controlling each instance in the project
2 Kinds of Instance Groups

Managed Unmanaged
2 Kinds of Instance Groups

Managed Unmanaged
- Uses an instance template to create a group of
Managed identical instances
Instance Group - Changes to the instance group changes all instances in
the group
Instance Template
Defines the machine type, image, zone and other properties of an instance.
A way to save the instance configuration to use it later to create new
instances or groups of instances
- Global resource not bound to a zone or a region
Instance - Can reference zonal resources such as a persistent
Template disk
- In such cases can be used only within the zone
- Can automatically scale the number of instances in the
group
- Work with load balancing to distribute traffic across
Managed instances
Instance Group - If an instance stops, crashes or is deleted the group
automatically recreates the instance with the same
template
- Can identify and recreate unhealthy instances in a group
(autohealing)
2 Types of Managed Instance Groups

Zonal

Managed

Regional
- Prefer regional instance groups to zonal so application
load can be spread across multiple zones
Zonal vs.
Regional MIG - This protects against failures within a single zone
- Choose zonal if you want lower latency and avoid
cross-zone communication
- A MIG applies health checks to monitor the instances in
the group
- If a service has failed on an instance, that instance is
recreated (autohealing)
Health Checks
and Autohealing - Similar to health checks used in load balancing but the
objective is different
- LB health checks are used to determine where to
send traffic
- MIG health checks are used to recreate instances
- Typically configure health checks for both LB and MIGs
- The new instance is recreated based on the template
Health Checks that was used to originally create it (might be different
and Autohealing from the default instance template)
- Disk data might be lost unless explicitly snapshotted
- Check Interval: The time to wait between attempts to
check instance health
- Timeout: The length of time to wait for a response
Configuring before declaring check attempt failed
Health Checks - Health Threshold: How many consecutive “healthy”
responses indicate that the VM is healthy
- Unhealthy Threshold: How many consecutive “failed”
responses indicate VM is unhealthy
2 Kinds of Instance Groups

Managed Unmanaged
- Groups of dissimilar instances that you can add and
remove from the group
Unmanaged - Do not offer autoscaling, rolling updates or instance
Instance Groups
templates
- Not recommended, used only when you need to apply
load balancing to pre-existing configurations
Load Balancing
Load Balancing

Google Cloud

Load Balancer

User
- Load balancing and autoscaling for groups of instances
- Scale your application to support heavy traffic

Load Balancing - Detect and remove unhealthy VMs, healthy VMs


automatically re-added
- Route traffic to the closest VM
- Fully managed service, redundant and highly available
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User

Application Layer HTTP/HTTPS


Presentation Layer

Session Layer SSL Proxy


Transport Layer TCP Proxy
Network Layer Network
Data Link Layer

Physical Layer
OSI Net work Stack
User

Application Layer HTTP/HTTPS


Presentation Layer
Rule-of-thumb: Load
SSL Proxy
balancer in the highest
Session Layer

Transport Layer TCP Proxy


Network Layer Network layer possible
Data Link Layer

Physical Layer
- HTTP, HTTPS health checks: Highest fidelity check because they verify
that the web server is up and serving traffic, not just that the instance is
healthy.

Health - SSL health checks: Configure the SSL health checks if your traffic is not
Checks HTTPS but is encrypted via SSL(TLS)

- TCP health checks: For all TCP traffic that is not HTTP(S) or SSL(TLS),
you can configure a TCP health check
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User

Application Layer HTTP/HTTPS


Presentation Layer
HTTP(S) load
SSL Proxy
balancing is the
Session Layer

Transport Layer TCP Proxy


Network Layer Network “smartest”
Data Link Layer

Physical Layer
HTTP/HTTPS Load Balancing

A global, external load balancing service offered on the GCP


HTTP/HTTPS Load Balancing

Distributes HTTP(S) traffic among groups of instances based on:


- proximity to the user
- requested URL
- or both.
HTTP/HTTPS Load Balancing

Traffic from the internet is sent to a global forwarding rule - this rule determines which
proxy the traffic should be directed to
HTTP/HTTPS Load Balancing

The global forwarding rule directs incoming requests to a target HTTP proxy
HTTP/HTTPS Load Balancing

The target HTTP proxy checks each request against a URL map to determine the
appropriate backend service for the request
HTTP/HTTPS Load Balancing

The backend service directs each request to an appropriate backend based on serving
capacity, zone, and instance health of its attached backends
HTTP/HTTPS Load Balancing

The health of each backend instance is verified using either an HTTP health check or an
HTTPS health check - if HTTPS, request is encrypted
HTTP/HTTPS Load Balancing

Actual request distribution can happen based on CPU utilization, requests per instance
HTTP/HTTPS Load Balancing

Can configure the managed instance groups making up the backend to scale
as the traffic scales (based on the parameters of utilization or requests per
second)
HTTP/HTTPS Load Balancing

HTTPS load balancing requires the target proxy to have a signed certificate
to terminate the SSL connection
HTTP/HTTPS Load Balancing

BTW, must create firewall rules to allow requests from load balancer and health checker
to get through to the instances
HTTP/HTTPS Load Balancing

Session affinity: All requests from same client to same server based on either
- client IP
- cookie
HTTP/HTTPS Load Balancing
Global For warding Rules
- Route traffic by IP address, port and protocol to a load
balancing proxy

Global - Can only be used with global load balancing HTTP(S),


For warding Rules SSL Proxy and TCP Proxy
- Regional forwarding rules can be used with regional
load balancing and individual instances
HTTP/HTTPS Load Balancing
Target Proxy
- Referenced by one or more global forwarding rules
- Route the incoming requests to a URL map to
determine where they should be sent
- Specific to a protocol (HTTP, HTTPS, SSL and TCP)
Target Proxy
- Should have a SSL certificate if it terminates HTTPS
connections (limit of 10 SSL certificates)
- Can connect to backend services via HTTP or
HTTPS
HTTP/HTTPS Load Balancing
URL Map
- Used to direct traffic to different instances based on
the incoming URL
URL Map - http://www.example.com/audio -> backend service1
- http://www.example.com/vide -> backend service2
URL Map

All traffic sent to the same groups of instances

Only the /* path matcher is created automatically and directs all traffic to the same
backend service
URL Map

Host rules — example.com, customer.com


URL Map

Path rules — /video, /video/hd, /video/sd


URL Map

A default path matcher /* is created automatically. Traffic which does not match
other path rules is sent to this default service
URL Map With Host Rule

example.com requests will be sent to one set of backends


URL Map With Host Rule

Requests for all other hosts will go to the default backend


URL Map With Path Rules

Path rules for video


URL Map With Path Rules

More specific rules for /video/sd and /video/hd


URL Map With Path Rules

No host name
URL Map With Path Rules

Default backend service when no path rule matches


URL Map With Path Rules

Path other than /video/hd and /video/sd


URL Map With Path Rules

Backends for paths which match /video/hd and /video/sd


HTTP/HTTPS Load Balancing
Backend Service
- Centralized service for managing backends
- Backends contain instance groups which handle user
Backend requests

Service - Knows which instances it can use, how much traffic


they can handle
- Monitors the health of backends and does not send
traffic to unhealthy instances
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- HTTP(S), SSL and TCP health checks
- HTTP(S): Verifies that the instance is healthy and the
web server is serving traffic
Health - TCP, SSL: Used when the service expects TCP or SSL
Checks connection i.e. not HTTP(S)
- GCP creates redundant copies of the health checker
automatically so health checks might happen more
frequently that you expect
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- Client IP: Hashes the IP address to send requests
from the same IP to the same VM
- Requests from different users might look like it is
from the same IP
Session - Users which move networks might lose affinity
Affinity - Cookie: Issues a cookie named GCLB in the first
request.
- Subsequent requests from clients with the cookie are
sent to the same instance
HTTP/HTTPS Load Balancing
Backend
- Instance Group: Can be a managed or unmanaged
instance group
- Balancing Mode: Determines when the backend is at

Backends full usage


- CPU utilization, Requests per second
- Capacity Setting: A % of the balancing mode which
determines the capacity of the backend
- Allow you to use Cloud Storage buckets with HTTP(S)
Backend load balancing

Buckets - Traffic is directed to the bucket instead of a backend


- Useful in load balancing requests to static content
Backend Buckets
Backend Buckets
Backend Buckets
Backend Buckets

A path of /static can be sent to the storage bucket and all


other paths go to the instances
- Uses CPU utilization of the backend or requests per

Load second as the balancing mode

Distribution - Maximum values can be specified for both


- Short bursts of traffic above the limit can occur
- Incoming requests are first sent to the region closest
to the user, if that region has capacity

Load - Traffic distributed amongst zone instances based on


Distribution capacity
- Round robin distribution across instances in a zone
- Round robin can be overridden by session affinity
HTTP/HTTPS Load Balancing
Firewall Rules
- Allow traffic from 130.211.0.0/22 and
35.191.0.0/16 to reach your instances

Firewall - IP ranges that the load balancer and the health


checker use to connect to backends
Rules - Allow traffic on the port that the global forwarding rule
has been configured to use
HTTP/HTTPS Load Balancing

Cross-Regional Content-based
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User

Application Layer HTTP/HTTPS

SSL operates in
Presentation Layer

Session Layer SSL Proxy

the session layer


Transport Layer TCP Proxy
Network Layer Network
Data Link Layer

Physical Layer
- Remember the OSI network layer stack: physical, data link, network,
transport, session, presentation, application?

SSL Proxy Load - The usual combination is TCP/IP: network = IP, transport = TCP,
Balancing application = HTTP

- For secure traffic: add session layer = SSL (secure socket layer), and
application layer = HTTPS
- Use only for non-HTTP(S) SSL traffic

SSL Proxy Load - For HTTP(S), just use HTTP(S) load balancing
Balancing
- SSL connections are terminated at the global layer then proxied to the
closest available instance group
SSL Proxy Load Balancing

Users have a secure


connection to the SSL
proxy
SSL Proxy Load Balancing

Load balancing SSL proxy


SSL Proxy Load Balancing

Makes fresh connections to


the backends - this
connection can be SSL or
non-SSL
SSL Proxy Load Balancing

The SSL connections are


terminated at the global
layer and then proxied to
the closest available
instance group
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User

Application Layer HTTP/HTTPS


Presentation Layer

Session Layer SSL Proxy


Transport Layer TCP Proxy
Network Layer Network
Data Link Layer

Physical Layer
- Perform load balancing based on transport layer (TCP)

TCP Proxy Load - Allows you to use a single IP address for all users around the world.
Balancing
- Automatically routes traffic to the instances that are closest to the user.
- Advantage of transport layer load balancing:
- more intelligent routing possible than with network layer load
TCP Proxy Load balancing
Balancing - better security - TCP vulnerabilities can be patched at the load
balancer
TCP Proxy Load Balancing

TCP traffic from users goes


to the TCP proxy load
balancer
TCP Proxy Load Balancing

Proxy makes new


connections to the backend
- these can be TCP
connections or even SSL
connections
TCP Proxy Load Balancing

The TCP connections are


terminated at the global
layer and then proxied to
the closest available
instance group
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User

Application Layer HTTP/HTTPS


Presentation Layer

Session Layer SSL Proxy


Transport Layer TCP Proxy
Network Layer Network
Data Link Layer

Physical Layer
- Based on incoming IP protocol data, such as address,
port, and protocol type
- Pass-through, regional load balancer - does not proxy
Net work Load connections from clients

Balancing - Use it to load balance UDP traffic, and TCP and SSL
traffic
- Load balances traffic on ports that are not supported by
the SSL proxy and TCP proxy load balancers
- Picks an instance based on a hash of:

- the source IP and port

- destination IP and port

- protocol

Load Balancing - This means that incoming TCP connections are spread across instances
and each new connection may go to a different instance.
Algorithm
- Regardless of the session affinity setting, all packets for a connection are
directed to the chosen instance until the connection is closed and have no
impact on load balancing decisions for new incoming connections

- This can result in imbalance between backends if long-lived TCP


connections are in use.
- Network load balancing forwards traffic to target pools

- A group of instances which receive incoming traffic from forwarding


rules

- Can only be used with forwarding rules for TCP and UDP traffic

Can have backup pools which will receive requests if the first pool is
Target Pools -

unhealthy

- failoverRatio is the ratio of healthy instances to failed instances in a


pool

- If primary target pool’s ratio is below the failoverRatio traffic is sent


to the backup pool
- Configured to check instance health in target pools
Health
- Network load balancing uses legacy health checks for determining
Checks instance health
- HTTP health check probes are sent from the IP ranges

Firewall 209.85.152.0/22, 209.85.204.0/22, and 35.191.0.0/16.

Rules - The load balancer uses the same ranges to connect to the instances

- Firewall rules should be configured to allow traffic from these IP ranges


Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing

External Internal

Regional
Regional
Global

Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
External Load Balancing

Google Cloud

Load Balancer

User
Internal Load Balancing
Project

VPC #1 Subnet 2
Subnet 1
Private IP addresses
- Private load balancing IP address that only your VPC
instances can access

VPC traffic stays internal - less latency, more security


Internal Load -

No public IP address needed


Balancing -

- Useful to balance requests from your frontend instances to your


backend instances
Internal Load Balancing

A single subnet in the


region us-central
Internal Load Balancing

All instances belong to the


same VPC and region but
can be in different subnets
Internal Load Balancing

2 backend instance groups


across two zones
Internal Load Balancing

The load balancing IP is


from the same VPC
network
Internal Load Balancing

The request gets forwarded


to one of the two instance
groups with the subnet
- The backend instance for a client is selected using a hashing algorithm that
takes instance health into consideration.

- Using a 5-tuple hash, five parameters for hashing:


Load Balancing
Algorithm - client source IP
- client port
- destination IP (the load balancing IP)
- destination port
- protocol (either TCP or UDP)
- Introduce session affinity by hashing on only some of the 5 parameters

Load Balancing - Hash based on 3-tuple (Client IP, Dest IP, Protocol)
Algorithm
- Hash based on 2-tuple (Client IP, Dest IP)
- HTTP, HTTPS health checks: These provide the highest fidelity,
they verify that the web server is up and serving traffic, not just that
the instance is healthy.

Health - SSL (TLS) health checks: Configure the SSL health checks if your
Checks traffic is not HTTPS but is encrypted via SSL(TLS)

- TCP health checks: For all TCP traffic that is not HTTP(S) or
SSL(TLS), you can configure a TCP health check
- Managed service. no additional configuration needed to ensure high
availability

Can configure multiple instance groups in different zones to guard


High -

against failures in a single zone


Availability
- With multiple instance groups all instances are treated as if they are in a
single pool and the load balancer distributes traffic amongst them
using the load balancing algorithm
- Configure an internal IP on a load balancing device or instance(s) and your
client instance connects to this IP

Traditional (Proxy) - Traffic coming to the IP is terminated at the load balancer


Internal Load
Balancing - The load balancer selects a backend and establishes a new connection to it

- In effect, there are two connections: Client<->Load Balancer and Load


Balancer<->Backend.
Traditional (Proxy)
Internal Load
Balancing
- Not proxied - differs from traditional model

- lightweight load-balancing built on top of Andromeda network virtualization


GCP Internal stack
Load Balancing
- provides software-defined load balancing that directly delivers the traffic
from the client instance to a backend instance
GCP Internal
Load Balancing
Use Case: 3-tier Web App
Use Case: 3-tier Web App

External HTTP(S) load


balancer to manage
client traffic to frontend
instance groups
Use Case: 3-tier Web App

Frontend instances are


connected to the backend
instances using an internal
load balancer
Autoscaling
- Managed instance groups automatically add or remove
instances based on increases and decreases in load
- Helps your applications gracefully handle increases in
Autoscaling traffic
- Reduces cost when load is lower
- Define autoscaling policy, the autoscaler takes care of
the rest
Autoscaling is a feature of managed
instance groups

Unmanaged instance groups are not


supported
Autoscaling is a feature of managed
instance groups

For GKE groups autoscaling is different,


called Cluster Autoscaling
- Autoscaling Policy
Autoscaling - Target Utilization Level
Autoscaling Policy

Stackdriver monitoring
Average CPU utilization
metrics

HTTP(S) load balancing


Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- The level at which you want to maintain your VMs
Target
Utilization Level - Interpreted differently based on the autoscaling policy that
you’ve chosen
Autoscaling Policy

Stackdriver monitoring
Average CPU utilization
metrics

HTTP(S) load balancing


Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- Target utilization level of 0.75 maintains average CPU
utilization at 75% across all instances
- If utilization exceed the target, more CPUs will be
added
Average CPU
- If utilization reaches 100% during times of heavy usage
Utilization
the autoscaler might increase the number of CPUs by
- 50%
- 4 instances
- whichever is larger
Autoscaling Policy

Stackdriver monitoring
Average CPU utilization
metrics

HTTP(S) load balancing


Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- Can configure the autoscaler to use standard or
custom metrics
- Not all standard metrics are valid utilization metrics that
Stackdriver the autoscaler can use
monitoring metrics
- the metric must contain data for a VM instance
- the metric must define how busy the resource is, the
metric value increases or decreases proportional to the
number of instances in the group
Autoscaling Policy

Stackdriver monitoring
Average CPU utilization
metrics

HTTP(S) load balancing


Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
HTTP(S) Load Balancing Ser ver Capacity
HTTP(S) Load Balancing Ser ver Capacity
- Only works with
- CPU utilization
HTTP(S) Load
Balancing Server - maximum requests per second/instance
Capacity
- These are the only settings that can be controlled by
adding and removing instances
Autoscaling does not work with
maximum requests per group

This setting is independent of the


number of instances in a group
The autoscaler will scale based on the policy which
provides the largest number of VMs in the group
Autoscaler with
Multiple Policies This ensures that you always have enough machines to
handle your workload

Can handle a maximum of 5 policies at a time


cpuUtilization with target of 0.8

loadBalancingUtilization with target of 0.6


Example Policies
customMetricUtilization for metric1 with target of 1000

customMetricUtilization for metric2 with target of 2000


cpuUtilization 0.5

Example loadBalancingUtilization 0.4


Utilization
customMetricUtilization 1100

customMetricUtilization 2700
cpuUtilization 7

Additional Machines loadBalancingUtilization 7


Recommended
customMetricUtilization 11

customMetricUtilization 14

You might also like