Managed Instance Groups and Load Balancing
Managed instance groups are a pool of similar machines
which can be scaled automatically
Over view Load balancing can be external or internal, global or
regional
Basic components of HTTP(S) load balancing - target
proxy, URL map, backend service and backends
Use cases and architecture diagrams for all the load
balancing types HTTP(S), SSL proxy, TCP proxy,
network and internal load balancing
Managed Instance Groups
Instance Groups
A group of machines which can be created and managed together to avoid
individually controlling each instance in the project
2 Kinds of Instance Groups
Managed Unmanaged
2 Kinds of Instance Groups
Managed Unmanaged
- Uses an instance template to create a group of
Managed identical instances
Instance Group - Changes to the instance group changes all instances in
the group
Instance Template
Defines the machine type, image, zone and other properties of an instance.
A way to save the instance configuration to use it later to create new
instances or groups of instances
- Global resource not bound to a zone or a region
Instance - Can reference zonal resources such as a persistent
Template disk
- In such cases can be used only within the zone
- Can automatically scale the number of instances in the
group
- Work with load balancing to distribute traffic across
Managed instances
Instance Group - If an instance stops, crashes or is deleted the group
automatically recreates the instance with the same
template
- Can identify and recreate unhealthy instances in a group
(autohealing)
2 Types of Managed Instance Groups
Zonal
Managed
Regional
- Prefer regional instance groups to zonal so application
load can be spread across multiple zones
Zonal vs.
Regional MIG - This protects against failures within a single zone
- Choose zonal if you want lower latency and avoid
cross-zone communication
- A MIG applies health checks to monitor the instances in
the group
- If a service has failed on an instance, that instance is
recreated (autohealing)
Health Checks
and Autohealing - Similar to health checks used in load balancing but the
objective is different
- LB health checks are used to determine where to
send traffic
- MIG health checks are used to recreate instances
- Typically configure health checks for both LB and MIGs
- The new instance is recreated based on the template
Health Checks that was used to originally create it (might be different
and Autohealing from the default instance template)
- Disk data might be lost unless explicitly snapshotted
- Check Interval: The time to wait between attempts to
check instance health
- Timeout: The length of time to wait for a response
Configuring before declaring check attempt failed
Health Checks - Health Threshold: How many consecutive “healthy”
responses indicate that the VM is healthy
- Unhealthy Threshold: How many consecutive “failed”
responses indicate VM is unhealthy
2 Kinds of Instance Groups
Managed Unmanaged
- Groups of dissimilar instances that you can add and
remove from the group
Unmanaged - Do not offer autoscaling, rolling updates or instance
Instance Groups
templates
- Not recommended, used only when you need to apply
load balancing to pre-existing configurations
Load Balancing
Load Balancing
Google Cloud
Load Balancer
User
- Load balancing and autoscaling for groups of instances
- Scale your application to support heavy traffic
Load Balancing - Detect and remove unhealthy VMs, healthy VMs
automatically re-added
- Route traffic to the closest VM
- Fully managed service, redundant and highly available
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User
Application Layer HTTP/HTTPS
Presentation Layer
Session Layer SSL Proxy
Transport Layer TCP Proxy
Network Layer Network
Data Link Layer
Physical Layer
OSI Net work Stack
User
Application Layer HTTP/HTTPS
Presentation Layer
Rule-of-thumb: Load
SSL Proxy
balancer in the highest
Session Layer
Transport Layer TCP Proxy
Network Layer Network layer possible
Data Link Layer
Physical Layer
- HTTP, HTTPS health checks: Highest fidelity check because they verify
that the web server is up and serving traffic, not just that the instance is
healthy.
Health - SSL health checks: Configure the SSL health checks if your traffic is not
Checks HTTPS but is encrypted via SSL(TLS)
- TCP health checks: For all TCP traffic that is not HTTP(S) or SSL(TLS),
you can configure a TCP health check
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User
Application Layer HTTP/HTTPS
Presentation Layer
HTTP(S) load
SSL Proxy
balancing is the
Session Layer
Transport Layer TCP Proxy
Network Layer Network “smartest”
Data Link Layer
Physical Layer
HTTP/HTTPS Load Balancing
A global, external load balancing service offered on the GCP
HTTP/HTTPS Load Balancing
Distributes HTTP(S) traffic among groups of instances based on:
- proximity to the user
- requested URL
- or both.
HTTP/HTTPS Load Balancing
Traffic from the internet is sent to a global forwarding rule - this rule determines which
proxy the traffic should be directed to
HTTP/HTTPS Load Balancing
The global forwarding rule directs incoming requests to a target HTTP proxy
HTTP/HTTPS Load Balancing
The target HTTP proxy checks each request against a URL map to determine the
appropriate backend service for the request
HTTP/HTTPS Load Balancing
The backend service directs each request to an appropriate backend based on serving
capacity, zone, and instance health of its attached backends
HTTP/HTTPS Load Balancing
The health of each backend instance is verified using either an HTTP health check or an
HTTPS health check - if HTTPS, request is encrypted
HTTP/HTTPS Load Balancing
Actual request distribution can happen based on CPU utilization, requests per instance
HTTP/HTTPS Load Balancing
Can configure the managed instance groups making up the backend to scale
as the traffic scales (based on the parameters of utilization or requests per
second)
HTTP/HTTPS Load Balancing
HTTPS load balancing requires the target proxy to have a signed certificate
to terminate the SSL connection
HTTP/HTTPS Load Balancing
BTW, must create firewall rules to allow requests from load balancer and health checker
to get through to the instances
HTTP/HTTPS Load Balancing
Session affinity: All requests from same client to same server based on either
- client IP
- cookie
HTTP/HTTPS Load Balancing
Global For warding Rules
- Route traffic by IP address, port and protocol to a load
balancing proxy
Global - Can only be used with global load balancing HTTP(S),
For warding Rules SSL Proxy and TCP Proxy
- Regional forwarding rules can be used with regional
load balancing and individual instances
HTTP/HTTPS Load Balancing
Target Proxy
- Referenced by one or more global forwarding rules
- Route the incoming requests to a URL map to
determine where they should be sent
- Specific to a protocol (HTTP, HTTPS, SSL and TCP)
Target Proxy
- Should have a SSL certificate if it terminates HTTPS
connections (limit of 10 SSL certificates)
- Can connect to backend services via HTTP or
HTTPS
HTTP/HTTPS Load Balancing
URL Map
- Used to direct traffic to different instances based on
the incoming URL
URL Map - http://www.example.com/audio -> backend service1
- http://www.example.com/vide -> backend service2
URL Map
All traffic sent to the same groups of instances
Only the /* path matcher is created automatically and directs all traffic to the same
backend service
URL Map
Host rules — example.com, customer.com
URL Map
Path rules — /video, /video/hd, /video/sd
URL Map
A default path matcher /* is created automatically. Traffic which does not match
other path rules is sent to this default service
URL Map With Host Rule
example.com requests will be sent to one set of backends
URL Map With Host Rule
Requests for all other hosts will go to the default backend
URL Map With Path Rules
Path rules for video
URL Map With Path Rules
More specific rules for /video/sd and /video/hd
URL Map With Path Rules
No host name
URL Map With Path Rules
Default backend service when no path rule matches
URL Map With Path Rules
Path other than /video/hd and /video/sd
URL Map With Path Rules
Backends for paths which match /video/hd and /video/sd
HTTP/HTTPS Load Balancing
Backend Service
- Centralized service for managing backends
- Backends contain instance groups which handle user
Backend requests
Service - Knows which instances it can use, how much traffic
they can handle
- Monitors the health of backends and does not send
traffic to unhealthy instances
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- HTTP(S), SSL and TCP health checks
- HTTP(S): Verifies that the instance is healthy and the
web server is serving traffic
Health - TCP, SSL: Used when the service expects TCP or SSL
Checks connection i.e. not HTTP(S)
- GCP creates redundant copies of the health checker
automatically so health checks might happen more
frequently that you expect
- Health Check: Pools instances to determine which
one can receive requests
- Backends: Instance group of VMs which can be
Backend automatically scaled
Service
Components - Session Affinity: Attempts to send requests from the
same client to the same VM
- Timeout: Time the backend service will wait for a
backend to respond
- Client IP: Hashes the IP address to send requests
from the same IP to the same VM
- Requests from different users might look like it is
from the same IP
Session - Users which move networks might lose affinity
Affinity - Cookie: Issues a cookie named GCLB in the first
request.
- Subsequent requests from clients with the cookie are
sent to the same instance
HTTP/HTTPS Load Balancing
Backend
- Instance Group: Can be a managed or unmanaged
instance group
- Balancing Mode: Determines when the backend is at
Backends full usage
- CPU utilization, Requests per second
- Capacity Setting: A % of the balancing mode which
determines the capacity of the backend
- Allow you to use Cloud Storage buckets with HTTP(S)
Backend load balancing
Buckets - Traffic is directed to the bucket instead of a backend
- Useful in load balancing requests to static content
Backend Buckets
Backend Buckets
Backend Buckets
Backend Buckets
A path of /static can be sent to the storage bucket and all
other paths go to the instances
- Uses CPU utilization of the backend or requests per
Load second as the balancing mode
Distribution - Maximum values can be specified for both
- Short bursts of traffic above the limit can occur
- Incoming requests are first sent to the region closest
to the user, if that region has capacity
Load - Traffic distributed amongst zone instances based on
Distribution capacity
- Round robin distribution across instances in a zone
- Round robin can be overridden by session affinity
HTTP/HTTPS Load Balancing
Firewall Rules
- Allow traffic from 130.211.0.0/22 and
35.191.0.0/16 to reach your instances
Firewall - IP ranges that the load balancer and the health
checker use to connect to backends
Rules - Allow traffic on the port that the global forwarding rule
has been configured to use
HTTP/HTTPS Load Balancing
Cross-Regional Content-based
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User
Application Layer HTTP/HTTPS
SSL operates in
Presentation Layer
Session Layer SSL Proxy
the session layer
Transport Layer TCP Proxy
Network Layer Network
Data Link Layer
Physical Layer
- Remember the OSI network layer stack: physical, data link, network,
transport, session, presentation, application?
SSL Proxy Load - The usual combination is TCP/IP: network = IP, transport = TCP,
Balancing application = HTTP
- For secure traffic: add session layer = SSL (secure socket layer), and
application layer = HTTPS
- Use only for non-HTTP(S) SSL traffic
SSL Proxy Load - For HTTP(S), just use HTTP(S) load balancing
Balancing
- SSL connections are terminated at the global layer then proxied to the
closest available instance group
SSL Proxy Load Balancing
Users have a secure
connection to the SSL
proxy
SSL Proxy Load Balancing
Load balancing SSL proxy
SSL Proxy Load Balancing
Makes fresh connections to
the backends - this
connection can be SSL or
non-SSL
SSL Proxy Load Balancing
The SSL connections are
terminated at the global
layer and then proxied to
the closest available
instance group
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User
Application Layer HTTP/HTTPS
Presentation Layer
Session Layer SSL Proxy
Transport Layer TCP Proxy
Network Layer Network
Data Link Layer
Physical Layer
- Perform load balancing based on transport layer (TCP)
TCP Proxy Load - Allows you to use a single IP address for all users around the world.
Balancing
- Automatically routes traffic to the instances that are closest to the user.
- Advantage of transport layer load balancing:
- more intelligent routing possible than with network layer load
TCP Proxy Load balancing
Balancing - better security - TCP vulnerabilities can be patched at the load
balancer
TCP Proxy Load Balancing
TCP traffic from users goes
to the TCP proxy load
balancer
TCP Proxy Load Balancing
Proxy makes new
connections to the backend
- these can be TCP
connections or even SSL
connections
TCP Proxy Load Balancing
The TCP connections are
terminated at the global
layer and then proxied to
the closest available
instance group
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
OSI Net work Stack
User
Application Layer HTTP/HTTPS
Presentation Layer
Session Layer SSL Proxy
Transport Layer TCP Proxy
Network Layer Network
Data Link Layer
Physical Layer
- Based on incoming IP protocol data, such as address,
port, and protocol type
- Pass-through, regional load balancer - does not proxy
Net work Load connections from clients
Balancing - Use it to load balance UDP traffic, and TCP and SSL
traffic
- Load balances traffic on ports that are not supported by
the SSL proxy and TCP proxy load balancers
- Picks an instance based on a hash of:
- the source IP and port
- destination IP and port
- protocol
Load Balancing - This means that incoming TCP connections are spread across instances
and each new connection may go to a different instance.
Algorithm
- Regardless of the session affinity setting, all packets for a connection are
directed to the chosen instance until the connection is closed and have no
impact on load balancing decisions for new incoming connections
- This can result in imbalance between backends if long-lived TCP
connections are in use.
- Network load balancing forwards traffic to target pools
- A group of instances which receive incoming traffic from forwarding
rules
- Can only be used with forwarding rules for TCP and UDP traffic
Can have backup pools which will receive requests if the first pool is
Target Pools -
unhealthy
- failoverRatio is the ratio of healthy instances to failed instances in a
pool
- If primary target pool’s ratio is below the failoverRatio traffic is sent
to the backup pool
- Configured to check instance health in target pools
Health
- Network load balancing uses legacy health checks for determining
Checks instance health
- HTTP health check probes are sent from the IP ranges
Firewall 209.85.152.0/22, 209.85.204.0/22, and 35.191.0.0/16.
Rules - The load balancer uses the same ranges to connect to the instances
- Firewall rules should be configured to allow traffic from these IP ranges
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
Load Balancing
External Internal
Regional
Regional
Global
Network
TCP
HTTP/ SSL
Proxy
HTTPS Proxy
External Load Balancing
Google Cloud
Load Balancer
User
Internal Load Balancing
Project
VPC #1 Subnet 2
Subnet 1
Private IP addresses
- Private load balancing IP address that only your VPC
instances can access
VPC traffic stays internal - less latency, more security
Internal Load -
No public IP address needed
Balancing -
- Useful to balance requests from your frontend instances to your
backend instances
Internal Load Balancing
A single subnet in the
region us-central
Internal Load Balancing
All instances belong to the
same VPC and region but
can be in different subnets
Internal Load Balancing
2 backend instance groups
across two zones
Internal Load Balancing
The load balancing IP is
from the same VPC
network
Internal Load Balancing
The request gets forwarded
to one of the two instance
groups with the subnet
- The backend instance for a client is selected using a hashing algorithm that
takes instance health into consideration.
- Using a 5-tuple hash, five parameters for hashing:
Load Balancing
Algorithm - client source IP
- client port
- destination IP (the load balancing IP)
- destination port
- protocol (either TCP or UDP)
- Introduce session affinity by hashing on only some of the 5 parameters
Load Balancing - Hash based on 3-tuple (Client IP, Dest IP, Protocol)
Algorithm
- Hash based on 2-tuple (Client IP, Dest IP)
- HTTP, HTTPS health checks: These provide the highest fidelity,
they verify that the web server is up and serving traffic, not just that
the instance is healthy.
Health - SSL (TLS) health checks: Configure the SSL health checks if your
Checks traffic is not HTTPS but is encrypted via SSL(TLS)
- TCP health checks: For all TCP traffic that is not HTTP(S) or
SSL(TLS), you can configure a TCP health check
- Managed service. no additional configuration needed to ensure high
availability
Can configure multiple instance groups in different zones to guard
High -
against failures in a single zone
Availability
- With multiple instance groups all instances are treated as if they are in a
single pool and the load balancer distributes traffic amongst them
using the load balancing algorithm
- Configure an internal IP on a load balancing device or instance(s) and your
client instance connects to this IP
Traditional (Proxy) - Traffic coming to the IP is terminated at the load balancer
Internal Load
Balancing - The load balancer selects a backend and establishes a new connection to it
- In effect, there are two connections: Client<->Load Balancer and Load
Balancer<->Backend.
Traditional (Proxy)
Internal Load
Balancing
- Not proxied - differs from traditional model
- lightweight load-balancing built on top of Andromeda network virtualization
GCP Internal stack
Load Balancing
- provides software-defined load balancing that directly delivers the traffic
from the client instance to a backend instance
GCP Internal
Load Balancing
Use Case: 3-tier Web App
Use Case: 3-tier Web App
External HTTP(S) load
balancer to manage
client traffic to frontend
instance groups
Use Case: 3-tier Web App
Frontend instances are
connected to the backend
instances using an internal
load balancer
Autoscaling
- Managed instance groups automatically add or remove
instances based on increases and decreases in load
- Helps your applications gracefully handle increases in
Autoscaling traffic
- Reduces cost when load is lower
- Define autoscaling policy, the autoscaler takes care of
the rest
Autoscaling is a feature of managed
instance groups
Unmanaged instance groups are not
supported
Autoscaling is a feature of managed
instance groups
For GKE groups autoscaling is different,
called Cluster Autoscaling
- Autoscaling Policy
Autoscaling - Target Utilization Level
Autoscaling Policy
Stackdriver monitoring
Average CPU utilization
metrics
HTTP(S) load balancing
Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- The level at which you want to maintain your VMs
Target
Utilization Level - Interpreted differently based on the autoscaling policy that
you’ve chosen
Autoscaling Policy
Stackdriver monitoring
Average CPU utilization
metrics
HTTP(S) load balancing
Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- Target utilization level of 0.75 maintains average CPU
utilization at 75% across all instances
- If utilization exceed the target, more CPUs will be
added
Average CPU
- If utilization reaches 100% during times of heavy usage
Utilization
the autoscaler might increase the number of CPUs by
- 50%
- 4 instances
- whichever is larger
Autoscaling Policy
Stackdriver monitoring
Average CPU utilization
metrics
HTTP(S) load balancing
Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
- Can configure the autoscaler to use standard or
custom metrics
- Not all standard metrics are valid utilization metrics that
Stackdriver the autoscaler can use
monitoring metrics
- the metric must contain data for a VM instance
- the metric must define how busy the resource is, the
metric value increases or decreases proportional to the
number of instances in the group
Autoscaling Policy
Stackdriver monitoring
Average CPU utilization
metrics
HTTP(S) load balancing
Pub/Sub queueing
server capacity
workload (alpha)
(utilization or RPS)
HTTP(S) Load Balancing Ser ver Capacity
HTTP(S) Load Balancing Ser ver Capacity
- Only works with
- CPU utilization
HTTP(S) Load
Balancing Server - maximum requests per second/instance
Capacity
- These are the only settings that can be controlled by
adding and removing instances
Autoscaling does not work with
maximum requests per group
This setting is independent of the
number of instances in a group
The autoscaler will scale based on the policy which
provides the largest number of VMs in the group
Autoscaler with
Multiple Policies This ensures that you always have enough machines to
handle your workload
Can handle a maximum of 5 policies at a time
cpuUtilization with target of 0.8
loadBalancingUtilization with target of 0.6
Example Policies
customMetricUtilization for metric1 with target of 1000
customMetricUtilization for metric2 with target of 2000
cpuUtilization 0.5
Example loadBalancingUtilization 0.4
Utilization
customMetricUtilization 1100
customMetricUtilization 2700
cpuUtilization 7
Additional Machines loadBalancingUtilization 7
Recommended
customMetricUtilization 11
customMetricUtilization 14