Container Networking Docker Kubernetes
Container Networking Docker Kubernetes
m
pl
im
en
Container
ts
of
Networking
From Docker to Kubernetes
Michael Hausenblas
The NGINX Application Platform
powers Load Balancers,
Microservices & API Gateways
https://www.nginx.com/solutions/microservices/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/ https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/web-mobile-acceleration/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/ https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/web-mobile-acceleration/
https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/adc/
https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/web-mobile-acceleration/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/web-mobile-acceleration/
https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/ https://www.nginx.com/solutions/web-mobile-acceleration/ https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/web-mobile-acceleration/ https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/adc/
https://www.nginx.com/solutions/adc/
https://www.nginx.com/solutions/microservices/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/
https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/application-security/ https://www.nginx.com/solutions/web-mobile-acceleration/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/microservices/
https://www.nginx.com/solutions/adc/ https://www.nginx.com/solutions/cloud/ https://www.nginx.com/solutions/api-gateway/
https://www.nginx.com/products/ https://www.nginx.com/
FREE TRIAL
https://www.nginx.com/products/ LEARN MORE
https://www.nginx.com/
https://www.nginx.com/products/ https://www.nginx.com/
https://www.nginx.com/
Learn more at nginx.com
https://www.nginx.com/
https://www.nginx.com/ https://www.nginx.com/
Container Networking
From Docker to Kubernetes
Michael Hausenblas
The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Container Networking, the cover
image, and related trade dress are trademarks of O’Reilly Media, Inc.
While the publisher and the author have used good faith efforts to ensure that the information and
instructions contained in this work are accurate, the publisher and the author disclaim all responsi‐
bility for errors or omissions, including without limitation responsibility for damages resulting from
the use of or reliance on this work. Use of the information and instructions contained in this work is
at your own risk. If any code samples or other technology this work contains or describes is subject
to open source licenses or the intellectual property rights of others, it is your responsibility to ensure
that your use thereof complies with such licenses and/or rights.
This work is part of a collaboration between O’Reilly and NGINX. See our statement of editorial
independence.
978-1-492-03681-4
[LSI]
Table of Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
1. Motivation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Introducing Pets Versus Cattle 1
Go Cattle! 2
The Container Networking Stack 3
Do I Need to Go “All In”? 4
3. Multi-Host Networking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Multi-Host Container Networking 101 13
Options for Multi-Host Container Networking 13
Docker Networking 15
Administrative Considerations 16
Wrapping It Up 16
4. Orchestration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
What Does a Scheduler Actually Do? 19
Docker 20
Apache Mesos 21
Hashicorp Nomad 23
Community Matters 25
Wrapping It Up 25
v
5. Service Discovery. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
The Challenge 27
Technologies 28
Load Balancing 32
Wrapping It Up 34
7. Kubernetes Networking. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
A Gentle Kubernetes Introduction 43
Kubernetes Networking Overview 45
Intra-Pod Networking 46
Inter-Pod Networking 47
Service Discovery in Kubernetes 50
Ingress and Egress 53
Advanced Kubernetes Networking Topics 55
Wrapping It Up 57
A. References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
vi | Table of Contents
Preface
When you start building your first containerized application, you’re excited
about the capabilities and opportunities you encounter: it runs the same in dev
and in prod, it’s straightforward to put together a container image using Docker,
and the distribution is taken care of by a container registry.
So, you’re satisfied with how quickly you were able to containerize an existing,
say, Python app, and now you want to connect it to another container that has a
database, such as PostgreSQL. Also, you don’t want to have to manually launch
the containers and implement your own system that takes care of checking if the
containers are still running and, if not, relaunching them.
At this juncture, you might realize there’s a challenge you’re running into: con‐
tainer networking. Unfortunately, there are still a lot of moving parts in this
domain and there are currently few best practice resources available in a central
place. Fortunately, there are tons of articles, repos, and recipes available on the
wider internet and with this book you have a handy way to get access to many of
them in a simple and comprehensive format.
vii
• Without a proper understanding of the networking aspect of (Docker) con‐
tainers and a sound strategy in place, you will have more than one bad day
when adopting containers.
• Service discovery and container orchestration are two sides of the same coin.
• The space of container networking and service discovery is still relatively
young: you will likely find yourself starting out with one set of technologies
and then changing gears and trying something else. Don’t worry, you’re in
good company.
• You are a software developer who drank the (Docker) container Kool-Aid.
• You work in network operations and want to brace yourself for the upcom‐
ing onslaught of your enthusiastic developer colleagues.
• You are an aspiring Site Reliability Engineer (SRE) who wants to get into the
container business.
• You are an (enterprise) software architect who is in the process of migrating
existing workloads to a containerized setup.
Last but not least, distributed application developers and backend engineers
should also be able to extract some value out of it.
Note that this is not a hands-on book. Besides some single-host Docker network‐
ing stuff in Chapter 2 and some of the material about Kubernetes in Chapter 7, I
don’t show a lot of commands or source code; consider this book more like a
guide, a heavily annotated bookmark collection. You will also want to use it to
make informed decisions when planning and implementing containerized appli‐
cations.
About Me
I work at Red Hat in the OpenShift team, where I help devops to get the most out
of the software. I spend my time mainly upstream—that is, in the Kubernetes
community, for example in the Autoscaling, Cluster Lifecycle, and Apps Special
Interest Groups (SIGs).
Before joining Red Hat in the beginning of 2017 I spent some two years at Meso‐
sphere, where I also did containers, in the context of (surprise!) Mesos. I also
have a data engineering background, having worked as Chief Data Engineer at
viii | Preface
MapR Inc. prior to Mesosphere, mainly on distributed query engines and data‐
stores as well as building data pipelines.
Last but not least, I’m a pragmatist and tried my best throughout the book to
make sure to be unbiased toward the technologies discussed here.
Acknowledgments
A big thank you to the O’Reilly team, especially Virginia Wilson. Thanks for your
guidance and feedback on the first iteration of the book (back then called Docker
Networking and Service Discovery), which came out in 2015, and for putting up
with me again.
A big thank you to Nic (Sheriff) Jackson of HashiCorp for your time around
Nomad. You rock, dude!
Thanks a million Bryan Boreham of Weaveworks! You provided super-valuable
feedback and I appreciate your suggestions concerning the flow as well as your
diligence, paying attention to details and calling me out when I drifted off and/or
made mistakes. Bryan, who’s a container networking expert and CNI 7th dan, is
the main reason this book in its final version turned out to be a pretty good read
(I think).
Last but certainly not least, my deepest gratitude to my awesome and supportive
family: our two girls Saphira (aka The Real Unicorn—love you hun :) and Ranya
(whose talents range from Scratch programming to Irish Rugby), our son Iannis
(sigh, told you so, you ain’t gonna win the rowing championship with a broken
hand, but you’re still dope), and my wicked smart and fun wife Anneliese (did I
empty the washing machine? Not sure!).
Preface | ix
CHAPTER 1
Motivation
In this chapter I’ll introduce you to the pets versus cattle approach concerning
compute infrastructure as well as what container networking entails. It sets the
scene, and if you’re familiar with the basics you may want to skip this chapter.
• With the pets approach to infrastructure, you treat the machines as individu‐
als. You give each (virtual) machine a name, and applications are statically
allocated to machines. For example, db-prod-2 is one of the production
servers for a database. The apps are manually deployed, and when a machine
gets ill you nurse it back to health and manually redeploy the app it ran onto
another machine. This approach is generally considered to be the dominant
paradigm of a previous (non–cloud native) era.
• With the cattle approach to infrastructure, your machines are anonymous;
they are all identical (modulo hardware upgrades), they have numbers rather
than names, and apps are automatically deployed onto any and each of the
machines. When one of the machines gets ill, you don’t worry about it
immediately; you replace it—or parts of it, such as a faulty hard disk drive—
when you want and not when things break.
1 In all fairness, Randy did attribute the origins to Bill Baker of Microsoft.
1
While the original meme was focused on virtual machines, we apply the cattle
approach to infrastructure.
Go Cattle!
The beautiful thing about applying the cattle approach to infrastructure is that it
allows you to scale out on commodity hardware.2
It gives you elasticity with the implication of hybrid cloud capabilities. This is a
fancy way of saying that you can have parts of your deployments on premises and
burst into the public cloud—using services provided by the likes of Amazon,
Microsoft, and Google, or the infrastructure-as-a-service (IaaS) offerings of dif‐
ferent provides like VMware—if and when you need to.
Most importantly, from an operator’s point of view, the cattle approach allows
you to get a decent night’s sleep, as you’re no longer paged at 3 a.m. just to replace
a broken hard disk drive or to relaunch a hanging app on a different server, as
you would have done with your pets.
However, the cattle approach poses some challenges that generally fall into one of
the following two categories:
Social challenges
I dare say most of the challenges are of a social nature: How do I convince
my manager? How do I get buy-in from my CTO? Will my colleagues oppose
this new way of doing things? Does this mean we will need fewer people to
manage our infrastructure?
I won’t pretend to offer ready-made solutions for these issues; instead, go
buy a copy of The Phoenix Project by Gene Kim, Kevin Behr, and George
Spafford (O’Reilly), which should help you find answers.
Technical challenges
This category includes issues dealing with things like base provisioning of
the machines—e.g., using Ansible to install Kubernetes components, how to
set up the communication links between the containers and to the outside
world, and most importantly, how to ensure the containers are automatically
deployed and are discoverable.
Now that you know about pets versus cattle, you are ready to have a look at the
overall container networking stack.
2 Typically even heterogeneous hardware. For example, see slide 7 of Thorvald Natvig’s talk “Challenging
Fundamental Assumptions of Datacenters: Decoupling Infrastructure from Hardware” from Velocity
2015.
2 | Chapter 1: Motivation
The Container Networking Stack
The overall stack we’re dealing with here is comprised of the following:
The low-level networking layer
This includes networking gear, iptables, routing, IPVLAN, and Linux
namespaces. You usually don’t need to know the details of this layer unless
you’re on the networking team, but you should at least be aware of it. Note
that the technologies here have existed and been used for a decade or more.
The container networking layer
This layer provides some abstractions, such as the single-host bridge net‐
working mode and the multi-host, IP-per-container solution. I cover this
layer in Chapters 2 and 3.
The container orchestration layer
Here, we’re marrying the container scheduler’s decisions on where to place a
container with the primitives provided by lower layers. In Chapter 4 we look
at container orchestration systems in general, and in Chapter 5 we focus on
the service discovery aspect, including load balancing. Chapter 6 deals with
the container networking standard, CNI, and finally in Chapter 7 we look at
Kubernetes networking.
If you are on the network operations team, you’re probably good to go for the
next chapter. However, if you’re an architect or developer and your networking
knowledge might be a bit rusty, I suggest brushing up by studying the Linux Net‐
work Administrators Guide before advancing.
Note that the stage doesn’t necessarily correspond with the size of the deploy‐
ment. For example, Gutefrage.de only has six bare-metal servers under manage‐
ment but uses Apache Mesos to manage them, and you can run a Kubernetes
cluster easily on a Raspberry Pi.
One last remark before we move on: by now, you might have realized that we are
dealing with distributed systems in general here. Given that we will usually want
to deploy containers into a network of computers, may I suggest reading up on
the fallacies of distributed computing, in case you are not already familiar with
this topic?
And now let’s move on to the deep end of container networking.
4 | Chapter 1: Motivation
CHAPTER 2
Introduction to Container Networking
5
Figure 2-1. Simplified Docker architecture for a single host
The relationship between a host and containers is 1:N. This means that one host
typically has several containers running on it. For example, Facebook reports that
—depending on how beefy the machine is—it sees on average some 10 to 40 con‐
tainers per host running.
No matter if you have a single-host deployment or use a cluster of machines, you
will likely have to deal with networking:
• For single-host deployments, you almost always have the need to connect to
other containers on the same host; for example, an application server like
WildFly might need to connect to a database.
• In multi-host deployments, you need to consider two aspects: how containers
are communicating within a host and how the communication paths look
between different hosts. Both performance considerations and security
aspects will likely influence your design decisions. Multi-host deployments
usually become necessary either when the capacity of a single host is insuffi‐
cient, for resilience reasons, or when one wants to employ distributed sys‐
tems such as Apache Spark or Apache Kafka.
Simply put, Docker networking is the native container SDN solution you have at
your disposal when working with Docker.
Because bridge mode is the Docker default, you could have used docker
run -d -P nginx:1.9.1 in the previous command instead. If you do
not use the -P argument, which publishes all exposed ports of the con‐
tainer, or -p <host_port>:<container_port>, which publishes a spe‐
cific port, the IP packets will not be routable to the container outside of
the host.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED
No Networking
This mode puts the container inside its own network namespace but doesn’t con‐
figure it. Effectively, this turns off networking and is useful for two cases: for con‐
tainers that don’t need a network, such as batch jobs writing to a disk volume, or
if you want to set up your own custom networking (see Chapter 3 for a number
of options that leverage this). Here’s an example:
$ docker run -d -P --net=none nginx:1.9.1
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED
STATUS PORTS NAMES
d8c26d68037c nginx:1.9.1 nginx -g 2 minutes ago
Up 2 minutes grave_perlman
$ docker inspect d8c26d68037c | grep IPAddress
"IPAddress": "",
"SecondaryIPAddresses": null,
As this example shows, there is no network configured, precisely as we would
have expected.
You can read more about networking and learn about configuration options via
the Docker docs.
Administrative Considerations
We will now briefly discuss other aspects you should be aware of from an admin‐
istrative point of view. Most of these issues are equally relevant for multi-host
deployments:
Wrapping It Up
In this chapter, we had a look at the four basic single-host networking modes and
related admin issues. Now that you have a basic understanding of the single-host
case, let’s have a look at a likely more interesting case: multi-host container net‐
working.
1 New Relic, for example, found the majority of the overall uptime of the containers in one particular
setup in the low minutes; see also the update here.
Wrapping It Up | 11
CHAPTER 3
Multi-Host Networking
13
• Flannel by CoreOS (see “flannel” on page 14)
• Weave Net by Weaveworks (see “Weave Net” on page 14)
• Metaswitch’s Project Calico (see “Project Calico” on page 14)
• Open vSwitch from the OpenStack project (see “Open vSwitch” on page 15)
• OpenVPN (see “OpenVPN” on page 15)
flannel
CoreOS’s flannel is a virtual network that assigns a subnet to each host for use
with container runtimes. Each container—or pod, in the case of Kubernetes—has
a unique, routable IP inside the cluster. flannel supports a range of backends,
such as VXLAN, AWS VPC, and the default layer 2 UDP network. The advantage
of flannel is that it reduces the complexity of doing port mapping. For example,
Red Hat’s Project Atomic uses flannel.
Weave Net
Weaveworks’s WeaveNet creates a virtual network that connects Docker contain‐
ers deployed across multiple hosts. Applications use the network just as if the
containers were all plugged into the same network switch, with no need to con‐
figure port mappings and links. Services provided by application containers on
the Weave network can be made accessible to the outside world, regardless of
where those containers are running.
Similarly, existing internal systems can be exposed to application containers irre‐
spective of their location. Weave can traverse firewalls and operate in partially
connected networks. Traffic can be encrypted, allowing hosts to be connected
across an untrusted network. You can learn more about Weave’s discovery fea‐
tures in the blog post “Automating Weave Deployment on Docker Hosts with
Weave Discovery” by Alvaro Saurin.
If you want to give Weave a try, check out its Katacoda scenarios.
Project Calico
Metaswitch’s Project Calico uses standard IP routing—to be precise, the venera‐
ble Border Gateway Protocol (BGP), as defined in RFC 1105—and networking
tools to provide a layer 3 solution. In contrast, most other networking solutions
build an overlay network by encapsulating layer 2 traffic into a higher layer.
The primary operating mode requires no encapsulation and is designed for data‐
centers where the organization has control over the physical network fabric.
Open vSwitch
Open vSwitch is a multilayer virtual switch designed to enable network automa‐
tion through programmatic extension while supporting standard management
interfaces and protocols, such as NetFlow, IPFIX, LACP, and 802.1ag. In addi‐
tion, it is designed to support distribution across multiple physical servers and is
used in Red Hat’s Kubernetes distro OpenShift, the default switch in Xen, KVM,
Proxmox VE, and VirtualBox. It has also been integrated into many private cloud
systems, such as OpenStack and oVirt.
OpenVPN
OpenVPN, another OSS project that has a commercial offering, allows you to
create virtual private networks (VPNs) using TLS. These VPNs can also be used
to securely connect containers to each other over the public internet. If you want
to try out a Docker-based setup, I suggest taking a look at DigitalOcean’s “How to
Run OpenVPN in a Docker Container on Ubuntu 14.04” walk-through tutorial.
Docker Networking
Docker 1.9 introduced a new docker network command. With this, containers
can dynamically connect to other networks, with each network potentially
backed by a different network driver.
In March 2015, Docker Inc. acquired the SDN startup SocketPlane and rebran‐
ded its product as the Overlay Driver. Since Docker 1.9, this is the default for
multi-host networking. The Overlay Driver extends the normal bridge mode
with peer-to-peer communication and uses a pluggable key-value store backend
to distribute cluster state, supporting Consul, etcd, and ZooKeeper.
To learn more, I suggest checking out the following blog posts:
Docker Networking | 15
Administrative Considerations
In the last section of this chapter we will discuss some administrative aspects you
should be aware of:
IPVLAN
Linux kernel version 3.19 introduced an IP-per-container feature. This
assigns each container on a host a unique and routable IP address. Effec‐
tively, IPVLAN takes a single network interface and creates multiple virtual
network interfaces with different MAC addresses assigned to them.
This feature, which was contributed by Mahesh Bandewar of Google, is con‐
ceptually similar to the macvlan driver but is more flexible because it’s oper‐
ating both on L2 and L3. If your Linux distro already has a kernel > 3.19,
you’re in luck. Otherwise, you cannot yet benefit from this feature.
IP address management (IPAM)
One of the key challenges of multi-host networking is the allocation of IP
addresses to containers in a cluster. There are two strategies one can pursue:
either find a way to realize it in your existing (corporate) network or spawn
an orthogonal, practically hidden networking layer (that is, an overlay net‐
work). Note that with IPv6 this situation is relaxed, since it should be a lot
easier to find a free address space.
Orchestration tool compatibility
Many of the multi-host networking solutions discussed in this chapter are
effectively coprocesses wrapping the Docker API and configuring the net‐
work for you. This means that before you select one, you should make sure
to check for any compatibility issues with the container orchestration tool
you’re using. You’ll find more on this topic in Chapter 4.
IPv4 versus IPv6
To date, most Docker deployments use the standard IPv4, but IPv6 is wit‐
nessing some uptake. Docker has supported IPv6 since v1.5, released in Feb‐
ruary 2015; however, the IPv6 support in Kubernetes is not yet complete.
The ever-growing address shortage in IPv4-land might encourage more IPv6
deployments down the line, also getting rid of network address translation
(NAT), but it is unclear when exactly the tipping point will be reached.
Wrapping It Up
In this chapter, we reviewed multi-host networking options and touched on
admin issues such as IPAM and orchestration. At this juncture you should have a
good understanding of the low-level single-host and multi-host networking
options and their challenges. Let’s now move on to container orchestration, look‐
ing at how it depends on networking and how it interacts with it.
With the cattle approach to managing infrastructure, you don’t manually allocate
certain machines for running an application. Instead, you leave it up to an
orchestrator to manage the life cycle of your containers. In Figure 4-1, you can
see that container orchestration includes a range of functions, including but not
limited to:
17
Figure 4-1. Orchestration and its constituents
Sometimes considered part of orchestration but outside the scope of this book is
the topic of base provisioning—that is, installing or upgrading the local operating
system on a node or setting up the container runtime there.
Service discovery (covered in greater detail in Chapter 5) and scheduling are
really two sides of the same coin. The scheduler decides where in a cluster a con‐
tainer is placed and supplies other parts with an up-to-date mapping in the form
containers -> locations. This mapping can then be represented in various
ways, be it in a distributed key-value store such as etcd, via DNS, or through
environment variables.
In this chapter we will discuss networking and service discovery from the point
of view of the following container orchestration solutions: Docker Swarm and
swarm mode, Apache Mesos, and HashiCorp Nomad. These three are (along
with Kubernetes, which we will cover in detail in Chapter 7) alternatives your
organization may already be using, and hence, for the sake of completeness, it’s
worth exploring them here. To make it clear, though, as of early 2018 the indus‐
try has standardized on Kubernetes as the portable way of doing container
orchestration.
18 | Chapter 4: Orchestration
Before we dive into container orchestration systems, though, let’s step back and
review what the scheduler—which is the core component of orchestration—
actually does in the context of containerized workloads.
If you want to learn more about scheduling in distributed systems I suggest you
check out the excellent resource “Cluster Management at Google” by John
Wilkes.
Docker
Docker at the time of writing uses the so-called swarm mode in a distributed set‐
ting, whereas previous to Docker 1.12 the standalone Docker Swarm model was
used. We will discuss both here.
Swarm Mode
Since Docker 1.12, swarm mode has been integrated with Docker Engine. The
orchestration features embedded in Docker Engine are built using SwarmKit.
A swarm in Docker consists of multiple hosts running in swarm mode and acting
as managers and workers—hosts can be managers, workers, or perform both
roles at once. A task is a running container that is part of a swarm service and
managed by a swarm manager, as opposed to a standalone container. A service in
the context of Docker swarm mode is a definition of the tasks to execute on the
manager or worker nodes. Docker works to maintain that desired state; for
example, if a worker node becomes unavailable, Docker schedules the tasks onto
another host.
Docker running in swarm mode doesn’t prevent you from running standalone
containers on any of the hosts participating in the swarm. The essential differ‐
ence between standalone containers and swarm services is that only swarm man‐
agers can manage a swarm, while standalone containers can be started on any
host.
To learn more about Docker’s swarm mode, check out the official “Getting
Started with Swarm Mode” tutorial or check out the Katacoda “Docker Orches‐
tration – Getting Started with Swarm Mode” scenario.
Docker Swarm
Docker historically had a native clustering tool called Docker Swarm. Docker
Swarm builds upon the Docker API1 and works as follows: there’s one Swarm
1 Essentially, this means that you can simply keep using docker run commands and the deployment of
your containers in a cluster happens automagically.
20 | Chapter 4: Orchestration
manager that’s responsible for scheduling, and on each host an agent runs that
takes care of the local resource management (Figure 4-3).
Figure 4-3. Docker Swarm architecture, based on the T-Labs presentation “Swarm –
A Docker Clustering System”
Docker Swarm supports different backends: etcd, Consul, and ZooKeeper. You
can also use a static file to capture your cluster state with Swarm, and recently a
DNS-based service discovery tool for Swarm called wagl has been introduced.
Apache Mesos
Apache Mesos (Figure 4-4) is a general-purpose cluster resource manager that
abstracts the resources of a cluster (CPU, RAM, etc.) in such a way that the clus‐
ter appears like one giant computer to the developer. In a sense, Mesos acts like
the kernel of a distributed operating system. It is hence never used on its own,
but always together with so-called frameworks such as Marathon (for long-
running stuff like a web server) or Chronos (for batch jobs), or big data and fast
data frameworks like Apache Spark or Apache Cassandra.
Apache Mesos | 21
Figure 4-4. Apache Mesos architecture at a glance
Mesos supports both containerized workloads (that is, running Docker contain‐
ers) and plain executables (for example, bash scripts or Linux ELF format binar‐
ies for both stateless and stateful services).
In the following discussion, I’m assuming you’re familiar with Mesos and its ter‐
minology. If you’re new to Mesos, I suggest checking out David Greenberg’s won‐
derful book Building Applications on Mesos (O’Reilly), a gentle introduction to
this topic that’s particularly useful for distributed application developers.
The networking characteristics and capabilities mainly depend on the Mesos
containerizer used:
• For the Mesos containerizer there are a few prerequisites, such as having a
Linux Kernel version > 3.16 and libnl installed. You can then build a Mesos
agent with network isolator support enabled. At launch, you would use
something like the following:
$mesos-slave --containerizer=mesos
--isolation=network/port_mapping
--resources=ports:[31000-32000];ephemeral_ports:[33000-35000]
This would configure the Mesos agent to use nonephemeral ports in the
range from 31,000 to 32,000 and ephemeral ports in the range from 33,000 to
22 | Chapter 4: Orchestration
35,000. All containers share the host’s IP, and the port ranges are spread over
the containers (with a 1:1 mapping between destination port and container
ID). With the network isolator, you also can define performance limitations
such as bandwidth, and it enables you to perform per-container monitoring
of the network traffic. See Jie Yu’s MesosCon 2015 talk “Per Container Net‐
work Monitoring and Isolation in Mesos” for more details on this topic.
• For the Docker containerizer, see Chapter 2.
Note that Mesos supports IP-per-container since version 0.23. If you want to
learn more about Mesos networking check out Christos Kozyrakis and Spike
Curtis’s “Mesos Networking” talk from MesosCon 2015.
While Mesos is not opinionated about service discovery, there is a Mesos-specific
solution that is often used in practice: Mesos-DNS (see “Pure-Play DNS-Based
Solutions” on page 31). There are also a multitude of emerging solutions, such as
traefik (see “Wrapping It Up” on page 34) that are integrated with Mesos and
gaining traction.
If you want to try out Mesos online for free you can use the Katacoda “Deploying
Containers to DC/OS” scenario.
Hashicorp Nomad
Nomad is a cluster scheduler by HashiCorp, the makers of Vagrant. It was intro‐
duced in September 2015 and primarily aims at simplicity. The main idea is that
Nomad is easy to install and use. Its scheduler design is reportedly inspired by
Google’s Omega, borrowing concepts such as having a global state of the cluster
as well as employing an optimistic, concurrent scheduler.
Nomad has an agent-based architecture with a single binary that can take on dif‐
ferent roles, supporting rolling upgrades as well as draining nodes for re-
balancing. Nomad makes use of both a consensus protocol (strongly consistent)
for all state replication and scheduling and a gossip protocol used to manage the
addresses of servers for automatic clustering and multiregion federation. In
Figure 4-5, you can see Nomad’s architecture:
• Servers are responsible for accepting jobs from users, managing clients, and
computing task placements.
Hashicorp Nomad | 23
• Clients (one per VM instance) are responsible for interacting with the tasks
or applications contained within a job. They work in a pull-based manner;
that is, they register with the server and then they poll it periodically to
watch for pending work.
24 | Chapter 4: Orchestration
Community Matters
An important aspect you’ll want to consider when selecting an orchestration sys‐
tem is the community behind and around it.2 Here are a few indicators and met‐
rics you can use:
Wrapping It Up
As of early 2018, Kubernetes (discussed in Chapter 7) can be considered the de
facto container orchestration standard. All major providers, including Docker
and DC/OS (Mesos), support Kubernetes.
Next, we’ll move on to service discovery, a vital part of container orchestration.
2 Now, you might argue that this is not specific to the container orchestration domain but a general OSS
issue, and you’d be right. Still, I believe it is important enough to mention it, as many people are new to
this area and can benefit from these insights.
Community Matters | 25
CHAPTER 5
Service Discovery
One challenge arising from adopting the cattle approach to managing infrastruc‐
ture is service discovery. If you subscribe to the cattle approach, you treat all of
your machines equally and you do not manually allocate certain machines for
certain applications; instead, you leave it up to a piece of software (the scheduler)
to manage the life cycle of the containers.
The question then is, how do you determine which host your container ended up
being scheduled on so that you can connect to it? This is called service discovery,
and we touched on it already in Chapter 4.
The Challenge
Service discovery has been around for a while—essentially, as long as distributed
systems and services have existed. In the context of containers, the challenge
boils down to reliably maintaining a mapping between a running container and
its location. By location, I mean its IP address and the port on which it is reacha‐
ble. This mapping has to be done in a timely manner and accurately across
relaunches of the container throughout the cluster. Two distinct operations must
be supported by a container service discovery solution:
Registration
Establishes the container -> location mapping. Because only the con‐
tainer scheduler knows where containers “live,” we can consider it to be the
absolute source of truth concerning a container’s location.
Lookup
Enables other services or applications to look up the mapping we stored dur‐
ing registration. Interesting properties include the freshness of the informa‐
tion and the latency of a query (average, p50, p90, etc.).
27
Let’s examine a few slightly orthogonal considerations for the selection process:
In this chapter you’ll learn about service discovery options and how and where to
use them.
If you want to learn more about the requirements and fundamental challenges in
this space, read Jeff Lindsay’s “Understanding Modern Service Discovery with
Docker” and check out what Simon Eskildsen of Shopify shared on this topic at a
recent DockerCon.
Technologies
This section briefly introduces a variety of service discovery technologies, listing
pros and cons and pointing to further discussions on the web. For a more in-
depth treatment, check out Adrian Mouat’s excellent book Using Docker
(O’Reilly).
ZooKeeper
Apache ZooKeeper is an ASF top-level project and a JVM-based, centralized tool
for configuration management,1 providing comparable functionality to what
Google’s Chubby brings to the table. ZooKeeper (ZK) organizes its payload data
somewhat like a filesystem, in a hierarchy of so-called znodes. In a cluster, a
leader is elected and clients can connect to any of the servers to retrieve data. You
want 2n+1 nodes in a ZK cluster. The most often found configurations in the
1 ZooKeeper was originally developed at Yahoo! in order to get its ever-growing zoo of software tools,
including Hadoop, under control.
etcd
Written in the Go language, etcd is a product of the CoreOS team.2 It is a light‐
weight, distributed key-value store that uses the Raft algorithm for consensus (a
leader–follower model, with leader election) and employs a replicated log across
2 Did you know that etcd comes from /etc distributed? What a name!
Technologies | 29
the cluster to distribute the writes a leader receives to its followers. In a sense,
etcd is conceptually quite similar to ZK. While the payload can be arbitrary, etcd’s
HTTP API is JSON-based,3 and as with ZK, you can watch for changes in the val‐
ues etcd makes available to the cluster. A very useful feature of etcd is that of
TTLs on keys, which is a great building block for service discovery. In the same
manner as ZK, you want 2n+1 nodes in an etcd cluster, for the same reasons.
The security model etcd provides allows on-the-wire encryption through
TLS/SSL as well as client certificate authentication, both between clients and the
cluster and between the etcd nodes.
In Figure 5-2, you can see that the etcd service discovery setup is quite similar to
the ZK setup. The main difference is the usage of confd, which configures
HAProxy, rather than having you write your own script.
Consul
Consul, a HashiCorp product also written in the Go language, exposes function‐
ality for service registration, discovery, and health checking in an opinionated
way. Services can be queried using the HTTP API or through DNS. Consul sup‐
ports multi-datacenter deployments.
One of Consul’s features is a distributed key-value store, akin to etcd. It also uses
the Raft consensus algorithm (and again the same observations concerning 2n+1
nodes as with ZK and etcd apply), but the deployment is different. Consul has the
3 That is, in contrast to ZK, all you need to interact with etcd is curl or the like.
Want to learn more about using Consul for service discovery? Check out these
two great blog posts: “Consul Service Discovery with Docker” by Jeff Lindsay and
“Docker DNS & Service Discovery with Consul and Registrator” by Joseph
Miller.
Technologies | 31
issues DNS queries to discover the services. Thus, functionality-wise it is
quite similar to Consul, without the health checks.
WeaveDNS
WeaveDNS was introduced in Weave 0.9 as a simple solution for service dis‐
covery on the Weave network, allowing containers to find other containers’
IP addresses by their hostnames. In Weave 1.1, a so-called Gossip DNS pro‐
tocol was introduced, making lookups faster through a cache as well as
including timeout functionality. In the new implementation, registrations are
broadcast to all participating instances, which subsequently hold all entries
in memory and handle lookups locally.
Load Balancing
An orthogonal but related topic to that of service discovery is load balancing.
Load balancing enables you to spread the load—that is, the inbound service
The following list outlines some popular load balancing options for container‐
ized setups, in alphabetical order:
Bamboo
A daemon that automatically configures HAProxy instances, deployed on
Apache Mesos and Marathon. See the p24e guide “Service Discovery with
Marathon, Bamboo and HAProxy” for a concrete recipe.
Envoy
A high-performance distributed proxy written in C++, originally built at
Lyft. Envoy was designed to be used for single services and applications, and
to provide a communication bus and data plane for service meshes. It’s the
default data plane in Istio.
HAProxy
A stable, mature, and battle-proven (if not very feature-rich) workhorse.
Often used in conjunction with NGINX, HAProxy is reliable and integra‐
tions with pretty much everything under the sun exist.
kube-proxy
Runs on each node of a Kubernetes cluster and updates services IPs. It sup‐
ports simple TCP/UDP forwarding and round-robin load balancing. Note
that it’s only for cluster-internal load balancing and also serves as a service
discovery support component.
MetalLB
A load-balancer implementation for bare-metal Kubernetes clusters,
addressing the fact that Kubernetes does not offer a default implementation
for such clusters. In other words, you need to be in a public cloud environ‐
ment to benefit from this functionality. Note that you may need one or more
routers capable of speaking BGP in order for MetalLB to work.
NGINX
The leading solution in this space. With NGINX you get support for round-
robin, least-connected, and ip-hash strategies, as well as on-the-fly con‐
figuration, monitoring, and many other vital features.
Load Balancing | 33
servicerouter.py
A simple script that gets app configurations from Marathon and updates
HAProxy; see also the p24e guide “Service Discovery with Marathon, Bam‐
boo and HAProxy”.
traefik
The rising star in this category. Emile Vauge (traefik’s lead developer) must
be doing something right. I like it a lot, because it’s like HAProxy but comes
with a bunch of backends, such as Marathon and Consul, out of the box.
Vamp-router
Inspired by Bamboo and Consul–HAProxy, Magnetic.io wrote Vamp-router,
which supports updates of the config through a REST API or ZooKeeper,
routes and filters for canary releasing and A/B testing, and ACLs, as well as
providing statistics.
Vulcand
A reverse proxy for HTTP API management and microservices, inspired by
Hystrix.
If you want to learn more about load balancing, check out Kevin Reedy’s talk
from nginx.conf 2014 on load balancing with NGINX and Consul.
Wrapping It Up
To close out this chapter, I’ve put together a table that provides an overview of
the service discovery solutions we’ve discussed. I explicitly do not aim at declar‐
ing a winner, because I believe the best choice depends on your use case and
requirements. So, take the following table as a quick orientation and summary
but not as a shootout (also, note that in the context of Kubernetes you don’t need
to choose one—it’s built into the system):
In this chapter you learned about service discovery and how to tackle it, as well
as about load balancing options. We will now switch gears and move on to
Kubernetes, the de facto container orchestration standard that comes with built-
in service discovery (so you don’t need to worry about the topics discussed in this
chapter) and has its own very interesting approach to container networking
across machines.
Wrapping It Up | 35
CHAPTER 6
The Container Network Interface
The CNI specification is lightweight; it only deals with the network connectivity
of containers, as well as the garbage collection of resources once containers are
deleted.
We will focus on CNI in this book since it’s the de facto standard for container
orchestrators, adopted by all major systems such as Kubernetes, Mesos, and
Cloud Foundry. If you’re exclusively using Docker Swarm you’ll need to use
37
Docker’s libnetwork and might want to read the helpful article by Lee Calcote
titled “The Container Networking Landscape: CNI from CoreOS and CNM from
Docker”, which contrasts CNI with the Docker model and provides you with
some guidance.
History
CNI was pioneered by CoreOS in the context of the container runtime rkt, to
define a common interface between the network plug-ins and container runtimes
and orchestrators. Docker initially planned to support it but then came up with
the Docker-proprietary libnetwork approach to container networking.
CNI and the libnetwork plug-in interface were developed in parallel from April
to June 2015, and after some discussion the Kubernetes community decided not
to adopt libnetwork but rather to use CNI. Nowadays pretty much every con‐
tainer orchestrator with the exception of Docker Swarm uses CNI; all runtimes
support it and there’s a long list of supported plug-ins, as discussed in “Container
Runtimes and Plug-ins” on page 40.
In May 2017, the Cloud Native Computing Foundation (CNCF) made CNI a full-
blown top-level project.
In order for CNI to add a container to a network, the container runtime must
first create a new network namespace for the container and then invoke one or
more of the defined plug-ins. The network configuration is in JSON format and
includes mandatory fields such as name and type as well as plug-in type–specific
fields. The actual command (for example, ADD) is passed in as an environment
variable aptly named CNI_COMMAND.
Wrapping It Up
With this we conclude the CNI chapter and move on to Kubernetes and its net‐
working approach. CNI plays a central role in Kubernetes (networking), and you
might want to check the docs there as well.
Wrapping It Up | 41
CHAPTER 7
Kubernetes Networking
This chapter will first quickly bring you up to speed concerning Kubernetes, then
introduce you to the networking concepts on a high level. Then we’ll jump into
the deep end, looking at how container networking is realized in Kubernetes,
what traffic types exist and how you can make services talk to each other within
the cluster, as well as how you can get traffic into your cluster and to a specific
service.
I’d argue that there are at least two significant points in time concerning
the birth of Kubernetes. The first was on June 7, 2014, with Joe Beda’s
initial commit on GitHub that marked the beginning of the open sourc‐
ing of the project. The second was almost a year later, on July 20, 2015,
when Google launched Kubernetes 1.0 and announced the formation of
a dedicated entity to host and govern Kubernetes, the Cloud Native
Computing Foundation (CNCF). As someone who was at the launch
event (and party), I can tell you, that’s certainly one way to celebrate the
birth of a project.
43
Figure 7-1. An overview of the Kubernetes architecture
How you meet these requirements is up to you. This means you have a lot of free‐
dom to realize networking with and for Kubernetes. It also means, however, that
Kubernetes on its own will only provide so much; for example, it supports CNI
(Chapter 6) but it doesn’t come with a default SDN solution. In the networking
area, Kubernetes is at the same time strangely opinionated (see the preceding
requirements) and not at all (no batteries included).
From a network traffic perspective we differentiate between three types in Kuber‐
netes, as depicted in Figure 7-2:
Intra-pod networking
All containers within a pod share a network namespace and see each other
on localhost. Read “Intra-Pod Networking” on page 46 for details.
Inter-pod networking
Two types of east–west traffic are supported: pods can directly communicate
with other pods or, preferably, pods can leverage services to communicate
with other pods. Read “Inter-Pod Networking” on page 47 for details.
Ingress and egress
Ingress refers to routing traffic from external users or apps to pods, and
egress refers to calling external APIs from pods. Read “Ingress and Egress”
on page 53 for details.
Intra-Pod Networking
Within a pod there exists a so-called infrastructure container. This is the first con‐
tainer that the kubelet launches, and it acquires the pod’s IP and sets up the net‐
work namespace. All the other containers in the pod then join the infra
container’s network and IPC namespace. The infra container has network bridge
mode enabled (see “Bridge Mode Networking” on page 7) and all the other con‐
tainers in the pod join this namespace via container mode (covered in “Container
Mode Networking” on page 9). The initial process that runs in the infra container
does effectively nothing,1 as its sole purpose is to act as the home for the name‐
Inter-Pod Networking
In Kubernetes, each pod has a routable IP, allowing pods to communicate across
cluster nodes without NAT and no need to manage port allocations. Because
every pod gets a real (that is, not machine-local) IP address, pods can communi‐
cate without proxies or translations (such as NAT). The pod can use well-known
ports and can avoid the use of higher-level service discovery mechanisms such as
those we discussed in Chapter 5.
We distinguish between two types of inter-pod communication, sometimes also
called East-West traffic:
• Pods can directly communicate with other pods; in this case the caller pod
needs to find out the IP address of the callee and risks repeating this opera‐
tion since pods come and go (cattle behaviour).
• Preferably, pods use services to communicate with other pods. In this case,
the service provides a stable (virtual) IP address that can be discovered, for
example, via DNS.
Inter-Pod Networking | 47
When a container tries to obtain the address of network interface it sees the same
IP that any peer container would see them coming from; each pod has its own IP
address that other pods can find and use. By making IP addresses and ports the
same both inside and outside the pods, Kubernetes creates a flat address space
across the cluster. For more details on this topic see also the article “Understand‐
ing Kubernetes Networking: Pods” by Mark Betz.
Let’s now focus on the service, as depicted in Figure 7-3.
A service provides a stable virtual IP (VIP) address for a set of pods. While pods
may come and go, services allow clients to reliably discover and connect to the
containers running in the pods by using the VIP. The “virtual” in VIP means it’s
not an actual IP address connected to a network interface; its purpose is purely to
act as the stable front to forward traffic to one or more pods, with IP addresses
that may come and go.
As you can see in Figure 7-3, the service with the VIP 10.104.58.143 routes the
traffic to one of the pods 172.17.0.3 or 172.17.0.4. Note here the different sub‐
nets for the service and pods, see Network Ranges for further details on the rea‐
son behind that. Now, you might be wondering how this actually works? Let’s
have a look at it.
You specify the set of pods you want a service to target via a label selector, for
example, for spec.selector.app=someapp Kubernetes would create a service
that targets all pods with a label app=someapp. Note that if such a selector exists,
then for each of the targeted pods a sub-resource of type Endpoint will be cre‐
ated, and if no selector exists then no endpoints are created. For example, see in
the following code example the output of the kubectl describe command. Such
endpoints are also not created in the case of so-called headless services, which
allow you to exercise great control over how the IP management and service dis‐
covery takes place.
Keeping the mapping between the VIP and the pods up-to-date is the job of
kube-proxy (see also the docs on kube-proxy), a process that runs on every node
on the cluster.
This kube-proxy process queries the API server to learn about new services in
the cluster and updates the node’s iptables rules accordingly, to provide the nec‐
essary routing information. To learn more how exactly services work, check out
Kubernetes Services By Example.
Let’s see how this works in practice: assuming there’s an existing deployment
called nginx (for example, execute kubectl run webserver --image nginx)
you can automatically create a service like so:
$ kubectl expose deployment/webserver --port 80
service "webserver" exposed
Inter-Pod Networking | 49
Port: <unset> 80/TCP
TargetPort: 80/TCP
Endpoints: 172.17.0.3:8080,172.17.0.4:8080
Session Affinity: None
Events: <none>
After executing the above kubectl expose command, you will see the service
appear:
$ kubectl get service -l run=webserver
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
webserver ClusterIP 10.104.58.143 <none> 80/TCP 1m
Above, note two things: the service has got itself a cluster-internal IP (CLUSTER-
IP column) and the EXTERNAL-IP column tells you that this service is only avail‐
able from within the cluster, that is, no traffic from outside of the cluster can
reach this service (yet)—see “Ingress and Egress” on page 53 to learn how to
change this situation.
In Figure 7-4 you can see the representation of the service in the Kubernetes
dashboard.
Above, you can see the service discovery in action: the environment variables WEB
SERVER_XXX give you the IP address and port you can use to connect to the ser‐
vice. For example, while still in the jump pod, you could execute curl
10.104.58.143 and you should see the NGINX welcome page.
While convenient, note that discovery via environment variables has a funda‐
mental drawback: any service that you want to discover must be created before
the pod from which you want to discover it as otherwise the environment vari‐
ables will not be populated by Kubernetes. Luckily there exists a better way: DNS.
Pods in the same namespace can reach the service by its shortname webserver,
whereas pods in other namespaces must qualify the name as webserver.default.
Note that the result of these FQDN lookups is the pod’s cluster IP. Further,
Kubernetes supports DNS service (SRV) records for named ports. So if our web
server service had a port named, say, http with the protocol type TCP, you
could issue a DNS SRV query for _http._tcp.webserver from the same name‐
space to discover the port number for http. Note also that the virtual IP for a
service is stable, so the DNS result does not have to be requeried.
Ingress
Up to now we have discussed how to access a pod or service from within the
cluster. Accessing a pod from outside the cluster is a bit more challenging. Kuber‐
netes aims to provide highly available, high-performance load balancing for serv‐
ices.
Initially, the only available options for North-South traffic in Kubernetes were
NodePort, LoadBalancer, and ExternalName, which are still available to you. For
layer 7 traffic (i.e., HTTP) a more portable option is available, however: intro‐
duced in Kubernetes 1.2 as a beta feature, you can use Ingress to route traffic
from the external world to a service in our cluster.
Ingress in Kubernetes works as shown in Figure 7-5: conceptually, it is split up
into two main pieces, an Ingress resource, which defines the routing to the back‐
ing services, and the Ingress controller, which listens to the /ingresses endpoint
of the API server, learning about services being created or removed. On service
status changes, the Ingress controller configures the routes so that external traffic
lands at a specific (cluster-internal) service.
Now NGINX is available via the IP address 192.168.99.100 (in this case my
Minikube IP) and the manifest file defines that it should be exposed via the
path /web.
Note that Ingress controllers can technically be any system capable of reverse
proxying, but NGINX is most commonly used. Further, Ingress can also be
implemented by a cloud-provided load balancer, such as Amazon’s ALB.
For more details on Ingress, read the excellent article “Understanding Kubernetes
Networking: Ingress” by Mark Betz and make sure to check out the results of the
survey the Kubernetes SIG Network carried out on this topic.
Egress
While in the case of Ingress we’re interested in routing traffic from outside the
cluster to a service, in the case of Egress we are dealing with the opposite: how
does an app in a pod call out to (cluster-)external APIs?
One may want to control which pods are allowed to have a communication path
to outside services and on top of that impose other policies. Note that by default
all containers in a pod can perform Egress. These policies can be enforced using
network policies as described in “Network Policies” on page 55 or by deploying a
service mesh as in “Service Meshes” on page 56.
Network Policies
Network policies in Kubernetes are a feature that allow you to specify how
groups of pods are allowed to communicate with each other. From Kubernetes
Service Meshes
Going forward, you can make use of service meshes such as the two discussed in
the following. The idea of a service mesh is that rather than putting the burden of
networking communication and control onto you, the developer, you outsource
these nonfunctional things to the mesh. So you benefit from traffic control,
observability, security, etc. without any changes to your source code. Sound fan‐
tastic? It is, believe you me.
Istio
Istio is a modern and popular service mesh, available for Kubernetes but not
exclusively so. It’s using Envoy as the default data plane and mainly focusing
on the control-plane aspects. It supports monitoring (Prometheus), tracing
(Zipkin/Jaeger), circuit breakers, routing, load balancing, fault injection,
retries, timeouts, mirroring, access control, and rate limiting out of the box,
to name a few features. Istio takes the battle-tested Envoy proxy (cf. “Load
Balancing” on page 32) and packages it up as a sidecar container in your pod.
Learn more about Istio via Christian Posta’s wonderful resource: Deep Dive
Envoy and Istio Workshop.
Wrapping It Up
In this chapter we’ve covered the Kubernetes approach to container networking
and showed how to use it in various setups. With this we conclude the book;
thanks for reading and if you have feedback, please do reach out via Twitter.
Wrapping It Up | 57
APPENDIX A
References
Reading stuff is fine, and here I’ve put together a collection of links that contain
either background information on topics covered in this book or advanced mate‐
rial, such as deep dives or teardowns. However, for a more practical approach I
suggest you check out Katacoda, a free online learning environment that contains
100+ scenarios from Docker to Kubernetes (see for example the screenshot in
Figure A-1).
You can use Katacoda in any browser; sessions are typically terminated after one
hour.
59
Container Networking References
Networking 101
• “Network Protocols” from the Programmer’s Compendium
• “Demystifying Container Networking” by Michele Bertasi
• “An Empirical Study of Load Balancing Algorithms” by Khalid Lafi
Docker
• Docker networking overview
• “Concerning Containers’ Connections: On Docker Networking” by Federico
Kereki
• “Unifying Docker Container and VM Networking” by Filip Verloy
• “The Tale of Two Container Networking Standards: CNM v. CNI” by Har‐
meet Sahni
60 | Appendix A: References
• Cluster Networking
• Provide Load-Balanced Access to an Application in a Cluster
• Create an External Load Balancer
• Kubernetes DNS example
• Kubernetes issue 44063: Implement IPVS-based in-cluster service load bal‐
ancing
• “Data and analysis of the Kubernetes Ingress survey 2018” by the Kubernetes
SIG Network
References | 61
About the Author
Michael Hausenblas is a developer advocate for Go, Kubernetes, and OpenShift
at Red Hat, where he helps appops to build and operate distributed services. His
background is in large-scale data processing and container orchestration and he’s
experienced in advocacy and standardization at the W3C and IETF. Before Red
Hat, Michael worked at Mesosphere and MapR and in two research institutions
in Ireland and Austria. He contributes to open source software (mainly using
Go), speaks at conferences and user groups, blogs, and hangs out on Twitter too
much.