Just Eat’s SRE Story
DevOps at Scale
2
Bennie Johnston
Head of Site Reliability Engineering
Rich Haigh
Director of Technology
Our vision
Creating the world’s
greatest food community
Fosduo dolores etoa jasom rebum.
Steto clita kuasd gubogren, nosotra drs
frone.
What makes us
UK/Ukraine/Australia/Canada 500+ ppl in Tech
22.8m active customers
30+ teams
450+ services
2,700+ orders/min
1,500+ AWS instances in production
1.6M+ metrics/min
1.5TB+ logs/day
500+ releases/week
45% Revenue Growth (FY17)
FTSE100 >£5bn Market Cap
Special?
What is SRE at Just Eat?
1 - Relentlessly protect site availability
2 - Enable change to be delivered fast, but with quality
3 - Optimise the use of our infrastructure and resources
4 - Innovate to stay ahead
5 - Foster the right culture at Just Eat
We believe that Dev teams own their product - full stop!
Site Reliability Engineering operates on 5 key principles...
5
How do we structure it?
Our customers are 30+ Dev Teams in multiple countries (these numbers vary)
Central Reliability Engineering department
- 24/7 Service Operations Centre (SOC)
- Development team
- Hosting/Platform
- Delivery Automation (CI/CD)
- Observability
- Service Management
Daily production standups
Weekly risk meeting
Monthly Engineering all-hands
1st class citizen in various architecture/project groups
6
What tools/processes do we own?
In one extreme SRE owns all tools and processes
+ economies of scale
+ faster decisions
- limits innovation
- slows down development teams
In the other extreme Dev teams own all tools and processes
+ maximum flexibility for development teams
- tooling sprawl
- wasted time reinventing the wheel
- support problems
Our solution
+ central support for a range of tooling
+ ability for dev teams to interact via an opensource approach
+ freedom for dev teams to deviate
+ survival of the fittest approach
The Central vs. Distributed debate
7
Lessons learnt as we’ve grown?
What didn’t work?
How do we deal with scale
Example: internal tool we own
Bennie
11
Example: external tool we own
12
A formula for managing chaos?
13
if ( ReliabilityScore() < DesiredReliability() )
{
LetUsHelpYou()
}
else
{
LetUsHighlightYou()
Freedom++
}
What’s next? The FUTURE!
Automation of
observability.
A step jump
from the simple
time series
metrics.
14
The dream of
incident
resolution
automation.
The robots
talking to the
robots.
Questions?
If you want to contact us?
richard.haigh@just-eat.com
bennie.johnston@just-eat.com
If you want to read more about us?
Our tech blog: https://tech.just-eat.com
If you want to work for us ;)
Our Careers site: https://careers.just-eat.com

Just Eat: DevOps at Scale at AppD Global Tour London

  • 1.
    Just Eat’s SREStory DevOps at Scale
  • 2.
    2 Bennie Johnston Head ofSite Reliability Engineering Rich Haigh Director of Technology
  • 3.
    Our vision Creating theworld’s greatest food community
  • 4.
    Fosduo dolores etoajasom rebum. Steto clita kuasd gubogren, nosotra drs frone. What makes us UK/Ukraine/Australia/Canada 500+ ppl in Tech 22.8m active customers 30+ teams 450+ services 2,700+ orders/min 1,500+ AWS instances in production 1.6M+ metrics/min 1.5TB+ logs/day 500+ releases/week 45% Revenue Growth (FY17) FTSE100 >£5bn Market Cap Special?
  • 5.
    What is SREat Just Eat? 1 - Relentlessly protect site availability 2 - Enable change to be delivered fast, but with quality 3 - Optimise the use of our infrastructure and resources 4 - Innovate to stay ahead 5 - Foster the right culture at Just Eat We believe that Dev teams own their product - full stop! Site Reliability Engineering operates on 5 key principles... 5
  • 6.
    How do westructure it? Our customers are 30+ Dev Teams in multiple countries (these numbers vary) Central Reliability Engineering department - 24/7 Service Operations Centre (SOC) - Development team - Hosting/Platform - Delivery Automation (CI/CD) - Observability - Service Management Daily production standups Weekly risk meeting Monthly Engineering all-hands 1st class citizen in various architecture/project groups 6
  • 7.
    What tools/processes dowe own? In one extreme SRE owns all tools and processes + economies of scale + faster decisions - limits innovation - slows down development teams In the other extreme Dev teams own all tools and processes + maximum flexibility for development teams - tooling sprawl - wasted time reinventing the wheel - support problems Our solution + central support for a range of tooling + ability for dev teams to interact via an opensource approach + freedom for dev teams to deviate + survival of the fittest approach The Central vs. Distributed debate 7
  • 8.
    Lessons learnt aswe’ve grown?
  • 9.
  • 10.
    How do wedeal with scale
  • 11.
    Example: internal toolwe own Bennie 11
  • 12.
  • 13.
    A formula formanaging chaos? 13 if ( ReliabilityScore() < DesiredReliability() ) { LetUsHelpYou() } else { LetUsHighlightYou() Freedom++ }
  • 14.
    What’s next? TheFUTURE! Automation of observability. A step jump from the simple time series metrics. 14 The dream of incident resolution automation. The robots talking to the robots.
  • 15.
    Questions? If you wantto contact us? [email protected] [email protected] If you want to read more about us? Our tech blog: https://tech.just-eat.com If you want to work for us ;) Our Careers site: https://careers.just-eat.com