SlideShare a Scribd company logo
Attila Szegedi, Software Engineer
@asz




                                1
Everything I ever
learned about JVM
performance tuning
     @twitter

        2
Everything More
than I ever wanted
 to learned about
JVM performance
       tuning
      @twitter
        3
• Memory tuning
• CPU usage tuning
• Lock contention tuning
• I/O tuning


                   4
Twitter’s biggest enemy




           5
Twitter’s biggest enemy
       Latency




           5
Twitter’s biggest enemy
       Latency




           5
Twitter’s biggest enemy
                         Latency




CC licensed image from http://www.flickr.com/photos/dunechaser/
213255210/                          5
Latency contributors


• By far the biggest contributor is garbage collector
• others are, in no particular order:
   • in-process locking and thread scheduling,
   • I/O,
   • application algorithmic inefficiencies.
                             6
Areas of performance
           tuning

• Memory tuning
• Lock contention tuning
• CPU usage tuning
• I/O tuning
                           7
Areas of memory
      performance tuning


• Memory footprint tuning
• Allocation rate tuning
• Garbage collection tuning

                              8
Memory footprint tuning


• So you got an OutOfMemoryError…
   • Maybe you just have too much data!
   • Maybe your data representation is fat!
   • You can also have a genuine memory leak…
                          9
Too much data

• Run with -verbosegc
• Observe numbers in “Full GC” messages secs]
  [Full GC $before->$after($total), $time

• Can you give the JVM more memory?
• Do you need all that data in memory? Consider
  using:
   • a LRU cache, or…
   • soft references*   10
Fat data

• Can be a problem when you want to do wacky
  things, like
   • load the full Twitter social graph in a single
     JVM
   • load all user metadata in a single JVM
• Slimming internal data representation works at
  these economies of scale
                       11
Fat data: object header


 • JVM object header is normally two machine
     words.
 • That’s 16 bytes, or 128 bits on a 64-bit JVM!
 • new java.lang.Object() takes 16 bytes.
 •   new byte[0]   takes 24 bytes.


                         12
Fat data: padding

              class A {
                  byte x;
              }
              class B extends A {
                  byte y;
              }



•   new A()   takes 24 bytes.

•   new B()   takes 32 bytes.
                        13
Fat data: no inline structs

         class C {
           Object obj = new Object();
         }




  • new C() takes 40 bytes.
  • similarly, no inline array elements.
                         14
Slimming taken to
      extreme
• A research project had to load the full follower
  graph in memory
• Each vertex’s edges ended up being represented
  as int arrays
• If it grows further, we can consider variable-
  length differential encoding in a byte array



                      15
Compressed object
     pointers


• Pointers become 4 bytes long
• Usable below 32 GB of max heap size
• Automatically used below 30 GB of max heap

                    16
Compressed object
           pointers
                   Uncompressed Compressed            32-bit

      Pointer           8               4               4

  Object header         16             12*              8

   Array header         24              16             12

  Superclass pad        8               4               4

* Object can have 4 bytes of fields and still only take up 16 bytes
                                17
Avoid instances of
      primitive wrappers

• Hard won experience with Scala 2.7.7:
 • a Seq[Int] stores java.lang.Integer
 • an Array[Int] stores int
 • first needs (24 + 32 * length) bytes
 • second needs (24 + 4 * length) bytes
                        18
Avoid instances of
     primitive wrappers

• This was fixed in Scala 2.8, but it shows that:
 • you often don’t know the performance
   characteristics of your libraries,
 • and won’t ever know them until you run your
   application under a profiler.


                          19
Map footprints


•   Guava MapMaker.makeMap() takes 2272 bytes!

•   MapMaker.concurrencyLevel(1).makeMap()
    takes 352 bytes!
• ConcurrentMap with level 1 makes sense
    sometimes (i.e. you don’t want a
    ConcurrentModificationException)

                       20
Thrift can be heavy



• Thrift generated classes are used to encapsulate a
  wire tranfer format.
• Using them as your domain objects: almost never
  a good idea.



                         21
Thrift can be heavy


• Every Thrift class with a primitive field has a
  java.util.BitSet __isset_bit_vector       field.

• It adds between 52 and 72 bytes of overhead per
  object.



                            22
Thrift can be heavy




         23
Thrift can be heavy


• Thrift does not support 32-bit floats.
• Coupling domain model with transport:
 • resistance to change domain model
• You also miss oportunities for interning and N-to-1
  normalization.

                          24
class Location {
   public String city;
   public String region;
   public String countryCode;
   public int metro;
   public List<String> placeIds;
   public double lat;
   public double lon;
   public double confidence;




                        25
class SharedLocation {
   public String city;
   public String region;
   public String countryCode;
   public int metro;
   public List<String> placeIds;
class UniqueLocation {
   private SharedLocation sharedLocation;
   public double lat;
   public double lon;
   public double confidence;




                        26
Careful with thread locals

• Thread locals stick around.
• Particularly problematic in thread pools with m⨯n
  resource association.
 • 200 pooled threads using 50 connections: you end
   up with 10 000 connection buffers.
• Consider using synchronized objects, or
• just create new objects all the time.
                          27
Part II:
fighting latency



       28
Performance tradeoff

               Memory




                 Time



  Convenient, but oversimplified view.
                  29
Performance triangle

          Memory footprint




 Throughput                  Latency




                 30
Performance triangle

                     Compactness




       Throughput                   Responsiveness
                     C ⨯T ⨯ R = a
• Tuning: vary C, T, R for fixed a
• Optimization: increase a  31
Performance triangle

• Compactness: inverse of memory footprint
• Responsiveness: longest pause the application will
  experience
• Throughput: amount of useful application CPU work
  over time
• Can trade one for the other, within limits.
• If you have spare CPU, can be pure win.
                           32
Responsiveness vs.
   throughput




        33
Biggest threat to
responsiveness in the JVM
  is the garbage collector


            34
Memory pools

Eden       Survivor             Old



                         Code
       Permanent
                        cache




  This is entirely HotSpot specific!
                   35
How does young gen
           work?
          Eden          S1        S2      Old

• All new allocation happens in eden.
 • It only costs a pointer bump.
• When eden fills up, stop-the-world copy-collection
  into the survivor space.
 • Dead objects cost zero to collect.
• Aftr several collections, survivors get tenured into
  old generation.
                             36
Ideal young gen operation


• Big enough to hold more than one set of all
  concurrent request-response cycle objects.
• Each survivor space big enough to hold active
  request objects + tenuring ones.
• Tenuring threshold such that long-lived objects
  tenure fast.

                           37
Old generation collectors

• Throughput collectors
 •   -XX:+UseSerialGC

 •   -XX:+UseParallelGC

 •   -XX:+UseParallelOldGC

• Low-pause collectors
 •   -XX:+UseConcMarkSweepGC

 •   -XX:+UseG1GC   (can’t discuss it here)

                             38
Adaptive sizing policy

• Throughput collectors can automatically tune
  themselves:
 •   -XX:+UseAdaptiveSizePolicy

 •   -XX:MaxGCPauseMillis=…      (i.e. 100)
 •   -XX:GCTimeRatio=…   (i.e. 19)



                            39
Adaptive sizing policy at
         work




            40
Choose a collector


• Bulk service: throughput collector, no adaptive sizing
  policy.
• Everything else: try throughput collector with
  adaptive sizing policy. If it didn’t work, use
  concurrent mark-and-sweep (CMS).



                           41
Always start with tuning
 the young generation
• Enable -XX:+PrintGCDetails, -XX:+PrintHeapAtGC,
  and -XX:+PrintTenuringDistribution.

• Watch survivor sizes! You’ll need to determine
  “desired survivor size”.
• There’s no such thing as a “desired eden size”, mind
  you. The bigger, the better, with some
  responsiveness caveats.
• Watch the tenuring threshold; might need to tune it
  to tenure long lived objects faster.
                          42
-XX:+PrintHeapAtGC


Heap after GC invocations=7000 (full 87):
  par new generation    total 4608000K, used 398455K
   eden space 4096000K,    0% used
   from space 512000K, 77% used
   to   space 512000K,    0% used
  concurrent mark-sweep generation total 3072000K, used 1565157K
  concurrent-mark-sweep perm gen total 53256K, used 31889K
}




                                43
-XX:+PrintTenuringDistribution

Desired   survivor size   262144000 bytes, new threshold 4 (max 4)
- age     1: 137474336    bytes, 137474336 total
- age     2:   37725496   bytes, 175199832 total
- age     3:   23551752   bytes, 198751584 total
- age     4:   14772272   bytes, 213523856 total



 • Things of interest:
  • Number of ages
  • Size distribution in ages
    • You want strongly declining.
                                   44
Tuning the CMS

• Give your app as much memory as possible.
 • CMS is speculative. More memory, less punitive
   miscalculations.
• Try using CMS without tuning. Use -verbosegc and
  -XX:+PrintGCDetails.

 • Didn’t get any “Full GC” messages? You’re done!
• Otherwise, tune the young generation first.
                         45
Tuning the old generation


 • Goals:
  • Keep the fragmentation low.
  • Avoid full GC stops.
 • Fortunately, the two goals are not conflicting.

                          46
Tuning the old generation


 • Find the minimum and maximum working set size
   (observe “Full GC” numbers under stable state and
   under load).
 • Overprovision the numbers by 25-33%.
  • This gives CMS a cushion to concurrently clean
    memory as it’s used.

                           47
Tuning the old generation


 • Set -XX:InitiatingOccupancyFraction to
   between 80-75, respectively.
  • corresponds to overprovisioned heap ratio.
 • You can lower initiating occupancy fraction to 0 if
   you have CPU to spare.


                         48
Responsiveness still not
     good enough?
• Too many live objects during young gen GC:
 • Reduce NewSize, reduce survivor spaces, reduce
   tenuring threshold.
• Too many threads:
 • Find the minimal concurrency level, or
 • split the service into several JVMs.
                          49
Responsiveness still not
    good enough?
• Does the CMS abortable preclean phase, well,
  abort?
 • It is sensitive to number of objects in the new
   generation, so
   • go for smaller new generation
   • try to reduce the amount of short-lived garbage
     your app creates.

                         50
Part III:
let’s take a break from GC



            51
Thread coordination
        optimization

• You don’t have to always go for synchronized.
• Synchronization is a read barrier on entry; write
  barrier on exit.
• Sometimes you only need a half-barrier; i.e. in a
  producer-observer pattern.
• Volatiles can be used as half-barriers.
                            52
Thread coordination
        optimization

• For atomic update of a single value, you only need
  Atomic{Integer|Long}.compareAndSet().

• You can use AtomicReference.compareAndSet() for
  atomic update of composite values represented by
  immutable objects.


                          53
Fight CMS fragmentation
   with slab allocators

• CMS doesn’t compact, so it’s prone to fragmentation,
  which will lead to a stop-the-world pause.
• Apache Cassandra uses a slab allocator internally.


                           54
Cassandra slab allocator


 • 2MB slab sizes
 • copy byte[] into them using compare-and-set
 • GC before: 30-60 seconds every hour
 • GC after: 5 seconds once in 3 days and 10 hours

                       55
Slab allocator constraints
• Works for limited usage:
 • Buffers are written to linearly, flushed to disk and
   recycled when they fill up.
 • The objects need to be converted to binary
   representation anyway.
• If you need random freeing and compaction, you’re
  heading down the wrong direction.
• If you find yourself writing a full memory manager
  on top of byte buffers, stop!
                          56
Soft references revisited

• Soft reference clearing is based on the amount of
  free memory available when GC encounters the
  reference.
• By definition, throughput collectors always clear
  them.
• Can use them with CMS, but they increase memory
  pressure and make the behavior less predictable.
• Need two GC cycles to get rid of referenced objects.
                           57
Everything More
than I ever wanted
 to learned about
JVM performance
       tuning
      @twitter
   Questions?
        58
Attila Szegedi, Software Engineer
@asz




                               59

More Related Content

PPTX
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
PDF
Manage your bare-metal infrastructure with a CI/CD-driven approach
inovex GmbH
 
PDF
Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Summit
 
PDF
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
PDF
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
PDF
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
 
PDF
Performance Analysis Tools for Linux Kernel
lcplcp1
 
Multi-Datacenter Kafka - Strata San Jose 2017
Gwen (Chen) Shapira
 
Manage your bare-metal infrastructure with a CI/CD-driven approach
inovex GmbH
 
Spark Tuning for Enterprise System Administrators By Anya Bida
Spark Summit
 
Linux Performance Analysis: New Tools and Old Secrets
Brendan Gregg
 
Ceph Object Storage Performance Secrets and Ceph Data Lake Solution
Karan Singh
 
Shared Memory Centric Computing with CXL & OMI
Allan Cantle
 
Performance Analysis Tools for Linux Kernel
lcplcp1
 

What's hot (20)

POTX
Performance Tuning EC2 Instances
Brendan Gregg
 
PDF
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
PDF
Top 5 mistakes when writing Spark applications
hadooparchbook
 
PPTX
Kafka at Peak Performance
Todd Palino
 
PDF
Container Performance Analysis
Brendan Gregg
 
PDF
Blazing Performance with Flame Graphs
Brendan Gregg
 
PDF
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
PDF
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
PDF
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
PPTX
Apache Flink and what it is used for
Aljoscha Krettek
 
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
PDF
Kernel_Crash_Dump_Analysis
Buland Singh
 
PDF
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
PDF
Memory Management in Apache Spark
Databricks
 
PDF
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
PPTX
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
DongMin Choi
 
PPTX
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
PDF
LISA2019 Linux Systems Performance
Brendan Gregg
 
PPTX
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
PDF
Windows Registered I/O (RIO) vs IOCP
Seungmo Koo
 
Performance Tuning EC2 Instances
Brendan Gregg
 
Kernel Recipes 2017: Using Linux perf at Netflix
Brendan Gregg
 
Top 5 mistakes when writing Spark applications
hadooparchbook
 
Kafka at Peak Performance
Todd Palino
 
Container Performance Analysis
Brendan Gregg
 
Blazing Performance with Flame Graphs
Brendan Gregg
 
High-Performance Networking Using eBPF, XDP, and io_uring
ScyllaDB
 
Amazon S3 Best Practice and Tuning for Hadoop/Spark in the Cloud
Noritaka Sekiyama
 
Producer Performance Tuning for Apache Kafka
Jiangjie Qin
 
Apache Flink and what it is used for
Aljoscha Krettek
 
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Flink Forward
 
Kernel_Crash_Dump_Analysis
Buland Singh
 
Java Performance Analysis on Linux with Flame Graphs
Brendan Gregg
 
Memory Management in Apache Spark
Databricks
 
Spark + Parquet In Depth: Spark Summit East Talk by Emily Curtin and Robbie S...
Spark Summit
 
[NDC 2018] 신입 개발자가 알아야 할 윈도우 메모리릭 디버깅
DongMin Choi
 
YugaByte DB Internals - Storage Engine and Transactions
Yugabyte
 
LISA2019 Linux Systems Performance
Brendan Gregg
 
How to build a streaming Lakehouse with Flink, Kafka, and Hudi
Flink Forward
 
Windows Registered I/O (RIO) vs IOCP
Seungmo Koo
 
Ad

Viewers also liked (20)

PPTX
Java performance tuning
Jerry Kurian
 
PDF
Towards JVM Dynamic Languages Toolchain
Attila Szegedi
 
PDF
Understanding Java Garbage Collection and What You Can Do About It: Gil Tene
jaxconf
 
PPT
Java Performance Monitoring & Tuning
Muhammed Shakir
 
PPT
Efficient Memory and Thread Management in Highly Parallel Java Applications
pkoza
 
PDF
Pimp my gc - Supersonic Scala
Pierre Laporte
 
PDF
Learn 90% of Python in 90 Minutes
Matt Harrison
 
PPT
F5 link controller
Jimmy Saigon
 
PPT
Regions Of Pennsylvania
iheart4th
 
PDF
Infrastructureless Wireless networks
Gwendal Simon
 
PPTX
Tropical seasonal forests
Vinaya Shah
 
PPT
radioligand binding studies
ankit
 
DOCX
Feature Story - Sample
Courtney Dunn
 
PDF
Guide to Construction Procurement Strategies
Sarah Fox
 
PPTX
Relationship marketing concept, process and importance
gaurav jain
 
PPT
Citing Yourself (citing your previous work) in MLA or APA format
khornberger
 
PDF
Difference between flyers, brochures, posters & leaflets
Interactive Bees
 
PPTX
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
Amie Weller
 
PPT
Layouting Your School Paper
Jerry Noveno
 
PPT
Pharmaceutical packaging
ceutics1315
 
Java performance tuning
Jerry Kurian
 
Towards JVM Dynamic Languages Toolchain
Attila Szegedi
 
Understanding Java Garbage Collection and What You Can Do About It: Gil Tene
jaxconf
 
Java Performance Monitoring & Tuning
Muhammed Shakir
 
Efficient Memory and Thread Management in Highly Parallel Java Applications
pkoza
 
Pimp my gc - Supersonic Scala
Pierre Laporte
 
Learn 90% of Python in 90 Minutes
Matt Harrison
 
F5 link controller
Jimmy Saigon
 
Regions Of Pennsylvania
iheart4th
 
Infrastructureless Wireless networks
Gwendal Simon
 
Tropical seasonal forests
Vinaya Shah
 
radioligand binding studies
ankit
 
Feature Story - Sample
Courtney Dunn
 
Guide to Construction Procurement Strategies
Sarah Fox
 
Relationship marketing concept, process and importance
gaurav jain
 
Citing Yourself (citing your previous work) in MLA or APA format
khornberger
 
Difference between flyers, brochures, posters & leaflets
Interactive Bees
 
How Brands Grow : A summary of Byron Sharp's book on what marketers don't know
Amie Weller
 
Layouting Your School Paper
Jerry Noveno
 
Pharmaceutical packaging
ceutics1315
 
Ad

Similar to Everything I Ever Learned About JVM Performance Tuning @Twitter (20)

PPTX
Decima Engine: Visibility in Horizon Zero Dawn
Guerrilla
 
PDF
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
PPTX
Memory Management & Garbage Collection
Abhishek Sur
 
PDF
Performance van Java 8 en verder - Jeroen Borgers
NLJUG
 
PDF
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
KEY
Pulsar
Eugene Lazutkin
 
PPT
NYJavaSIG - Big Data Microservices w/ Speedment
Speedment, Inc.
 
PPTX
Loom promises: be there!
Jean-Francois James
 
PDF
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
PDF
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
PDF
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 
PPT
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
PPTX
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
PPTX
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
PPTX
Large scale computing with mapreduce
hansen3032
 
PDF
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
PPT
7. Key-Value Databases: In Depth
Fabio Fumarola
 
PDF
High Performance With Java
malduarte
 
PPTX
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 
Decima Engine: Visibility in Horizon Zero Dawn
Guerrilla
 
[DBA]_HiramFleitas_SQL_PASS_Summit_2017_Summary
Hiram Fleitas León
 
Memory Management & Garbage Collection
Abhishek Sur
 
Performance van Java 8 en verder - Jeroen Borgers
NLJUG
 
Spil Storage Platform (Erlang) @ EUG-NL
Thijs Terlouw
 
NYJavaSIG - Big Data Microservices w/ Speedment
Speedment, Inc.
 
Loom promises: be there!
Jean-Francois James
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
Chester Chen
 
Search at Twitter: Presented by Michael Busch, Twitter
Lucidworks
 
NickKallen_DataArchitectureAtTwitterScale
Kostas Mavridis
 
An Introduction to JVM Internals and Garbage Collection in Java
Abhishek Asthana
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Malin Weiss
 
JavaOne2016 - Microservices: Terabytes in Microseconds [CON4516]
Speedment, Inc.
 
Kafka 0.8.0 Presentation to Atlanta Java User's Group March 2013
Christopher Curtin
 
Large scale computing with mapreduce
hansen3032
 
Is NoSQL The Future of Data Storage?
Saltmarch Media
 
7. Key-Value Databases: In Depth
Fabio Fumarola
 
High Performance With Java
malduarte
 
In-memory Caching in HDFS: Lower Latency, Same Great Taste
DataWorks Summit
 

Recently uploaded (20)

PPTX
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
PDF
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
PDF
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
PDF
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
PDF
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
PDF
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
PDF
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
PPTX
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
PDF
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
PDF
This slide provides an overview Technology
mineshkharadi333
 
PDF
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
PDF
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
PDF
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
PDF
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
PPTX
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
PPT
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
PPTX
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
PDF
REPORT: Heating appliances market in Poland 2024
SPIUG
 
PDF
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
PDF
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 
OA presentation.pptx OA presentation.pptx
pateldhruv002338
 
The Evolution of KM Roles (Presented at Knowledge Summit Dublin 2025)
Enterprise Knowledge
 
NewMind AI Weekly Chronicles - July'25 - Week IV
NewMind AI
 
Presentation about Hardware and Software in Computer
snehamodhawadiya
 
A Day in the Life of Location Data - Turning Where into How.pdf
Precisely
 
Advances in Ultra High Voltage (UHV) Transmission and Distribution Systems.pdf
Nabajyoti Banik
 
BLW VOCATIONAL TRAINING SUMMER INTERNSHIP REPORT
codernjn73
 
IoT Sensor Integration 2025 Powering Smart Tech and Industrial Automation.pptx
Rejig Digital
 
Research-Fundamentals-and-Topic-Development.pdf
ayesha butalia
 
This slide provides an overview Technology
mineshkharadi333
 
Cloud-Migration-Best-Practices-A-Practical-Guide-to-AWS-Azure-and-Google-Clou...
Artjoker Software Development Company
 
AI Unleashed - Shaping the Future -Starting Today - AIOUG Yatra 2025 - For Co...
Sandesh Rao
 
Orbitly Pitch Deck|A Mission-Driven Platform for Side Project Collaboration (...
zz41354899
 
How-Cloud-Computing-Impacts-Businesses-in-2025-and-Beyond.pdf
Artjoker Software Development Company
 
What-is-the-World-Wide-Web -- Introduction
tonifi9488
 
Coupa-Kickoff-Meeting-Template presentai
annapureddyn
 
cloud computing vai.pptx for the project
vaibhavdobariyal79
 
REPORT: Heating appliances market in Poland 2024
SPIUG
 
Accelerating Oracle Database 23ai Troubleshooting with Oracle AHF Fleet Insig...
Sandesh Rao
 
MASTERDECK GRAPHSUMMIT SYDNEY (Public).pdf
Neo4j
 

Everything I Ever Learned About JVM Performance Tuning @Twitter

  • 1. Attila Szegedi, Software Engineer @asz 1
  • 2. Everything I ever learned about JVM performance tuning @twitter 2
  • 3. Everything More than I ever wanted to learned about JVM performance tuning @twitter 3
  • 4. • Memory tuning • CPU usage tuning • Lock contention tuning • I/O tuning 4
  • 8. Twitter’s biggest enemy Latency CC licensed image from http://www.flickr.com/photos/dunechaser/ 213255210/ 5
  • 9. Latency contributors • By far the biggest contributor is garbage collector • others are, in no particular order: • in-process locking and thread scheduling, • I/O, • application algorithmic inefficiencies. 6
  • 10. Areas of performance tuning • Memory tuning • Lock contention tuning • CPU usage tuning • I/O tuning 7
  • 11. Areas of memory performance tuning • Memory footprint tuning • Allocation rate tuning • Garbage collection tuning 8
  • 12. Memory footprint tuning • So you got an OutOfMemoryError… • Maybe you just have too much data! • Maybe your data representation is fat! • You can also have a genuine memory leak… 9
  • 13. Too much data • Run with -verbosegc • Observe numbers in “Full GC” messages secs] [Full GC $before->$after($total), $time • Can you give the JVM more memory? • Do you need all that data in memory? Consider using: • a LRU cache, or… • soft references* 10
  • 14. Fat data • Can be a problem when you want to do wacky things, like • load the full Twitter social graph in a single JVM • load all user metadata in a single JVM • Slimming internal data representation works at these economies of scale 11
  • 15. Fat data: object header • JVM object header is normally two machine words. • That’s 16 bytes, or 128 bits on a 64-bit JVM! • new java.lang.Object() takes 16 bytes. • new byte[0] takes 24 bytes. 12
  • 16. Fat data: padding class A { byte x; } class B extends A { byte y; } • new A() takes 24 bytes. • new B() takes 32 bytes. 13
  • 17. Fat data: no inline structs class C { Object obj = new Object(); } • new C() takes 40 bytes. • similarly, no inline array elements. 14
  • 18. Slimming taken to extreme • A research project had to load the full follower graph in memory • Each vertex’s edges ended up being represented as int arrays • If it grows further, we can consider variable- length differential encoding in a byte array 15
  • 19. Compressed object pointers • Pointers become 4 bytes long • Usable below 32 GB of max heap size • Automatically used below 30 GB of max heap 16
  • 20. Compressed object pointers Uncompressed Compressed 32-bit Pointer 8 4 4 Object header 16 12* 8 Array header 24 16 12 Superclass pad 8 4 4 * Object can have 4 bytes of fields and still only take up 16 bytes 17
  • 21. Avoid instances of primitive wrappers • Hard won experience with Scala 2.7.7: • a Seq[Int] stores java.lang.Integer • an Array[Int] stores int • first needs (24 + 32 * length) bytes • second needs (24 + 4 * length) bytes 18
  • 22. Avoid instances of primitive wrappers • This was fixed in Scala 2.8, but it shows that: • you often don’t know the performance characteristics of your libraries, • and won’t ever know them until you run your application under a profiler. 19
  • 23. Map footprints • Guava MapMaker.makeMap() takes 2272 bytes! • MapMaker.concurrencyLevel(1).makeMap() takes 352 bytes! • ConcurrentMap with level 1 makes sense sometimes (i.e. you don’t want a ConcurrentModificationException) 20
  • 24. Thrift can be heavy • Thrift generated classes are used to encapsulate a wire tranfer format. • Using them as your domain objects: almost never a good idea. 21
  • 25. Thrift can be heavy • Every Thrift class with a primitive field has a java.util.BitSet __isset_bit_vector field. • It adds between 52 and 72 bytes of overhead per object. 22
  • 26. Thrift can be heavy 23
  • 27. Thrift can be heavy • Thrift does not support 32-bit floats. • Coupling domain model with transport: • resistance to change domain model • You also miss oportunities for interning and N-to-1 normalization. 24
  • 28. class Location { public String city; public String region; public String countryCode; public int metro; public List<String> placeIds; public double lat; public double lon; public double confidence; 25
  • 29. class SharedLocation { public String city; public String region; public String countryCode; public int metro; public List<String> placeIds; class UniqueLocation { private SharedLocation sharedLocation; public double lat; public double lon; public double confidence; 26
  • 30. Careful with thread locals • Thread locals stick around. • Particularly problematic in thread pools with m⨯n resource association. • 200 pooled threads using 50 connections: you end up with 10 000 connection buffers. • Consider using synchronized objects, or • just create new objects all the time. 27
  • 32. Performance tradeoff Memory Time Convenient, but oversimplified view. 29
  • 33. Performance triangle Memory footprint Throughput Latency 30
  • 34. Performance triangle Compactness Throughput Responsiveness C ⨯T ⨯ R = a • Tuning: vary C, T, R for fixed a • Optimization: increase a 31
  • 35. Performance triangle • Compactness: inverse of memory footprint • Responsiveness: longest pause the application will experience • Throughput: amount of useful application CPU work over time • Can trade one for the other, within limits. • If you have spare CPU, can be pure win. 32
  • 36. Responsiveness vs. throughput 33
  • 37. Biggest threat to responsiveness in the JVM is the garbage collector 34
  • 38. Memory pools Eden Survivor Old Code Permanent cache This is entirely HotSpot specific! 35
  • 39. How does young gen work? Eden S1 S2 Old • All new allocation happens in eden. • It only costs a pointer bump. • When eden fills up, stop-the-world copy-collection into the survivor space. • Dead objects cost zero to collect. • Aftr several collections, survivors get tenured into old generation. 36
  • 40. Ideal young gen operation • Big enough to hold more than one set of all concurrent request-response cycle objects. • Each survivor space big enough to hold active request objects + tenuring ones. • Tenuring threshold such that long-lived objects tenure fast. 37
  • 41. Old generation collectors • Throughput collectors • -XX:+UseSerialGC • -XX:+UseParallelGC • -XX:+UseParallelOldGC • Low-pause collectors • -XX:+UseConcMarkSweepGC • -XX:+UseG1GC (can’t discuss it here) 38
  • 42. Adaptive sizing policy • Throughput collectors can automatically tune themselves: • -XX:+UseAdaptiveSizePolicy • -XX:MaxGCPauseMillis=… (i.e. 100) • -XX:GCTimeRatio=… (i.e. 19) 39
  • 44. Choose a collector • Bulk service: throughput collector, no adaptive sizing policy. • Everything else: try throughput collector with adaptive sizing policy. If it didn’t work, use concurrent mark-and-sweep (CMS). 41
  • 45. Always start with tuning the young generation • Enable -XX:+PrintGCDetails, -XX:+PrintHeapAtGC, and -XX:+PrintTenuringDistribution. • Watch survivor sizes! You’ll need to determine “desired survivor size”. • There’s no such thing as a “desired eden size”, mind you. The bigger, the better, with some responsiveness caveats. • Watch the tenuring threshold; might need to tune it to tenure long lived objects faster. 42
  • 46. -XX:+PrintHeapAtGC Heap after GC invocations=7000 (full 87): par new generation total 4608000K, used 398455K eden space 4096000K, 0% used from space 512000K, 77% used to space 512000K, 0% used concurrent mark-sweep generation total 3072000K, used 1565157K concurrent-mark-sweep perm gen total 53256K, used 31889K } 43
  • 47. -XX:+PrintTenuringDistribution Desired survivor size 262144000 bytes, new threshold 4 (max 4) - age 1: 137474336 bytes, 137474336 total - age 2: 37725496 bytes, 175199832 total - age 3: 23551752 bytes, 198751584 total - age 4: 14772272 bytes, 213523856 total • Things of interest: • Number of ages • Size distribution in ages • You want strongly declining. 44
  • 48. Tuning the CMS • Give your app as much memory as possible. • CMS is speculative. More memory, less punitive miscalculations. • Try using CMS without tuning. Use -verbosegc and -XX:+PrintGCDetails. • Didn’t get any “Full GC” messages? You’re done! • Otherwise, tune the young generation first. 45
  • 49. Tuning the old generation • Goals: • Keep the fragmentation low. • Avoid full GC stops. • Fortunately, the two goals are not conflicting. 46
  • 50. Tuning the old generation • Find the minimum and maximum working set size (observe “Full GC” numbers under stable state and under load). • Overprovision the numbers by 25-33%. • This gives CMS a cushion to concurrently clean memory as it’s used. 47
  • 51. Tuning the old generation • Set -XX:InitiatingOccupancyFraction to between 80-75, respectively. • corresponds to overprovisioned heap ratio. • You can lower initiating occupancy fraction to 0 if you have CPU to spare. 48
  • 52. Responsiveness still not good enough? • Too many live objects during young gen GC: • Reduce NewSize, reduce survivor spaces, reduce tenuring threshold. • Too many threads: • Find the minimal concurrency level, or • split the service into several JVMs. 49
  • 53. Responsiveness still not good enough? • Does the CMS abortable preclean phase, well, abort? • It is sensitive to number of objects in the new generation, so • go for smaller new generation • try to reduce the amount of short-lived garbage your app creates. 50
  • 54. Part III: let’s take a break from GC 51
  • 55. Thread coordination optimization • You don’t have to always go for synchronized. • Synchronization is a read barrier on entry; write barrier on exit. • Sometimes you only need a half-barrier; i.e. in a producer-observer pattern. • Volatiles can be used as half-barriers. 52
  • 56. Thread coordination optimization • For atomic update of a single value, you only need Atomic{Integer|Long}.compareAndSet(). • You can use AtomicReference.compareAndSet() for atomic update of composite values represented by immutable objects. 53
  • 57. Fight CMS fragmentation with slab allocators • CMS doesn’t compact, so it’s prone to fragmentation, which will lead to a stop-the-world pause. • Apache Cassandra uses a slab allocator internally. 54
  • 58. Cassandra slab allocator • 2MB slab sizes • copy byte[] into them using compare-and-set • GC before: 30-60 seconds every hour • GC after: 5 seconds once in 3 days and 10 hours 55
  • 59. Slab allocator constraints • Works for limited usage: • Buffers are written to linearly, flushed to disk and recycled when they fill up. • The objects need to be converted to binary representation anyway. • If you need random freeing and compaction, you’re heading down the wrong direction. • If you find yourself writing a full memory manager on top of byte buffers, stop! 56
  • 60. Soft references revisited • Soft reference clearing is based on the amount of free memory available when GC encounters the reference. • By definition, throughput collectors always clear them. • Can use them with CMS, but they increase memory pressure and make the behavior less predictable. • Need two GC cycles to get rid of referenced objects. 57
  • 61. Everything More than I ever wanted to learned about JVM performance tuning @twitter Questions? 58
  • 62. Attila Szegedi, Software Engineer @asz 59

Editor's Notes

  • #2: Here it is all together\n
  • #3: Here it is all together\n
  • #4: Here it is all together\n
  • #5: Here it is all together\n
  • #6: Here it is all together\n
  • #7: Here it is all together\n
  • #8: Here it is all together\n
  • #9: Here it is all together\n
  • #10: Here it is all together\n
  • #11: Here it is all together\n
  • #12: Here it is all together\n
  • #13: Here it is all together\n
  • #14: Here it is all together\n
  • #15: Here it is all together\n
  • #16: Here it is all together\n
  • #17: Here it is all together\n
  • #18: Here it is all together\n
  • #19: Here it is all together\n
  • #20: Here it is all together\n
  • #21: Here it is all together\n
  • #22: Here it is all together\n
  • #23: Here it is all together\n
  • #24: Here it is all together\n
  • #25: Here it is all together\n
  • #26: Here it is all together\n
  • #27: Here it is all together\n
  • #28: \n
  • #29: \n
  • #30: Here it is all together\n
  • #31: Here it is all together\n
  • #32: Here it is all together\n
  • #33: Here it is all together\n
  • #34: Here it is all together\n
  • #35: Here it is all together\n
  • #36: Here it is all together\n
  • #37: Here it is all together\n
  • #38: Here it is all together\n
  • #39: Here it is all together\n
  • #40: Here it is all together\n
  • #41: Here it is all together\n
  • #42: Here it is all together\n
  • #43: Here it is all together\n
  • #44: Here it is all together\n
  • #45: Here it is all together\n
  • #46: Here it is all together\n
  • #47: Here it is all together\n
  • #48: Here it is all together\n
  • #49: Here it is all together\n
  • #50: Here it is all together\n
  • #51: Here it is all together\n
  • #52: Here it is all together\n
  • #53: Here it is all together\n
  • #54: Here it is all together\n
  • #55: Here it is all together\n
  • #56: Here it is all together\n
  • #57: Here it is all together\n
  • #58: Here it is all together\n
  • #59: Here it is all together\n
  • #60: Here it is all together\n
  • #61: Here it is all together\n
  • #62: Here it is all together\n