Forgive me for I have allocated
Tomasz Kowalczewski
Log4j2 Logger Quiz
Log4j2 Logger Quiz
How many objects does this loop create
when debug logging is disabled?
while (true) {

LOGGER.debug("{} {}", "a", "b");

}

Log4j2 Logger
while (true) {

LOGGER.debug("{} {}", "a", "b");

}

@Override

public void debug(String format, Object arg1, Object arg2) {

logger.logIfEnabled(FQCN, Level.DEBUG, null, format,
arg1, arg2);

}
void logIfEnabled(String fqcn, Level level,
Marker marker, String message, Object... params);
String::contains Quiz
String::contains Quiz
String containsTest = "contains or not?";

StringBuilder builder = new StringBuilder("Test string that
is long");



while (true) {

if (containsTest.contains(builder)) {

System.out.println("Contains! (??)");

}

}
String::contains Quiz
public boolean contains(CharSequence s) {

return indexOf(s.toString()) > -1;

}
Sins of the young… GC
Stop the world
Sins of the young… GC
Stop the world
Time proportional to number of surviving objects
Sins of the young… GC
Stop the world
Time proportional to number of surviving objects
and old gen size (card scanning!)
Card table
Old gen: X MB
Card table (X*2000 entries)
Sins of the young… GC
Stop the world
Time proportional to number of surviving objects
and old gen size (card scanning!)
Trashing processor caches, TLB, page table, NUMA
Intel cache hierarchies
L1 & L2 caches are core local
L3 is shared among all cores
in a socket
contains all data in L1 & L2
Putting data into local
cache of one core may
evict data from local
caches of another core
Sins of the young… GC
No algorithm is lock free (let alone wait free) if it
allocates objects on its critical path
Amdahl’s Law
…the effort expended on achieving high parallel
processing rates is wasted unless it is accompanied by
achievements in sequential processing rates of very
nearly the same magnitude.
1%
1024
Serial part is time spent in GC
2%
3%
Escape analysis
After escape analysis, the server compiler eliminates
scalar replaceable object allocations and associated
locks from generated code. The server compiler also
eliminates locks for all non-globally escaping objects. It
does not replace a heap allocation with a stack
allocation for non-globally escaping objects.
https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-
enhancements-7.html#escapeAnalysis
Profiling
YourKit et al.
Java Mission Control
Beware of false positives
http://psy-lob-saw.blogspot.de/2014/12/the-
escape-of-arraylistiterator.html
Basic techniques
Use ThreadLocal objects
Created on first use
Confined to single thread
Less effective if threads are short lived or there are
thousands of them
JDK already uses this pattern
JDK Use of Thread Locals
• StringCoding:



private final static
ThreadLocal<SoftReference<StringDecoder>> decoder =

new ThreadLocal<>();


private final static
ThreadLocal<SoftReference<StringEncoder>> encoder =

new ThreadLocal<>();

•ThreadLocalCoders.encoderFor(“UTF-8”)
•ThreadLocalRandom
Strings
In Java 8
No easy way to getBytes of a string without allocating new
array
No easy way to encode/decode strings without allocation
I Java 9 (thanks to Richard Warburton)
Creating new Strings from ByteBuffer and Charset
getBytes with externally provided byte array or ByteBuffer and
Charset
ConcurrentHashMap::size
Java 6 Java 8
int[] mc = new int[segments.length];
// Try a few times to get accurate count.
// On failure due to
// continuous async changes in table,
// resort to locking.
for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) {
for (int i = 0; i < segments.length; ++i) {
sum += segments[i].count;
mcsum += mc[i] = segments[i].modCount;
}
...
CounterCell[] as = counterCells; CounterCell a;

long sum = baseCount;

if (as != null) {
for (int i = 0; i < as.length; ++i) {

if ((a = as[i]) != null)

sum += a.value;

}

}


return sum;
•Allocate array
•Put segment sizes in it
•Lock everything if modCount
changes
•Sum per segment size counters
updated by other operations
Splitter (Guava)
Alternative to
public String[] split(String regex)
Splitter (Guava)
public Iterable<String> split(final CharSequence sequence)
Splitter (Guava)
public Iterable<String> split(final CharSequence sequence)
Dose not force caller to use immutable String
Splitter (Guava)
public Iterable<String> split(final CharSequence sequence)
Dose not force caller to use immutable String
Returns minimal interface that does the job
Does not commit to creating new collection object
(List<String> etc.) Will return next token when asked for
The bad: will return new String for each token
Compare to String[] String.split(String)
API is the key
String[] String.split(String)
vs
Iterable<String> Splitter.split(final CharSequence sequence)
byte[] String.getBytes()
vs.
String.getBytes(byte[] copyTo, int offset);
Conclusion
If not for GC pauses we would not care at all about allocation.
We would look at it ONLY when there are performance
hotspots in code related to object construction and
initialisation.
API is the key, well designed gives freedom:
for users to choose version that allocates or reuses user
supplied objects
for implementers to optimise in time without changing API
Bonus: collections options
https://github.com/OpenHFT/Koloboke
Fast, space efficient
Uses one array to lay out keys and values
Provides hash sets, hash maps
With primitive specialisation
“Project was started as a Trove fork, but has nothing in common with Trove for already very
long time”
Cliff Clicks’ High Scale Lib
http://sourceforge.net/projects/high-scale-lib/
Old stuff but might still scale better

Forgive me for i have allocated

  • 1.
    Forgive me forI have allocated Tomasz Kowalczewski
  • 2.
  • 3.
    Log4j2 Logger Quiz Howmany objects does this loop create when debug logging is disabled? while (true) {
 LOGGER.debug("{} {}", "a", "b");
 }

  • 4.
    Log4j2 Logger while (true){
 LOGGER.debug("{} {}", "a", "b");
 }
 @Override
 public void debug(String format, Object arg1, Object arg2) {
 logger.logIfEnabled(FQCN, Level.DEBUG, null, format, arg1, arg2);
 } void logIfEnabled(String fqcn, Level level, Marker marker, String message, Object... params);
  • 5.
  • 6.
    String::contains Quiz String containsTest= "contains or not?";
 StringBuilder builder = new StringBuilder("Test string that is long");
 
 while (true) {
 if (containsTest.contains(builder)) {
 System.out.println("Contains! (??)");
 }
 }
  • 7.
    String::contains Quiz public booleancontains(CharSequence s) {
 return indexOf(s.toString()) > -1;
 }
  • 8.
    Sins of theyoung… GC Stop the world
  • 9.
    Sins of theyoung… GC Stop the world Time proportional to number of surviving objects
  • 10.
    Sins of theyoung… GC Stop the world Time proportional to number of surviving objects and old gen size (card scanning!)
  • 11.
    Card table Old gen:X MB Card table (X*2000 entries)
  • 12.
    Sins of theyoung… GC Stop the world Time proportional to number of surviving objects and old gen size (card scanning!) Trashing processor caches, TLB, page table, NUMA
  • 13.
    Intel cache hierarchies L1& L2 caches are core local L3 is shared among all cores in a socket contains all data in L1 & L2 Putting data into local cache of one core may evict data from local caches of another core
  • 14.
    Sins of theyoung… GC No algorithm is lock free (let alone wait free) if it allocates objects on its critical path
  • 15.
    Amdahl’s Law …the effortexpended on achieving high parallel processing rates is wasted unless it is accompanied by achievements in sequential processing rates of very nearly the same magnitude.
  • 16.
    1% 1024 Serial part istime spent in GC 2% 3%
  • 17.
    Escape analysis After escapeanalysis, the server compiler eliminates scalar replaceable object allocations and associated locks from generated code. The server compiler also eliminates locks for all non-globally escaping objects. It does not replace a heap allocation with a stack allocation for non-globally escaping objects. https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance- enhancements-7.html#escapeAnalysis
  • 18.
    Profiling YourKit et al. JavaMission Control Beware of false positives http://psy-lob-saw.blogspot.de/2014/12/the- escape-of-arraylistiterator.html
  • 19.
    Basic techniques Use ThreadLocalobjects Created on first use Confined to single thread Less effective if threads are short lived or there are thousands of them JDK already uses this pattern
  • 20.
    JDK Use ofThread Locals • StringCoding:
 
 private final static ThreadLocal<SoftReference<StringDecoder>> decoder =
 new ThreadLocal<>(); 
 private final static ThreadLocal<SoftReference<StringEncoder>> encoder =
 new ThreadLocal<>();
 •ThreadLocalCoders.encoderFor(“UTF-8”) •ThreadLocalRandom
  • 21.
    Strings In Java 8 Noeasy way to getBytes of a string without allocating new array No easy way to encode/decode strings without allocation I Java 9 (thanks to Richard Warburton) Creating new Strings from ByteBuffer and Charset getBytes with externally provided byte array or ByteBuffer and Charset
  • 22.
    ConcurrentHashMap::size Java 6 Java8 int[] mc = new int[segments.length]; // Try a few times to get accurate count. // On failure due to // continuous async changes in table, // resort to locking. for (int k = 0; k < RETRIES_BEFORE_LOCK; ++k) { for (int i = 0; i < segments.length; ++i) { sum += segments[i].count; mcsum += mc[i] = segments[i].modCount; } ... CounterCell[] as = counterCells; CounterCell a;
 long sum = baseCount;
 if (as != null) { for (int i = 0; i < as.length; ++i) {
 if ((a = as[i]) != null)
 sum += a.value;
 }
 } 
 return sum; •Allocate array •Put segment sizes in it •Lock everything if modCount changes •Sum per segment size counters updated by other operations
  • 23.
    Splitter (Guava) Alternative to publicString[] split(String regex)
  • 24.
    Splitter (Guava) public Iterable<String>split(final CharSequence sequence)
  • 25.
    Splitter (Guava) public Iterable<String>split(final CharSequence sequence) Dose not force caller to use immutable String
  • 26.
    Splitter (Guava) public Iterable<String>split(final CharSequence sequence) Dose not force caller to use immutable String Returns minimal interface that does the job Does not commit to creating new collection object (List<String> etc.) Will return next token when asked for The bad: will return new String for each token Compare to String[] String.split(String)
  • 27.
    API is thekey String[] String.split(String) vs Iterable<String> Splitter.split(final CharSequence sequence) byte[] String.getBytes() vs. String.getBytes(byte[] copyTo, int offset);
  • 28.
    Conclusion If not forGC pauses we would not care at all about allocation. We would look at it ONLY when there are performance hotspots in code related to object construction and initialisation. API is the key, well designed gives freedom: for users to choose version that allocates or reuses user supplied objects for implementers to optimise in time without changing API
  • 29.
    Bonus: collections options https://github.com/OpenHFT/Koloboke Fast,space efficient Uses one array to lay out keys and values Provides hash sets, hash maps With primitive specialisation “Project was started as a Trove fork, but has nothing in common with Trove for already very long time” Cliff Clicks’ High Scale Lib http://sourceforge.net/projects/high-scale-lib/ Old stuff but might still scale better