Dynamic Language VMs Ruby 1.9 Lourens Naude, WildfireApp.com
Background Independent Contractor Ruby / C / integrations
Well versed full stack
Architecture WildfireApp.com Social Marketing platform
Large whitelabel clients
Bursty traffic – Lady Gaga, EA, Gatorade etc.
 
RUBY VM INTERNALS ?
A GOOD CRAFTSMEN KNOWS HIS TOOLS
A BAD CRAFTSMEN BLAMES HIS TOOLS
Typical public facing apps Interaction patterns Request / response
Time
Event driven Overheads Data transfer (I/0)
Serialization / coercion (CPU)
VM – allocation, symbol tables etc. (CPU + mem)
Business requirements (CPU)
Ruby daemon - strace Process 5856 detached % time  calls  syscall ------  ------- ------------- 89.69  5092  recvfrom 5.35  5093  sendto 2.49  26300  stat 2.05  11004  clock_gettime
Ruby daemon - ltrace % time  calls  function ------  -------- -------- 95.78  635173  memcpy 1.38  25862  malloc 0.79  14984  free 0.60  11403  strcmp
System Resources Data latency CPU cache
Memory – local
Disk - local
Memory + disk - remote Record retrieval with ORM Fetch results (local/remote memory + disk)
Serialization + conversion (CPU)
Object instantiation (CPU + memory)
Optional memcached (local or remote memory)
RUBY ?
Conversion – rows to hash Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_rows "SELECT * FROM users" } end end user  system  total  real 0.300000  0.040000  0.340000 (  0.505095)
Conversion – rows to objects Benchmark.bm do |b| b.report do 1000.times{ ActiveRecord::Base.connection.select_all "SELECT * FROM users" } end end user  system  total  real 0.510000  0.050000  0.560000 (  0.719201)
Instantiation Benchmark.bm do |b| b.report do 100_000.times{ 'string'.dup } end end user  system  total  real 0.040000  0.000000  0.040000 (  0.043791)
Serialization – load + dump Benchmark.bm do |b| b.report do 100_000.times{ Marshal.load(Marshal.dump('ruby string')) } end end user  system  total  real 1.660000  0.010000  1.670000 (  1.699882)
Roadmap VM Architecture Symbol table
Opcodes / instructions
Dispatch
Optimizations Ruby language Object model
Garbage Collection
Contexts and control flow
Concurrency
VM ARCHITECTURE
 
Changes Ruby 1.8 artifacts Parser && AST nodes
Object model
Garbage Collection
No immediate performance gains for String manipulation etc. Codegen phase Better optimization hooks
Faster runtime
AST AND CODEGEN
 
Abstract Syntax Tree (AST) Structure Grammar representation
Annotations attach semantics to nodes
Possible to refactor the tree – more nodes, less complexity Example nodes Literals, values and assignments
Method calls, arguments and return values
Jumps – if, else, iterators
Unconditional jumps – exceptions, retry etc.
Code generation How it works Converts the AST to compiled code segments
Reduces a tree to a linear and ordered instruction set
Fast execution – no tree walking + native code Workflow Preprocessing – AST refactoring (!YARV)
Codegen, nodes -> instruction sequences
Postprocessing – replace with optimal instruction sequences (peephole optimization)
Pre and postprocessing phases may be multiple passes
LOOKUPS
 
Symbol / Hash tables How it works Constant time access to int/char indexed values
Table defaults: 11 bins, 5 entries per bin
Bins++, sequential lookup inside bins
Lookup of methods, variables, encodings etc. Symbol Entity with both a String and Number representation
!(String || Symbol), points to a table entry
Developer identifies by name, VM by int
Immutable for performance – watch out for memory
VM INSTRUCTIONS
VM instructions / opcodes Stateless functions 80+ currently
Generated from definitions at interpreter compile time (existing ruby requirement for 1.9)
Instruction / opcode / operands notation Categories and examples variable: get or set local variable
class / module: definition
method / iterator: invoke method, call block
Optimization: redefines common +, <<, * contracts
Managing opcode sequences Stack Machine 2 instruction types: push && pop
Move / copy values, top of stack -> elsewhere
SP: top of stack pointer, BP: bottom of stack pointer Example %w(a b c)
Put strings “a”, “b” and “c” on the stack
Fetch top 3 stack elements
Create an array from them
Instruction sequence Opcode collection Instruction dispatch can be a bottleneck
Optimizing simple instructions is very important
Likely a small subset of the typical web app's hot path Dispatch techniques Direct Threaded Dispatch : fastest jump to next opcode / instruction
Switch Dispatch : slower, but portable
DISPATCH AND CACHE
Dispatch techniques Direct Threaded Dispatch Represents an instruction by the address of the routine that implements it
Forth, Python 3
Not portable: GCC first class labels Switch Dispatch CPU branch mispredictions, depending on pipeline length
Up to 50% slower than Threaded dispatch
Portable
VM Caches Versioning State counter scopes caches to the current VM state
Lazy invalidation – just bump the version Expires on constant definition
constant removal
method definition
method removal
method cache changes (covered later)
OPTIMIZATIONS
Optimization Limitations Static Analysis Examine source code without execution

RailswayCon 2010 - Dynamic Language VMs