refining var-handles in Valhalla

John Rose, 3-2025

(Revised 2025-04-04.)

It may be time for another round of “find the primitive” regarding var-handles and Unsafe.

The extremely tricky JDK code for var-handles performs the same access to (heap) variables as getfield (and putfield, getstatic, putstatic, aaload, aastore, iaload, iastore, and all the rest). Var-handles also provide certain other atomic operations, notably CAS. As a key design problem, the way a var-handle accesses its variable must be completely coherent with analogous operations in the interpreter, in the JIT (all JITs), and JNI. There can be only one set of rules per variable, and all access methods to that variable must follow the same identical set of rules. These rules arise from the “layout” of the variable, as assigned by the VM when the variable is created. The VM must communicate all relevant layout conditions to all of its components that access variables, including the extremely tricky JDK code just mentione.

One old design is Unsafe::getValue and friends. It takes an “unsafe address” as a pair of object reference and long offset, just like simpler operations such as getByte. But it also takes an additional metadata argument which is intended to convey the layout of the variable being read.

In this old design, that extra argument is a Class mirror. The obvious flaw here is that this information, while necessary, is not sufficient to inform the get-method (or any other method) how to decode whatever is stored at the address. For example, a struct final value might be fully flattened, while a non-final value (either volatile or not) might be represented with an indirection, a managed pointer. These are two different layouts with incompatible rules for access. The variable’s layout matters crucially, and needs be given to the getValue method; it cannot be inferred from context.

Before Valhalla, the VM used a fixed, small number of data layouts to format variables in the Java heap. Each of the original layouts has its own bespoke access methods in Unsafe. There were a few extra contextual conditions to keep track of as well. For example, a variable marked final must not be written by a var-handle, and a variable marked volatile must be properly fenced on reads and writes. But these conditions are simple, few in number, and easy to determine.

Today, there are several options for how to format a value instance variable, which an be summarized as “full flat”, “half flat”, and “reference to buffer”. Some of these layouts may come with a side order of nullity, in the form a “null flag” (currently a separate byte, where zero means “no instance, just null”).

It is as if there are many quasi-modifier bits that a var-handle must take into account: nullable or null-restricted, reference or flattened, atomic or loose. Some of those modifiers are easy to find since they (like volatile) are user-visible. Others are invisible to the user and must be secretly shared between the VM and the var-handle code.

VM engineers use a formal terminology in the source code, for example in the file layoutKind.hpp. Here is a summary of those concepts, as found in the LayoutKind enumeration:

REFERENCE - the variable is a managed pointer (32 or 64 bits, nullable)
NON_ATOMIC_FLAT - the variable has multiple subfields, often larger than 64 bits total
ATOMIC_FLAT - the variable packs in 64 bits, and may have (small) subfields
NULLABLE_ATOMIC_FLAT - same as ATOMIC_FLAT, but with an extra byte null flag
NULLABLE_NON_ATOMIC_FLAT - same as NON_ATOMIC_FLAT but again with a null flag
BUFFERED - special case used only for read-only buffered values (copies one of the others)

Only the NULLABLE_*_FLAT ones are both flat and nullable. Most value classes are limited to (at most) “half flat” (ATOMIC) layouts, since most value variables require fully consistent (“atomic”) access. Such “half flat” variables can be either nullable or not. If a value is too large to fit in 64 bits (including any required null flag), then the VM uses REFERENCE as a fallback, and the reference points to a value in BUFFERED format. The VM only uses NON_ATOMIC_FLAT with specialized classes which have explicitly opted into “full flatness”. Interestingly, a volatile variable of a type normally assigned NON_ATOMIC_FLAT will use ATOMIC_FLAT, if possible, else REFERENCE, just for that volatile variable. Thus, “full flat” classes (capable of NON_ATOMIC_FLAT) can fallback to “half flat” or even to non-flat (REFERENCE) layouts. The “full flat” layouts can also be used with so-called “strict final” fields, since they cannot suffer from race conditions. The “full flat” but nullable layout NULLABLE_NON_ATOMIC_FLAT is unsafe except for strict finals.

Going back to the Unsafe API, if the layout of a variable is a simple indirect reference to the buffered value (REFERENCE), then getReference is fully adequate. And if the value class only ever has one layout assigned to it, then maybe getValue could be trusted do the job, since it could guess the layout. But there’s a problem: we don’t know if the variable has a null flag. So in the end we need some sort of mode descriptor (at least a boolean noting the null flag, but probably more, in order to distinguish “full flat” from “half flat” as well).

So getValue, in its current form, must be abandoned as inadequate, or else enhanced to take different or more descriptor arguments, rather than just a class mirror.

In addition, even if getValue were enhanced, it might be better to refactor it into smaller primitives, which distinguish separable concerns. To that end, let’s consider defining one operation copyConsistentValue to atomically move one atomic unit of memory, stored in the heap, to another. That new operation will also take enhanced layout descriptor arguments; if we don’t like it better than getValue, then we should go back and consider adding the extra argument or two to getValue (and its brothers).

Given an atomic-capable copy promitive, we will also need more operations to decompose a copied word into the various value fields that it encodes, and also check a null flag (if any). And for writes, we must define an inverse operation to recompose the fields and flag into a word to store into the heap, to be followed by the another call to copyConsistentValue.

But perhaps surprisingly, once copyConsistentValue is defined properly, and given widely-used VM support for flattened arrays (aaload, aastore, and factories), we can make use of the same old Unsafe peek/poke (get/set) methods previously used to access regular object fields in the heap. The thing that is different here is that copyConsistentValue provides an abstraction boundary, protecting the var-handle logic from the VM protocol for atomic reads and writes, and also from some of the details of the VM’s choices of layout.

In the end, if getValue is retained (with improved arguments) it can be built on top of those primitives.

Such a refactoring of primitives seems attractive for a few reasons. First, all of the parts of value processing which decompose (or recompose) value into parts can be expressed as ordinary reads and writes against a thread-private buffer, holding just that value. A length-one array seems perfect for a private buffer, which is the venue for such peeking and poking. Such an array must not be shared with other threads, which is a reasonably simple condition to ensure.

Second, no new primitives are needed for the actual tasks of peeking (and poking) value components from a thread-private buffer; it is a straighforward application of getByte, getReference, and all the other unsafe getters which know about fixed layouts. Note also that read-only operations are easy and natural to perform against a heap-resident buffer, if it is immutable. Note also that value fields which are themselves sub-values are handled recursively by peeking and poking at known offsets, if they are flattened inside the containing value.

Third, the hard details of atomically reading and writing a packed (“half flat”) value, encoded in a heap word, are separated from the other hard details of how that object’s components are laid out. What’s hard about atomic reads and writes? Well, for starters, you can’t always do it; you have to know when it is an option. (We need a new query for this, isConsistentValue.) Second, there might be special modes for such a variable beyond read and write: acquire, release, CAS, etc. (We need a second new query for this, isPortablyAtomic or the like. We don’t want to leak VM implementation details uncontrollably into the VH API, so some consistent variables will only secretly be atomic-capable.)

Fourth, there are very tricky details of talking to the GC, when loading or storing encoded values, if and when they contain managed references. The GC may need to be notified of loads and/or stores of managed references, even if they are bundled tightly inside a composite value layout. Consider possible support for a pair of managed references, or a reference and a small primitive, stored in one atomic unit (feasible today with 32-bit oops on 64-bit machines). To do this we need a “hook” to notify the GC when oop-stuff happens. Making an explicit copyConsistentValue call tells us where to put such GC hooks.

Therefore, one way (maybe new) to stack the necessary parts is to define two new queries and a one new memory operation to copy atomic values into and out of the heap:

isFlat: is this value variable flattened in any way, or is that just a plain old reference we see stored there?
isConsistent: is this flat value variable (field/array) atomically tight (all reads and writes are consistent), or loose (weakly consistent, like a ComplexDouble with NON_ATOMIC_FLAT layout)? (Note: The term Consistent be renamed Atomic, if we don’t mind the connotations.)
isPortableAtomic: given a tight value variable, does it (portably) support CAS and other VH atomics? (There are rules to be defined not described here. Do remember that any “tight” variable is secretly atomic-capable, since plain read and write are already atomic. I settled on Consistent in order to save Atomic for variables that actually support the extended atomic API in var-handles. It can be very confusing to mix up the two concepts.)
copyConsistentValue (and other atomic ops): Given an atomically “tight” value variable in the heap, copy it to another variable of the same kind. This provides an alternative to getReference and its brothers, to load or store (even CAS) from or to that variable, respecting the “protocol” the VM has assigned to the variable. A 1-array (of the same same layout as the “tight” variable) can serve as a private buffer on either side of the copy.

These primitives must be parameterized by a constant “metadata” item that could simply be a field or array-kind descriptor, or perhaps is derived from that descriptor.

In practice, there are likely to be two or more metadata arguments, one being a class mirror, and another being an opaque pointer or 64-bit integer cookie. If the VM has stashed more metadata about the variable in question, it is likely to be a table connected to the class metadata (accessible via the mirror), and possibly indexed by a bitfield in the cookie. (The VM obviously hands out the cookies somehow, through yet another query API connected with fields and array layouts.) The Valhalla VM prototype has a side-array, for each class containing flattened fields, of metadata useful to these operations, such as, “Does this field have a null flag and if so where is it?”

Given those primitives we know when we are permitted to build out CAS, acquire, release, etc: isConsistent and isPortableAtomic are both true. This will be true, at a minimum, for volatile fields; there are a few other cases which can also be portably defined. Those other cases must include migrated wrappers like Integer.

Digression: The hard part about atomics is making portable rules for which classes support them. It must be easy to use. It must not vary across platforms or VM optimizations. Personally, I would prefer rules which include all value classes which (recursively through null-restricted sub-values fields) contain at most one field, since such classes, if flattened, will never overflow 64 bits. A simpler, initially serviceable definition, covering the wrappers, would just say, “one field of primitive type, and you can be a portable atomic”. More subtly, “N fields, all of primitive types, totalling no more than 64 bits. But it’s tricky… Now back to the problem of var-handles.

The primitives proposed above can help us spin up a read or write method for an arbitrary value, like getValue claims to do. In pseudocode it could work like this:

performGetValue(UnsafeAddress r; constexpr VariableMetaData vmetadata) {
  if (!vmetadata.isFlat()) {
     return getReference(r)
  } else if (!vmetadata.isConsistent()) {
    // make a loose group of individual reads
    // Note: variable consistency depends on many factors!
    return getLooseValue(r, vmetadata)
  } else {
    Object[] a = getConsistentValue(r, vmetadata);
    // use an unshared array to convey one hardware memory word
    // works like: a := new vmetadata[]{ *r }
    return a[0]  // use JVM native aaload
    // or maybe load it as a loose value from unshared memory:
    //return getLooseValue(&a[0], vmetadata)
  }
}
  where
getConsistentValue(UnsafeAddress src; constexpr VariableMetaData vmetadata) {
  Object[] a = vmetadata.newFlatArray(1)
  UnsafeAddress dest = &a[0]
  copyConsistentValue(src, dest, vmetadata)
  return a
}
  where
getLooseValue(UnsafeAddress r; constexpr VariableMetaData vmetadata) {
  if (vmetadata.hasNullFlag() && getByte(r + vmetadata.nullFlagOffset()) == 0)
    return null
  // use an unshared array to assemble the value from components
  Object[] a = vmetadata.newFlatArray(1);
  for (var c in r.nonStaticFieldMetadata()) {
    // recursively get and store r.c, depending on c.vmetadata
    var x = performSubfieldGet(r + c.offset, c.vmetadata)
    performSet(&a[0] + c.offset, c.vmetadata, x)
  }
  return a[0]  // use aaload bytecode to store value out of atomic format
}
  where
performSubfieldGet(UnsafeAddress r; constexpr VariableMetaData vmetadata) {
  if (vmetadata == byte.vmetadata)  return getByte(r)
  …
  if (vmetadata.isFlat())  return getValue(r, vmetadata)
  return getReference(r)
}

For setting, similar actions are performed in reverse:

performSetValue(UnsafeAddress r, Object x; constexpr VariableMetaData vmetadata) {
  if (!vmetadata.isFlat()) {
     setReference(r, x)
  } else if (!vmetadata.isConsistent()) {
    // make a loose group of individual writes
    return performSetLooseValue(r, x, vmetadata)
  } else {
    setConsistentValue(r, x, vmetadata);
  }

}
  where
setConsistentValue(UnsafeAddress dest, Object x; constexpr VariableMetaData vmetadata) {
  Object[] a = vmetadata.newFlatArray(1)
  a[0] = x  // use aastore bytecode to store x into atomic format
  UnsafeAddress src = &a[0]
  copyConsistentValue(src, dest, vmetadata)
}
  where …

The point here is to express all the field-wise (and null-flag) logic separately from the low-level hardware memory read (or write). The good old Unsafe peek/poke routines do their stuff on a private memory buffer, while the new routine (copyConsistentValue) performs only the single-word memory transfer, into or out of a new 1-array buffer, which is private. The var-handle logic can use the private buffer to peek and poke individual value fields, safe from interference from other threads.

The complexity of the “protocol” for reads and writes is divided between copyConsistentValue and the preexisting peek and poke methods. The new copier intrinsic method knows how to transfer a single heap word safely and atomically between two heap variables of the same layout. And the var-handle code uses plain array instructions to “talk to” its own side of the data sconnection.

The copier method can assume it is always given addresses of arrays, since both arguments are under the control of the var-handle logic. Of course, for that reason (and for others), the method also needs to be locked away somewhere safe, for experts only.