
The Data
primitive — not to be confused with the DATA
constant — was added to the language in Ruby 3.2.0 and is a minimal, immutable, and non-enumerable value-only class. The Data
class is not a Struct but is Struct
-like in nature with a limited Object API. There are several unique aspects to this primitive worth exploring.
Overview
For starters, I’d recommend reading my Struct article before proceeding because knowing how structs work will be helpful when comparing and contrasting with data objects since they are similar. That said, here’s how to construct, initialize, and interact with a data object:
Point = Data.define :x, :y
point = Point.new 1, 2
point.x # 1
point.y # 2
point.to_h # {:x=>1, :y=>2}
point.to_s # #<data Point x=1, y=2>
point.with y: 5 # #<data Point x=1, y=5>
point.members # [:x, :y]
point.inspect # #<data Point x=1, y=2>
point.frozen? # true
As you can see, the Object API is quite small and definitely smaller than that of a Struct. Additionally, a data object doesn’t inherit from Enumerable
so can’t it be iterated over or have attributes accessed via #[]
like a Struct can. Despite the limited feature set, Data
objects have the following advantages:
-
Great for concurrency when used with Ractors, Fibers, Threads, etc.
-
Great for pattern matching (more on this later).
-
Bypasses the baggage of the
keyword_init: true
flag as found when using a Struct which makes working with positional or keyword arguments more interchangeable.
There is one major disadvantage which has to do with overriding the #initialize
method in that it only accepts keyword parameters. It’s an odd and surprising design choice but will be explained later.
Construction
Construction of a data object is like a Struct
but is, sadly, an inconsistent departure.
Define
You must use .define
to define your data object. Example:
Point = Data.define :x, :y
This is definitely inconsistent from how you’d use .new
to construct a Struct
. I’ll admit that I like the use of .define
more than .new
and wish Struct
was updated to use .define
as well so we had more consistency between these two value objects.
Subclass
Much like a Struct, you can subclass:
class Inspectable < Data
def inspect = to_h.inspect
end
Point = Inspectable.define :x, :y
As mentioned with subclassing a Struct, you’re much better off composing your objects rather than using complicated inheritance structures. I don’t recommend this approach.
Initialization
Initialization is similar to a Struct except arguments are strictly enforced.
New
As hinted at earlier, you can initialize a data instance via the .new
class method using either positional or keyword arguments:
# Positional
point = Point.new 1, 2
# Keyword
point = Point.new x: 1, y: 2
All arguments are required. Otherwise, you’ll get an ArgumentError
(you can avoid this stipulation by defining defaults which will be explained later):
point = Point.new 1 # missing keyword: :y (ArgumentError)
point = Point.new x: 1 # missing keyword: :y (ArgumentError)
Mixing and matching of arguments isn’t allowed either:
point = Point.new 1, y: 2
# wrong number of arguments (given 2, expected 0) (ArgumentError)
Anonymous
You can anonymously create a new data instance via single line construction and initialization. Example:
# Positional
point = Data.define(:x, :y).new 1, 2
# Keyword
point = Data.define(:x, :y).new x: 1, y: 2
The problem with anonymous data objects is that they are only useful within the scope they are defined as temporary and short lived objects. Worse, you must redefine them each time you want to use them. For anything more permanent, you’ll need to define a constant for improved reuse. That said, anonymous data objects can be handy for one-off situations like scripts, specs, or code spikes.
Brackets
There is a shorter way to initialize a data object and that’s via square brackets:
# Positional
point = Point[1, 2]
# Keyword
point = Point[x: 1, y: 2]
This is my favorite form of initialization and for two important reasons:
-
Brackets require three less characters to type.
-
Brackets signify, more clearly, you are working with a struct/data object versus a class which improves readability.
Immutability
By default, Data
is immutable but only in a shallow way. This means only the top level attributes of your data object but nothing that is deeply nested. Example:
Basket = Data.define :label, :items
basket = Basket[label: "Demo", items: [:apple, :orange]]
basket.frozen? # true
basket.label = "Test" # can't modify frozen (FrozenError)
basket.items.push :pear # [:apple, :orange, :pear]
As you can see, the data is frozen but is a shallow freeze. This is why we were able to mutate the items
array. To deep freeze data, you can use Ractor
. Example:
Basket = Data.define :label, :items
basket = Ractor.make_shareable Basket[label: "Demo", items: [:apple, :orange]]
basket.frozen? # true
basket.label = "Test" # can't modify frozen (FrozenError)
basket.items.push :pear # can't modify frozen Array (FrozenError)
Notice, by using Ractor.make_shareable
, this applies a deep freeze on all attributes of your data object. Using Ractor
like this isn’t it’s primary purpose but can be handy. The other way to tackle this is to iterate over all attributes in your data object, at initialization, by freezing each member. Definitely more cumbersome than using Ractor
because you’d also have to traverse deeply nested objects and freeze each one.
Defaults
You can provide defaults by defining an #initialize
method. Example:
Point = Data.define :x, :y do
def initialize x: 1, y: 2
super
end
end
point = Point.new
point.x # 1
point.y # 2
Continuing with the above example, this also means you now have the flexibility to use partial arguments when defaults are defined:
Point.new 0 # #<data Point x=0, y=2>
Point.new x: 0 # #<data Point x=0, y=2>
One stipulation — when defining defaults — is all keywords must be present. For instance, the following is syntactically correct but unusable and will throw an error:
Point = Data.define :x, :y do
def initialize x: 1
super
end
end
Point.new 0, 1 # unknown keyword: :y (ArgumentError)
Point.new x: 0, y: 1 # unknown keyword: :y (ArgumentError)
That said, you can fix the above by supplying all keyword arguments with only the defaults you need. Here’s a modification to the above code which allows you to supply a default value for only one of the parameters:
Point = Data.define :x, :y do
def initialize x: 1, y:
super
end
end
Point.new 0, 1 # #<data Point x=0, y=1>
Point.new x: 0, y: 1 # #<data Point x=0, y=1>
Again, the only difference between this code snippet and the earlier code snippet is that all keyword arguments are defined even if only a subset of defaults are supplied. You can also keyword argument forwarding:
Point = Data.define :x, :y do
def initialize(**) = super
end
Point[1, 2]
#<data Point x=1, y=2>
Once incoming arguments are passed to super
, further modification of attributes is impossible because they are immediately frozen and inaccessible. With a Struct you could use #[]
to access member values but there is no such method for a Data object.
Values
Unlike a Struct, values must be messaged directly which means you can’t access them via #[]
or assign new values. For example, this won’t work:
point[:x] # NoMethodError
point.x = 5 # NoMethodError
point.each { |value| puts value } # NoMethodError
All you can do is ask for a value (as shown earlier):
point.x # 1
point.y # 2
With
Unique to Data
objects, you can make a shallow copy your instance (i.e. instance variable copies but not the objects referenced by them). This makes for a fast way to build altered versions of your data. Consider the following:
Point = Data.define :x, :y
point = Point[1, 2]
point.with x: 2, y: 3 # #<data Point x=2, y=3>
point.with x: 0 # #<data Point x=0, y=2>
point.with y: 0 # #<data Point x=1, y=0>
point.with bogus: "✓" # unknown keyword: :bogus (ArgumentError)
Notice that you can quickly build a new version of your original data object with the same attributes. You can also mix and match all or some of your attributes. However, attempting to reference an attribute that doesn’t exist will result in an ArgumentError
.
Equality
Structs and data objects share the same superpower in that they are both value objects by default. Even better, data objects take this one step further since all values are immutable by default. The following illustrates this more clearly:
a = Point[x: 1, y: 2]
b = Point[x: 1, y: 2]
a == b # true
a === b # true
a.eql? b # true
a.equal? b # false
Pattern Matching
Pattern matching is supported by default and is identical in behavior to Struct pattern matching. One difference worth pointing out is that you can define data without any arguments which isn’t possible with a struct. Example:
module Monads
Just = Data.define :content
None = Data.define
end
This is handy when pattern matching:
case Monads::Just[content: "demo"]
in Monads::Just then "Something"
in Monads::None then "Nothing"
end
# "Something"
case Monads::None.new
in Monads::Just then "Something"
in Monads::None then "Nothing"
end
# "Nothing"
Refinements
You can refine your Data
objects by using the Refinements gem especially if you’d like obtain the differences between two instances. Example:
#! /usr/bin/env ruby
# frozen_string_literal: true
# Save as `demo`, then `chmod 755 demo`, and run as `./demo`.
require "bundler/inline"
gemfile true do
source "https://rubygems.org"
gem "refinements"
end
using Refinements::Data
Point = Data.define :x, :y
point_a = Point[x: 1, y: 2]
point_b = Point[x: 1, y: 3]
point_a.diff point_b
# {y: [2, 3]}
If you were to run the above script, you’d see the same output as shown in the code comments. The above is only a small taste of how you can refine your structs. Feel free to check out the Refinements gem for details or even add it to your own projects.
Wholeable
In situations where a Data
or Struct
isn’t enough, you can use the Wholeable gem turn a Class
into a whole value object. Example:
class Person
include Wholeable[:name, :email]
def initialize name:, email:
@name = name
@email = email
end
end
jill = Person[name: "Jill Smith", email: "[email protected]"]
# #<Person @name="Jill Smith", @email="[email protected]">
jill.name # "Jill Smith"
jill.email # "[email protected]"
You only need to include Wholeable with the attributes that make up your whole value object with required and/or optional keys as desired.
Avoidances
As emphasized with the Struct class, avoid anonymous inheritance and use of constants within your data objects. In other words, don’t do the following:
class Point < Data.define(:x, :y)
end
Point.ancestors
# [Point, #<Class:0x000000010da8de88>, Data, Object, Kernel, BasicObject]
Anonymous superclasses (i.e. <Class:0x000000010da8de88>
) are wasteful and inefficient, performance-wise. Definitely refer to the original Struct article to learn more on why this is a bad practice.
Benchmarks
In terms of performance, data objects can be faster than structs but it depends on whether you are using positional or keyword arguments. Consider the following YJIT-enabled benchmark:
#! /usr/bin/env ruby
# frozen_string_literal: true
# Save as `benchmark`, then `chmod 755 benchmark`, and run as `./benchmark`.
require "bundler/inline"
gemfile true do
source "https://rubygems.org"
gem "benchmark-ips"
gem "debug"
end
Warning[:performance] = false
MAX = 1_000_000
StructDemo = Struct.new :to, :from
DataDemo = Data.define :to, :from
Benchmark.ips do |benchmark|
benchmark.config time: 5, warmup: 2
benchmark.report "Data" do
MAX.times { DataDemo[to: "Mork", from: "Mindy"] }
end
benchmark.report "Struct" do
MAX.times { StructDemo[to: "Mork", from: "Mindy"] }
end
benchmark.compare!
end
If you save the above script and run locally, you’ll get the following results:
ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +YJIT +PRISM [arm64-darwin24.2.0] Warming up -------------------------------------- Data 1.000 i/100ms Struct 1.000 i/100ms Calculating ------------------------------------- Data 6.321 (± 0.0%) i/s (158.21 ms/i) - 32.000 in 5.066667s Struct 6.143 (± 0.0%) i/s (162.80 ms/i) - 31.000 in 5.055805s Comparison: Data: 6.3 i/s Struct: 6.1 i/s - 1.03x slower
💡 If you’d like more benchmarks, check out my Struct article and/or Benchmarks project for further details.
Conclusion
Data objects are minimalistic and powerful in that they are only meant to hold a collection of data values which can’t be mutated. I’ve wanted an immutable value object in Ruby for some time but I’m also concerned about the design and precedence this new Data
object has introduced into the language. That said, if you find yourself reaching for a struct, consider using a data object instead especially if you only need the immutable encapsulation of raw values.