[ planet-factor ]

John Benediktsson: Game of Life

One of my first and most memorable graphical programs was implementing John Conway’s Game of Life. At the time, that implementation was as a Java applet. I’ve revisited it periodically in different programming languages including several years ago when I started to implement the Game of Life in Factor – something I’ve always wanted to write about.

The Game of Life is a two-dimensional grid of square cells with fairly simple logic. Each cell can be either live or dead. Each cell interacts with its eight neighboring cells with the following rules determining the next state of the game board:

Any live cell with fewer than two live neighbours dies, as if by underpopulation.
Any live cell with two or three live neighbours lives on to the next generation.
Any live cell with more than three live neighbours dies, as if by overpopulation.
Any dead cell with exactly three live neighbours becomes a live cell, as if by reproduction.

You can run this in any release since Factor 0.98:

IN: scratchpad "game-of-life" run

And it will look something like this:

Let’s go ahead and build it!

Game Logic

We will model our two-dimensional game board as an array of arrays. And in particular, since each cell has only two states, we will use bit-arrays to reduce the memory requirements by efficiently storing the state, one bit for each cell.

: <grid> ( rows cols -- grid )
    '[ _ <bit-array> ] replicate ;

: grid-dim ( grid -- rows cols )
    [ length ] [ first length ] bi ;

Making a random grid, which is useful in testing:

: random-grid ( rows cols -- grid )
    '[ _ { t f } ?{ } randoms-as ] replicate ;

And a word we can use for debugging, to print a grid out:

: grid. ( grid -- )
    [ [ CHAR: # CHAR: . ? ] "" map-as print ] each ;

Some implementations choose to make the game boards infinite, but we are instead going to build a wraparound game board. This allows, for example, a glider shape to fly off the bottom right and then re-appear on the top left of the board, which is a lot more fun to watch.

A useful word calculates adjacent indices for a cell – that wrap at a max value of rows or columns:

:: adjacent-indices ( n max -- n-1 n n+1 )
    n [ max ] when-zero 1 -
    n
    n 1 + dup max = [ drop 0 ] when ;

Test it out, showing how it might work in a hypothetical 10 x 10 grid:

! in the middle
IN: scratchpad 3 10 adjacent-indices 3array .
{ 2 3 4 }

! at the start, wrapped around
IN: scratchpad 0 10 adjacent-indices 3array .
{ 9 0 1 }

! at the end, wrapped around
IN: scratchpad 9 10 adjacent-indices 3array .
{ 8 9 0 }

The main game logic requires counting neighbors for each cell. Since each cell can have 8 neighbors, we can store this count in a half-byte – a nibble – which can hold the values [0..15]. In the batteries-included standard library, we have a nibble-arrays vocabulary that makes this easy.

The simplest implementation would just iterate across the game board, and for each cell that is live, increment the count for the neighboring indices around it:

:: count-neighbors ( grid -- counts )
    grid grid-dim :> ( rows cols )
    rows [ cols <nibble-array> ] replicate :> neighbors
    grid [| row j |
        j rows adjacent-indices
        [ neighbors nth ] tri@ :> ( above same below )

        row [| cell i |
            cell [
                i cols adjacent-indices
                [ [ above [ 1 + ] change-nth ] tri@ ]
                [ nip [ same [ 1 + ] change-nth ] bi@ ]
                [ [ below [ 1 + ] change-nth ] tri@ ]
                3tri
            ] when
        ] each-index
    ] each-index neighbors ;

Then the last piece of game logic we need is to adjust the grid cells according to the rules – making some transition from live to dead, and others from dead to live based on their state and the neighboring counts.

:: next-step ( grid -- )
    grid count-neighbors :> neighbors
    grid [| row j |
        j neighbors nth :> neighbor-row
        row [| cell i |
            i neighbor-row nth
            cell [
                2 3 between? i row set-nth
            ] [
                3 = [ t i row set-nth ] when
            ] if
        ] each-index
    ] each-index ;

Before we move on to creating a graphical user interface for the game, let’s try it out in the Factor listener:

! Create a random 10x10 grid
IN: scratchpad 10 10 random-grid

! Print it out
IN: scratchpad dup grid.
#..#..#.##
##....####
..###.####
.##...#..#
.##....###
..###..#.#
...###.#..
.###....##
#...###.##
.##..#.#..

! Compute the neighbors for each cell
IN: scratchpad dup count-neighbors .
{
    N{ 5 5 4 1 2 3 4 6 5 5 }
    N{ 5 3 4 4 3 4 4 7 7 7 }
    N{ 6 5 4 3 1 4 4 6 6 5 }
    N{ 5 4 5 5 2 3 3 6 7 4 }
    N{ 5 4 5 5 2 2 3 3 5 3 }
    N{ 3 3 4 5 4 3 4 3 6 2 }
    N{ 3 3 6 6 5 2 3 2 5 3 }
    N{ 4 2 3 4 6 5 4 4 4 4 }
    N{ 4 5 5 4 3 3 3 4 4 4 }
    N{ 5 3 2 3 4 4 5 4 5 6 }
}

! Compute the next generation
IN: scratchpad dup next-step

! Print it out
IN: scratchpad dup grid.
.....#....
.#..#.....
...#......
.....##...
......##.#
##...#.#.#
##...###.#
.##.......
....###...
.###......

It works!

Game Interface

In Factor, one of the ways we can build user interfaces is using gadgets and OpenGL rendering instructions. We start by modeling our game as a gadget with a grid object, a size that specifies the rendered pixels-per-cell, and a timer to control the speed of repainting new generations.

TUPLE: grid-gadget < gadget grid size timer ;

Our default gadget will have cells that are 20 pixels square, and repaint 10 times per second:

: <grid-gadget> ( grid -- gadget )
    grid-gadget new
        swap >>grid
        20 >>size
        dup '[ _ dup grid>> next-step relayout-1 ]
        f 1/10 seconds <timer> >>timer ;

Gadgets are grafted onto the render hierarchy, and then later ungrafted when they are removed. We handle that state change by stopping the timer before delegating to the parent to cleanup further:

M: grid-gadget ungraft*
    [ timer>> stop-timer ] [ call-next-method ] bi ;

The default dimension for our gadget is the grid dimension times the pixel size:

M: grid-gadget pref-dim*
    [ grid>> grid-dim swap ] [ size>> '[ _ * ] bi@ 2array ] bi ;

If the grid size changes – for example, by using the mouse scroll wheel to zoom in or out – we can create and store a new grid, keeping the cells that are visible in the same state they were in:

:: update-grid ( gadget -- )
    gadget dim>> first2 :> ( w h )
    gadget size>> :> size
    h w [ size /i ] bi@ :> ( new-rows new-cols )
    gadget grid>> :> grid
    grid grid-dim :> ( rows cols )
    rows new-rows = not cols new-cols = not or [
        new-rows new-cols <grid> :> new-grid
        rows new-rows min [| j |
            cols new-cols min [| i |
                i j grid nth nth
                i j new-grid nth set-nth
            ] each-integer
        ] each-integer
        new-grid gadget grid<<
    ] when ;

We can draw the cells that are live as black squares:

:: draw-cells ( gadget -- )
    COLOR: black gl-color
    gadget size>> :> size
    gadget grid>> [| row j |
        row [| cell i |
            cell [
                i j [ size * ] bi@ 2array { size size } gl-fill-rect
            ] when
        ] each-index
    ] each-index ;

And then draw the gray lines that define the grid of cells:

:: draw-lines ( gadget -- )
    gadget size>> :> size
    gadget grid>> grid-dim :> ( rows cols )
    COLOR: gray gl-color
    cols rows [ size * ] bi@ :> ( w h )
    rows 1 + [| j |
        j size * :> y
        { 0 y } { w y } gl-line
    ] each-integer
    cols 1 + [| i |
        i size * :> x
        { x 0 } { x h } gl-line
    ] each-integer ;

Putting this together, we draw our gadget by updating the grid, drawing the cells, and drawing the lines:

M: grid-gadget draw-gadget*
    [ update-grid ] [ draw-cells ] [ draw-lines ] tri ;

And, with the “visual REPL”, you can directly render the grid gadget, to see it work:

We now need to build the interactive parts. Let’s first start by handling a click, to toggle the state of a cell, and storing which state it was toggled to in the last-click variable:

SYMBOL: last-click

:: on-click ( gadget -- )
    gadget grid>> :> grid
    gadget size>> :> size
    grid grid-dim :> ( rows cols )
    gadget hand-rel first2 [ size /i ] bi@ :> ( i j )
    i 0 cols 1 - between?
    j 0 rows 1 - between? and [
        i j grid nth
        [ not dup last-click set ] change-nth
    ] when gadget relayout-1 ;

That allows us to build a drag feature, where as we drag, we continue to either set cells to live or dead according to what the first click was doing:

:: on-drag ( gadget -- )
    gadget grid>> :> grid
    gadget size>> :> size
    grid grid-dim :> ( rows cols )
    gadget hand-rel first2 [ size /i ] bi@ :> ( i j )
    i 0 cols 1 - between?
    j 0 rows 1 - between? and [
        last-click get i j
        grid nth set-nth
        gadget relayout-1
    ] when ;

We implement a scrolling feature to adjust the size of the rendered cells:

: on-scroll ( gadget -- )
    [
        scroll-direction get second {
            { [ dup 0 > ] [ -2 ] }
            { [ dup 0 < ] [ 2 ] }
            [ 0 ]
        } cond nip + 4 30 clamp
    ] change-size relayout-1 ;

And we store these as "gestures" that are supported by the gadget:

grid-gadget "gestures" [
    {
        { T{ button-down { # 1 } } [ on-click ] }
        { T{ drag { # 1 } } [ on-drag ] }
        { mouse-scroll [ on-scroll ] }
    } assoc-union
] change-word-prop

The last bit we need is to make the toolbar, which has a few commands we can run:

:: com-play ( gadget -- )
    gadget timer>> restart-timer ;

:: com-stop ( gadget -- )
    gadget timer>> stop-timer ;

:: com-clear ( gadget -- )
    gadget dup grid>> [ clear-bits ] each relayout-1 ;

:: com-random ( gadget -- )
    gadget dup grid>> [ [ drop { t f } random ] map! drop ] each relayout-1 ;

:: com-glider ( gadget -- )
    gadget dup grid>> :> grid
    { { 2 1 } { 3 2 } { 1 3 } { 2 3 } { 3 3 } }
    [ grid nth t -rot set-nth ] assoc-each relayout-1 ;

:: com-step ( gadget -- )
    gadget dup grid>> next-step relayout-1 ;

And then store these as the "toolbar" command map:

grid-gadget "toolbar" f {
    { T{ key-down { sym "1" } } com-play }
    { T{ key-down { sym "2" } } com-stop }
    { T{ key-down { sym "3" } } com-clear }
    { T{ key-down { sym "4" } } com-random }
    { T{ key-down { sym "5" } } com-glider }
    { T{ key-down { sym "6" } } com-step }
} define-command-map

And finally, we can wrap the grid gadget with something that makes a toolbar, and creates a main window when launched:

TUPLE: life-gadget < track ;

: <life-gadget> ( -- gadget )
    vertical life-gadget new-track
    20 20 make-grid <grid-gadget>
    [ <toolbar> format-toolbar f track-add ]
    [ 1 track-add ] bi ;

M: life-gadget focusable-child* children>> second ;

MAIN-WINDOW: life-window
    { { title "Game of Life" } }
    <life-gadget> >>gadgets ;

As with anything, there are probably things we could continue to improve in our UI framework, but one of the biggest missing pieces are examples of working code, which is largely what motivated writing about this today.

Check it out!

And maybe think about how you might adjust it to be an infinite game board, or to increase performance when computing the next generation, to improve the OpenGL rendering logic, persist the game board between launches, or do things like communicate age of each cell by the color that it is rendered with.

Tue, 27 May 2025 15:00:00

John Benediktsson: Sorting IPv6

Recently, Chris Siebenmann was lamenting the lack of a good command line way to sort IPv6 addresses. This followed a post of his a few years ago about how sort -V can easily sort IPv4 addresses. Since I had some fun talking about sorting Roman numerals recently – and we have an extensive standard library – I thought I’d talk about how you might solve this problem with Factor.

As a reminder, IPv6 uses a 128-bit address space with more theoretical addresses than the older – but still quite commonly used – IPv4 32-bit address space.

The internal network address of your computer is sometimes referred to as localhost or a loopback address, and represented as 127.0.0.1 in IPv4, or ::1 in IPv6. We have an ip-parser vocabulary with words for parsing and manipulating IP addresses as well as IP network strings written in CIDR notation. We can use these words to show how to translate these addresses to their byte representation:

IN: scratchpad "127.0.0.1" parse-ipv4 .
B{ 127 0 0 1 }

IN: scratchpad "::1" parse-ipv6 .
{ 0 0 0 0 0 0 0 1 }

And, we could use that to sort a list of addresses pretty easily:

IN: scratchpad {
                   "127.0.0.1"
                   "1.1.1.1"
                   "8.8.8.8"
                   "192.168.10.40"
               } [ parse-ipv4 ] sort-by .
{
    "1.1.1.1"
    "8.8.8.8"
    "127.0.0.1"
    "192.168.10.40"
}

IN: scratchpad {
                   "2620:0:1cfe:face:b00c::3"
                   "2001:4860:4860::8844"
                   "2620:0:ccc::2"
                   "::1"
                   "2001:4860:4860::8888"
               } [ parse-ipv6 ] sort-by .
{
    "::1"
    "2001:4860:4860::8844"
    "2001:4860:4860::8888"
    "2620:0:ccc::2"
    "2620:0:1cfe:face:b00c::3"
}

And so, now that some great feedback encouraged us to do command-line eval with auto-use? enabled, we can run this easily as a one-line script:

# make a file full of unsorted IPv6 addresses
$ cat <<EOF > ips.txt
2620:0:1cfe:face:b00c::3
2001:4860:4860::8844
2620:0:ccc::2
::1
2001:4860:4860::8888
EOF

# show that you can parse the file as strings
$ cat ips.txt | ./factor -e="read-lines ."
{
    "2620:0:1cfe:face:b00c::3"
    "2001:4860:4860::8844"
    "2620:0:ccc::2"
    "::1"
    "2001:4860:4860::8888"
}

# sort and print the sorted output
$ cat ips.txt | ./factor -e="read-lines [ parse-ipv6 ] sort-by [ print ] each"
::1
2001:4860:4860::8844
2001:4860:4860::8888
2620:0:ccc::2
2620:0:1cfe:face:b00c::3

Pretty cool!

Fri, 23 May 2025 15:00:00

John Benediktsson: Faster Leap Year?

Last week, Falk Hüffner wrote about making a leap year check in three instructions:

With the following code, we can check whether a year 0 ≤ y ≤ 102499 is a leap year with only about 3 CPU instructions:
bool is_leap_year_fast(uint32_t y) {
    return ((y * 1073750999) & 3221352463) <= 126976;
}
How does this work? The answer is surprisingly complex. This article explains it, mostly to have some fun with bit-twiddling; at the end, I’ll briefly discuss the practical use.

This is how a leap year check is typically implemented:
bool is_leap_year(uint32_t y) {
    if ((y % 4) != 0) return false;
    if ((y % 100) != 0) return true;
    if ((y % 400) == 0) return true;
    return false;
}

It would be fun to see how that works in Factor and compare the relative performance between a simple version and the new super-fast-highly-optimized 3 instruction version. To do that, we can use the benchmark word to record execution time by calling it repeatedly and returning an average time-per-call in nanoseconds:

TYPED: average-benchmark ( n: fixnum quot -- nanos-per-call )
    over [ '[ _ _ times ] benchmark ] dip /f ; inline

Note: We are forcing the iteration loop above to be fixnum to reduce its overhead, and due to the design of the benchmark words below, are going to have code blocks with predictable inputs. Testing your program with random inputs is also important to see the impact of CPU optimizations such as cache and branch predictions, or across multiple CPU architectures. Performance is also impacted by use of code generation features such as inline and compiler steps such as dead-code elimination. Benchmarking is hard.

Simple implementation

The simple – and typical – implementation can be easily written as:

: leap-year? ( year -- ? )
    dup 100 divisor? 400 4 ? divisor? ;

And in fact, that’s how it is implemented in the standard library.

We can write a quick benchmarking word. This ensures we are using the optimizing compiler and also asserts that the result of the word is as expected:

: bench-leap-year ( n year ? -- nanos )
     '[ _ leap-year? _ assert= ] average-benchmark ;

And then call it one hundred million times, to see how long it takes each call on average:

IN: scratchpad 100,000,000 2028 t bench-leap-year .
10.53904317

Just under 11 nanoseconds, including the loop and the assert…

Fast implementation

The fast implementation suggested by Falk can be written directly as:

: fast-leap-year? ( year -- ? )
    1073750999 * 3221352463 bitand 126976 <= ;

And then write a benchmarking word:

: bench-fast-leap-year ( n year ? -- nanos )
     '[ _ fast-leap-year? _ assert= ] average-benchmark ;

And see how long it takes:

IN: scratchpad 100,000,000 2028 t bench-fast-leap-year .
4.74783302

Just under 5 nanoseconds…

Faster implementation

Well, generally Factor supports arbitrarily large integers by allowing integers to implicitly promote from word-sized fixnum to overflow into bignum. And, as they say, you can write the C programming language in any language.

A faster implementation might check the input is a fixnum and then force math without overflow:

TYPED: faster-leap-year? ( year: fixnum -- ? )
    1073750999 fixnum*fast 3221352463 fixnum-bitand 126976 fixnum<= ;

And write a benchmark word:

: bench-faster-leap-year ( n year ? -- nanos )
     '[ _ faster-leap-year? _ assert= ] average-benchmark ;

It’s a bit faster:

IN: scratchpad 100,000,000 2028 t bench-faster-leap-year .
3.24267145

Just under 4 nanoseconds…

Fastest implementation

But, to make sure that we take advantage of the least amount of instructions possible, we can make it slightly-less-safe by declaring the input to be a fixnum to avoid the run-time type checks. This could cause issues if it is called with other types on the stack.

: fastest-leap-year? ( year -- ? )
    { fixnum } declare
    1073750999 fixnum*fast 3221352463 fixnum-bitand 126976 fixnum<= ;

And write a benchmark word:

: bench-fastest-leap-year ( n year ? -- nanos )
     '[ _ fastest-leap-year? _ assert= ] average-benchmark ;

And then you can see it gets quite fast indeed:

IN: scratchpad 100,000,000 2028 t bench-fastest-leap-year .
2.82150476

Just under 3 nanoseconds!

But, is it also just 3 instructions?

IN: scratchpad \ fastest-leap-year? disassemble
000075f0afa19490: 89056a5bd1fe          mov [rip-0x12ea496], eax
000075f0afa19496: 498b06                mov rax, [r14]
000075f0afa19499: 48c1f804              sar rax, 0x4
000075f0afa1949d: 4869c0d7230040        imul rax, rax, 0x400023d7
000075f0afa194a4: bb0ff001c0            mov ebx, 0xc001f00f
000075f0afa194a9: 4821d8                and rax, rbx
000075f0afa194ac: 4881f800f00100        cmp rax, 0x1f000
000075f0afa194b3: b801000000            mov eax, 0x1
000075f0afa194b8: 48bb5c0e388cf0750000  mov rbx, 0x75f08c380e5c
000075f0afa194c2: 480f4ec3              cmovle rax, rbx
000075f0afa194c6: 498906                mov [r14], rax
000075f0afa194c9: 8905315bd1fe          mov [rip-0x12ea4cf], eax
000075f0afa194cf: c3                    ret

Pretty close!

There is an extra instruction near the beginning to untag our fixnum input. Due to the convention around handling booleans in Factor, there are a couple of extra instructions at the end for converting the result into a return value of either t or f.

And it could get even faster if either the assert= was removed, or the code was made inline so the function prologue and epilogue could be elided into the outer scope.

So much fun.

Wed, 21 May 2025 15:00:00

John Benediktsson: Raylib

Raylib is a very neat C library that has become popular as a “simple and easy-to-use library to enjoy videogames programming”. Originally released in 2014, it has seen lots of updates including with the latest version 5.5 representing 11 years of updates since the original version 1.0 release.

You can sense the love this library has received from the extensive feature list:

NO external dependencies, all required libraries included with raylib

Multiplatform: Windows, Linux, MacOS, RPI, Android, HTML5… and more!

Written in plain C code (C99) using PascalCase/camelCase notation

Hardware accelerated with OpenGL (1.1, 2.1, 3.3, 4.3 or ES 2.0)

Unique OpenGL abstraction layer: rlgl

Powerful Fonts module (SpriteFonts, BMfonts, TTF, SDF)

Multiple texture formats support, including compressed formats (DXT, ETC, ASTC)

Full 3d support for 3d Shapes, Models, Billboards, Heightmaps and more!

Flexible Materials system, supporting classic maps and PBR maps

Animated 3d models supported (skeletal bones animation)

Shaders support, including Model shaders and Postprocessing shaders

Powerful math module for Vector, Matrix and Quaternion operations: raymath

Audio loading and playing with streaming support (WAV, OGG, MP3, FLAC, XM, MOD)

VR stereo rendering support with configurable HMD device parameters

Huge examples collection with +120 code examples!

Bindings to +60 programming languages!

Free and open source. Check [LICENSE].

In 2019, we first included support for version 2.5 of raylib. Over the years, we have updated our bindings to include new functions and features of new versions. Our most recent two releases, Factor 0.99 and Factor 0.100 included support for version 4.5. And in the current development cycle, we have updated to version 5.0 and then again updated to version 5.5.

It is possible to maintain support for multiple versions, but we have chosen for now to just target the most recent stable release as best we can. Generally, the raylib library has been quite stable, with the occasional deprecation or breaking change which seem to be easy to adjust for.

As a simple demonstration, we can start with the basic window example – an extremely simple program in the spirit of achieving a black triangle moment – to make sure everything works. We can directly translate this example into Factor using our Raylib bindings – which import the C functions using our convention for kebab-case word names.

USE: raylib

: basic-window ( -- )

    800 450 "raylib [core] example - basic window" init-window

    60 set-target-fps

    [ window-should-close ] [

        begin-drawing

        RAYWHITE clear-background

        "Congrats! You created your first window!"
        190 200 20 LIGHTGRAY draw-text

        end-drawing

    ] until close-window ;

And, if you run the example – you’ll end up with this:

Note: if you’re wondering why the background isn’t white, it’s because RAYWHITE is a special version of not-quite-white that matches the Raylib logo.

Of course, your examples can become more complex. For example, Joseph Oziel released a fun game called BitGuessr written in Factor using Raylib. And, given the extensive feature list, your simple program written quickly for a game jam might end up including great audio, images, keyboard, mouse, gamepad, 3d rendering, and other features relatively easily!

I’d love to see more demos written in Factor and Raylib. Happy coding!

Tue, 20 May 2025 15:00:00

John Benediktsson: Shuffle Syntax

Some might describe shuffle words as one of the fundamental building blocks of Factor. Others might describe them as a code smell and seek to use dataflow combinators or other higher-level words to reduce code complexity.

Whatever your opinion is, they are useful concepts in a concatenative language. Besides the basic shuffle words – like dup, swap, rot – we have had the shuffle vocabulary which provides some “additional shuffle words” for awhile, as well as a syntax word that can perform arbitrary shuffles:

IN: scratchpad USE: shuffle

IN: scratchpad { 1 2 3 4 5 } [
                   shuffle( a b c d e -- d e c a b )
               ] with-datastack .
{ 4 5 3 1 2 }

This would be quite useful, except that it has had a fundamental issue – the way it is implemented uses a macro to curry the stack arguments into an array, and then pull the stack arguments back out of the array onto the stack in the requested order.

For example, we can look at a simple swap and a complex shuffle:

IN: scratchpad [ shuffle( x y -- y x ) ] optimized.
[
    2 0 <array> dup >R >R R> 3 set-slot
    R> >R R> dup >R >R R> 2 set-slot
    R> >R R> dup >R >R R> >R R> 3 slot R> >R R> >R R> 2 slot
]

IN: scratchpad [ shuffle( a b c d e -- b a d c e ) ] optimized.
[
    5 0 <array> dup >R >R R> 6 set-slot
    R> >R R> dup >R >R R> 5 set-slot
    R> >R R> dup >R >R R> 4 set-slot
    R> >R R> dup >R >R R> 3 set-slot
    R> >R R> dup >R >R R> 2 set-slot
    R> >R R> dup >R >R R> >R R> 3 slot
    R> dup >R >R R> >R R> 2 slot R> dup >R >R R> >R R> 5 slot
    R> dup >R >R R> >R R> 4 slot R> >R R> >R R> 6 slot
]

And not only would this be less-than-efficient, it would also turn literal arguments that were on the stack into run-time arguments and potentially cause a Cannot apply 'call' to a run-time computed value error if one of the shuffled arguments is a quotation they hope to use.

This bug was described on our issue tracker and I spent some time recently looking into it.

It turns out that we can use the stack checker to indicate that a shuffle is taking place, and use some "special" machinery to allow the optimizing compiler to generate efficient and correct code for these arbitrary shuffles.

After applying a small fix, we can see that the earlier examples are now quite simple:

IN: scratchpad [ shuffle( x y -- y x ) ] optimized.
[ swap ]

IN: scratchpad [ shuffle( a b c d e -- b a d c e ) ] optimized.
[
    ( 10791205 10791206 10791207 10791208 10791209 -- 10791206 10791205 10791208 10791207 10791209 )
]

This is available in the latest development version.

Sun, 18 May 2025 15:00:00

John Benediktsson: Even More Brainf*ck

I was distracted a little by some recent explorations building a Brainfuck interpreter in Factor and had a couple of follow-ups to add to the conversation.

First, I realized my initial quick-and-dirty Brainfuck interpreter didn’t support nested loops. Specifically, the logic for beginning or ending a loop just searched for the nearest [ or ] character without considering nesting. This was fixed today so that will no longer be an issue.

Second, despite the Brainfuck compiler implicitly making an AST (abstract syntax tree) for Brainfuck by virtue of generating a quotation, I thought it would be more fun to build and generate one intentionally.

We can model the Brainfuck commands as operations using the following tuples and singletons:

TUPLE: ptr n ;
TUPLE: mem n ;
SINGLETONS: output input debug ;
TUPLE: loop ops ;

Next, we can build a parser using EBNF to convert the textual commands into our Brainfuck AST:

EBNF: ast-brainfuck [=[

inc-ptr  = (">")+     => [[ length ptr boa ]]
dec-ptr  = ("<")+     => [[ length neg ptr boa ]]
inc-mem  = ("+")+     => [[ length mem boa ]]
dec-mem  = ("-")+     => [[ length neg mem boa ]]
output   = "."        => [[ output ]]
input    = ","        => [[ input ]]
debug    = "#"        => [[ debug ]]
space    = [ \t\n\r]+ => [[ f ]]
unknown  = (.)        => [[ "Invalid input" throw ]]

ops   = inc-ptr|dec-ptr|inc-mem|dec-mem|output|input|debug|space
loop  = "[" {loop|ops}+ "]" => [[ second sift loop boa ]]

code  = (loop|ops|unknown)* => [[ sift ]]

]=]

This is interesting, because now we can more easily analyze a piece of Brainfuck code, such as the Hello, World example that I have been frequently using:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " ast-brainfuck .
V{
    T{ mem { n 10 } }
    T{ loop
        { ops
            V{
                T{ ptr { n 1 } }
                T{ mem { n 7 } }
                T{ ptr { n 1 } }
                T{ mem { n 10 } }
                T{ ptr { n 1 } }
                T{ mem { n 3 } }
                T{ ptr { n 1 } }
                T{ mem { n 1 } }
                T{ ptr { n -4 } }
                T{ mem { n -1 } }
            }
        }
    }
    T{ ptr { n 1 } }
    T{ mem { n 2 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 1 } }
    output
    T{ mem { n 7 } }
    output
    output
    T{ mem { n 3 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 2 } }
    output
    T{ ptr { n -2 } }
    T{ mem { n 15 } }
    output
    T{ ptr { n 1 } }
    output
    T{ mem { n 3 } }
    output
    T{ mem { n -6 } }
    output
    T{ mem { n -8 } }
    output
    T{ ptr { n 1 } }
    T{ mem { n 1 } }
    output
    T{ ptr { n 1 } }
    output
}

And then we can implement those operations against a brainfuck state object, by deferring to words from our current implementation:

GENERIC: op ( brainfuck op -- brainfuck )
M: ptr op n>> (>) ;
M: mem op n>> (+) ;
M: output op drop (.) ;
M: input op drop (,) ;
M: debug op drop (#) ;
M: loop op [ get-memory zero? ] swap ops>> '[ _ [ op ] each ] until ;

And now this Brainfuck AST represents a hybrid execution model somewhere between the compiled and interpreted versions:

: hybrid-brainfuck ( code -- )
    [ <brainfuck> ] dip ast-brainfuck [ op ] each drop ;

And see that it works:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " hybrid-brainfuck
Hello World!

We also gain some potential for building code optimization techniques that operate on an AST as a step before actual compilation or execution – for example, coalescing adjacent increment and decrement operations or some other more complex analysis.

That, however, is likely to remain an exercise for the reader!

Sat, 17 May 2025 15:00:00

John Benediktsson: More Brainf*ck

Almost 16 years ago, I wrote about implementing the Brainfuck programming language in Factor. It is a curious programming language, sometimes considered one of the most famous esoteric programming languages.

In any event – and encouraged by a question I was asked recently – I spent some time thinking about the current process of “compiling” the Brainfuck into quotations versus how an interpreter might work instead.

As a quick reminder, our current implementation expands a program written in Brainfuck into an equivalent form in Factor, and then allows it to be run:

IN: scratchpad USE: brainfuck

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " run-brainfuck
Hello World!


IN: scratchpad [
                   "
                   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
                   >++.>+.+++++++..+++.>++.<<+++++++++++++++
                   .>.+++.------.--------.>+.>.
                   " run-brainfuck
               ] expand-macros .
[
    <brainfuck> 10 (+)
    [ get-memory zero? ] [
        1 (>) 7 (+) 1 (>) 10 (+) 1 (>) 3 (+) 1 (>) 1 (+) 4 (<)
        1 (-)
    ] until 1 (>) 2 (+) (.) 1 (>) 1 (+) (.) 7 (+) (.) (.) 3 (+)
    (.) 1 (>) 2 (+) (.) 2 (<) 15 (+) (.) 1 (>) (.) 3 (+)
    (.) 6 (-) (.) 8 (-) (.) 1 (>) 1 (+) (.) 1 (>) (.) drop flush
]

Notice the coalescing that it performs to collapse multiple identical operators into a single action.

But, this got me curious about a couple of things:

How could we cleanly write a Factor word that is implemented in Brainfuck?
How could we write a Factor interpreter for Brainfuck, and what benefits would it have?

Building a syntax word

Thankfully, the answer to the first question is simple using parsing words.

SYNTAX: BRAINFUCK:
    scan-new-word ";" parse-tokens concat
    '[ _ run-brainfuck ] ( -- ) define-declared ;

Now, we can define a Hello, World, complete with inline comments:

BRAINFUCK: hello
    +++++ +++               ! Set Cell #0 to 8
    [
        >++++               ! Add 4 to Cell #1; this will always set Cell #1 to 4
        [                   ! as the cell will be cleared by the loop
            >++             ! Add 4*2 to Cell #2
            >+++            ! Add 4*3 to Cell #3
            >+++            ! Add 4*3 to Cell #4
            >+              ! Add 4 to Cell #5
            <<<<-           ! Decrement the loop counter in Cell #1
        ]                   ! Loop till Cell #1 is zero
        >+                  ! Add 1 to Cell #2
        >+                  ! Add 1 to Cell #3
        >-                  ! Subtract 1 from Cell #4
        >>+                 ! Add 1 to Cell #6
        [<]                 ! Move back to the first zero cell you find; this will
                            ! be Cell #1 which was cleared by the previous loop
        <-                  ! Decrement the loop Counter in Cell #0
    ]                       ! Loop till Cell #0 is zero

    ! The result of this is:
    ! Cell No :   0   1   2   3   4   5   6
    ! Contents:   0   0  72 104  88  32   8
    ! Pointer :   ^

    >>.                     ! Cell #2 has value 72 which is 'H'
    >---.                   ! Subtract 3 from Cell #3 to get 101 which is 'e'
    +++++ ++..+++.          ! Likewise for 'llo' from Cell #3
    >>.                     ! Cell #5 is 32 for the space
    <-.                     ! Subtract 1 from Cell #4 for 87 to give a 'W'
    <.                      ! Cell #3 was set to 'o' from the end of 'Hello'
    +++.----- -.----- ---.  ! Cell #3 for 'rl' and 'd'
    >>+.                    ! Add 1 to Cell #5 gives us an exclamation point
    >++.                    ! And finally a newline from Cell #6
    ;

And, it works!

IN: scratchpad hello
Hello World!

Note: we are using ! as a comment character which is the convention in Factor. Some Brainfuck implementations use that character to indicate embedded program inputs.

That’s pretty cool, and a neat example of using the lexer, the parser, and macros.

Building an interpreter

The answer to the second question might be more complex and nuanced, but thankfully we can re-use some of the current implementation to make a quick-and-dirty interpreter:

: end-loop ( str i -- str j/f )
    CHAR: ] swap pick index-from dup [ 1 + ] when ;

: start-loop ( str i -- str j/f )
    1 - CHAR: [ swap pick last-index-from dup [ 1 + ] when ;

: interpret-brainfuck-from ( str i brainfuck -- str next/f brainfuck )
    2over swap ?nth [ 1 + ] 2dip {
        { CHAR: > [ 1 (>) ] }
        { CHAR: < [ 1 (<) ] }
        { CHAR: + [ 1 (+) ] }
        { CHAR: - [ 1 (-) ] }
        { CHAR: . [ (.) ] }
        { CHAR: , [ (,) ] }
        { CHAR: # [ (#) ] }
        { CHAR: [ [ get-memory zero? [ [ end-loop ] dip ] when ] }
        { CHAR: ] [ get-memory zero? [ [ start-loop ] dip ] unless ] }
        { f [ [ drop f ] dip ] }
        [ blank? [ "Invalid input" throw ] unless ]
    } case ;

: interpret-brainfuck ( str -- )
    0 <brainfuck> [ interpret-brainfuck-from over ] loop 3drop ;

And give it a try:

IN: scratchpad "
               ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
               >++.>+.+++++++..+++.>++.<<+++++++++++++++
               .>.+++.------.--------.>+.>.
               " interpret-brainfuck
Hello, World!

It works! But now I’m curious about relative performance. Let’s build a silly benchmark equivalent to cat to redirect the contents of an input-stream to an output-stream. We can compare our compiled run-brainfuck macro to a version that uses the interpreter we made above and then to a native version implemented using stream-copy.

: cat1 ( -- ) ",[.,]" run-brainfuck ;

: cat2 ( -- ) ",[.,]" interpret-brainfuck ;

: cat3 ( -- ) input-stream get output-stream get stream-copy* ;

First, we should make sure they both work:

IN: scratchpad "Compiled" [ cat1 ] with-string-reader
Compiled

IN: scratchpad "Interpreted" [ cat2 ] with-string-reader
Interpreted

IN: scratchpad "Factor" [ cat3 ] with-string-reader
Factor

Okay, so it seems to work!

For quick performance testing, lets compare them outputting to a null stream:

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat1 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.059820291 seconds

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat2 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.13840325 seconds

IN: scratchpad [
                   1,000,000 CHAR: a <string> [
                       [ cat3 ] with-null-writer
                   ] with-string-reader
               ] time
Running time: 0.015008417 seconds

The compiled one is a bit more than 2x faster than the interpreted version, but both are slower than the native version.

Let’s try comparing our “Hello, World” example – where the operator coalescing that the compiled version does might help:

: hello1 ( -- )
   "
   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
   >++.>+.+++++++..+++.>++.<<+++++++++++++++
   .>.+++.------.--------.>+.>.
   " run-brainfuck ;

: hello2 ( -- )
   "
   ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
   >++.>+.+++++++..+++.>++.<<+++++++++++++++
   .>.+++.------.--------.>+.>.
   " interpret-brainfuck ;

We can see that the compiled one is now more like 7x faster:

IN: scratchpad [ [ 10,000 [ hello1 ] times ] with-null-writer ] time
Running time: 0.018075292 seconds

IN: scratchpad [ [ 10,000 [ hello2 ] times ] with-null-writer ] time
Running time: 0.133718416 seconds

Obviously, this is somewhat comparing apples and oranges because it’s ignoring compilation time in the comparison, and I didn’t spend any time on optimizing the interpreted version – for example, stripping blanks or validating inputs before doing the interpreter loop – but it’s a useful starting point for understanding tradeoffs.

How long does it currently take to compile?

IN: scratchpad [
                    [
                       gensym [
                           "
                           ++++++++++[>+++++++>++++++++++>+++>+<<<<-]
                           >++.>+.+++++++..+++.>++.<<+++++++++++++++
                           .>.+++.------.--------.>+.>.
                           " run-brainfuck
                       ] ( -- ) define-declared
                   ] with-compilation-unit
               ] time
Running time: 0.02294075 seconds

…a bit more time than it takes to call it 10,000 times. Interesting.

Fri, 16 May 2025 15:00:00

John Benediktsson: Roman Sort

Factor has included support for Roman numerals since at least 2008. Sometimes recent events – such as the election of Pope Leo XIV or the Super Bowl LIX – revive modern interest in how they work and how to computationally work with them.

There was a blog post a few days ago about sorting Roman numerals which pointed out that sorting them alphabetically worked pretty well. Given that we have a pretty good roman vocabulary, I thought we could explore different methods of sorting a sequence of strings representing Roman numbers.

Let’s first try the mostly-but-not-entirely-correct method suggested in the blog post:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad sort .
{ "I" "II" "III" "IV" "IX" "V" "VI" "VII" "VIII" }

Well, that’s almost correct, but the number IX – the number 9 – sorts in the middle, rather than at the end. Could we fix this by using sort-by to convert the string to a number before calling compare to produce a sorted output?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That fixes it, but how often do we end up calling the conversion function?

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad SYMBOL: calls

IN: scratchpad [ calls inc roman> ] sort-by .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

IN: scratchpad calls get .
40

Wow, 40 times – that seems like a lot!

Perhaps we could try the Schwartzian transform – also known as decorate-sort-undecorate – at the expense of using intermediate storage by saving the keys for each element, then sorting, and then returning only the values:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ [ roman> ] keep ] map>alist sort-keys values .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

That seems like a lot of code to do a simple thing. Instead, we can use the map-sort abstraction which implements the same method:

IN: scratchpad 10 [1..b) [ >ROMAN ] map

IN: scratchpad [ roman> ] map-sort .
{ "I" "II" "III" "IV" "V" "VI" "VII" "VIII" "IX" }

Does this make much of a difference? Let’s compare each method:

: roman-sort1 ( seq -- sorted-seq ) [ roman> ] sort-by ;

: roman-sort2 ( seq -- sorted-seq ) [ roman> ] map-sort ;

We can time sorting a list of 100,000 random Roman numbers under 1,000:

IN: scratchpad 100,000 1,000 [1..b] randoms [ >ROMAN ] map

IN: scratchpad [ [ roman-sort1 ] time ]
               [ [ roman-sort2 ] time ] bi
Running time: 4.164076625 seconds
Running time: 0.154227583 seconds

You can see that the decorate-sort-undecorate pattern is quite a bit faster in this case. This is not always true, but generally depends on how much work the key function is doing.

Mon, 12 May 2025 15:00:00