MODELING
WORKFLOWS AS
PIPELINES
THE WORKFLOW WE'RE MODELING
workflow "Place Order" = • Note that if the order doesn't validate, we
input: UnvalidatedOrder
output (on success):
report that & we're done.
OrderAcknowledgmentSent
• But if it does validate, then we need to produce
AND OrderPlaced (to send to shipping)
AND BillableOrderPlaced (to send to billing) several things (Including some state changes):
output (on error):
o OrderPlaced event to send to shipping
ValidationError
// step 1 o BillableOrderPlaced event to send to billing
do ValidateOrder
If order is invalid then:
o Acknowledgment to customer
return with ValidationError
// step 2
do PriceOrder
// step 3
do AcknowledgeOrder
// step 4
create and return the events
BREAKING IT DOWN
• Clearly we have a series of substeps—validate order, price order, and so on.
• We'll take the obvious approach: Each step can be modeled as a short pipe, with each pipe doing one
step of the overall process. The smaller pipes are then combined into a long pipe.
• We'll design each step to be stateless and free from side effects, which means they can be considered,
tested, and understood independently.
• Once they're designed, we just need to implement and assemble them.
WORKFLOW INPUTS
• The basic input is an UnvalidatedOrder.
• Another input to the workflow is the PlaceOrder command itself. It should contain everything needed to
complete the process.
o In most applications, we'd also want to track the user ID, a timestamp, and various other metadata, for logging
and auditing later.
o In keeping with our usual practice, we should make it a type:
type PlaceOrder =
OrderForm: UnvalidatedOrder
Timestamp: DateTime
User: UserID
// and so on....
STEPPING UP IN ABSTRACTION
• Of course, this is only one workflow, and we're going to have several.
• So we should use abstraction to make a Command type:
(define (make-command something )
(define (private-command something)
(let
([creation-time (read current time function) ]
[ID (read current user-ID from system])
(list something creation-time ID ))
(λ() (private-command something)))
• The make-command function returns a function (an action) that creates a new Command (everything needed for
that command, including metadata)
• Then we can define
Type PlaceOrder = Command<UnvalidatedOrder>
MULTIPLE COMMANDS IN ONE TYPE
• In some cases, all the commands for a single bounded context will be sent in one channel, for example a
message queue, which might contain a mix of PlaceOrder, ChangeOrder, and CancelOrder actions.
• In an OO model, we'd make an abstract parent class and derive the child types.
• In a functional model, we define a choice type, tag the options, and use separate functions by type—the
'master' function takes the ProcessOrder action and then, based on the tag, passes it along to the
appropriate function.
• Each type would have a command associated with it.
• Then we just need to add a 'dispatch' or 'routing' stage when the input arrives.
MODELING AN ORDER AS A SET OF STATES
• It's clear that an order isn't just a static document; it progresses through a set of states:
o Unprocessed order form --> Either an Unvalidated Order or Unvalidated Quote
o Unvalidated Order --> Either a Validated Order or an Invalid Order
o Validated Order --> Priced Order
• How to model this?
• One approach would be to use a record type with various boolean flags: isValidated, isPriced,
amountToBill, etc.
• But this approach has problems
PROBLEMS WITH A SINGLE-RECORD
APPROACH
• The system clearly has state, indicated by the various flags, but the states are implicit and would require
lots of conditional code in order to be handled.
• Additional code is needed to maintain consistency. What happens if an order is flagged as Unvalidated, but
the boolean indicating it's been priced (implying that it's valid) is also set?
• Some states have data that isn't needed in other states, and putting them all in one record complicates the
design. For example, AmountToBill is only needed in the Priced state, but because it doesn't exist in other
states, we have to mark that field as optional.
• It's not clear which fields go with which flags. AmountToBill should be set when Priced is set, but the
design doesn't enforce it. We have to rely on documentation and programmer discipline to keep things
consistent.
o Design rule: Invalid states should be impossible to represent. Validity and consistency should be unavoidably
enforced at each step.
SEPARATE STATES, SEPARATE TYPES
• So again, we create a new type for each state, to eliminate implicit state and conditional fields
• Types follow directly from the defintions.
o ValidatedOrder = OrderID AND ValidatedCustomerInfo AND ValidatedShippingAddress AND
ValidatedBillingAddress AND list of validatedOrderLine
o PricedOrder = (stuff for ValidatedOrder through ValidatedBillingAddress) AND list of PricedOrderLine AND
AmountToBill
• Then we define a top-level type, Order, which is one of the subtypes.
• This represents the object at any point in its life cycle, and can be sent to storage or sent to other
contexts.
ADDING NEW STATE TYPES AS
REQUIREMENTS CHANGE
• New states can be added without breaking existing code.
• If we need to add support for refunds, we can add a new RefundedOrder state with any information
needed just for that state.
• Other states are defined independently, so code using them won't be affected by the change.
STATE MACHINES AS A BASIC MODELING
DEVICE
• We can view each of the various stages as states
o An email address can transition from an UnverifiedEmailAddress to a VerifiedEmailAddress when the user clicks
a link in an email we send.
o When the user enters a new email address on a web form, it transitions to an UnverifiedEmailAddress
o A shopping cart has states Empty, Active, and Paid, where you transition from Empty to Active by adding an
item to the cart (or transitioning back to Empty by removing the last item), and to the Paid state by paying.
o A package delivery might have multiple states: Undelivered, OutForDelivery, and Delivered; the transition from
Undelivered to OutForDelivery happens when the package is put on the truck. If the "loaded on the truck but
the truck's still parked at the warehouse" state needs to be tracked, a Loaded state can be added.
ADVANTAGES OF STATE MACHINES
• Each state can have different allowable behavior
o Only an Active cart can be paid for, and a Paid cart can't be added to.
o We must not send password-reset emails to UnverifiedEmail addresses.
• By using distinct types for each state, we can encode that directly in the function signature, allowing the compiler to be sure
we're following that business rule at compile time.
• All the states are explicitly documented
o An empty cart has different behavior than an active cart, but that may not be documented in the code
• It's a design tool that forces you to think about every possibility that can occur
o What should happen if we try to verify an already-verified email?
o What should happen if we try to remove an item from an empty cart?
o What should happen if we try to deliver a package that's already in the "Delivered" state?
o And so on. This can clarify the domain logic
IMPLEMENTING STATE MACHINES
• As we've seen, we don’t want to model the state machine with a single monolithic record and a
selection of boolean flags, enums, or conditional logic.
• We make each state have its own type, which stores the data relevant to that state (if any).
• The entire set of states can then be represented by a choice type (a tagged item from the collection).
• Type Item = <whatever....>
Type ActiveCartData = UnpaidItems: list of Item
Type PaidCartData = PaidItems: list of Item; Payment: float (or perhaps a TransactionID)
Type ShoppingCart = (list tag cart-data) where cart-data is one of the above, and tag is one of
'EmptyCart (no data), 'ActiveCart (ActiveCartData), or 'PaidCart (PaidCartData)
• The command handler is then a function that can accept a ShoppingCart of any type, check the tag, and
either handle it directly or (better) dispatch it to the appropriate function.
HANDLING STATE
• So, for example, here's the logic for adding an item to a cart:
• (define (add-item cart item)
(let
([tag (check-tag cart)])
(cond
[(equal? tag 'empty-cart) (create new active-cart with unpaid-items = (list item)]
[(equal? tag 'active-cart) (create new active-cart with (append cart-unpaid-items
(list item))
[(equal? tag 'paid-cart) cart])))
• If cart is empty, return new cart with item on list of unpaid-items
• If cart is active, return new cart with item added to list of unpaid-items
• If cart is paid, return cart unchanged (can't add to a paid cart)
MODELING EACH STEP IN THE WORKFLOW
WITH TYPES
• The Validation Step
o Validating an order takes in an UnvalidatedOrder and returns either a ValidatedOrder or ValidationError. It has 2 dependencies:
CheckProductCodeExists, CheckAddressExists
o We'll assume we've defined the input and output types like we did the others above.
o We've been talking about modeling processes as functions; that's what we'll do with the dependencies here. They're both functions
that take in something and return something—a boolean for CheckProductCodeExists, and either a CheckedAddress or
AddressValidationError for CheckAddressExists.
o We may also want to distinguish between a checked address at this stage and our Address domain object; so we'll say that a
CheckedAddress is just a wrapped version of an UnvalidatedAddress
• We can now define the ValidateOrder step as a function with a primary input (the UnvalidatedOrder), two dependencies
(the CheckProductCodeExists and CheckAddressExists services), and output (either a ValidatedOrder or an error. Because
one of the dependencies returns a Result, so should this function.
• (define (validate-order init-order-form check-code-fnc check-addr-fnc)
; returns Result, valid-order or validation-error
MODELING STEPS WITH TYPES
• The Pricing Step
o Input is a ValidatedOrder, output is a PricedOrder, and one dependency: GetProductPrice
o Note that we're not passing a heavyweight iProductCatalog interface to it, just a function that represents
exactly what we need from the product catalog at this stage
That is, GetProductPrice acts as an abstraction—it hides any data this workflow doesn't need to know about, exposing
only the functionality needed and no more
o This function always succeeds (remember, we checked the product codes to validate the order), so there's no
need to return a Result; we can just fill in the prices.
MODELING STEPS WITH TYPES
• The AcknowledgeOrder step creates an acknowledgement letter and sends it to the customer.
o To model the acknowledgement letter, for now we'll just say it contains an HTML string that we're going send in
an email.
o So an OrderAcknowledgement consists of an EmailAddress, and a Letter (an HTML string, which is just a string)
What about the letter contents? Chances are it's produced from a template, based on the customer information and
order details.
Rather than embedding that in the workflow, we'll make it someone else's problem, by assuming a service function that
will generate the content for us:
(define (create-order-acknowledge-letter priced-order) ; returns html-string
Likewise, do we interact with an API to send the email? Add it to an email queue? Let's just assume a function that takes
an OrderAcknowledgement as input and sends it for us:
(define (send-order-acknowledge order-acknowledgement)
; no return value. Sends message as side effect, handles any errors around that
MODELING STEPS WITH TYPES
• But hold on; that's not quite right
• We want to return an OrderAcknowledgmentSent event from the overall order-placing workflow if it
was sent, but with this design we can't tell if it was sent or not. An obvious choice is for the function to
return a bool, reporting whether or not the email was actually sent.
• But bools aren't a good choice in design, because they're so uninformative. We can at least use a more
informative label and use Sent or NotSent as the return type
• Hm, should we have the service itself optionally return the OrderAcknowledgmentSent event itself? But
that would create a coupling between our domain and the service, via the event type.
• There's no obviously correct answer here, so for now we'll stick with the Sent / NotSent approach
MODELING STEPS WITH TYPES
• So what should the output of this step be? Just the "sent" event, if created. We'll define that as an
OrderID and an EmailAdress.
• So, putting it all together:
(define (acknowledge-order priced-order create-acknowledge-letter send-acknowledge-
letter)
; returns Maybe order-acknowledgement-sent
CREATING THE EVENTS TO RETURN
• We still need to create the OrderPlaced event (for shipping) and BillableOrderPlaced event (for billing).
• The OrderPlaced event can just be an alias for PricedOrder, the BillableOrderPlaced is just a subset of
the PricedOrder—the OrderID, BillingAddress, and BillingAmount.
• We could create a special type to hold them, but it's likely we'll be adding new events to this workflow
over time, and designing a special record type makes the design harder to change.
• Instead, we can just say the the workflow returns a list of events, where an event can be any one of
OrderPlaced, BillableOrderPlaced, or OrderAcknowledgementSent.
• If we need to add another type, we can add it to the workflow without breaking anything
• And if we discover the same events appear in multiple workflows, we can go up a level and create a
more general OrderTakingDomainEvent as a choice of all the events in the domain
DOCUMENTING EFFECTS
• What effects could these functions have? Do they do I/O?
• The validation step has 2 dependencies: CheckProductCodeExists and CheckAddressExists
o CheckProductionCodeExists is a ProductCode --> bool function. Could it return an error, and is it a remote call?
Let's assume not. We probably have a copy of the product catalog available that we can access quickly.
o On the other hand, CheckAddressExists is calling a remote service, one not local inside the domain, so it should
have the Async effect as well as the Result effect. And if one of the functions we call can be delayed, then so can
we; like Result, Async is contagious for any code containing it.
In fact, we often define an AsyncResult for just this case—a composition of Async and Result
o With that, we can change the signature of CheckAddressExists to AsyncResult.
o And that means that ValidateOrder also returns an AsyncResult
• So it's clear that CheckAdressExists is doing I/O and it might fail.
EFFECTS IN THE PRICING STEP
• The only dependency is GetProductPrice. We again assume that the catalog is local (in memory, or at
worst on disk) so there's no Async effect. Nor can accessing it return an error as far as we can tell. So no
effects there.
• The PriceOrder step itself might return an error. If an item has been mispriced, the overall AmountToBill
might be very large, or might be negative. We should catch that when it happens.
o Yes, it's an edge case, but real-world embarrassments have been caused by errors like this. And the occasional
lawsuit...
• So, if it might fail, we should return a Result, which means we also need to define an error type to go
with it.
EFFECTS IN THE ACKNOWLEDGE STEP
• The AcknowledgeOrder step has 2 dependencies: CreateOrderAcknowledgmentLetter and
SendOrderAcknowledgment
• Can the CreateOrderAcknowledgementLetter function return an error? Probably not. We'll further assume
it's local and uses a cached template. So there are no effects that need to be documented in the type
signature.
• On the other hand, we know SendOrderAcknowledgement will be doing I/O, so it needs an Async effect.
• What about errors? In this case, we ignore errors and continue processing the order; we can deal with the
lack of an acknowledgment email later.
• So the revised SendOrderAcknowledgment will be Async but not AsyncResult.
• Of course, that effect ripples up to the parent function as well, so AcknoweldgeOrder has Async output as
well
COMPOSING THE WORKFLOW FROM THE
STEPS
• From there we have everything we need... but we note that most of these functions are world-crossing:
o ValidateOrder takes in an UnvalidatedOrder and returns an AsyncResult<ValidatedOrder, ErrorList>
o PriceOrder takes in a ValidatedOrder and returns a Result<PricedOrder, PricingError>
o AcknowledgeOrder takes in a PricedOrder, and returns an Async<Maybe OrderAcknowledgementSent>
o CreateEvents takes in a PricedOrder and returns a List(PlaceOrderEvent)
• We've seen this before... the built-in chain function will get a workout, or we can write our own.
ARE DEPENDENCIES PART OF THE
DESIGN?
• We've been treating calls to other contexts as dependencies to be documented. We added extra parameters for them.
• There's also the argument that how a process performs its job should be hidden. Do we really care what other systems
it has to call?
• There's never a single right answer when it comes to design—it's a classic "wicked" problem—but there are some
general guidelines
o For functions exposed in a public API, hide dependency information from callers
o For functions used internally, be explicit about dependencies
• The dependencies for the top-level PlaceOrder workflow should not be exposed, because the caller doesn't need to
know about them. The signature should just show inputs and outputs.
• But for each internal step, the dependency should be made explicit. This documents what each step actually needs. If
the dependencies for a step change, we can alter the definition for that step, which in turn will force us to change the
implementation.
LONG RUNNING WORKFLOWS
• We're expecting that even though there are calls to remote systems, the pipeline will complete in a short time, on the
order of seconds. What if they take longer to complete?
o What if the validation was done by a person rather than a machine? Or pricing was done by a different department?
o Cf. Loading reel-to-reel tapes in Ye Olden Dayes of Computing...
o First, we save the state into storage before calling a remote service, then we wait for a message telling us the service has finished,
then reload the state from storage and continue with the next step.
• This is much heavier than asynchronous calls, because we have to persist the state between steps.
• Rather than 1 workflow, you can think of this as several workflows.
• This is where a state machine model is useful. We recover a state from storage, the mini-workflow transforms it to a
new state, we store the new state and call the next remote service.
• These long-running workflows (sometimes called Sagas) are typical where humans are involved, but can be used
anywhere you want to break the workflow up into decoupled stand-alone pieces connected by events.
o Microservices!
• This example was simple. If we get more and more states, we may want to create a ProcessManager component that
dispatches and receives messages, and triggering the appropriate workflow.
WRAPPING UP
• We've modeled a workflow using types
• We started by documenting inputs, and how to model commands.
• We can use state machines to model documents or entities with a life cycle
• We modeled each substep with types to represent input, output, dependencies, and effects.
• We created lots of types along the way. Was all that really needed? Was that too many types?
o Remember, we're trying to create executable documents—code that communicates the domain.
o If we didn't create these types, we'd still have to document the difference between a ValidatedOrder and a
PricedOrder; why not let the code do it?
• There's always a balance. If this approach is overkill for what you're doing, reduce it to match your
needs. Do what serves the domain and is most helpful for the task at hand.