Catamorph your DSL (Deep Dive)

Share on:

This post is a second of the series of post dedicated to notion of Catamorphisms and its application to build Domain Specific Languages (DSLs).

Our last post introduced an Arithmetic DSL that allowed to build expression composed of operations like addition and multiplication integer constants and integers variables. On top of this DSL, we built several interpreter functions:

  • prn to pretty print our Arithmetic expressions
  • eval to compute the value of an expression, given the value of all variables
  • optimize to simplify and optimize our expression
  • partial to partially evaluate our expression, given the value of some variables
  • dependencies to retrieve not yet resolved variables from our expression

At the end of the last post, we noticed that eval could be written in terms of partial and optimize. The impossibility to compose these interpreters efficiently showed us the limits of our model. Our design that coupled the traversal and the transformation on the Abstract Syntax Tree (AST) was simply not good enough.

The post of today will introduce the notion of Catamorphisms as a nice way to:

  • Decouple the traversal from the actions to perform
  • Get efficient composition of our interpreters back

Open type recursion

One of the first step to do in order to get to Catamorphisms is to deconstruct the recursion. Practically, it means getting rid of the recursive nature of the data structure by introducing a template.

type Id = String
data OpType = Add | Mul deriving (Show, Eq, Ord)

data Expr
  = Cst Int
  | Var Id
  | Op OpType [Expr]
  deriving (Show, Eq, Ord)

Becomes the following, where recursion on Expr is replaced by the templated type r:

type Id = String
data OpType = Add | Mul deriving (Show, Eq, Ord)

data ExprR r
  = Cst Int
  | Var Id
  | Op OpType [r]
  deriving (Show, Eq, Ord)

This looks innocent, but this is a very powerful trick. We already covered a similar trick for functions in my very first blog post when we were concerned about dynamic programming.

Back then, open recursion allowed us to plug memoization to our recurrence formula. Deconstructing our type recurrence relation will allow us to separate the recurrence from the action to perform during it.

If you read this previous post, you know the drill. Our first step is to get back our naive recurrence by computing the fix point of our recurrence. For types, this is a bit more complex (as we cannot create infinite types) and is achieved using the following trick:

newtype Fix f = Fix { unFix :: f (Fix f) }

-- Analog to Expr = ExprR Expr
type Expr = Fix ExprR

Fortunately, there is a Haskell package on hackage that help you automating this, in the data-fix package.

Building an expression

To construct an Expr instance, we need to adapt our “smart constructors” (Haskell’s name for factory functions) introduced in our previous post:

cst = Fix . Cst
var = Fix . Var
add = Fix . Op Add
mul = Fix . Op Mul

Having introduced this layer of abstraction allows us to keep the exact same syntax as before to construct our expressions:

> let e = add [ cst(1)
              , cst(2)
              , mul [cst(0), var("x"), var("y")]
              , mul [cst(1), var("y"), cst(2)]
              , add [cst(0), var("x") ]]

Although our expression type got more complex by the introduction of a template, we can still instantiate it as easily as we used to, as we do not need to know the exact constructors needed at construction.

Getting to catamorphisms

Let us start by demystifying the big word. A Catamorphism is a just a name for a recursion scheme. A kind of design pattern to decouple the traversal of a recursive data structure from the work to do inside the traversal.

Our cataphormism implementation will make use of the open recursion we introduced in ExprR to implement a kind of post-order depth first search traversal.

Functor instance

Let us start by the first building block of the Catamorphism, making ExprR a Functor instance. It allows to apply a function on the type parameter r for the ExprR r. The Functor basically allows us to transform the type that represents the recurrence (for example to remove the recurrence).

DerivingFunctor could have generated the Functor instance for us as follows, but doing manually is really instructive:

instance Functor ExprR where
  fmap _ (Cst c) = Cst c
  fmap _ (Var v) = Var v
  fmap f (Op opType xs) = Op opType (map f xs)

Now, if we consider a function with a prototype ExprR a -> a, we see that providing it to fmap will get us a function ExprR (ExprR a) -> ExprR a. Somehow, using fmap is like navigating between levels of recursion.

But we are not done yet. There is an unbound number of levels of recursions to deal with. Moreover, fmap does not quite fit our recursion: we have to take into account the Fix constructor wrapping each ExprR.

FMAP at each level

So we need to build a function that will apply our transformation through fmap, at each level of recursion. This function is the catamorphism function below which morphs a function ExprF a -> a to a function Expr -> a:

cataExpr :: (ExprR a -> a) -> Expr -> a
cataExpr algebra =
  algebra
  . fmap (cataExpr algebra)
  . unFix

In simpler terms, it provides a way to lift a local transformation (that applies to one stage of the Abstract Syntax Tree) into a global transformation on the whole expression.

It proceeds in three steps:

  • unFix allows to extract the ExprF from the Fix constructor
  • fmap (cataExpr algebra) recurs on the sub-trees: each sub expression becomes a a
  • The resulting ExprR a is given to the function algebra to finish the job

Generalization to all Functors

We can generalize this cataExpr function to work on any fixed point of an open recursive data structure that is an instance of Functor. It means we can replace our ExprR by a Functor instance, as follows:

cata :: Functor f => (f a -> a) -> Fix f -> a
cata algebra =
  algebra
  . fmap (cata algebra)
  . unFix

This is a bit abstract, so do not worry if you do not quite get this one. Just remember that applied to our ExprR, this function will resolve to the same cataExpr function we saw earlier.

Revisiting pretty printing

To best understand how catamorphisms work, examples are key. Let us start by reworking our first interpreter, the pretty printer, in terms of cata.

One cool thing about using this recursion scheme is that we can reason locally: we only need to care about what happens at one given stage (one node of our AST). Namely, we know that pretty printing a:

  • Constant is calling show on the numeric value
  • Variable is returning the name of variable
  • Addition is concatenating the sub-expression textual representation with +
  • Multiplication is concatenating the sub-expression textual representation with *

This translates almost directly into the following code. We can observe that algebra function is only concerned about implementing the local transformations listed above. The cata function takes entirely care of the recursion:

prn :: Expr -> String
prn = cata algebra where
  algebra (Cst n) = show n
  algebra (Var x) = x
  algebra (Op Add xs) = "(+ " ++ unwords xs ++ ")"
  algebra (Op Mul xs) = "(* " ++ unwords xs ++ ")"

To convince ourselves it works, we can test our function:

> let e = add [ cst(1)
              , cst(2)
              , mul [cst(0), var("x"), var("y")]
              , mul [cst(1), var("y"), cst(2)]
              , add [cst(0), var("x") ]]

> prn e
"(+ 1 2 (* 0 x y) (* 1 y 2) (+ 0 x))"

Evaluation and dependencies

Let us continue our exploration of catamorphisms by rewriting our two next most straightforwards evaluators, eval and dependencies.

Our eval function is based on an algebra function that only has to deal with integers:

  • The evaluation of constant and variables is just as before
  • Addition is a simple call to sum on the sub-expression results
  • Multiplication is a call to product on the sub-expression results
eval :: Env -> Expr -> Int
eval env = cata algebra where
  algebra (Cst n) = n
  algebra (Var x) = env Map.! x
  algebra (Op Add xs) = sum xs
  algebra (Op Mul xs) = product xs

The dependencies function is based on an algebra function that is only concerned in joining sets of variables identifiers:

  • Variables terms have only one dependency: the variable itself
  • Operations have for dependencies the total dependencies of their sub-expressions
dependencies :: Expr -> Set.Set Id
dependencies = cata algebra where
  algebra (Cst _) = Set.empty
  algebra (Var x) = Set.singleton x
  algebra (Op _ xs) = Set.unions xs

In both cases, and because cata takes care of the recursion, the implementation almost reflects the specification textually!

Optimized combination of optimizations

Let us now move on the fun part, decomposing our optimization functions bit by bit, before recombining them with great efficiency. We will adapt the optimize function we introduced in our previous post, but with two big differences:

We extracted the algebra from the recursion: to actually run the optimizations, these functions should be given as argument to cata.

Optimizations for Add and Mul are separated: this is to reflect possible real situations in which two teams (and two sets of functions) are specialized and focused on dealing one complex optimization each.

optimizeAdd :: ExprR Expr -> Expr
optimizeAdd op@(Op Add _) = optimizeOp op 0 (+)
optimizeAdd e = Fix e

optimizeMul :: ExprR Expr -> Expr
optimizeMul op@(Op Mul xs)
  | not (null (dropWhile (/= cst 0) xs)) = cst 0
  | otherwise = optimizeOp op 1 (*)
optimizeMul e = Fix e

optimizeOp :: ExprR Expr -> Int -> (Int -> Int -> Int) -> Expr
optimizeOp (Op optType xs) neutral combine =
  let (constants, vars) = partition isCst xs
      constantsVal = map (\(Fix (Cst x)) -> x) constants
      sumCst = foldl' combine neutral constantsVal
  in case vars of
      []  -> cst sumCst
      [y] | sumCst == neutral -> y
      ys  | sumCst == neutral -> Fix (Op optType ys)
      ys  -> Fix (Op optType (cst sumCst : ys))

We already know how to run these optimizations inefficiently in two tree traversals. Thanks to our previous labour though, we can do much better. Because the algebra are not coupled to the traversal, we combine two algebra into a single algebra before running one single traversal.

We introduce below the means to compose our algebras:

  • comp combines two algebra by function composition with unFix
  • compAll combines several algebra by reducing with comp over them
type Algebra f = f (Fix f) -> Fix f

comp :: Algebra f -> Algebra f -> Algebra f
comp f g = f . unFix . g

compAll :: Foldable t => t (Algebra f) -> Algebra f
compAll fs = foldr1 comp fs

Using these composition functions, we can write an efficient optimize function:

optimize :: Expr -> Expr
optimize = cata (optimizeMul `comp` optimizeAdd)

We did manage to combine several actions in a single traversal. This is a quite important feature which might remind you of our traversal combinators for simple ranges in this post.

This ability to compose easily will allow us to build modular yet efficient software as we will see in the next section.

Partial + Optimize = Eval

Let us take back from where we failed in the last post. We will now build an efficient evaluation function from three different building blocks:

  • A partial evaluation algebra to replace the known variables by constants
  • Our addition optimization and multiplication optimization algebra
  • Our dependencies interpreter that identifies remaining unknowns

Our partial evaluation algebra function will be named replaceKnownVars. It replaces variables bound in its environment argument by their value:

replaceKnownVars :: Env -> ExprR Expr -> Expr
replaceKnownVars env = go where
  go e@(Var v) =
    case Map.lookup v env of
      Just val -> cst val
      Nothing -> Fix e
  go e = Fix e

To finish up the work, we just need to assemble the pieces together:

  • partial combines the replacement of known variables with the optimization steps
  • eval runs a partial evaluation, checks whether the expression was reduced to a constant and reports an error with remaining dependencies if it was not
partial :: Env -> Expr -> Expr
partial env = cata (compAll [optimizeMul, optimizeAdd, replaceKnownVars env])

eval :: Env -> Expr -> Int
eval env expr =
  case partial env expr of
    (Fix (Cst n)) -> n
    e -> error $ "Missing vars: " ++ show (dependencies e)

That’s it! Three steps in one traversal in case of success, and a second traversal in case of error, all obtained from modular and independently testable pieces of software.

Note: you can notice there another advantage of working with catamorphisms: pattern matching does not have to be concerned with every possible cases. Since the recursion is handled outside, algebra that transform the AST can make use of the default case.

Conclusion and what’s next

This concludes our introduction of catamorphisms in Haskell.

We went through the mechanisms needed to introduce this recursion scheme, and demonstrated how it could be applied to combine several tree recursions into a single traversal.

We discovered how using this technique might improve the modularity and testability of our code, with minimum expenses in terms of efficiency.

The next posts will focus on discussing the limits of this technics, some of its trade-offs and show how it translates in some other languages.

Updated:

 

<
Previous Post
Catamorph your DSL (Introduction)
>
Next Post
Catamorph your DSL (Trade-offs)

Comments