Zooming in on DAML’s performance
tl;dr If you care about performance, use DAML’s builtin syntax for accessing record fields.
I guess it is no secret that I’m not the biggest fan of lenses, particularly not in DAML. I wouldn’t go as far as saying that lenses, prisms, and the other optics make my eyes bleed but there are definitely better ways to handle records in most cases. One case where the builtin syntax of DAML has a clear advantage over lenses is record access, getting the value of a field from a record:
It doesn’t get much clearer or much shorter. No matter what lens library you use, your equivalent code will look something like
get (lens @"field1" . lens @"field2") record
get (_field1 . _field2) record
if you’re willing to define lenses like
_field2 for each record field. Either way, the standard DAML way is hard to beat. And if you need to pass a field accessor around as a function, the
(.field1) syntax has you covered as well.
Clarity is often in the eye of the beholder and short code is not per se good code. The only thing that seems to matter universally is performance. Well, let’s have a look at the performance of both styles then. If you want to play along with the code in this blog post, you can find a continuous version of it in a GitHub Gist.
van Laarhoven lenses
Before we delve into a series of benchmarks, let’s quickly recap van Laarhoven lenses. The type of a lens for accessing a field of type
a in a record of type
type Lens s a = forall f. Functor f => (a -> f a) -> (s -> f s)
If it wasn’t for everything related to the functor
f in there, the type would be the rather simple
(a -> a) -> (s -> s). This looks like the type of a higher-order function that turns a function for changing the value of the field into a function for changing the whole record. That sounds pretty useful in its own right and is also something that could easily be used to implement a setter for the field: just use
\_ -> x as the function for changing the field in order to set its value to
x. Alas, it seems totally unclear how we could use such a higher-order function to produce a getter for the field. That’s where
f comes into play. But before we go into the details of how to get a getter, let’s implement a few lenses first.
Given a record type
data Record = Record with field1: Int; ...
a lens for
field1 can be defined by
_field1: Lens Record Int
_field1 f r = fmap (\x -> r with field1 = x) (f r.field1)
In fact, this is the only way we can define a function of this type without using any “magic constants”. More generally, a lens can be made out of a getter and a setter for a field in a very generic fashion:
makeLens: (s -> a) -> (s -> a -> s) -> Lens s a
makeLens getter setter f r = setter r <$> f (getter r)
Again, this is the only way a
Lens s a can be obtained from
setter without explictly using bottom values, such as
HasField class, we can even produce a lens that works for any record field:
lens: forall x s a. HasField x s a => Lens s a
lens = makeLens (getField @x) (flip (setField @x))
The lens to access
field1 is now written as
Remark. In my opinion, the most fascinating fact about van Laarhoven lenses is that you can compose them using the regular function composition operator
(.) and the field names appear in the same order as when you use the buildin syntax for accessing record fields, as indicated by the example in the introduction.
Implementing the getter
That’s all very nice, but how do we actually use a lens to access a field of a record? As mentioned above, that’s where the
f comes into play. What we are looking for is a function
get: Lens s a -> s -> a
get l r = ???
Recall that the type
Lens s a has the shape
forall f. Functor f => .... This means the implementation of
get can choose an arbitrary functor
f and an arbitrary function of type
(a -> f a) and pass them as arguments to
l. The functor that will solve our problem is the so-called “const functor” defined by
data Const b a = Const with unConst: b
instance Functor (Const r) where
fmap _ (Const x) = Const x
The key property of
Const is that
fmap does pretty much nothing with it (except for changing its type). We can use this to finally implement
get as follows:
get: Lens s a -> s -> a
get l r = (l Const r).unConst
With this function, we can get the value of
field1 from an arbitrary record
r by calling
get (lens @"field1") r
For the sake of completeness, let’s quickly define a setter as well. As insinuated above, we don’t really need the
f for that purpose. That’s exactly what the
Identity functor is for:
data Identity a = Identity with unIdentity: a
instance Functor Identity where
fmap f (Identity x) = Identity (f x)
set: Lens s a -> a -> s -> s
set l x r = (l (\_ -> Identity x) r).unIdentity
How to micro-benchmark DAML
Micro-benchmarking DAML code is unfortunately still a bit of an art form. We will use the scenario-based benchmarking approach described in a readme in the
daml repository. To this end, we need to write a scenario that runs
get (lens @"field") r for some record
r. The benchmark runner will then tell us how long the scenario runs on average.
In order to write such a scenario, we need to take quite a few things into consideration. Let’s have a look at the code first and explain the details afterward:
records = map Record [1..100_000] -- (A)
benchLens = scenario do
_ <- pure () -- (B)
let step acc r =
acc + get (lens @"field1") r -- (C)
let _ = foldl step 0 records -- (D)
The explanations for the marked lines are as follows:
- (D) Running a scenario has some constant overhead. Buy running the getter 100,000 times, we make this overhead per individual run of the getter negligible. However, folding over a list has some overhead too, including some overhead for each step of the fold. In order to account for this overhead, we use a technique that could be called “differential benchmarking”: We run a slightly modified version of the benchmark above, where line (C) is replaced by
acc + rand line (A) by
records = [1..100_00]. The difference between both benchmarks will tell us how long it takes to execute line (C) 100,000 times.
- (A) In order for the differential benchmarking technique to work, we need to compute the value of
recordsoutside of the actual measurements since allocating a list of 100,000 records takes significantly longer than allocating a list of 100,000 numbers. To this end, we move the definition of
recordsto the top-level. The DAML interpreter computes top-level values the first time their value is requested and then caches this value for future requests. The benchmark runner fills these caches by executing the scenario once before measuring.
- (B) Due to the aforementioned caching of top-level values and some quirks around the semantics of
donotation, we need to put our benchmark after at least one
<-binding. Otherwise, the result of the benchmark would be cached and we would only measure the time for accessing the cache.
- (C) We put the code we want to benchmark into a non-trivial context to reflect its expected usage. If we dropped the
acc +, then
get (lens @"field1") rwould be in tail position of the
stepfunction and hence not cause a stack allocation. However, in most use cases the
getfunction will be part of a more complex expression and its result will be pushed onto the stack. Thus, it seems fair to benchmark the cost of running the
getplus the pushing onto the stack. The additional cost of the addition is removed by the differential benchmarking technique.
Recall that the objective of this blog post is to compare lenses to the builtin syntax for accessing record fields in terms of their runtime performance. To this end, we run three benchmarks:
benchLensas defined above,
benchNoop, the variant of
benchLensdescribed under (D) above,
benchBuiltin, a variant of
benchLenswhere line (C) is replaced by
acc + r.field1.
T(x) denotes the time it takes to run a benchmark
x, then we can compute the time a single
get (field @"field1") r takes by
(T(benchLens) - T(benchNoop)) / 100_000
Similarly, the time a single
r.x takes is determined by
(T(benchBuiltin) - T(benchNoop)) / 100_000
Running these benchmarks on my laptop produced the following numbers:
||188.7 ms||1776 ns|
||15.7 ms||46 ns|
Benchmarks of the polymorphic
lens and the builtin syntax.
Wow! That means a single record field access using the builtin syntax takes 46 ns whereas doing the same with
lens takes 1776 ns, which is roughly 38 × 46 ns. That is more than 1.5 orders of magnitude slower!
Why are lenses so slow as getters?
This is almost a death sentence for lenses as getters. But where are these huge differences coming from? If we look through the definitions of
get, we find that there are quite a few function calls going on and that the two typeclasses
HasField are involved in this as well. Calling more functions is obviously slower. Typeclasses do have a significant runtime overhead in DAML since instances are passed around as dictionary records at runtime and calling a method selects the right field from this dictionary.
If we don’t want to abandon the idea of van Laarhoven lenses, we cannot get rid of the
Functor typeclass. But what about
HasField? If we want to be able to construct lenses in a way that is polymorphic in the field name, there’s no way around
HasField. However, if we were willing to write plenty of boilerplate like the monomorphic
_field1 lens above, we could do away with
HasField. Benchmarking this approach yields:
||92.5 ms||814 ns|
Benchmark of the monomorphic
Accessing fields with monomorphic lenses is twice as fast as with their polymorphic counterparts but still more than an order of magnitude slower than using the builtin syntax. This implies that no matter how much better we make the implementation of
lens, even if we used compile time specialization for it, we wouln’t get better than a 17x slowdown compared to the builtin syntax.
Temporary stop-gap measures
If a codebase is ubiquitously using lenses as getters, then rewriting it to use the builtin syntax instead will take time. It might make sense to replace some very commonly used lenses with monomorphic implementations. Although, in a codebase defining hundreds of record types, each of them with a few fields, there is most likely no small group of lenses whose monomorphization makes a difference.
Fortunately, there’s one significant win we can achive without changing too much code at all. The current implementation of
lens is pretty far away from the implementation of
_field1. If we move
lens closer to
_field1, we arrive at
fastLens: forall x r a. HasField x r a => Lens r a
fastLens f r = fmap (\x -> setField @x x r) (f (getField @x r))
Benchmarking this implementation gives us
||128.9 ms||1178 ns|
Benchmark of the polymorphic
These numbers are still not great, but they are at least a 1.5x speedup compared to the implementation of
Chains of record accesses
So far, our benchmarks were only concerned with accessing one field in one record. A pattern that occurs quite frequently in practice are nested records and chains of record accesses, as in
With the builtin syntax, every record access you attach to the chain is as expensive as the first record access. Benchmarks confirm this linear progression. However, we could easily make every record access after the first one in a chain significantly faster in the DAML interpreter.
There’s a similar linear progression when using
fastLens. Unfortunately, we have no chance of optimizing chains of record accesses in any way since they are completely intransparent to the compiler and the interpreter.
I think the numbers say everything there’s to say:
|Method||Time (in ns)||Slowdown vs builtin|
Summary of the benchmarks.
In view of these numbers, I would recommend to everybody who cares about performance to use DAML’s builtin syntax for accessing record fields!
Only focussing on getters might be modestly controversial since lenses also serve a purpose as setters. I expect the differences in performance between DAML’s builtin syntax for updating record fields
r with field1 = newValue
set and a lens to be in the same ballpark as for setters when updating a single field in a single record. When updating multiple fields in the same record, the DAML interpreter already performs some optimizations to avoid allocating intermediate records. Such optimizations are impossible with lenses.
However, when it comes to updating fields in nested records, DAML’s builtin syntax is not particularly helpful:
r with field1 = r.field1 with field2 = newValue
It gets even worse when you want to update the value of a field depending on its old value using a function
f. In many lens libraries this function is called
over and can be used like
over (lens @"field1" . lens @"field2") f r
Expressing the same with DAML’s builtin syntax feels rather clumsy
r with field1 = r.field1 with field2 = f r.field1.field2
If we ever want to make lenses significantly less appealing in DAML than they are today, we need to innovate and make the builtin syntax competitive when it comes to nested record updates. Who would still want to use lenses if you could simply write
r.field1.field2 ~= f
DAML has also a new learn section where you can begin to code online: