Skip to content

Commit ef2bca8

Browse files
author
doyougnu
committed
opt: add bodigrims comments to lambda lift
1 parent efa233b commit ef2bca8

File tree

2 files changed

+111
-51
lines changed

2 files changed

+111
-51
lines changed

src/Optimizations/GHC_opt/lambda_lifting.rst

Lines changed: 110 additions & 51 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,8 @@ Lambda Lifting :cite:p:`lambdaLifting` is a classic rewriting technique that
1111
avoids excess closure allocations and removes free variables from a function. It
1212
avoids closure allocation by moving local functions out of an enclosing function
1313
to the :term:`top-level`. It then removes free variables by adding parameters to
14-
the lifted function to capture free variables. This chapter describes the lambda
15-
lifting transformation, describes how GHC implements the transformation and
14+
the lifted function that captures the free variables. This chapter describes the
15+
lambda lifting transformation, how GHC implements the transformation, and
1616
provides guidance for when to implement the transformation manually.
1717

1818
A Working Example
@@ -52,19 +52,69 @@ to the top level producing the final program:
5252
f a 0 = a
5353
f a n = f (g_lifted a (n `mod` 2)) (n - 1)
5454
55-
This new program will be much faster because ``f`` becomes essentially
56-
non-allocating. Before the lambda lifting transformation ``f`` had to allocate a
57-
closure for ``g`` in order to pass ``a`` to ``g``. After the lambda lifting on
55+
Before the lambda lifting transformation ``f`` had to allocate a closure for
56+
``g`` in order to allow ``g`` to reference ``a``. After the lambda lifting on
5857
``g`` this is no longer the case; |glift| is a top level function so ``f`` can
5958
simply reference it; no closures needed!
6059

60+
This new program *could* be much faster than the original, it depends on the
61+
usage patterns the programs will experience. To understand the distribution of
62+
patterns inspect the function's behavior with respect to its inputs. The
63+
original program allocates one expensive closure for ``g`` per call of ``f``. So
64+
When ``n`` is large, there will be few calls of ``f`` relative to ``g``, in fact
65+
for each call of ``f`` we should expect exactly ``n `mod` 2`` calls of ``g``. In
66+
this scenario, the original program is faster because it allocates some closures
67+
in the outer loop (``f``, the outer loop, allocates a closure for ``g``, the
68+
inner loop, which includes a reference to ``a``) and in turn saves allocations
69+
in the inner loop (``g``) because ``a`` can simply be referenced in ``g``. Since
70+
the inner loop is called much more than the outer loop this pattern saves
71+
allocations.
72+
73+
In contrast, the lifted version must allocate an additional argument for ``a``
74+
*for each* call of ``g_lifted``. So when ``n`` is large and we have many more
75+
calls to ``g_lifted`` relative to ``f`` the extra argument required to pass
76+
``a`` adds up to more allocations than the original version would make.
77+
78+
However the situation reverses when there are *many* calls to ``f a n`` with a
79+
small ``n``. In this scenario, the closure allocation that the original makes in
80+
the outer loop do not pay off, because the inner loop is relatively short lived
81+
since ``n`` is small. For the same reason, the lambda lifted version is now
82+
fruitful: because ``n`` is small the extra parameter that |glift| must allocate
83+
stays cheap. Thus the lifted version is faster by avoiding the closure
84+
allocation in the now frequently called outer loop.
85+
86+
Now ``f`` is an obviously contrived example, so one may ask how frequently the
87+
many-calls with low ``n`` scenario will occur in practice. The simplest example
88+
is very familiar:
89+
90+
.. code-block:: haskell
91+
92+
-- | map with no lambda lifting
93+
map f = go
94+
where
95+
go [] = []
96+
go (x:xs) = f x : go xs
97+
98+
vs. the lifted version:
99+
100+
.. code-block:: haskell
101+
102+
-- | map lambda lifted
103+
map f [] = []
104+
map f (x:xs) = f x : map f xs
105+
106+
The first form is beneficial when there are a few calls on long lists via the
107+
same reasoning as above; only now we have the list determines the number of
108+
calls instead of ``n`` and ``f`` is free rather than ``a`` . Similarly, the
109+
second form is beneficial when there many calls of ``map`` on short lists.
110+
61111
.. note::
62112

63113
The fundamental tradeoff is decreased heap allocation for an increase in
64-
function parameters at each call site. This means that lambda lifting trades
65-
heap for stack and is not always a performance win. See :ref:`When to
66-
Manually Apply Lambda Lifting <when>` for guidance on recognizing when your
67-
program may benefit.
114+
function parameters at each call site. This means that whether lambda lifting
115+
is a performance win or not depends on the usage pattern of the function as
116+
we have demonstrated. See :ref:`When to Manually Apply Lambda Lifting <when>`
117+
for guidance on recognizing when your program may benefit.
68118

69119

70120
How Lambda Lifting Works in GHC
@@ -75,9 +125,10 @@ default method GHC uses for handling local functions and free variables.
75125
Instead, GHC uses an alternative strategy called :term:`Closure Conversion`,
76126
which creates more uniformity at the cost of extra heap allocation.
77127

78-
Automated lambda lifting in GHC occurs *late* in the compiler pipeline at STG,
79-
right before code generation. GHC lambda lifts at STG instead of Core because
80-
lambda lifting interferes with other optimizations.
128+
Automated lambda lifting in GHC is called *late lambda lifting* because it
129+
occurs in the compiler pipeline in STG, right before code generation. GHC lambda
130+
lifts at STG instead of Core because lambda lifting interferes with other
131+
optimizations.
81132

82133
Lambda lifting in GHC is also *Selective*. GHC uses a cost model that calculates
83134
hypothetical heap allocations a function will induce. GHC lists heuristics for
@@ -112,10 +163,9 @@ Observing the Effect of Lambda Lifting
112163
--------------------------------------
113164

114165
You may directly observe the effect of late lambda lifting by comparing Core to
115-
STG when late lambda lifting is enabled. You can also directly disable or enable
116-
late lambda lifting with the flags ``-f-stg-lift-lams`` and
117-
``-fno-stg-lift-lams``. In general, lambda lifting performs the following
118-
syntactic changes:
166+
STG when late lambda lifting is enabled. You can also disable or enable late
167+
lambda lifting with the flags ``-f-stg-lift-lams`` and ``-fno-stg-lift-lams``.
168+
In general, lambda lifting performs the following syntactic changes:
119169

120170
#. It eliminates a let binding.
121171
#. It creates a new :term:`top-level` binding.
@@ -131,22 +181,25 @@ When to Manually Lambda Lift
131181
----------------------------
132182

133183
GHC does a good job finding beneficial instances of lambda lifting. However, you
134-
might want to manually lambda lift to save compile time, or to increase
135-
the performance of your without relying on GHC's optimizer.
184+
might want to manually lambda lift to save compile time, or to increase the
185+
performance of your program without relying on GHC's optimizer.
136186

137-
There are three considerations you should have when deciding when to manually
138-
lambda lift:
187+
When deciding when to manually lambda lift, consider the following:
139188

140-
1. Are the functions that would be lifted in hot loops.
189+
1. What is the expected usage pattern of the functions.
141190
2. How many more parameters would be passed to these functions.
142-
3. Would this transformation sacrifice readability and maintainability.
143191

144192
Let's take these in order: (1) lambda lifting trades heap (the let bindings that
145-
it removes), for stack (the increased function parameters). Thus it is not
146-
always a performance win and in some cases can be a performance loss. The losses
147-
occur when existing closures grow as a result of the lambda lift. This extra
148-
allocation slows the program down and increases pressure on the garbage
149-
collector. Consider this example from :cite:t:`selectiveLambdaLifting`:
193+
it removes), for stack (the increased function parameters). Thus whether or not
194+
it is a performance win depends on the usage patterns of the enclosing function
195+
and to-be lifted function. As demonstrated in the motivating example,
196+
performance can degrade when extra parameter in combination with the usage
197+
pattern of the function results in more total allocation during the lifetime of
198+
the program. Performance may also degrade if the existing closures grow as a
199+
result of the lambda lift. Both kinds of extra allocation slow the program down
200+
and increases pressure on the garbage collector. So it is important to learn to
201+
read the program from the perspective of memory. Consider this example from
202+
:cite:t:`selectiveLambdaLifting`:
150203

151204
.. code-block:: haskell
152205
@@ -183,43 +236,49 @@ before the lift will save one slot of memory. With ``f_lifted`` we additionally
183236
save two slots of memory because ``x`` and ``y`` are now parameters. Thus
184237
``f_lifted`` does not need to allocate a closure with :term:`Closure
185238
Conversion`. ``g``'s allocations do not change since ``f_lifted`` can be
186-
directly referenced just as before and because ``x`` is still free in ``g``.
187-
Thus ``g``'s closure will contain ``x`` and ``f_lifted`` will be inlined, same
188-
as ``f`` in the unlifted version. ``h``'s allocations grow by one slot since
189-
``y`` *is now also* free in ``h``, just as ``x`` was. So it would seem that in
190-
total lambda lifting ``f`` saves one slot of memory because two slots were lost
191-
in ``f`` and one was gained in ``h``. However, ``g`` is a :term:`multi-shot
192-
lambda`, thus ``h`` will be allocated *for each* call of ``g``, whereas ``f``
193-
and ``g`` are only allocated once. Therefore the lift is a net loss.
194-
195-
This example illustrates how tricky good lifts can be and especially for hot
196-
loops. In general, you should try to train your eye to determine when to
197-
manually lift. Try to roughly determine allocations by counting the ``let``
198-
expressions, the number of free variables, and the likely number of times a
199-
function is called and allocated.
239+
directly referenced just as before and because ``x`` is still free in ``g``. So
240+
``g``'s closure will contain ``x`` and ``f_lifted`` will be inlined, same as
241+
``f`` in the unlifted version. ``h``'s allocations grow by one slot since ``y``
242+
*is now also* free in ``h``, just as ``x`` was. So it would seem that in total
243+
lambda lifting ``f`` saves one slot of memory because two slots were lost in
244+
``f`` and one was gained in ``h``. However, ``g`` is a :term:`multi-shot
245+
lambda`, which means ``h`` will be allocated *for each* call of ``g``, whereas
246+
``f`` and ``g`` are only allocated once. Therefore, the lift is a net loss.
247+
248+
This example illustrates how tricky good lifts can be. To estimate allocations
249+
counting the ``let`` expressions, the number of free variables,
250+
and the number of times the outer function and inner functions are expected to
251+
be called.
200252

201253
.. note::
202254

203255
Recall, due to closure conversion GHC allocates one slot of memory for each
204256
free variable. Local functions are allocated *once per call* of the enclosing
205257
function. Top level functions are always only allocated once.
206258

207-
The next determining factor is counting the number of new parameters that will
208-
be passed to the lifted function. Should this number become greater than the
209-
number of available argument registers on the target platform then you'll incur
210-
slow downs in the STG machine. These slowdowns result from more work the STG
211-
machine will need to do. It will need to generate code that pops arguments from
212-
the stack instead of just applying the function to arguments that are already
213-
loaded into registers. In a hot loop this extra manipulation can have a large
214-
impact.
259+
(2) The next determining factor is counting the number of new parameters that is
260+
passed to the lifted function. Should this number become greater than the number
261+
of available argument registers on the target platform then you'll incur slow
262+
downs in the STG machine. These slowdowns result from more work the STG machine
263+
will need to do; it will need to generate code that pops arguments from the
264+
stack instead of just applying the function to arguments that are already loaded
265+
into registers. In a hot loop this extra manipulation can have a large impact.
266+
267+
In general the heuristic is: if there are few calls to the outer loop and many
268+
calls to the inner loop, then do not lambda lift. However, if there are many
269+
calls to the outer loop and few calls made in the inner loop, then lambda
270+
lifting will be beneficial.
215271

216272
Summary
217273
-------
218274

219275
#. Lambda lifting is a classic optimization technique for compiling local
220276
functions and removing free variables.
221-
#. Lambda lifting trades heap for stack and is therefore effective for tight,
222-
closed, hot loops where fetching from the heap would be slow.
277+
#. Lambda lifting trades heap for stack. To determine if a manual lambda lift
278+
would be beneficial determine the use pattern of the enclosing and local
279+
functions, determine if closures would grow in the lifted version, and ensure
280+
that the extra parameters in the lifted version would not exceed the number
281+
of argument registers on the platform the program targets.
223282
#. GHC automatically performs lambda lifting, but does so only selectively. This
224283
transformation is late in the compilation pipeline at STG and right before
225284
code generation. GHC's lambda lifting transformation can be toggled via the

src/contributors.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,3 +18,4 @@ Reviewers:
1818
- Jeffrey Young `@doyougnu <https://github.com/doyougnu>`_
1919
- Sylvain Henry `@hsyl20 <https://hsyl20.fr/home/>`_
2020
- Frank Staals `@noinia <https://fstaals.net/>`_
21+
- Andrew Lelechenko `@bodigrim <https://github.com/Bodigrim>`_

0 commit comments

Comments
 (0)