Skip to content

Commit 1de2d3d

Browse files
author
doyougnu
committed
working: more lambda lifting
1 parent 8547c4e commit 1de2d3d

File tree

1 file changed

+103
-9
lines changed

1 file changed

+103
-9
lines changed

src/Optimizations/GHC_opt/lambda_lifting.rst

Lines changed: 103 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
.. _Lambda Lifting Chapter:
22

3+
..
34
Local Variables
45
.. |glift| replace:: ``g_lifted``
56

@@ -16,7 +17,7 @@ contains closures, rather it only references global names.
1617
A Working Example
1718
-----------------
1819

19-
Consider the following program [#]_:
20+
Consider the following program [#f1]_:
2021

2122
.. code-block:: haskell
2223
@@ -59,9 +60,10 @@ simply reference it; no closures needed!
5960
.. note::
6061

6162
The fundamental tradeoff is decreased heap allocation for an increase in
62-
function parameters at each call site. This means that lambda lifting is not
63-
always a performance win. See `When to Manually Apply Lambda Lifting`_ for
64-
guidance on recognizing when your program may benefit.
63+
function parameters at each call site. This means that lambda lifting trades
64+
heap for stack and is not always a performance win. See `When to Manually
65+
Apply Lambda Lifting`_ for guidance on recognizing when your program may
66+
benefit.
6567

6668

6769
How Lambda Lifting Works in GHC
@@ -125,11 +127,103 @@ syntactic changes:
125127
#. All non-top-level variables (i.e., free variables) in the let's body become
126128
occurrences of parameters.
127129

128-
When to Manually Apply Lambda Lifting
129-
-------------------------------------
130+
When to Manually Lambda Lift
131+
----------------------------
130132

131-
tomorrow: update glossary, start here
133+
GHC does a good job finding beneficial instances of lambda lifting. However, you
134+
might want to manually lambda lift to save compile time, or to increase
135+
the performance of your without relying on GHC's optimizer.
132136

137+
There are three considerations you should have when deciding when to manually
138+
lambda lift:
139+
140+
1. Are the functions that would be lifted in hot loops.
141+
2. How many more parameters would be passed to these functions.
142+
3. Would this transformation sacrifice readability and maintainability.
143+
144+
Let's take these in order: (1) lambda lifting trades heap (the let bindings that
145+
it removes), for stack (the increased function parameters). Thus it is not
146+
always a performance win and in some cases can be a performance loss. The losses
147+
occur when existing closures grow as a result of the lambda lift. This extra
148+
allocation slows the program down and increases pressure on the garbage
149+
collector. Consider this example from :cite:t:`selectiveLambdaLifting`:
150+
151+
.. code-block:: haskell
152+
153+
-- unlifted.
154+
155+
-- f's increases heap because it must have a closure that includes the 'x'
156+
-- and 'y' free variables
157+
158+
-- 'g' increases heap because of the let and must have 'f' and 'x' in its
159+
-- closure (not assuming other optimizations such as constant propagation)
160+
161+
-- 'h' increases heap because 'f' is free in 'h'
162+
163+
let f a b = a + x + b + y
164+
g d = let h e = f e e
165+
in h x
166+
in g 1 + g 2 + g 3
167+
168+
Let's say we lift ``f``, now we have:
169+
170+
171+
.. code-block:: haskell
172+
173+
-- lifted f
174+
175+
f_lifted x y a b = a + x + b + y
176+
177+
let g d = let h e = f_lifted x y e e
178+
in h x
179+
in g 1 + g 2 + g 3
180+
181+
``f_lifted`` is now a top level function, thus any closure that contained ``f``
182+
before the lift will save one slot of memory. With ``f_lifted`` we additionally
183+
save two slots of memory because ``x`` and ``y`` are now parameters. Thus
184+
``f_lifted`` does not need to allocate a closure with :term:`Closure
185+
Conversion`. ``g``'s allocations do not change since ``f_lifted`` can be
186+
directly referenced just as before and because ``x`` is still free in ``g``.
187+
Thus ``g``'s closure will contain ``x`` and ``f_lifted`` will be inlined, same
188+
as ``f`` in the unlifted version. ``h``'s allocations grow by one slot since
189+
``y`` *is now also* free in ``h``, just as ``x`` was. So it would seem that in
190+
total lambda lifting ``f`` saves one slot of memory because two slots were lost
191+
in ``f`` and one was gained in ``h``. However, ``g`` is a :term:`multi-shot`
192+
lambda, thus ``h`` will be allocated *for each* call of ``g``, whereas ``f`` and
193+
``g`` are only allocated once. Therefore the lift is a net loss.
194+
195+
This example illustrates how tricky good lifts can be and especially for hot
196+
loops. In general, you should try to train your eye to determine when to
197+
manually lift. Try to roughly determine allocations by counting the ``let``
198+
expressions, the number of free variables, and the likely number of times a
199+
function is called and allocated.
200+
201+
.. note::
202+
203+
Recall, due to closure conversion GHC allocates one slot of memory for each
204+
free variable. Local functions are allocated *once per call* of the enclosing
205+
function. Top level functions are always only allocated once.
206+
207+
The next determining factor is counting the number of new parameters that will
208+
be passed to lifted function. Should this number become greater than the number
209+
of available argument registers on the target platform then you'll incur slow
210+
downs in the STG machine......
211+
212+
tomorrow: update glossary, genapply and calling conventions. start here
213+
214+
Summary
215+
-------
216+
217+
#. Lambda lifting is a classic optimization technique for compiling local
218+
functions and removing free variables.
219+
#. Lambda lifting trades heap for stack and is therefore effective for tight,
220+
closed, hot loops where fetching from the heap would be slow.
221+
#. GHC automatically performs lambda lifting, but does so only selectively. This
222+
transformation is late in the compilation pipeline at STG and right before
223+
code generation. GHC's lambda lifting transformation can be toggled via the
224+
``-f-stg-lift-lams`` and ``-fno-stg-lift-lams`` flags.
225+
#. To tell if your program has undergone lifting you can compare the Core with
226+
the STG. Or, you may compare STG with and without lifting explicitly enabled.
133227

134228
Testing Exec
135229

@@ -175,5 +269,5 @@ and we can also run from cabal target!!
175269
:args: bench lethargy:tooManyClosures
176270

177271

178-
.. [#] This program and example comes from Sebastian Graf and Simon Peyton Jones
179-
:cite:p:`selectiveLambdaLifting`; thank you for your labor!:
272+
.. [#f1] This program and example comes from :cite:t:`selectiveLambdaLifting`;
273+
thank you for your labor!:

0 commit comments

Comments
 (0)