@@ -11,8 +11,8 @@ Lambda Lifting :cite:p:`lambdaLifting` is a classic rewriting technique that
11
11
avoids excess closure allocations and removes free variables from a function. It
12
12
avoids closure allocation by moving local functions out of an enclosing function
13
13
to the :term: `top-level `. It then removes free variables by adding parameters to
14
- the lifted function to capture free variables. This chapter describes the lambda
15
- lifting transformation, describes how GHC implements the transformation and
14
+ the lifted function that captures the free variables. This chapter describes the
15
+ lambda lifting transformation, how GHC implements the transformation, and
16
16
provides guidance for when to implement the transformation manually.
17
17
18
18
A Working Example
@@ -52,19 +52,69 @@ to the top level producing the final program:
52
52
f a 0 = a
53
53
f a n = f (g_lifted a (n `mod` 2)) (n - 1)
54
54
55
- This new program will be much faster because ``f `` becomes essentially
56
- non-allocating. Before the lambda lifting transformation ``f `` had to allocate a
57
- closure for ``g `` in order to pass ``a `` to ``g ``. After the lambda lifting on
55
+ Before the lambda lifting transformation ``f `` had to allocate a closure for
56
+ ``g `` in order to allow ``g `` to reference ``a ``. After the lambda lifting on
58
57
``g `` this is no longer the case; |glift | is a top level function so ``f `` can
59
58
simply reference it; no closures needed!
60
59
60
+ This new program *could * be much faster than the original, it depends on the
61
+ usage patterns the programs will experience. To understand the distribution of
62
+ patterns inspect the function's behavior with respect to its inputs. The
63
+ original program allocates one expensive closure for ``g `` per call of ``f ``. So
64
+ When ``n `` is large, there will be few calls of ``f `` relative to ``g ``, in fact
65
+ for each call of ``f `` we should expect exactly ``n `mod` 2 `` calls of ``g ``. In
66
+ this scenario, the original program is faster because it allocates some closures
67
+ in the outer loop (``f ``, the outer loop, allocates a closure for ``g ``, the
68
+ inner loop, which includes a reference to ``a ``) and in turn saves allocations
69
+ in the inner loop (``g ``) because ``a `` can simply be referenced in ``g ``. Since
70
+ the inner loop is called much more than the outer loop this pattern saves
71
+ allocations.
72
+
73
+ In contrast, the lifted version must allocate an additional argument for ``a ``
74
+ *for each * call of ``g_lifted ``. So when ``n `` is large and we have many more
75
+ calls to ``g_lifted `` relative to ``f `` the extra argument required to pass
76
+ ``a `` adds up to more allocations than the original version would make.
77
+
78
+ However the situation reverses when there are *many * calls to ``f a n `` with a
79
+ small ``n ``. In this scenario, the closure allocation that the original makes in
80
+ the outer loop do not pay off, because the inner loop is relatively short lived
81
+ since ``n `` is small. For the same reason, the lambda lifted version is now
82
+ fruitful: because ``n `` is small the extra parameter that |glift | must allocate
83
+ stays cheap. Thus the lifted version is faster by avoiding the closure
84
+ allocation in the now frequently called outer loop.
85
+
86
+ Now ``f `` is an obviously contrived example, so one may ask how frequently the
87
+ many-calls with low ``n `` scenario will occur in practice. The simplest example
88
+ is very familiar:
89
+
90
+ .. code-block :: haskell
91
+
92
+ -- | map with no lambda lifting
93
+ map f = go
94
+ where
95
+ go [] = []
96
+ go (x:xs) = f x : go xs
97
+
98
+ vs. the lifted version:
99
+
100
+ .. code-block :: haskell
101
+
102
+ -- | map lambda lifted
103
+ map f [] = []
104
+ map f (x:xs) = f x : map f xs
105
+
106
+ The first form is beneficial when there are a few calls on long lists via the
107
+ same reasoning as above; only now we have the list determines the number of
108
+ calls instead of ``n `` and ``f `` is free rather than ``a `` . Similarly, the
109
+ second form is beneficial when there many calls of ``map `` on short lists.
110
+
61
111
.. note ::
62
112
63
113
The fundamental tradeoff is decreased heap allocation for an increase in
64
- function parameters at each call site. This means that lambda lifting trades
65
- heap for stack and is not always a performance win. See :ref: ` When to
66
- Manually Apply Lambda Lifting <when>` for guidance on recognizing when your
67
- program may benefit.
114
+ function parameters at each call site. This means that whether lambda lifting
115
+ is a performance win or not depends on the usage pattern of the function as
116
+ we have demonstrated. See :ref: ` When to Manually Apply Lambda Lifting <when >`
117
+ for guidance on recognizing when your program may benefit.
68
118
69
119
70
120
How Lambda Lifting Works in GHC
@@ -75,9 +125,10 @@ default method GHC uses for handling local functions and free variables.
75
125
Instead, GHC uses an alternative strategy called :term: `Closure Conversion `,
76
126
which creates more uniformity at the cost of extra heap allocation.
77
127
78
- Automated lambda lifting in GHC occurs *late * in the compiler pipeline at STG,
79
- right before code generation. GHC lambda lifts at STG instead of Core because
80
- lambda lifting interferes with other optimizations.
128
+ Automated lambda lifting in GHC is called *late lambda lifting * because it
129
+ occurs in the compiler pipeline in STG, right before code generation. GHC lambda
130
+ lifts at STG instead of Core because lambda lifting interferes with other
131
+ optimizations.
81
132
82
133
Lambda lifting in GHC is also *Selective *. GHC uses a cost model that calculates
83
134
hypothetical heap allocations a function will induce. GHC lists heuristics for
@@ -112,10 +163,9 @@ Observing the Effect of Lambda Lifting
112
163
--------------------------------------
113
164
114
165
You may directly observe the effect of late lambda lifting by comparing Core to
115
- STG when late lambda lifting is enabled. You can also directly disable or enable
116
- late lambda lifting with the flags ``-f-stg-lift-lams `` and
117
- ``-fno-stg-lift-lams ``. In general, lambda lifting performs the following
118
- syntactic changes:
166
+ STG when late lambda lifting is enabled. You can also disable or enable late
167
+ lambda lifting with the flags ``-f-stg-lift-lams `` and ``-fno-stg-lift-lams ``.
168
+ In general, lambda lifting performs the following syntactic changes:
119
169
120
170
#. It eliminates a let binding.
121
171
#. It creates a new :term: `top-level ` binding.
@@ -131,22 +181,25 @@ When to Manually Lambda Lift
131
181
----------------------------
132
182
133
183
GHC does a good job finding beneficial instances of lambda lifting. However, you
134
- might want to manually lambda lift to save compile time, or to increase
135
- the performance of your without relying on GHC's optimizer.
184
+ might want to manually lambda lift to save compile time, or to increase the
185
+ performance of your program without relying on GHC's optimizer.
136
186
137
- There are three considerations you should have when deciding when to manually
138
- lambda lift:
187
+ When deciding when to manually lambda lift, consider the following:
139
188
140
- 1. Are the functions that would be lifted in hot loops .
189
+ 1. What is the expected usage pattern of the functions .
141
190
2. How many more parameters would be passed to these functions.
142
- 3. Would this transformation sacrifice readability and maintainability.
143
191
144
192
Let's take these in order: (1) lambda lifting trades heap (the let bindings that
145
- it removes), for stack (the increased function parameters). Thus it is not
146
- always a performance win and in some cases can be a performance loss. The losses
147
- occur when existing closures grow as a result of the lambda lift. This extra
148
- allocation slows the program down and increases pressure on the garbage
149
- collector. Consider this example from :cite:t: `selectiveLambdaLifting `:
193
+ it removes), for stack (the increased function parameters). Thus whether or not
194
+ it is a performance win depends on the usage patterns of the enclosing function
195
+ and to-be lifted function. As demonstrated in the motivating example,
196
+ performance can degrade when extra parameter in combination with the usage
197
+ pattern of the function results in more total allocation during the lifetime of
198
+ the program. Performance may also degrade if the existing closures grow as a
199
+ result of the lambda lift. Both kinds of extra allocation slow the program down
200
+ and increases pressure on the garbage collector. So it is important to learn to
201
+ read the program from the perspective of memory. Consider this example from
202
+ :cite:t: `selectiveLambdaLifting `:
150
203
151
204
.. code-block :: haskell
152
205
@@ -183,43 +236,49 @@ before the lift will save one slot of memory. With ``f_lifted`` we additionally
183
236
save two slots of memory because ``x `` and ``y `` are now parameters. Thus
184
237
``f_lifted `` does not need to allocate a closure with :term: `Closure
185
238
Conversion `. ``g ``'s allocations do not change since ``f_lifted `` can be
186
- directly referenced just as before and because ``x `` is still free in ``g ``.
187
- Thus ``g ``'s closure will contain ``x `` and ``f_lifted `` will be inlined, same
188
- as ``f `` in the unlifted version. ``h ``'s allocations grow by one slot since
189
- ``y `` *is now also * free in ``h ``, just as ``x `` was. So it would seem that in
190
- total lambda lifting ``f `` saves one slot of memory because two slots were lost
191
- in ``f `` and one was gained in ``h ``. However, ``g `` is a :term: `multi-shot
192
- lambda `, thus ``h `` will be allocated *for each * call of ``g ``, whereas ``f ``
193
- and ``g `` are only allocated once. Therefore the lift is a net loss.
194
-
195
- This example illustrates how tricky good lifts can be and especially for hot
196
- loops. In general, you should try to train your eye to determine when to
197
- manually lift. Try to roughly determine allocations by counting the ``let ``
198
- expressions, the number of free variables, and the likely number of times a
199
- function is called and allocated.
239
+ directly referenced just as before and because ``x `` is still free in ``g ``. So
240
+ ``g ``'s closure will contain ``x `` and ``f_lifted `` will be inlined, same as
241
+ ``f `` in the unlifted version. ``h ``'s allocations grow by one slot since ``y ``
242
+ *is now also * free in ``h ``, just as ``x `` was. So it would seem that in total
243
+ lambda lifting ``f `` saves one slot of memory because two slots were lost in
244
+ ``f `` and one was gained in ``h ``. However, ``g `` is a :term: `multi-shot
245
+ lambda `, which means ``h `` will be allocated *for each * call of ``g ``, whereas
246
+ ``f `` and ``g `` are only allocated once. Therefore, the lift is a net loss.
247
+
248
+ This example illustrates how tricky good lifts can be. To estimate allocations
249
+ counting the ``let `` expressions, the number of free variables,
250
+ and the number of times the outer function and inner functions are expected to
251
+ be called.
200
252
201
253
.. note ::
202
254
203
255
Recall, due to closure conversion GHC allocates one slot of memory for each
204
256
free variable. Local functions are allocated *once per call * of the enclosing
205
257
function. Top level functions are always only allocated once.
206
258
207
- The next determining factor is counting the number of new parameters that will
208
- be passed to the lifted function. Should this number become greater than the
209
- number of available argument registers on the target platform then you'll incur
210
- slow downs in the STG machine. These slowdowns result from more work the STG
211
- machine will need to do. It will need to generate code that pops arguments from
212
- the stack instead of just applying the function to arguments that are already
213
- loaded into registers. In a hot loop this extra manipulation can have a large
214
- impact.
259
+ (2) The next determining factor is counting the number of new parameters that is
260
+ passed to the lifted function. Should this number become greater than the number
261
+ of available argument registers on the target platform then you'll incur slow
262
+ downs in the STG machine. These slowdowns result from more work the STG machine
263
+ will need to do; it will need to generate code that pops arguments from the
264
+ stack instead of just applying the function to arguments that are already loaded
265
+ into registers. In a hot loop this extra manipulation can have a large impact.
266
+
267
+ In general the heuristic is: if there are few calls to the outer loop and many
268
+ calls to the inner loop, then do not lambda lift. However, if there are many
269
+ calls to the outer loop and few calls made in the inner loop, then lambda
270
+ lifting will be beneficial.
215
271
216
272
Summary
217
273
-------
218
274
219
275
#. Lambda lifting is a classic optimization technique for compiling local
220
276
functions and removing free variables.
221
- #. Lambda lifting trades heap for stack and is therefore effective for tight,
222
- closed, hot loops where fetching from the heap would be slow.
277
+ #. Lambda lifting trades heap for stack. To determine if a manual lambda lift
278
+ would be beneficial determine the use pattern of the enclosing and local
279
+ functions, determine if closures would grow in the lifted version, and ensure
280
+ that the extra parameters in the lifted version would not exceed the number
281
+ of argument registers on the platform the program targets.
223
282
#. GHC automatically performs lambda lifting, but does so only selectively. This
224
283
transformation is late in the compilation pipeline at STG and right before
225
284
code generation. GHC's lambda lifting transformation can be toggled via the
0 commit comments