Skip to content

Commit 7198050

Browse files
authored
Docs for specific coalescing. (#5886)
Add documentation describing the problem and algorithm for coalescing the LLVM functions generated from Carbon generic functions into fewer LLVM functions, where the LLVM types permit it.
1 parent 48e7589 commit 7198050

File tree

2 files changed

+295
-0
lines changed

2 files changed

+295
-0
lines changed
Lines changed: 292 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,292 @@
1+
# Coalescing generic functions emitted when lowering to LLVM IR
2+
3+
<!--
4+
Part of the Carbon Language project, under the Apache License v2.0 with LLVM
5+
Exceptions. See /LICENSE for license information.
6+
SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
7+
-->
8+
9+
<!-- toc -->
10+
11+
## Table of contents
12+
13+
- [Overview](#overview)
14+
- [Design details](#design-details)
15+
- [SemIR representation and why to coalesce during lowering](#semir-representation-and-why-to-coalesce-during-lowering)
16+
- [Recursion and strongly connected components (SCCs)](#recursion-and-strongly-connected-components-sccs)
17+
- [Function fingerprints](#function-fingerprints)
18+
- [Canonical specific to use](#canonical-specific-to-use)
19+
- [Algorithm details](#algorithm-details)
20+
- [Alternatives considered](#alternatives-considered)
21+
- [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end)
22+
- [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end)
23+
- [Compile-time trade-offs](#compile-time-trade-offs)
24+
- [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions)
25+
- [Opportunities for further improvement](#opportunities-for-further-improvement)
26+
27+
<!-- tocstop -->
28+
29+
## Overview
30+
31+
When lowering Carbon generics to LLVM, it is possible we emit duplicate LLVM IR
32+
functions. This document describes the algorithm implemented in
33+
[lowering](lower.md) for determining when and which generated specifics, while
34+
different at the Carbon language level, can be coalesced into a single one when
35+
lowering Carbon’s intermediate representation (_SemIR_), to
36+
[LLVM IR](https://llvm.org/docs/LangRef.html).
37+
38+
The overall goal of this optimization is to avoid generating duplicate LLVM IR
39+
code where it is easy to determine this from the front-end. Such an optimization
40+
needs to be done after specialization, but there is some flexibility in when to
41+
do it afterwards: before lowering, through analysis of SemIR or during/after
42+
lowering.
43+
44+
The goal of this doc is to describe the algorithm implemented in
45+
[specifics_coalescer](/toolchain/lower/specific_coalescer.h), from putting it
46+
into context, to the overall goal, the challenges and where there is still room
47+
for improvement in subsequent iterations.
48+
49+
Determining the impact on compile-time is beyond the scope of this document, but
50+
an important problem to follow up on.
51+
52+
## Design details
53+
54+
In order to determine if two specific functions are equivalent, and a single one
55+
of them can be used instead of the other, the following need to be considered as
56+
part of the algorithm and its implementation.
57+
58+
### SemIR representation and why to coalesce during lowering
59+
60+
In SemIR, a specific function is defined by an unique tuple:
61+
`(function_id, specific_id)`. There is a single in-memory representation of a
62+
generic function’s body (not one for each specific), where the instructions that
63+
are different between specifics can be determined, on-demand, based on a given
64+
`specific_id`. Hence, determining if two specifics are equivalent needs to
65+
analyze if these specific-dependent instructions are equivalent at the LLVM IR
66+
level. This can only be determined after the eval phase is complete and using
67+
information on how Carbon types map to `llvm::Type`s.
68+
69+
The algorithm described below does coalescing of specifics during lowering. Also
70+
see [alternatives considered](#alternatives-considered).
71+
72+
### Recursion and strongly connected components (SCCs)
73+
74+
Comparing if two different specific functions contain (access, invoke, etc.) the
75+
same specific-dependent instruction is not straightforward when recursion is
76+
involved. The simplest example is when A and B each are recursive functions, and
77+
are equivalent. The check "are A and B equivalent" needs to start by assuming
78+
they are equivalent, and when a self-recursive call is found in each, that call
79+
is still equivalent. In practice this requires comparison of `specific_id`s,
80+
which in SemIR are distinct.
81+
82+
In the general case, this analysis needs to analyze the call graph for all
83+
functions and build strongly connected components (SCCs). The call graph could
84+
either be created before lowering or built while lowering. The current
85+
implementation does the latter, and in a post-processing phase we can conclude
86+
equivalence and simplify the emitted LLVM IR by deleting unnecessary parts.
87+
88+
A non-viable option is building the call graph based on the information "what
89+
are all call sites of myself, where I am a specific function", because this
90+
information is not available until processing the function bodies of all
91+
specific functions. This is an optimization done so that the definition of a
92+
specific isn’t emitted until a use of it is found. Building that information
93+
would duplicate all the lowering logic, minus the LLVM IR creation.
94+
95+
### Function fingerprints
96+
97+
Even with limiting the comparison of specific functions to those defined from
98+
the same generic, a comparison algorithm would still end up with quadratic
99+
complexity in the number of specifics for that generic.
100+
101+
We define two fingerprints for each specific:
102+
103+
1. `specific_fingerprint`: Includes all specific-dependent information.
104+
2. `common_fingerprint`: Includes the same except for `specific_id` information,
105+
as `specific_id`s can only be determined to be equivalent after building an
106+
equivalence SCC.
107+
108+
Two specific functions are equivalent if their `specific_fingerprint`s are equal
109+
and are not equivalent if their `common_fingerprint`s differs. If the
110+
`common_fingerprint`s are equal but the `specific_fingerprint`s are not, the two
111+
functions may still be equivalent.
112+
113+
Ideally, the `specific_fingerprint` can be used as a unique hash to first
114+
coalesce all specific functions with this same fingerprint, with no additional
115+
checks. Then, all remaining functions may use the `common_fingerprint` as
116+
another unique hash to group remaining potential candidates for coalescing.
117+
Then, only those with this same `common_fingerprint` are processed in a
118+
quadratic pass walking all calls instructions and comparing if the `specific_id`
119+
information is equivalent. These optimizations are not currently implemented.
120+
121+
Note that this does not
122+
[coalesce non-specifics](#coalescing-duplicate-non-specific-functions).
123+
124+
### Canonical specific to use
125+
126+
For determining the canonical specific to use, we use a
127+
[disjoint set](https://en.wikipedia.org/wiki/Disjoint-set_data_structure).
128+
129+
## Algorithm details
130+
131+
Below is a pseudocode of the existing algorithm in
132+
`toolchain/lower/specific_coalescer.*`.
133+
134+
The implementation can be found in
135+
[specifics_coalescer.h](/toolchain/lower/specific_coalescer.h) and
136+
[specifics_coalescer.cpp](/toolchain/lower/specific_coalescer.cpp).
137+
138+
At the top level, the current algorithm first generates all function
139+
definitions, and once this is complete, it performs the logic to coalesce
140+
specifics and delete the redundant LLVM function definitions.
141+
142+
```none
143+
LowerToLLVM () {
144+
for all non_generic_functions
145+
CreateLLVMFunctionDefinition (function, no_specific_id);
146+
PerformCoalescingPostProcessing ();
147+
}
148+
```
149+
150+
The lowering starts with all non-generic functions. While lowering these, when
151+
calls to specifics are encountered, it also generates definitions for those
152+
specific functions.
153+
154+
For each lowered specific function definition, we create the
155+
`SpecificFunctionFingerprint`, which includes the
156+
[two fingerprints](#function-fingerprints), and a list of calls to other
157+
specific functions.
158+
159+
```none
160+
CreateLLVMFunctionDefinition (function, specific_id) {
161+
For each SemIR instruction in the function:
162+
Step 1: Emit LLVM IR for the instruction
163+
Step 2: If the instruction is specific-dependent, hash it and add to its `common_fingerprint`
164+
Step 3: If the SemIR instruction is a call to a specific,
165+
a) Create a definition for this specific_id if it doesn't exist:
166+
CreateLLVMFunctionDefinition (function, specific_id);
167+
b) Hash the specific_id to the current function's `specific_fingerprint`
168+
c) Add the non-hashed specific_id to list of calls performed
169+
}
170+
```
171+
172+
The logic that performs the actual coalescing analyzes all specifics. For each
173+
pair of two specifics, it first checks if the LLVM function types match (using a
174+
third hash-like fingerprint: `function_type_fingerprint` for storage
175+
optimization), then if these are equivalent based on the
176+
`SpecificFunctionFingerprint`. For each pair of equivalent functions found (in a
177+
callgraph SCC), one function will be marked non-canonical: its uses are replaced
178+
with the canonical one and its definition will ultimately be deleted.
179+
180+
```none
181+
PerformCoalescingPostProcessing () {
182+
for each two specifics of the same generic {
183+
if function_type_fingerprints differ {
184+
track as non-equivalent
185+
continue
186+
}
187+
188+
add the two specifics to assumed equivalent specifics list
189+
if (CheckIfEquivalent(two specifics, assumed equivalent specifics list)) {
190+
for each two equivalent specifics found {
191+
find the canonical specific & mark the duplicates for replacement/deletion
192+
}
193+
}
194+
replace all duplicate specifics with the respective canonical specifics
195+
and delete all replaced LLVM function definitions.
196+
}
197+
198+
```
199+
200+
The equivalence check for specifics based on the constructed
201+
`SpecificFunctionFingerprint` can make an early non-equivalence determination
202+
based on the `common_fingerprint`s, and an early equivalence determination based
203+
on the `specific_fingerprint`s. Otherwise, it uses the call list and recurses to
204+
make the determination for all functions in the SCC call graph (in practice the
205+
implementation uses a worklist to avoid the recursion).
206+
207+
```none
208+
CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool {
209+
if common_fingerprints are non-equal {
210+
track as non-equivalent specifics
211+
return false
212+
}
213+
if specific_fingerprints are equal {
214+
track as equivalent specifics
215+
return true
216+
}
217+
if already tracked as equivalent or assumed equivalent specifics {
218+
return true
219+
}
220+
221+
for each of the calls in each of the specifics {
222+
if the functions called are the same or already equivalent or assumed equivalent specifics {
223+
continue
224+
}
225+
if the functions called are already non-equivalent specifics {
226+
return false
227+
}
228+
add <pair of calls> to assumed equivalent specifics
229+
if !CheckIfEquivalent(specifics in <pair of calls>, assumed equivalent specifics) {
230+
return false;
231+
}
232+
}
233+
}
234+
```
235+
236+
## Alternatives considered
237+
238+
### Coalescing in the front-end vs back-end?
239+
240+
An alternative considered was not doing any coalescing in the front-end and
241+
relying on LLVM to make the analysis and optimization. The current choice was
242+
made based on the expectation that such an
243+
[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in
244+
terms of compile-time. The relative cost has not yet been evaluated.
245+
246+
### When to do coalescing in the front-end?
247+
248+
The analysis and coalescing could be done prior to lowering, after
249+
specialization. The advantage of that choice would be avoiding to lower
250+
duplicate LLVM functions and then removing the duplicates. The disadvantage of
251+
that choice would be duplicating much of the lowering logic, currently necessary
252+
to make the equivalence determination.
253+
254+
### Compile-time trade-offs
255+
256+
Not doing any coalescing is also expected to increase the back-end codegen time
257+
more than performing the analysis and deduplication. This can be evaluated in
258+
practice and the feature disabled if found to be too costly.
259+
260+
### Coalescing duplicate non-specific functions
261+
262+
We could coalesce duplicate functions in non-specific cases, similar to lld's
263+
[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's
264+
[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would
265+
require fingerprinting all instructions in all functions, whereas specific
266+
coalescing can focus on cases that only Carbon's front-end knows about. Carbon
267+
would also be restricted to coalescing functions in a single compilation unit,
268+
which would require replacing function definitions that allow external calls
269+
with a placeholder that calls the coalesced definition. We don't expect
270+
sufficient advantages over existing support.
271+
272+
## Opportunities for further improvement
273+
274+
The current implemented algorithm can be improved with at least the following:
275+
276+
- The `specific_fingerprint` can be used to already bucket specifics that can
277+
be coalesced right away.
278+
- The remaining ones can be pre-bucketed such that only the specifics with the
279+
same `common_fingerprint` have their list of calls further compared (linear
280+
in the number of specific calls inside the functions) to determine SCCs that
281+
may be equivalent.
282+
283+
This should reduce the complexity from the current O(N^2), with N=number of
284+
specifics for a generic, to O(M^2), with M being the number of specifics for a
285+
generic that have different `specific_fingerprint` and equal
286+
`common_fingerprint` (expectation is that M << N).
287+
288+
An additional potential improvement is defining the function fingerprints in a
289+
manner that is translation-unit independent, so this can be used in the mangled
290+
name, and the same function name emitted. This does not currently occur, as the
291+
two fingerprints use internal SemIR identifiers (`function_id` and `specific_id`
292+
respectively).

toolchain/docs/lower.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,9 @@ function should follow the same pattern, adding a getter on `FunctionContext`
136136
that adds the information to the fingerprint, and returns a `*InFile` wrapper
137137
struct if the result contains any `TypeId`s.
138138

139+
Additional details can be found in:
140+
[Coalescing generic functions emitted when lowering to LLVM IR](coalesce_generic_lowering.md).
141+
139142
## Mangling
140143

141144
Part of lowering is choosing deterministically unique identifiers for each

0 commit comments

Comments
 (0)