|
| 1 | +# Coalescing generic functions emitted when lowering to LLVM IR |
| 2 | + |
| 3 | +<!-- |
| 4 | +Part of the Carbon Language project, under the Apache License v2.0 with LLVM |
| 5 | +Exceptions. See /LICENSE for license information. |
| 6 | +SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception |
| 7 | +--> |
| 8 | + |
| 9 | +<!-- toc --> |
| 10 | + |
| 11 | +## Table of contents |
| 12 | + |
| 13 | +- [Overview](#overview) |
| 14 | +- [Design details](#design-details) |
| 15 | + - [SemIR representation and why to coalesce during lowering](#semir-representation-and-why-to-coalesce-during-lowering) |
| 16 | + - [Recursion and strongly connected components (SCCs)](#recursion-and-strongly-connected-components-sccs) |
| 17 | + - [Function fingerprints](#function-fingerprints) |
| 18 | + - [Canonical specific to use](#canonical-specific-to-use) |
| 19 | +- [Algorithm details](#algorithm-details) |
| 20 | +- [Alternatives considered](#alternatives-considered) |
| 21 | + - [Coalescing in the front-end vs back-end?](#coalescing-in-the-front-end-vs-back-end) |
| 22 | + - [When to do coalescing in the front-end?](#when-to-do-coalescing-in-the-front-end) |
| 23 | + - [Compile-time trade-offs](#compile-time-trade-offs) |
| 24 | + - [Coalescing duplicate non-specific functions](#coalescing-duplicate-non-specific-functions) |
| 25 | +- [Opportunities for further improvement](#opportunities-for-further-improvement) |
| 26 | + |
| 27 | +<!-- tocstop --> |
| 28 | + |
| 29 | +## Overview |
| 30 | + |
| 31 | +When lowering Carbon generics to LLVM, it is possible we emit duplicate LLVM IR |
| 32 | +functions. This document describes the algorithm implemented in |
| 33 | +[lowering](lower.md) for determining when and which generated specifics, while |
| 34 | +different at the Carbon language level, can be coalesced into a single one when |
| 35 | +lowering Carbon’s intermediate representation (_SemIR_), to |
| 36 | +[LLVM IR](https://llvm.org/docs/LangRef.html). |
| 37 | + |
| 38 | +The overall goal of this optimization is to avoid generating duplicate LLVM IR |
| 39 | +code where it is easy to determine this from the front-end. Such an optimization |
| 40 | +needs to be done after specialization, but there is some flexibility in when to |
| 41 | +do it afterwards: before lowering, through analysis of SemIR or during/after |
| 42 | +lowering. |
| 43 | + |
| 44 | +The goal of this doc is to describe the algorithm implemented in |
| 45 | +[specifics_coalescer](/toolchain/lower/specific_coalescer.h), from putting it |
| 46 | +into context, to the overall goal, the challenges and where there is still room |
| 47 | +for improvement in subsequent iterations. |
| 48 | + |
| 49 | +Determining the impact on compile-time is beyond the scope of this document, but |
| 50 | +an important problem to follow up on. |
| 51 | + |
| 52 | +## Design details |
| 53 | + |
| 54 | +In order to determine if two specific functions are equivalent, and a single one |
| 55 | +of them can be used instead of the other, the following need to be considered as |
| 56 | +part of the algorithm and its implementation. |
| 57 | + |
| 58 | +### SemIR representation and why to coalesce during lowering |
| 59 | + |
| 60 | +In SemIR, a specific function is defined by an unique tuple: |
| 61 | +`(function_id, specific_id)`. There is a single in-memory representation of a |
| 62 | +generic function’s body (not one for each specific), where the instructions that |
| 63 | +are different between specifics can be determined, on-demand, based on a given |
| 64 | +`specific_id`. Hence, determining if two specifics are equivalent needs to |
| 65 | +analyze if these specific-dependent instructions are equivalent at the LLVM IR |
| 66 | +level. This can only be determined after the eval phase is complete and using |
| 67 | +information on how Carbon types map to `llvm::Type`s. |
| 68 | + |
| 69 | +The algorithm described below does coalescing of specifics during lowering. Also |
| 70 | +see [alternatives considered](#alternatives-considered). |
| 71 | + |
| 72 | +### Recursion and strongly connected components (SCCs) |
| 73 | + |
| 74 | +Comparing if two different specific functions contain (access, invoke, etc.) the |
| 75 | +same specific-dependent instruction is not straightforward when recursion is |
| 76 | +involved. The simplest example is when A and B each are recursive functions, and |
| 77 | +are equivalent. The check "are A and B equivalent" needs to start by assuming |
| 78 | +they are equivalent, and when a self-recursive call is found in each, that call |
| 79 | +is still equivalent. In practice this requires comparison of `specific_id`s, |
| 80 | +which in SemIR are distinct. |
| 81 | + |
| 82 | +In the general case, this analysis needs to analyze the call graph for all |
| 83 | +functions and build strongly connected components (SCCs). The call graph could |
| 84 | +either be created before lowering or built while lowering. The current |
| 85 | +implementation does the latter, and in a post-processing phase we can conclude |
| 86 | +equivalence and simplify the emitted LLVM IR by deleting unnecessary parts. |
| 87 | + |
| 88 | +A non-viable option is building the call graph based on the information "what |
| 89 | +are all call sites of myself, where I am a specific function", because this |
| 90 | +information is not available until processing the function bodies of all |
| 91 | +specific functions. This is an optimization done so that the definition of a |
| 92 | +specific isn’t emitted until a use of it is found. Building that information |
| 93 | +would duplicate all the lowering logic, minus the LLVM IR creation. |
| 94 | + |
| 95 | +### Function fingerprints |
| 96 | + |
| 97 | +Even with limiting the comparison of specific functions to those defined from |
| 98 | +the same generic, a comparison algorithm would still end up with quadratic |
| 99 | +complexity in the number of specifics for that generic. |
| 100 | + |
| 101 | +We define two fingerprints for each specific: |
| 102 | + |
| 103 | +1. `specific_fingerprint`: Includes all specific-dependent information. |
| 104 | +2. `common_fingerprint`: Includes the same except for `specific_id` information, |
| 105 | + as `specific_id`s can only be determined to be equivalent after building an |
| 106 | + equivalence SCC. |
| 107 | + |
| 108 | +Two specific functions are equivalent if their `specific_fingerprint`s are equal |
| 109 | +and are not equivalent if their `common_fingerprint`s differs. If the |
| 110 | +`common_fingerprint`s are equal but the `specific_fingerprint`s are not, the two |
| 111 | +functions may still be equivalent. |
| 112 | + |
| 113 | +Ideally, the `specific_fingerprint` can be used as a unique hash to first |
| 114 | +coalesce all specific functions with this same fingerprint, with no additional |
| 115 | +checks. Then, all remaining functions may use the `common_fingerprint` as |
| 116 | +another unique hash to group remaining potential candidates for coalescing. |
| 117 | +Then, only those with this same `common_fingerprint` are processed in a |
| 118 | +quadratic pass walking all calls instructions and comparing if the `specific_id` |
| 119 | +information is equivalent. These optimizations are not currently implemented. |
| 120 | + |
| 121 | +Note that this does not |
| 122 | +[coalesce non-specifics](#coalescing-duplicate-non-specific-functions). |
| 123 | + |
| 124 | +### Canonical specific to use |
| 125 | + |
| 126 | +For determining the canonical specific to use, we use a |
| 127 | +[disjoint set](https://en.wikipedia.org/wiki/Disjoint-set_data_structure). |
| 128 | + |
| 129 | +## Algorithm details |
| 130 | + |
| 131 | +Below is a pseudocode of the existing algorithm in |
| 132 | +`toolchain/lower/specific_coalescer.*`. |
| 133 | + |
| 134 | +The implementation can be found in |
| 135 | +[specifics_coalescer.h](/toolchain/lower/specific_coalescer.h) and |
| 136 | +[specifics_coalescer.cpp](/toolchain/lower/specific_coalescer.cpp). |
| 137 | + |
| 138 | +At the top level, the current algorithm first generates all function |
| 139 | +definitions, and once this is complete, it performs the logic to coalesce |
| 140 | +specifics and delete the redundant LLVM function definitions. |
| 141 | + |
| 142 | +```none |
| 143 | +LowerToLLVM () { |
| 144 | + for all non_generic_functions |
| 145 | + CreateLLVMFunctionDefinition (function, no_specific_id); |
| 146 | + PerformCoalescingPostProcessing (); |
| 147 | +} |
| 148 | +``` |
| 149 | + |
| 150 | +The lowering starts with all non-generic functions. While lowering these, when |
| 151 | +calls to specifics are encountered, it also generates definitions for those |
| 152 | +specific functions. |
| 153 | + |
| 154 | +For each lowered specific function definition, we create the |
| 155 | +`SpecificFunctionFingerprint`, which includes the |
| 156 | +[two fingerprints](#function-fingerprints), and a list of calls to other |
| 157 | +specific functions. |
| 158 | + |
| 159 | +```none |
| 160 | +CreateLLVMFunctionDefinition (function, specific_id) { |
| 161 | + For each SemIR instruction in the function: |
| 162 | + Step 1: Emit LLVM IR for the instruction |
| 163 | + Step 2: If the instruction is specific-dependent, hash it and add to its `common_fingerprint` |
| 164 | + Step 3: If the SemIR instruction is a call to a specific, |
| 165 | + a) Create a definition for this specific_id if it doesn't exist: |
| 166 | + CreateLLVMFunctionDefinition (function, specific_id); |
| 167 | + b) Hash the specific_id to the current function's `specific_fingerprint` |
| 168 | + c) Add the non-hashed specific_id to list of calls performed |
| 169 | +} |
| 170 | +``` |
| 171 | + |
| 172 | +The logic that performs the actual coalescing analyzes all specifics. For each |
| 173 | +pair of two specifics, it first checks if the LLVM function types match (using a |
| 174 | +third hash-like fingerprint: `function_type_fingerprint` for storage |
| 175 | +optimization), then if these are equivalent based on the |
| 176 | +`SpecificFunctionFingerprint`. For each pair of equivalent functions found (in a |
| 177 | +callgraph SCC), one function will be marked non-canonical: its uses are replaced |
| 178 | +with the canonical one and its definition will ultimately be deleted. |
| 179 | + |
| 180 | +```none |
| 181 | +PerformCoalescingPostProcessing () { |
| 182 | + for each two specifics of the same generic { |
| 183 | + if function_type_fingerprints differ { |
| 184 | + track as non-equivalent |
| 185 | + continue |
| 186 | + } |
| 187 | +
|
| 188 | + add the two specifics to assumed equivalent specifics list |
| 189 | + if (CheckIfEquivalent(two specifics, assumed equivalent specifics list)) { |
| 190 | + for each two equivalent specifics found { |
| 191 | + find the canonical specific & mark the duplicates for replacement/deletion |
| 192 | + } |
| 193 | + } |
| 194 | + replace all duplicate specifics with the respective canonical specifics |
| 195 | + and delete all replaced LLVM function definitions. |
| 196 | +} |
| 197 | +
|
| 198 | +``` |
| 199 | + |
| 200 | +The equivalence check for specifics based on the constructed |
| 201 | +`SpecificFunctionFingerprint` can make an early non-equivalence determination |
| 202 | +based on the `common_fingerprint`s, and an early equivalence determination based |
| 203 | +on the `specific_fingerprint`s. Otherwise, it uses the call list and recurses to |
| 204 | +make the determination for all functions in the SCC call graph (in practice the |
| 205 | +implementation uses a worklist to avoid the recursion). |
| 206 | + |
| 207 | +```none |
| 208 | +CheckIfEquivalent(two specifics, &assumed equivalent specifics) -> bool { |
| 209 | + if common_fingerprints are non-equal { |
| 210 | + track as non-equivalent specifics |
| 211 | + return false |
| 212 | + } |
| 213 | + if specific_fingerprints are equal { |
| 214 | + track as equivalent specifics |
| 215 | + return true |
| 216 | + } |
| 217 | + if already tracked as equivalent or assumed equivalent specifics { |
| 218 | + return true |
| 219 | + } |
| 220 | +
|
| 221 | + for each of the calls in each of the specifics { |
| 222 | + if the functions called are the same or already equivalent or assumed equivalent specifics { |
| 223 | + continue |
| 224 | + } |
| 225 | + if the functions called are already non-equivalent specifics { |
| 226 | + return false |
| 227 | + } |
| 228 | + add <pair of calls> to assumed equivalent specifics |
| 229 | + if !CheckIfEquivalent(specifics in <pair of calls>, assumed equivalent specifics) { |
| 230 | + return false; |
| 231 | + } |
| 232 | + } |
| 233 | +} |
| 234 | +``` |
| 235 | + |
| 236 | +## Alternatives considered |
| 237 | + |
| 238 | +### Coalescing in the front-end vs back-end? |
| 239 | + |
| 240 | +An alternative considered was not doing any coalescing in the front-end and |
| 241 | +relying on LLVM to make the analysis and optimization. The current choice was |
| 242 | +made based on the expectation that such an |
| 243 | +[LLVM pass](https://llvm.org/docs/MergeFunctions.html) would be more costly in |
| 244 | +terms of compile-time. The relative cost has not yet been evaluated. |
| 245 | + |
| 246 | +### When to do coalescing in the front-end? |
| 247 | + |
| 248 | +The analysis and coalescing could be done prior to lowering, after |
| 249 | +specialization. The advantage of that choice would be avoiding to lower |
| 250 | +duplicate LLVM functions and then removing the duplicates. The disadvantage of |
| 251 | +that choice would be duplicating much of the lowering logic, currently necessary |
| 252 | +to make the equivalence determination. |
| 253 | + |
| 254 | +### Compile-time trade-offs |
| 255 | + |
| 256 | +Not doing any coalescing is also expected to increase the back-end codegen time |
| 257 | +more than performing the analysis and deduplication. This can be evaluated in |
| 258 | +practice and the feature disabled if found to be too costly. |
| 259 | + |
| 260 | +### Coalescing duplicate non-specific functions |
| 261 | + |
| 262 | +We could coalesce duplicate functions in non-specific cases, similar to lld's |
| 263 | +[Identical Code Folding](https://lld.llvm.org/NewLLD.html#glossary) or LLVM's |
| 264 | +[MergeFunctions pass](https://llvm.org/docs/MergeFunctions.html). This would |
| 265 | +require fingerprinting all instructions in all functions, whereas specific |
| 266 | +coalescing can focus on cases that only Carbon's front-end knows about. Carbon |
| 267 | +would also be restricted to coalescing functions in a single compilation unit, |
| 268 | +which would require replacing function definitions that allow external calls |
| 269 | +with a placeholder that calls the coalesced definition. We don't expect |
| 270 | +sufficient advantages over existing support. |
| 271 | + |
| 272 | +## Opportunities for further improvement |
| 273 | + |
| 274 | +The current implemented algorithm can be improved with at least the following: |
| 275 | + |
| 276 | +- The `specific_fingerprint` can be used to already bucket specifics that can |
| 277 | + be coalesced right away. |
| 278 | +- The remaining ones can be pre-bucketed such that only the specifics with the |
| 279 | + same `common_fingerprint` have their list of calls further compared (linear |
| 280 | + in the number of specific calls inside the functions) to determine SCCs that |
| 281 | + may be equivalent. |
| 282 | + |
| 283 | +This should reduce the complexity from the current O(N^2), with N=number of |
| 284 | +specifics for a generic, to O(M^2), with M being the number of specifics for a |
| 285 | +generic that have different `specific_fingerprint` and equal |
| 286 | +`common_fingerprint` (expectation is that M << N). |
| 287 | + |
| 288 | +An additional potential improvement is defining the function fingerprints in a |
| 289 | +manner that is translation-unit independent, so this can be used in the mangled |
| 290 | +name, and the same function name emitted. This does not currently occur, as the |
| 291 | +two fingerprints use internal SemIR identifiers (`function_id` and `specific_id` |
| 292 | +respectively). |
0 commit comments