|
1 |
| -# Design sketch for scip-clang |
2 |
| - |
3 |
| -There are three main things to discuss here: |
4 |
| -- What the overall indexer architecture will be |
5 |
| -- How we can avoid redundant work for headers across TUs |
6 |
| - (["claiming"](https://github.com/kythe/kythe/blob/master/kythe/cxx/indexer/cxx/claiming.md) |
7 |
| - in the Kythe docs) |
8 |
| -- How the indexer will map C++ to SCIP |
9 |
| - |
| 1 | +# scip-clang design notes |
| 2 | + |
| 3 | +- [Architecture](#architecture) |
| 4 | + - [Handling slow, hung or crashed workers](#handling-slow-hung-or-crashed-workers) |
| 5 | + - [Disk I/O](#disk-io) |
| 6 | + - [Bazel and distributed builds](#bazel-and-distributed-builds) |
| 7 | +- [Reducing work across headers](#reducing-work-across-headers) |
| 8 | + - [Checking well-behavedness of headers](#checking-well-behavedness-of-headers) |
| 9 | +- [Indexing templates](#indexing-templates) |
| 10 | +- [Mapping C++ to SCIP](#mapping-c-to-scip) |
| 11 | + - [Symbol names for macros](#symbol-names-for-macros) |
| 12 | + - [Symbol names for declarations](#symbol-names-for-declarations) |
| 13 | + - [Symbol names for enum cases](#symbol-names-for-enum-cases) |
| 14 | + - [Method disambiguator](#method-disambiguator) |
| 15 | + - [Forward declarations](#forward-declarations) |
10 | 16 | ## Architecture
|
11 | 17 |
|
12 | 18 | When working on a compilation database (a `compile_commands.json` file),
|
@@ -216,6 +222,90 @@ entities for equality (or content-hash the AST,
|
216 | 222 | which would also be error-prone).
|
217 | 223 | </details>
|
218 | 224 |
|
| 225 | +## Indexing templates |
| 226 | + |
| 227 | +Recommended reading: [cppreference page on dependent names](https://en.cppreference.com/w/cpp/language/dependent_name) |
| 228 | + |
| 229 | +In the context of templates, C++ has two kinds of names: |
| 230 | +- Dependent names, where the result of name lookup may |
| 231 | + depend on the substitutions for template parameters. |
| 232 | +- Non-dependent names, where the result of name lookup |
| 233 | + is not allowed to depend on substitutions, |
| 234 | + even if considering substitutions would lead to a better match. |
| 235 | + |
| 236 | +Consequently, it is not always possible to get the correctly |
| 237 | +name-resolved result for a name without template substitution. |
| 238 | + |
| 239 | +For example, prefixing method calls with `this->` |
| 240 | +inside a templated class makes a name dependent. |
| 241 | +So in the presence of templates, |
| 242 | +omitting `this->` for method calls is not always permitted, |
| 243 | +and adding `this->` can make "obviously wrong" code compile. |
| 244 | +Here's a code example: |
| 245 | + |
| 246 | +```cpp |
| 247 | +template <typename T> |
| 248 | +struct Q0 { |
| 249 | + void f0() {} |
| 250 | +}; |
| 251 | + |
| 252 | +template <typename T> |
| 253 | +struct Q1: Q0<T> { |
| 254 | + void f1() { |
| 255 | + // f0 is dependent here, since `this` has type Q1<T>* |
| 256 | + this->f0(); // OK |
| 257 | + // f0 is independent here due to absence of explicit `this` |
| 258 | + f0(); // error: use of undeclared identifier 'f0' |
| 259 | + this->non_existent(); // OK: no template instantiation => no error |
| 260 | + } |
| 261 | +}; |
| 262 | +``` |
| 263 | +
|
| 264 | +For indexing templates, there are roughly 3 possible options |
| 265 | +from an indexer's point of view: |
| 266 | +
|
| 267 | +1. Pedantic: |
| 268 | + - Traverse uninstantiated template bodies once, |
| 269 | + and collect information about non-dependent names. |
| 270 | + - For every template instantiation, traverse the template body |
| 271 | + once, and collect information about dependent names. |
| 272 | + This can be de-duplicated on-the-fly. |
| 273 | +2. Generalizing: |
| 274 | + - Traverse uninstantiated template bodies once, |
| 275 | + and collect information about non-dependent names. |
| 276 | + - Randomly select a single instantiation. Traverse the template body |
| 277 | + for this instantiation, and collect information about dependent names. |
| 278 | +3. Optimistic: |
| 279 | + - Traverse uninstantied template bodies once, |
| 280 | + and collect information about both non-dependent and dependent names. |
| 281 | + For dependent names, rely on some way of performing approximate name lookup. |
| 282 | +
|
| 283 | +We go with the Optimistic approach in scip-clang for the following reasons: |
| 284 | +
|
| 285 | +- Performance: It is the only approach |
| 286 | + compatible with the optimization of indexing a header only once (per transcript). |
| 287 | + Otherwise, if a template in a header is included in a TU, |
| 288 | + but not instantiated, |
| 289 | + then indexing the header will not fully index the body of the template. |
| 290 | + Going for the Pedantic approach would likely lead to a large amount |
| 291 | + of redundant work across TUs due to repeated traversals of the same or |
| 292 | + similar template instantiations. The extra information would also |
| 293 | + increase the time for index merging. |
| 294 | +- Good enough: Based on experience, most dependent names in practice |
| 295 | + behave like non-dependent names anyways. |
| 296 | + This is reflected in clangd's index also using imprecise name lookup |
| 297 | + for dependent names: |
| 298 | +
|
| 299 | + ```cpp |
| 300 | + /// Performs an imprecise lookup of a dependent name in this class. |
| 301 | + /// |
| 302 | + /// This function does not follow strict semantic rules and should be used |
| 303 | + /// only when lookup rules can be relaxed, e.g. indexing. |
| 304 | + std::vector<const NamedDecl *> |
| 305 | + lookupDependentName(DeclarationName Name, |
| 306 | + llvm::function_ref<bool(const NamedDecl *ND)> Filter); |
| 307 | + ``` |
| 308 | + |
219 | 309 | ## Mapping C++ to SCIP
|
220 | 310 |
|
221 | 311 | (FQN = Fully Qualified Name)
|
|
0 commit comments