|
13 | 13 | - [Symbol names for enum cases](#symbol-names-for-enum-cases)
|
14 | 14 | - [Method disambiguator](#method-disambiguator)
|
15 | 15 | - [Forward declarations](#forward-declarations)
|
| 16 | +- [Cross-repo navigation](#cross-repo-navigation) |
| 17 | + - [Namespace declarations in a cross-repo setting](#namespace-declarations-in-a-cross-repo-setting) |
| 18 | + - [Forward declarations in a cross-repo setting](#forward-declarations-in-a-cross-repo-setting) |
| 19 | + |
16 | 20 | ## Architecture
|
17 | 21 |
|
18 | 22 | When working on a compilation database (a `compile_commands.json` file),
|
@@ -661,3 +665,78 @@ For simplicity, an initial implementation
|
661 | 665 | can put fake `SymbolInformation` values
|
662 | 666 | in the `external_symbols` list
|
663 | 667 | with an empty package name and version.
|
| 668 | + |
| 669 | +## Cross-repo navigation |
| 670 | + |
| 671 | +For cross-repo navigation, the "only" thing the indexer needs to do |
| 672 | +is to consistently assign package IDs (name + version pairs) |
| 673 | +to all non-local symbols. Ostensibly, |
| 674 | +the package ID should be the one the declaration "belongs" to. |
| 675 | + |
| 676 | +This runs into problems with two kinds of declarations: namespaces |
| 677 | +and forward declarations. |
| 678 | + |
| 679 | +### Namespace declarations in a cross-repo setting |
| 680 | + |
| 681 | +C++ namespaces can cut across packages. |
| 682 | +For example, generally types customize hashing |
| 683 | +by adding template specializations inside the `std` namespace. |
| 684 | + |
| 685 | +There are four main options for handling namespaces. |
| 686 | + |
| 687 | +1. Attempt to figure out the "original" package for a namespace |
| 688 | + and use that. |
| 689 | +2. Use whatever namespace the immediate namespace declaration |
| 690 | + was found in. |
| 691 | +3. Drop package information altogether (use a blank name and version). |
| 692 | +4. Require a "build seed" that is used to set the name and version. |
| 693 | + This build seed would need to be set consistently across different |
| 694 | + cross-repo indexing processes. |
| 695 | + |
| 696 | +Option 1 is not practically feasible, since it's not clear how it would |
| 697 | +be implemented in the general case. There is no central information |
| 698 | +anywhere about which namespaces are "owned" by which packages. |
| 699 | + |
| 700 | +Option 2 is implementable. The main downside is that namespace references |
| 701 | +cutting across packages will not be cross-linked, because the symbol |
| 702 | +names will differ. |
| 703 | + |
| 704 | +Option 3 and option 4 allow cross-linking of namespace references across |
| 705 | +packages. Option 3 creates the risk of false positives if there are |
| 706 | +situations where a namespace with the same name |
| 707 | +is used in entirely unrelated packages (maybe they aren't even built together). |
| 708 | +Option 4 avoids that, but it introduces a requirement |
| 709 | +to re-index the standard library for different seeds, |
| 710 | +and introduces a need to plumb state across indexing jobs |
| 711 | +(since the seed can't be inferred from the build configuration necessarily). |
| 712 | + |
| 713 | +So we go with option 3 as the least worst option. |
| 714 | + |
| 715 | +### Forward declarations in a cross-repo setting |
| 716 | + |
| 717 | +The information about what symbol a forward declaration refers to |
| 718 | +is not reliably available in a worker, |
| 719 | +it is only available at the time of index merging. |
| 720 | + |
| 721 | +Since the symbol name needs to take the package name into account, |
| 722 | +we need to perform a lookup in a worker based on the symbol name |
| 723 | +without the package ID prefix. |
| 724 | + |
| 725 | +There are two options for doing this: |
| 726 | + |
| 727 | +1. Additionally serialize symbol names without the package ID |
| 728 | + in index shards, and use those as map keys. |
| 729 | +2. Partition the symbol name reliably during index merging, |
| 730 | + and use the suffix as the lookup key. |
| 731 | + |
| 732 | +Option 1 would increase the overhead of (de)serializing indexes, |
| 733 | +and requires us to fiddle with the Protobuf format |
| 734 | +(to add an extra field which tracks the prefix-free symbol name). |
| 735 | + |
| 736 | +So we go with option 2 instead. Specifically, the descriptor |
| 737 | +part of the symbol name would start with a `$` (or we could |
| 738 | +add it to the end of the version, either is OK). |
| 739 | +We would forbid the use of `$` in package names and versions, |
| 740 | +allowing us to quickly split a symbol name. |
| 741 | + |
| 742 | +The extra `$` could be stripped when serializing the final index. |
0 commit comments