Skip to content

Commit 5c41a0e

Browse files
docs: Describe implementation of cross-repo nav (#343)
1 parent 460cad6 commit 5c41a0e

File tree

1 file changed

+79
-0
lines changed

1 file changed

+79
-0
lines changed

docs/Design.md

Lines changed: 79 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,6 +13,10 @@
1313
- [Symbol names for enum cases](#symbol-names-for-enum-cases)
1414
- [Method disambiguator](#method-disambiguator)
1515
- [Forward declarations](#forward-declarations)
16+
- [Cross-repo navigation](#cross-repo-navigation)
17+
- [Namespace declarations in a cross-repo setting](#namespace-declarations-in-a-cross-repo-setting)
18+
- [Forward declarations in a cross-repo setting](#forward-declarations-in-a-cross-repo-setting)
19+
1620
## Architecture
1721

1822
When working on a compilation database (a `compile_commands.json` file),
@@ -661,3 +665,78 @@ For simplicity, an initial implementation
661665
can put fake `SymbolInformation` values
662666
in the `external_symbols` list
663667
with an empty package name and version.
668+
669+
## Cross-repo navigation
670+
671+
For cross-repo navigation, the "only" thing the indexer needs to do
672+
is to consistently assign package IDs (name + version pairs)
673+
to all non-local symbols. Ostensibly,
674+
the package ID should be the one the declaration "belongs" to.
675+
676+
This runs into problems with two kinds of declarations: namespaces
677+
and forward declarations.
678+
679+
### Namespace declarations in a cross-repo setting
680+
681+
C++ namespaces can cut across packages.
682+
For example, generally types customize hashing
683+
by adding template specializations inside the `std` namespace.
684+
685+
There are four main options for handling namespaces.
686+
687+
1. Attempt to figure out the "original" package for a namespace
688+
and use that.
689+
2. Use whatever namespace the immediate namespace declaration
690+
was found in.
691+
3. Drop package information altogether (use a blank name and version).
692+
4. Require a "build seed" that is used to set the name and version.
693+
This build seed would need to be set consistently across different
694+
cross-repo indexing processes.
695+
696+
Option 1 is not practically feasible, since it's not clear how it would
697+
be implemented in the general case. There is no central information
698+
anywhere about which namespaces are "owned" by which packages.
699+
700+
Option 2 is implementable. The main downside is that namespace references
701+
cutting across packages will not be cross-linked, because the symbol
702+
names will differ.
703+
704+
Option 3 and option 4 allow cross-linking of namespace references across
705+
packages. Option 3 creates the risk of false positives if there are
706+
situations where a namespace with the same name
707+
is used in entirely unrelated packages (maybe they aren't even built together).
708+
Option 4 avoids that, but it introduces a requirement
709+
to re-index the standard library for different seeds,
710+
and introduces a need to plumb state across indexing jobs
711+
(since the seed can't be inferred from the build configuration necessarily).
712+
713+
So we go with option 3 as the least worst option.
714+
715+
### Forward declarations in a cross-repo setting
716+
717+
The information about what symbol a forward declaration refers to
718+
is not reliably available in a worker,
719+
it is only available at the time of index merging.
720+
721+
Since the symbol name needs to take the package name into account,
722+
we need to perform a lookup in a worker based on the symbol name
723+
without the package ID prefix.
724+
725+
There are two options for doing this:
726+
727+
1. Additionally serialize symbol names without the package ID
728+
in index shards, and use those as map keys.
729+
2. Partition the symbol name reliably during index merging,
730+
and use the suffix as the lookup key.
731+
732+
Option 1 would increase the overhead of (de)serializing indexes,
733+
and requires us to fiddle with the Protobuf format
734+
(to add an extra field which tracks the prefix-free symbol name).
735+
736+
So we go with option 2 instead. Specifically, the descriptor
737+
part of the symbol name would start with a `$` (or we could
738+
add it to the end of the version, either is OK).
739+
We would forbid the use of `$` in package names and versions,
740+
allowing us to quickly split a symbol name.
741+
742+
The extra `$` could be stripped when serializing the final index.

0 commit comments

Comments
 (0)