Skip to content

Commit a5298e5

Browse files
committed
[Index] Convert to hashing to HashBuilder with BLAKE3
`Hashing.h` is non-deterministic between runs. Update the index hashing to use xxhash for the unit hash and BLAKE3 for the record hash. Ideally we'd use xxhash for the record hash as well, but there's no easy `HashBuilder` option for it today. This also removes the hash caching logic from the record hasher, as it turns out to be slower than just hashing everything for BLAKE3 and greatly simplifies the implementation with its removal. Numbers for indexing `Foundation` and `Cocoa` textual includes on an M2 Pro over 10 runs with 3 warmup are as follows. Build with full re-index (ie. index removed between each run) - ``` Current: 688ms +- 8ms BLAKE3: 691ms +- 4ms BLAKE3 cached: 711ms +- 8ms No-op hash: 620ms +- 4ms ``` Same but with an existing index (which would hash but then not write any output) - ``` Current: 396ms +- 4ms BLAKE3: 394ms +- 4ms BLAKE3 cached: 419ms +- 3ms No-op hash: 382ms +- 5ms ``` The no-op hash is a little misleading in the full re-index since it will be writing out fewer records. But the existing index case is interesting, showing that hashing is only a small part of the entire build and index. Also worth noting that there was some fairly significant run-to-run variance of around 30ms, but the above was a generally typical pattern (ie. current about the same as BLAKE3, which is faster than BLAKE3 cached, and no-op is the fastest). The main take away is that this isn't a noticable performance regression.
1 parent 417d34b commit a5298e5

File tree

8 files changed

+339
-395
lines changed

8 files changed

+339
-395
lines changed

clang/include/clang/Index/IndexRecordWriter.h

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -75,7 +75,7 @@ class IndexRecordWriter {
7575
/// \returns Success if we should continue writing this record, AlreadyExists
7676
/// if the record file has already been written, or Failure if there was an
7777
/// error, in which case \p Error will be set.
78-
Result beginRecord(StringRef Filename, llvm::hash_code RecordHash,
78+
Result beginRecord(StringRef Filename, uint64_t RecordHash,
7979
std::string &Error, std::string *RecordFile = nullptr);
8080

8181
/// Finish writing the record file.

clang/lib/Index/ClangIndexRecordWriter.cpp

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,7 @@ StringRef ClangIndexRecordWriter::getUSRNonCached(const IdentifierInfo *Name,
6363

6464
ClangIndexRecordWriter::ClangIndexRecordWriter(ASTContext &Ctx,
6565
RecordingOptions Opts)
66-
: Impl(Opts.DataDirPath), Ctx(Ctx), RecordOpts(std::move(Opts)),
67-
Hasher(Ctx) {
66+
: Impl(Opts.DataDirPath), Ctx(Ctx), RecordOpts(std::move(Opts)) {
6867
if (Opts.RecordSymbolCodeGenName)
6968
ASTNameGen.reset(new ASTNameGenerator(Ctx));
7069
}
@@ -76,7 +75,9 @@ bool ClangIndexRecordWriter::writeRecord(StringRef Filename,
7675
std::string &Error,
7776
std::string *OutRecordFile) {
7877

79-
auto RecordHash = Hasher.hashRecord(IdxRecord);
78+
std::array<uint8_t, 8> RecordHashArr = index::hashRecord(IdxRecord, Ctx);
79+
uint64_t RecordHash = 0;
80+
std::memcpy(&RecordHash, RecordHashArr.data(), RecordHashArr.size());
8081

8182
switch (Impl.beginRecord(Filename, RecordHash, Error, OutRecordFile)) {
8283
case IndexRecordWriter::Result::Success:

clang/lib/Index/ClangIndexRecordWriter.h

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,7 +33,6 @@ class ClangIndexRecordWriter {
3333
std::unique_ptr<ASTNameGenerator> ASTNameGen;
3434
llvm::BumpPtrAllocator Allocator;
3535
llvm::DenseMap<const void *, StringRef> USRByDecl;
36-
IndexRecordHasher Hasher;
3736

3837
public:
3938
ClangIndexRecordWriter(ASTContext &Ctx, RecordingOptions Opts);

0 commit comments

Comments
 (0)