Skip to content

Commit afadd4e

Browse files
committed
Merged PR 713379: Add doc section for augmented weak fp
Add doc section for augmented weak fp
1 parent b8824c0 commit afadd4e

File tree

1 file changed

+24
-5
lines changed

1 file changed

+24
-5
lines changed

Documentation/Wiki/Advanced-Features/Two-Phase-Cache-Lookup.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -356,8 +356,27 @@ In production, the primary risk can be pathset lookup. It is possible for a weak
356356
tend to read many of the same files at runtime, it can result in pathset explosion: the number of
357357
potentially matching pathsets grows to a point that it degrades performance.
358358

359-
The primary fix for this issue (beyond tuning some pathset-related parameters in BuildXL's
360-
configuration) is to make the weak fingerprints stronger, by adding more statically known inputs.
361-
This results in fewer pathsets per weak fingerprint. BuildXL has also implemented an 'augmented
362-
weak fingerprint' mechanism which associates common pathsets with weak fingerprints, for optimized
363-
lookup.
359+
## Weak fingerprint augmentation
360+
361+
BuildXL implements a heuristic to deal with the case where a weak fingerprint is *too* weak and produces a large number
362+
of unique pathsets under it. In some particular scenarios where the weak fingerprint is known to be
363+
not strong enough, this behavior is enabled by default. This includes all the family of the JavaScript resolvers, as well as
364+
CMake/Ninja and MSBuild resolvers. For the case of DScript, the behavior can enabled at the pip level by setting
365+
366+
```enforceWeakFingerprintAugmentation: true```
367+
368+
as part of the transformer execute arguments.
369+
370+
A high level description of the heuristic is described below:
371+
372+
* Let's say BuildXL performs a cache lookup with a given weak fingerprint `fp`. Assume the lookup turns out to be a miss and the pip is executed. If the lookup involves more than `n` downloaded unique path sets, BuildXL won't push to the cache the result of the execution under the given `fp`, but it will use an augmented `aug_fp` for it.
373+
* How is this augmented weak fingerprint computed? BuildXL will try to extract some common paths from all the downloaded path sets and will construct with that a new `augmented` path set. A strong fingerprint will be then computed based on that augmented path set, and the result will be used as the new augmented weak fingerprint `aug_fp`. The result of executing the pip will be stored using the augmented fingerprint, not the original one, effectively branching the candidates that would have been stored under the original weak fingerprint under the augmented one.
374+
* When an augmented path set is produced, BuildXL also pushes a special entry to the cache that represents the `fp -> augmented path set` mapping. The next time a cache lookup against `fp` happens, the special entry `fp -> augmented path` will be also retrieved. That will make BuildXL to query the cache again, but now for all candidates with weak fingerprint `aug_fp = strong_fingerprint(augmented path set)`.
375+
376+
Observe that since the augmented weak fingerprint is *locally computed* from an augmented path set, the result will depend on the local state of the disk, and therefore only the candidates pushed under this particular weak fingerprint will be retrieved. This effectively reduces the number of candidates that are retrieved overall compared with not using the augmented fingerprint heuristic.
377+
378+
Two knobs that affect this heuristic can be configured:
379+
* `/pathSetThreshold` : The maximum number of visited path sets allowed before switching to an 'augmented' weak fingerprint computed from common dynamically accessed paths ('`n`' in the above description). Default is `5`.
380+
* `/augmentingPathSetCommonalityFactor`: Used to compute the number of times an entry must appear among paths in the observed path set in order to be included in the common path set. Value must be (0, 1]. Default is `0.4`.
381+
382+
This means that, by default, after downloading `5` different path sets, an augmented one will be computed. If a path is present in at least `2` (`5 * 0.4`) of the `5` different path sets, it will be included in the augmented one.

0 commit comments

Comments
 (0)