Skip to content

Commit 07dd0cc

Browse files
Copilotericstj
andauthored
Improve source indexer deduplication to prefer real implementation over *.notsupported.cs files (#192)
* Initial plan * Initial analysis and build environment setup Co-authored-by: ericstj <[email protected]> * Implement improved deduplication logic to prefer real source files over *.notsupported.cs Co-authored-by: ericstj <[email protected]> * Refine deduplication logic with improved error handling and cleaner output Co-authored-by: ericstj <[email protected]> * Implement project properties plumbing and improved deduplication logic Co-authored-by: ericstj <[email protected]> * Address PR review feedback: Optimize BinLogReader performance and reorder scoring priorities Co-authored-by: ericstj <[email protected]> * Address PR review feedback: Revert project file changes and optimize BinLogReader Co-authored-by: ericstj <[email protected]> * Revert "Initial analysis and build environment setup" This reverts commit 41af551. * Fix project ranking heuristic * Address feedback * Handle property reassignment during target execution * Make sure Linux is preferred over Windows * Removed workaround now that fix is in * Add documentation and unit tests for source selection algorithm Co-authored-by: ericstj <[email protected]> * Change test methodology to use comparative assertions instead of exact score values Co-authored-by: ericstj <[email protected]> * Add Clone method to CompilerInvocation and refactor tests to use it Co-authored-by: ericstj <[email protected]> * Build and test both SLNs in PR validation --------- Co-authored-by: copilot-swe-agent[bot] <[email protected]> Co-authored-by: ericstj <[email protected]> Co-authored-by: Eric StJohn <[email protected]>
1 parent 8908699 commit 07dd0cc

File tree

11 files changed

+753
-165
lines changed

11 files changed

+753
-165
lines changed

.github/workflows/dotnet.yml

Lines changed: 25 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,25 @@
1-
name: .NET
2-
3-
on:
4-
push:
5-
branches: [ main ]
6-
pull_request:
7-
branches: [ main ]
8-
9-
jobs:
10-
build:
11-
runs-on: windows-latest
12-
steps:
13-
- uses: actions/checkout@v4
14-
- name: Setup .NET
15-
uses: actions/setup-dotnet@v4
16-
with:
17-
global-json-file: global.json
18-
- name: Restore source-indexer.sln
19-
run: dotnet restore src/source-indexer.sln
20-
- name: Build source-indexer.sln
21-
run: dotnet build --no-restore src/source-indexer.sln
1+
name: .NET
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
build:
11+
runs-on: windows-latest
12+
steps:
13+
- uses: actions/checkout@v4
14+
- name: Setup .NET
15+
uses: actions/setup-dotnet@v4
16+
with:
17+
global-json-file: global.json
18+
- name: Restore source-indexer.sln
19+
run: dotnet restore src/source-indexer.sln
20+
- name: Restore SourceBrowser.sln
21+
run: dotnet restore src/SourceBrowser/SourceBrowser.sln
22+
- name: Build & test source-indexer.sln
23+
run: dotnet test --no-restore src/source-indexer.sln
24+
- name: Build & test SourceBrowser.sln
25+
run: dotnet test --no-restore src/SourceBrowser/SourceBrowser.sln

README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
# source-indexer
22
This repo contains the code for building http://source.dot.net
33

4+
## Documentation
5+
- [Source Selection Algorithm](docs/source-selection-algorithm.md) - How the indexer chooses the best implementation when multiple builds exist for the same assembly
6+
47
## Build Status
58
[![Build Status](https://dev.azure.com/dnceng/internal/_apis/build/status/dotnet-source-indexer/dotnet-source-indexer%20CI?branchName=main)](https://dev.azure.com/dnceng/internal/_build/latest?definitionId=612&branchName=main)
69

docs/source-selection-algorithm.md

Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
# Source Selection Algorithm
2+
3+
When the source indexer processes multiple builds for the same assembly (e.g., generic builds, platform-specific builds, or builds with different target frameworks), it uses a scoring algorithm to select the "best" implementation to include in the final source index.
4+
5+
## Overview
6+
7+
The deduplication process groups all compiler invocations by `AssemblyName` and then calculates a score for each build. The build with the highest score is selected and included in the generated solution file.
8+
9+
## Scoring Priorities
10+
11+
The scoring algorithm evaluates builds using the following criteria, ordered by priority from highest to lowest:
12+
13+
### 1. UseForSourceIndex Property (Highest Priority)
14+
- **Score**: `int.MaxValue` (2,147,483,647)
15+
- **Description**: When a project explicitly sets the `UseForSourceIndex` property to `true`, it receives the maximum possible score, ensuring it will always be selected regardless of other factors.
16+
- **Use Case**: Provides an escape hatch for projects that should definitely be included in the source index.
17+
18+
### 2. Platform Support Status (Second Priority)
19+
- **Score**: `-10,000` penalty for platform-not-supported assemblies
20+
- **Description**: If a project has the `IsPlatformNotSupportedAssembly` property set to `true`, it receives a heavy penalty.
21+
- **Use Case**: Ensures that stub implementations containing mostly `PlatformNotSupportedException` are avoided in favor of real implementations.
22+
23+
### 3. Target Framework Version (Third Priority)
24+
- **Score**: `Major * 1000 + Minor * 100`
25+
- **Description**: Newer framework versions receive higher scores. For example:
26+
- .NET 8.0 = 8,000 + 0 = 8,000 points
27+
- .NET 6.0 = 6,000 + 0 = 6,000 points
28+
- .NET Framework 4.8 = 4,000 + 80 = 4,080 points
29+
- **Use Case**: Prefers more recent implementations that are likely to contain the latest features and bug fixes.
30+
31+
### 4. Platform Specificity (Fourth Priority)
32+
- **Score**: `+500` for platform-specific frameworks
33+
- **Additional**: `+100` bonus for Linux platforms, `+50` bonus for Unix platforms
34+
- **Description**: Platform-specific builds (e.g., `net8.0-linux`, `net8.0-windows`) receive bonuses over generic builds.
35+
- **Use Case**: Platform-specific implementations often contain more complete functionality than generic implementations.
36+
37+
### 5. Source File Count (Lowest Priority)
38+
- **Score**: `+1` per source file
39+
- **Description**: Builds with more source files receive higher scores.
40+
- **Use Case**: Acts as a tiebreaker when other factors are equal, assuming more source files indicate a more complete implementation.
41+
42+
## Example Scoring
43+
44+
Consider these hypothetical builds for `System.Net.NameResolution`:
45+
46+
| Build | UseForSourceIndex | IsPlatformNotSupported | Framework | Platform | Source Files | Total Score |
47+
|-------|-------------------|------------------------|-----------|----------|--------------|-------------|
48+
| Generic Build | false | true | net8.0 | none | 45 | -1,955 |
49+
| Linux Build | false | false | net8.0-linux | linux | 127 | 8,727 |
50+
| Windows Build | false | false | net8.0-windows | windows | 98 | 8,598 |
51+
| Override Build | true | false | net6.0 | none | 23 | 2,147,483,647 |
52+
53+
In this example:
54+
- The **Override Build** would be selected due to `UseForSourceIndex=true`
55+
- Without the override, the **Linux Build** would be selected with the highest score
56+
- The **Generic Build** receives a massive penalty for being platform-not-supported
57+
58+
## Implementation Details
59+
60+
The scoring logic is implemented in the `CalculateInvocationScore` method in `BinLogToSln/Program.cs`. The method:
61+
62+
1. Reads project properties from the binlog file
63+
2. Applies scoring rules in priority order
64+
3. Handles parsing errors gracefully
65+
4. Returns a base score of 1 for builds that fail scoring to avoid complete exclusion
66+
67+
## Configuration
68+
69+
The algorithm can be influenced through MSBuild project properties:
70+
71+
- **UseForSourceIndex**: Set to `true` to force selection of this build
72+
- **IsPlatformNotSupportedAssembly**: Set to `true` to indicate this is a stub implementation
73+
- **TargetFramework**: Automatically detected from the project file
74+
75+
These properties are captured from the binlog during the build analysis phase.

src/SourceBrowser/SourceBrowser.sln

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,9 @@ Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "SourceIndexServer.Tests", "
1919
EndProject
2020
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "BinLogParser", "src\BinLogParser\BinLogParser.csproj", "{4EF5052C-7D88-49C6-B940-5190CECD070D}"
2121
EndProject
22-
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "BinLogToSln", "src\BinLogToSln\BinLogToSln.csproj", "{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}"
22+
Project("{FAE04EC0-301F-11D3-BF4B-00C04F79EFBC}") = "BinLogToSln", "src\BinLogToSln\BinLogToSln.csproj", "{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}"
23+
EndProject
24+
Project("{9A19103F-16F7-4668-BE54-9A1E7A4F7556}") = "BinLogToSln.Tests", "src\BinLogToSln.Tests\BinLogToSln.Tests.csproj", "{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}"
2325
EndProject
2426
Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{C0B9CC1C-1EF1-4086-9532-E8679CBA4E62}"
2527
ProjectSection(SolutionItems) = preProject
@@ -65,10 +67,14 @@ Global
6567
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Debug|Any CPU.Build.0 = Debug|Any CPU
6668
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Release|Any CPU.ActiveCfg = Release|Any CPU
6769
{4EF5052C-7D88-49C6-B940-5190CECD070D}.Release|Any CPU.Build.0 = Release|Any CPU
68-
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
69-
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.Build.0 = Debug|Any CPU
70-
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.ActiveCfg = Release|Any CPU
71-
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.Build.0 = Release|Any CPU
70+
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
71+
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Debug|Any CPU.Build.0 = Debug|Any CPU
72+
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.ActiveCfg = Release|Any CPU
73+
{E73D1784-F1BC-4F01-B68E-03623CFBFB8E}.Release|Any CPU.Build.0 = Release|Any CPU
74+
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Debug|Any CPU.ActiveCfg = Debug|Any CPU
75+
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Debug|Any CPU.Build.0 = Debug|Any CPU
76+
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Release|Any CPU.ActiveCfg = Release|Any CPU
77+
{A6F4AA1E-2B2A-4E48-9C3E-4A1B2D3C5E7F}.Release|Any CPU.Build.0 = Release|Any CPU
7278
EndGlobalSection
7379
GlobalSection(SolutionProperties) = preSolution
7480
HideSolutionNode = FALSE

0 commit comments

Comments
 (0)