Move the semanticdb generation logic to `compile` and make sure it's shared where possible #5841

arturaz · 2025-09-11T10:35:20Z

This PR optimizes and reworks the compilation pipeline, with regards to SemanticDB generation.

Pre-PR situation

there were separate compile and semanticDbDetailed tasks.
compile performed the compilation without semantic DB plugin.
semanticDbDetailed performed the compilation with the semantic DB plugin, but did not reuse the compile's output.

Which meant that if any of these happened:

mill cli invoked compile and then semanticDbData.
mill BSP invoked compile and then any other task that required the semantic db, or vice-versa.

The compilation would have been performed twice, wasting CPU cycles and worsening the developer experience.

Post PR situation

Mill now smartly chooses whether compile produces semanticdb data or not. semanticdb is produced if:

compile was directly invoked by a task that needs semanticdb.
there is at least one BSP client that requires semanticdb to be produced.

Implementation details

Introduction of `MILL_BSP_OUTPUT_DIR`

Previously you could use MILL_OUTPUT_DIR environment variable to set both regular and bsp mill's output directory to a certain folder.

Because regular mill now needs to know the location of the BSP folder, having one variable is problematic:

you run MILL_OUTPUT_DIR=out_bsp ./mill --bsp ...
you want to run regular mill, but provide it a changed path for bsp mill.
MILL_OUTPUT_DIR=out_bsp ./mill ... changes the regular mill out folder.

Thus MILL_BSP_OUTPUT_DIR is introduced, which allows you to:

MILL_BSP_OUTPUT_DIR=out_bsp ./mill --bsp ...
MILL_BSP_OUTPUT_DIR=out_bsp ./mill ... # this still uses the regular out/ folder, but knows where bsp mill out/ folder is located

`BuildCtx.bspSemanticDbSessionsFolder`

Folder in the filesystem where Mill's BSP sessions that require semanticdb store an indicator file (name = process PID, contents are irrelevant) to communicate to main Mill daemon and other BSP sessions that there is at least one Mill session that will need the semanticdb.

The reasoning is that if at least one of Mill's clients requests semanticdb, then there is no point in running regular compile without semanticdb, as eventually we will have to rerun it with semanticdb, and thus we should compile with semanticdb upfront to avoid paying the price of compling twice (without semanticdb and then with it).

`CompilationResult.semanticDbFiles`

Because we can't change the return type of compile due to binary compatibility, semanticDbFiles field was added to CompilationResult and compile fills it in if the compilation happened with semanticdb enabled.

Removal of `CompileFor` and related tasks

These are not needed anymore with the single compile task.

Removal of separate `SemanticDbJavaModule.semanticDbDataDetailed` task

It's functionality was merged to compileInternal, which takes a compileSemanticDb parameter.

There also was a lot of code duplication between compile and semanticDbDataDetailed tasks, for both java and scala modules.

Replace the implicit `Task.dest` usage in `ZincWorker` with explicit `compileTo` argument

This makes it clearer what the parameter is used for and allows to reuse the same value in SemanticDbJavaModule.enhanceCompilationResultWithSemanticDb invocation.

Misc changes

Moved JavaModule#resolveRelativeToOut instance method to UnresolvedPath.resolveRelativeToOut.
Improved Server to provide better debugging output if the server cannot be launched.
testScala212Version updated to 2.12.20 because new semanticdb plugin is not provided for the ancient 2.12.6 version that was used.

Fixes: #5744

…ration-on-compile # Conflicts: # libs/javalib/src/mill/javalib/SemanticDbJavaModule.scala # libs/javalib/src/mill/javalib/UnresolvedPath.scala

….com/arturaz/mill into fix/semanticdb-generation-on-compile

…ration-on-compile # Conflicts: # libs/daemon/server/src/mill/server/Server.scala # libs/javalib/src/mill/javalib/SemanticDbJavaModule.scala

…spClientsNeedSemanticDb runtime value

…ration-on-compile' into fix/semanticdb-generation-on-compile

…ration-on-compile # Conflicts: # example/scalalib/linting/3-acyclic/build.mill # example/scalalib/spark/1-hello-spark/build.mill # example/scalalib/spark/3-semi-realistic/build.mill # integration/ide/bsp-server/resources/snapshots/logging

lefou · 2025-10-26T17:10:48Z

TBH, the current state (of this PR) looks way to complicated and makes too much assumptions for my taste. But I admit, the current codebase (in this context) is already far from being simple.

I'd like to split the problem and provide clear solutions separated from each other.

Issue 1: Bad compilation performance

we currently compile too much, since the semanticDbData task duplicates compilation work already done in the compile task.
by always compiling with the semanticDB generator enabled, we could optimize the compilation for the BSP use case and would also ensure sync'ed results, but we potenially leak unwanted semanticDb data downstream.

Issue 2: Decide when we need semanticDB data

explicit: user enabled semanticDB in the module via scalacOptions - the compile result will contain the semanticDB data files and we should not not apply any extra processing
implicit: user uses Metals as the IDE - the compile task should not contain any semanticDB data files, as these are considered unwanted results (e.g. they should not appear downstream on classpaths or in jars)
no: we don't need semanticDB data at all - we should not generate it

Proposal for Issue 1:

Disclaimer: proposed task names are not final but choosen to make the concept clear

create a new persistent compileWithMaybeSemanticDb task which does the actual compilation, and include semanticDB data if we, for some reasons, need them.
Let the compile task use the result of compileWithMaybeSemanticDb but filter out semanticDB data, iff it was not explicitly requested by the module configuration, e.g. via scalacOptions.
Let the semanticDbData task use the result of compileWithMaybeSemanticDb and filter out any non-semanticDB data.
We keep the current concept of well-separated tasks with well-defined results. All downstream users, esp. the BSP client or mill-scalafix plugin keep as-is, but better performing.

Ideas for Issue 2:

Maybe too simple, but we could always generate semanticDB data in compileWithMaybeSemanticDb and just don't use it downstream if nobody is interested in it. This has an overhead of up to 20 percent in case nobody is going to need it. (Very unlikely, but it may also conflict with other compiler plugins and fail the compilation that otherwise would succeed.)
Smart-decision for semanticDB data need. Either project use of the semanticDbData task or any BSP use should permanently enable it. This must be a bullet-proof design, well-documented and users need a way to disable it (opt-out or opt-in).

To detect BSP, we should just use the fact that a Mill-generated .bsp/mill-bsp.json file is present, since this won't require any extra book keeping. Users, who don't want to use BSP can also safely remove that file. We could also write an extra file next to this location, so we can check its age, for example.

Once the semanticDbData task is used/planned, we may record that fact under the namespace of a dedicated module-specific persistent task semanticDbDataGenerationWanted. This has the issue that the semanticDbData task is a downstream dependency of compileWithMaybeSemanticDb task and we currently can't know if the initial value is correct (so factually we always need to guess "enabled"). It would be really cool to have some way to detect early what the user is going to run. E.g. if we could expose the current execution plan via the TaskCtx. That way we could have an early persistent task semanticDbDataGenerationWanted that decides based on it's persistent state and the fact whether the semanticDbData task is requested. Then we could conservatively default to disabled, unless there is more evidence.

arturaz added 7 commits August 29, 2025 13:46

WIP: I don't think this approach will work

1b91e3e

WIP: this compiles but doesn't work

50d1428

WIP: yeah, I'm stuck

1f02621

WIP: force to always use semantic db

b77f709

Merge remote-tracking branch 'upstream/main' into fix/semanticdb-gene…

c59e092

…ration-on-compile # Conflicts: # libs/javalib/src/mill/javalib/SemanticDbJavaModule.scala # libs/javalib/src/mill/javalib/UnresolvedPath.scala

WIP: force to always use semantic db

1ac358d

WIP: works?

31e0309

arturaz marked this pull request as draft September 11, 2025 10:35

autofix-ci bot and others added 7 commits September 11, 2025 10:50

[autofix.ci] apply automated fixes

8ed1ef3

Fix BuildCtx.withFilesystemCheckerDisabled

f60ae90

Merge branch 'fix/semanticdb-generation-on-compile' of https://github…

559b18f

….com/arturaz/mill into fix/semanticdb-generation-on-compile

Merge remote-tracking branch 'upstream/main' into fix/semanticdb-gene…

017c331

…ration-on-compile # Conflicts: # libs/daemon/server/src/mill/server/Server.scala # libs/javalib/src/mill/javalib/SemanticDbJavaModule.scala

Binary compatibility and code review fixes.

f7e6b00

[autofix.ci] apply automated fixes

317f5fc

Docs.

93777e2

arturaz marked this pull request as ready for review October 3, 2025 10:59

arturaz and others added 14 commits October 3, 2025 14:48

Undeprecate public JVM worker API

73a724e

docs

9d66aa2

test fixes

ee7f50e

test fixes

e24a0b2

[autofix.ci] apply automated fixes

de7fa17

Refactor to dynamically construct a different task graph based on a b…

08e2819

…spClientsNeedSemanticDb runtime value

Merge remote-tracking branch 'refs/remotes/origin/fix/semanticdb-gene…

f4043b0

…ration-on-compile' into fix/semanticdb-generation-on-compile

Fix semantic db compilation for KotlinModule

e8becf3

Fix semantic db compilation for MillBuildRootModule

16ff91d

Fix HelloJavaTests

e6445cb

[autofix.ci] apply automated fixes

16c0ca6

Fix KotlinModule compilation

5d66ac4

[autofix.ci] apply automated fixes

4ea7fc3

arturaz and others added 2 commits October 26, 2025 12:42

Fix KotlinModule compilation: zinc was receiving kotlin files

ababf65

[autofix.ci] apply automated fixes

d55ac0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Move the semanticdb generation logic to `compile` and make sure it's shared where possible #5841

Move the semanticdb generation logic to `compile` and make sure it's shared where possible #5841

arturaz commented Sep 11, 2025 •

edited

Loading

Uh oh!

lefou commented Oct 26, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Move the semanticdb generation logic to compile and make sure it's shared where possible #5841

Are you sure you want to change the base?

Move the semanticdb generation logic to compile and make sure it's shared where possible #5841

Conversation

arturaz commented Sep 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pre-PR situation

Post PR situation

Implementation details

Introduction of MILL_BSP_OUTPUT_DIR

BuildCtx.bspSemanticDbSessionsFolder

CompilationResult.semanticDbFiles

Removal of CompileFor and related tasks

Removal of separate SemanticDbJavaModule.semanticDbDataDetailed task

Replace the implicit Task.dest usage in ZincWorker with explicit compileTo argument

Misc changes

Uh oh!

lefou commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Issue 1: Bad compilation performance

Issue 2: Decide when we need semanticDB data

Proposal for Issue 1:

Ideas for Issue 2:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Move the semanticdb generation logic to `compile` and make sure it's shared where possible #5841

Move the semanticdb generation logic to `compile` and make sure it's shared where possible #5841

arturaz commented Sep 11, 2025 •

edited

Loading

Introduction of `MILL_BSP_OUTPUT_DIR`

`BuildCtx.bspSemanticDbSessionsFolder`

`CompilationResult.semanticDbFiles`

Removal of `CompileFor` and related tasks

Removal of separate `SemanticDbJavaModule.semanticDbDataDetailed` task

Replace the implicit `Task.dest` usage in `ZincWorker` with explicit `compileTo` argument

lefou commented Oct 26, 2025 •

edited

Loading