Skip to content

Conversation

cklin
Copy link
Contributor

@cklin cklin commented Sep 23, 2025

This PR updates the init action to compute a preliminary overlay database mode before the CodeQL CLI becomes available. The preliminary overlay database mode will be used in a future PR.

Risk assessment

For internal use only. Please select the risk level of this change:

  • Low risk: Changes are fully under feature flags, or have been fully tested and validated in pre-production environments and are highly observable, or are documentation or test only.

Merge / deployment checklist

  • Confirm this change is backwards compatible with existing workflows.
  • Consider adding a changelog entry for this change.
  • Confirm the readme and docs have been updated if necessary.

@cklin cklin marked this pull request as ready for review September 23, 2025 20:32
@cklin cklin requested a review from a team as a code owner September 23, 2025 20:32
@Copilot Copilot AI review requested due to automatic review settings September 23, 2025 20:32
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR updates the init action to compute a preliminary overlay database mode before the CodeQL CLI becomes available. This computation will be used in a future PR.

Key changes:

  • Creates a new JSON file for overlay language aliases mapping
  • Extracts the inputs object creation to happen earlier in the workflow
  • Adds a new getPreliminaryOverlayDatabaseMode function to determine overlay database mode without CodeQL CLI
  • Updates isOverlayAnalysisFeatureEnabled to work without CodeQL dependency

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/overlay-language-aliases.json New mapping file for language aliases used in overlay analysis
src/init.ts Updates initConfig to accept CodeQL parameter
src/init-action.ts Restructures input creation and adds preliminary overlay mode computation
src/feature-flags.ts Removes CodeQL dependency for overlay analysis feature minimum version
src/config-utils.ts Adds functions for computing overlay mode and loading config without CodeQL
src/config-utils.test.ts Updates tests to accommodate new function signatures
lib/* Generated JavaScript code reflecting TypeScript changes

@cklin cklin force-pushed the cklin/preliminary-overlay-mode branch from 0e88ef0 to 0d55271 Compare September 25, 2025 17:52
getOverlayDatabaseMode() already performs the same version check, so we
can remove minimumVersion from Feature.OverlayAnalysis. Doing so allows
the action to perform feature checks without CodeQL CLI.
This commit changes isOverlayAnalysisFeatureEnabled() so that it uses
the overlay-language-aliases.json file to resolve language aliases
instead of relying on the CodeQL CLI.
This commit makes getOverlayDatabaseMode() accept undefined as arguments
for codeql and languages.
This commit extracts into amendInputConfigFile() the code that processes
configInput, and moves the call from initConfig() into init-action.ts.
@cklin cklin force-pushed the cklin/preliminary-overlay-mode branch from 0d55271 to c4d96be Compare September 26, 2025 22:15
@cklin cklin requested a review from mbg September 29, 2025 14:39
Copy link
Member

@mbg mbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for having a go at this as an alternative to #3116. As discussed elsewhere, I think that having just the hard-coded alias mappings (in overlay-language-aliases.json here) is a better approach than having all of the CLI output and maintaining that with an extra workflow.

I have had a look over the details of the implementation here (but I have not yet looked at #3158). For the changes here, I have some fairly significant concerns about the duplication of the logic producing the UserConfig (i.e. the database configuration) in both getPreliminaryOverlayDatabaseMode and initConfig -- see my comments for details there.

This is also a fairly big change generally and I would class this as high risk given the concerns I discuss in the comments and that I don't have much confidence that we have enough existing overall test coverage to understand what this may or may not break.

I would propose one of the following:

  • Depending on the extent to which the UserConfig for getOverlayDatabaseMode must align with the one computed in initConfig, perhaps getPreliminaryOverlayDatabaseMode could be modified so that what it does has no effect on initConfig and initConfig continues to work largely as is.
  • Placing all of the changes behind a FF, but this is likely just going to make the logic even more complex and could introduce more issues than it might prevent if the above isn't possible; or
  • Making smaller changes incrementally, perhaps starting by refactoring some existing functions to set things up for the changes here in ways that don't change any behaviour.

logger: Logger,
): Promise<string[]> {
// Obtain languages without filtering them.
const { rawLanguages } = await getRawLanguages(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getRawLanguages makes an API call as part of calling getRawLanguagesInRepo (if there's no languages input). Since getRawLanguages is currently only called at most once, the result of that API call is not cached. With this case, it could get called more than once. It might make sense to cache the response since it shouldn't change between calls.

* CodeQL CLI. It is intended to be used for overlay analysis preparations
* before the CodeQL CLI is available.
*/
async function getUnverifiedLanguagesForOverlay(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: It would be nice if this could live in overlay-database-utils.ts since it is specific to overlay databases. I imagine it's here instead because of the call to getRawLanguages. Do you think you could make rawLanguages a parameter of getUnverifiedLanguagesForOverlay and move it to overlay-database-utils.ts?

);
const languageAliases = overlayLanguageAliases as Record<string, string>;

const languagesSet: string[] = [];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: languagesSet suggests that this is a Set object, but it is actually an array. Also the name is misleading, because I don't think that getRawLanguages attempts any de-duplication, so in theory we could have multiple languages multiples times here if the languages input contains them more than once.

* This function should be called only once on any specific `InitConfigInputs`
* object. Otherwise it could emit a false warning.
*/
export function amendInputConfigFile(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name is misleading because it suggests that we are somehow changing an existing configuration file, while we are actually just determining the one we wish to use (and writing the one provided as input to disk, if needed).

Comment on lines +576 to +577
* This function should be called only once on any specific `InitConfigInputs`
* object. Otherwise it could emit a false warning.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to enforce this, e.g. with some state capturing whether this has been called already for the given inputs or in the types.

`Both a config file and config input were provided. Ignoring config file.`,
);
}
inputs.configFile = userConfigFromActionPath(inputs.tempDir);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: I don't love that this relies on mutating inputs and that the return type of this function is void. Perhaps you could refactor this function so that it returns the appropriate value for inputs.configFile (i.e. either the initial inputs.configFile or the result of userConfigFromActionPath(inputs.tempDir)) and then update inputs.configFile to the result at the call site.

* @returns An object containing the overlay database mode and whether the
* action should perform overlay-base database caching.
*/
export async function getPreliminaryOverlayDatabaseMode(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am concerned about the amount of logic duplication that is taking place here (particularly the involvement of calculateAugmentation and generateCodeScanningConfig). I understand that these are required for getOverlayDatabaseMode, but we risk turning this into a big source of errors if there is inconsistency in what this function determines vs what initActionState determines.

For example, this function does not account for analysis-kinds being only code-quality, in which case all query customisation should be disabled. Notably, that has different semantics for repositoryProperties (where additional configuration is an info-level log message) and for inputs/config file values (where it is a ConfigurationError).

I assume that it is important that the computedConfig here mirrors the one in initConfig as accurately as possible, but correct me if I am wrong and this particular aspect doesn't matter.

If that does matter and viewing this issue in isolation, we would ideally have one function which figures computedConfig out given the necessary arguments for calculateAugmentation and generateCodeScanningConfig taking analysis-kinds into account, and then use that in both this function and initConfig.

Comment on lines -291 to +330
const qualityQueriesInput = getOptionalInput("quality-queries");

if (qualityQueriesInput !== undefined) {
if (inputs.qualityQueriesInput !== undefined) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor: is there any particular reason for removing this constant, since it's still used in two places as before?

Comment on lines +934 to +941
const userConfig = await loadUserConfig(
inputs.configFile,
inputs.workspacePath,
inputs.apiDetails,
tempDir,
logger,
);
const config = await initActionState(inputs, userConfig, codeql);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The interaction between this and what happens in getPreliminaryOverlayDatabaseMode is not obvious and possibly problematic. Specifically, getPreliminaryOverlayDatabaseMode suggests that it is related to only overlay databases, but it actually writes the configuration file that loadUserConfig ends up loading here. That configuration file already contains a CLI configuration that is a combination of a config-file or config input, with other Action inputs such as queries and packs as well as repository property values.

initActionState then performs essentially duplicate work calling calculateAugmentation (although I understand that it takes the languages as input and therefore could lead to different results than for the first call to it) which then gets fed to a second call to generateCodeScanningConfig. This could lead to some odd behaviours where assumptions that we make in generateCodeScanningConfig are invalidated because userConfig already contains merged results of various inputs from the first call to generateCodeScanningConfig that took place in getPreliminaryOverlayDatabaseMode. So we might end up with duplicate entires in the final configuration file, duplicate or confusing log messages, etc.

Comment on lines +737 to +738
languages: Language[] | undefined,
languagesInput: string | undefined,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am worried that having both (possibly disagreeing) values here could be a source of confusion or errors.

@cklin
Copy link
Contributor Author

cklin commented Oct 1, 2025

Hi @mbg,

Thank you for your detailed comments. Before we go into the details, can you confirm that I understand your main concerns correctly?

  1. You are concerned that this change is high-risk because it is a fairly big change, not under feature flags, and there is insufficient test coverage.
  2. You are concerned that there is too much code duplication between getPreliminaryOverlayDatabaseMode() and the rest of the init action, and that the behavior of getPreliminaryOverlayDatabaseMode() is inconsistent with the main init action logic.
  3. You are concerned that there is insufficient logical separation between getPreliminaryOverlayDatabaseMode() and the rest of the init action (for example, with initActionState()), and that the hidden interactions will cause maintenance burden down the road.

Do I understand your main concerns correctly? Am I missing anything?

Copy link
Contributor

@henrymercer henrymercer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we simplify our decision about whether to match the CLI version to the overlay version, and therefore reduce the amount of complexity here? I think it would be OK if we matched the CLI version to the overlay version, even if once we obtained the CLI we decided later we aren't actually going to run overlay. The main thing we need to ensure is that the decision is stable and doesn't flip on and off.

In particular, it would simplify things greatly if we didn't need to compute the full code scanning configuration, and instead made a decision based on the user's requested languages and build modes, and the feature flags.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants