-
Notifications
You must be signed in to change notification settings - Fork 402
Compute preliminary overlay database mode #3141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR updates the init
action to compute a preliminary overlay database mode before the CodeQL CLI becomes available. This computation will be used in a future PR.
Key changes:
- Creates a new JSON file for overlay language aliases mapping
- Extracts the inputs object creation to happen earlier in the workflow
- Adds a new
getPreliminaryOverlayDatabaseMode
function to determine overlay database mode without CodeQL CLI - Updates
isOverlayAnalysisFeatureEnabled
to work without CodeQL dependency
Reviewed Changes
Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
src/overlay-language-aliases.json | New mapping file for language aliases used in overlay analysis |
src/init.ts | Updates initConfig to accept CodeQL parameter |
src/init-action.ts | Restructures input creation and adds preliminary overlay mode computation |
src/feature-flags.ts | Removes CodeQL dependency for overlay analysis feature minimum version |
src/config-utils.ts | Adds functions for computing overlay mode and loading config without CodeQL |
src/config-utils.test.ts | Updates tests to accommodate new function signatures |
lib/* | Generated JavaScript code reflecting TypeScript changes |
0e88ef0
to
0d55271
Compare
getOverlayDatabaseMode() already performs the same version check, so we can remove minimumVersion from Feature.OverlayAnalysis. Doing so allows the action to perform feature checks without CodeQL CLI.
This commit changes isOverlayAnalysisFeatureEnabled() so that it uses the overlay-language-aliases.json file to resolve language aliases instead of relying on the CodeQL CLI.
This commit makes getOverlayDatabaseMode() accept undefined as arguments for codeql and languages.
This commit extracts into amendInputConfigFile() the code that processes configInput, and moves the call from initConfig() into init-action.ts.
0d55271
to
c4d96be
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for having a go at this as an alternative to #3116. As discussed elsewhere, I think that having just the hard-coded alias mappings (in overlay-language-aliases.json
here) is a better approach than having all of the CLI output and maintaining that with an extra workflow.
I have had a look over the details of the implementation here (but I have not yet looked at #3158). For the changes here, I have some fairly significant concerns about the duplication of the logic producing the UserConfig
(i.e. the database configuration) in both getPreliminaryOverlayDatabaseMode
and initConfig
-- see my comments for details there.
This is also a fairly big change generally and I would class this as high risk given the concerns I discuss in the comments and that I don't have much confidence that we have enough existing overall test coverage to understand what this may or may not break.
I would propose one of the following:
- Depending on the extent to which the
UserConfig
forgetOverlayDatabaseMode
must align with the one computed ininitConfig
, perhapsgetPreliminaryOverlayDatabaseMode
could be modified so that what it does has no effect oninitConfig
andinitConfig
continues to work largely as is. - Placing all of the changes behind a FF, but this is likely just going to make the logic even more complex and could introduce more issues than it might prevent if the above isn't possible; or
- Making smaller changes incrementally, perhaps starting by refactoring some existing functions to set things up for the changes here in ways that don't change any behaviour.
logger: Logger, | ||
): Promise<string[]> { | ||
// Obtain languages without filtering them. | ||
const { rawLanguages } = await getRawLanguages( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getRawLanguages
makes an API call as part of calling getRawLanguagesInRepo
(if there's no languages
input). Since getRawLanguages
is currently only called at most once, the result of that API call is not cached. With this case, it could get called more than once. It might make sense to cache the response since it shouldn't change between calls.
* CodeQL CLI. It is intended to be used for overlay analysis preparations | ||
* before the CodeQL CLI is available. | ||
*/ | ||
async function getUnverifiedLanguagesForOverlay( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: It would be nice if this could live in overlay-database-utils.ts
since it is specific to overlay databases. I imagine it's here instead because of the call to getRawLanguages
. Do you think you could make rawLanguages
a parameter of getUnverifiedLanguagesForOverlay
and move it to overlay-database-utils.ts
?
); | ||
const languageAliases = overlayLanguageAliases as Record<string, string>; | ||
|
||
const languagesSet: string[] = []; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: languagesSet
suggests that this is a Set
object, but it is actually an array. Also the name is misleading, because I don't think that getRawLanguages
attempts any de-duplication, so in theory we could have multiple languages multiples times here if the languages
input contains them more than once.
* This function should be called only once on any specific `InitConfigInputs` | ||
* object. Otherwise it could emit a false warning. | ||
*/ | ||
export function amendInputConfigFile( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name is misleading because it suggests that we are somehow changing an existing configuration file, while we are actually just determining the one we wish to use (and writing the one provided as input to disk, if needed).
* This function should be called only once on any specific `InitConfigInputs` | ||
* object. Otherwise it could emit a false warning. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to enforce this, e.g. with some state capturing whether this has been called already for the given inputs
or in the types.
`Both a config file and config input were provided. Ignoring config file.`, | ||
); | ||
} | ||
inputs.configFile = userConfigFromActionPath(inputs.tempDir); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: I don't love that this relies on mutating inputs
and that the return type of this function is void
. Perhaps you could refactor this function so that it returns the appropriate value for inputs.configFile
(i.e. either the initial inputs.configFile
or the result of userConfigFromActionPath(inputs.tempDir)
) and then update inputs.configFile
to the result at the call site.
* @returns An object containing the overlay database mode and whether the | ||
* action should perform overlay-base database caching. | ||
*/ | ||
export async function getPreliminaryOverlayDatabaseMode( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am concerned about the amount of logic duplication that is taking place here (particularly the involvement of calculateAugmentation
and generateCodeScanningConfig
). I understand that these are required for getOverlayDatabaseMode
, but we risk turning this into a big source of errors if there is inconsistency in what this function determines vs what initActionState
determines.
For example, this function does not account for analysis-kinds
being only code-quality
, in which case all query customisation should be disabled. Notably, that has different semantics for repositoryProperties
(where additional configuration is an info
-level log message) and for inputs/config file values (where it is a ConfigurationError
).
I assume that it is important that the computedConfig
here mirrors the one in initConfig
as accurately as possible, but correct me if I am wrong and this particular aspect doesn't matter.
If that does matter and viewing this issue in isolation, we would ideally have one function which figures computedConfig
out given the necessary arguments for calculateAugmentation
and generateCodeScanningConfig
taking analysis-kinds
into account, and then use that in both this function and initConfig
.
const qualityQueriesInput = getOptionalInput("quality-queries"); | ||
|
||
if (qualityQueriesInput !== undefined) { | ||
if (inputs.qualityQueriesInput !== undefined) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor: is there any particular reason for removing this constant, since it's still used in two places as before?
const userConfig = await loadUserConfig( | ||
inputs.configFile, | ||
inputs.workspacePath, | ||
inputs.apiDetails, | ||
tempDir, | ||
logger, | ||
); | ||
const config = await initActionState(inputs, userConfig, codeql); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The interaction between this and what happens in getPreliminaryOverlayDatabaseMode
is not obvious and possibly problematic. Specifically, getPreliminaryOverlayDatabaseMode
suggests that it is related to only overlay databases, but it actually writes the configuration file that loadUserConfig
ends up loading here. That configuration file already contains a CLI configuration that is a combination of a config-file
or config
input, with other Action inputs such as queries
and packs
as well as repository property values.
initActionState
then performs essentially duplicate work calling calculateAugmentation
(although I understand that it takes the languages
as input and therefore could lead to different results than for the first call to it) which then gets fed to a second call to generateCodeScanningConfig
. This could lead to some odd behaviours where assumptions that we make in generateCodeScanningConfig
are invalidated because userConfig
already contains merged results of various inputs from the first call to generateCodeScanningConfig
that took place in getPreliminaryOverlayDatabaseMode
. So we might end up with duplicate entires in the final configuration file, duplicate or confusing log messages, etc.
languages: Language[] | undefined, | ||
languagesInput: string | undefined, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am worried that having both (possibly disagreeing) values here could be a source of confusion or errors.
Hi @mbg, Thank you for your detailed comments. Before we go into the details, can you confirm that I understand your main concerns correctly?
Do I understand your main concerns correctly? Am I missing anything? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify our decision about whether to match the CLI version to the overlay version, and therefore reduce the amount of complexity here? I think it would be OK if we matched the CLI version to the overlay version, even if once we obtained the CLI we decided later we aren't actually going to run overlay. The main thing we need to ensure is that the decision is stable and doesn't flip on and off.
In particular, it would simplify things greatly if we didn't need to compute the full code scanning configuration, and instead made a decision based on the user's requested languages and build modes, and the feature flags.
This PR updates the
init
action to compute a preliminary overlay database mode before the CodeQL CLI becomes available. The preliminary overlay database mode will be used in a future PR.Risk assessment
For internal use only. Please select the risk level of this change:
Merge / deployment checklist