Generate CodeQL Models-as-Data (MaD) summaries (sources, sinks, summaries) from existing CodeQL databases and export them in multiple formats suitable for:
- Data extensions (YAML) for CodeQL packs
- Customization libraries (
.qll
) - Bundled packs containing generated customizations
- Raw JSON for further processing
- Automated download of CodeQL databases via the Code Scanning API (when a token is provided)
- Multiple export formats:
json
,extensions
,customizations
,bundle
- GitHub Action + GH CLI extension + direct CLI usage
- Automatic language detection from database metadata (fallback to manual selection)
- Caching support (skip with
--disable-cache
) - Supports (current):
java
,csharp
Currently limited to the languages enforced in the code (CODEQL_LANGUAGES
):
- Java
- C#
Requests / PRs to add more languages are welcome once the upstream model generator queries support them.
- name: Generate CodeQL Summaries
uses: advanced-security/[email protected]
with:
projects: ./projects.json
token: ${{ secrets.CODEQL_SUMMARY_GENERATOR_TOKEN }}
format: extensions
output: ./generated
gh extension install advanced-security/gh-codeql-summarize
gh codeql-summarize --help
Example:
gh codeql-summarize \
--format bundle \
--input examples/projects.json \
--output ./examples
git clone https://github.com/advanced-security/codeql-summarize.git
cd codeql-summarize
pipenv install --dev # or pip install -e . if a setup is added later
pipenv run python -m codeqlsummarize --help
Minimal invocation (using a local database + explicit language):
python -m codeqlsummarize \
-db /path/to/codeql-db \
-l java \
-f json \
-o ./out
Input | Description | Default |
---|---|---|
project |
Single repository (owner/name) to summarize | (none) |
projects |
Path to a JSON file mapping language to list of repositories | ./projects.json |
language |
Comma-separated language list (overrides auto-detect) | (auto) |
format |
Export format: json , extensions , customizations , bundle |
extensions |
output |
Output directory (or file for certain formats) | ./ |
repository |
GitHub repository context (fallback for project ) |
${{ github.repository }} |
token |
GitHub token used to download databases | ${{ github.token }} |
Notes:
- To download CodeQL databases the token must have appropriate permissions (typically
security_events:read
/repo
depending on visibility). A fine‑grained PAT with Code scanning read access is recommended. - If a database cannot be downloaded it will be skipped.
Example (examples/projects.json
):
{
"java": ["ESAPI/esapi-java-legacy"]
}
Structure: <language>
→ array of <owner>/<repo>
strings.
Format | Description | Output Shape |
---|---|---|
json |
Raw rows per model type | One JSON file per database / summary (future enhancement) |
extensions |
Data extensions YAML under a CodeQL pack structure | Writes .yml under generated/ inside the detected pack |
customizations |
Single .qll customization library aggregating models |
Requires -o <file>.qll |
bundle |
Initializes / updates a CodeQL pack containing generated customizations | Creates / updates pack in output dir |
bundle
will (if necessary) create a pack (e.g. java-summarize/
) and generate per‑repository .qll
files plus a Customizations.qll
aggregator.
Variable | Purpose |
---|---|
GITHUB_TOKEN |
Default token for API calls (Actions) |
GITHUB_REPOSITORY |
Default repo context (owner/name) |
RUNNER_TEMP |
Temp directory root (Actions) |
DEBUG |
If set (non-empty) enables debug logging |
The tool skips repositories whose databases cannot be fetched or located, logging warnings rather than stopping the entire run.
- Maintain a
projects.json
file listing target repositories per language. - Schedule a workflow (e.g. nightly) to regenerate models.
- Commit or publish the generated Data Extensions / Pack as needed.
- Consume generated models in downstream CodeQL analysis.
Run tests:
pipenv run python -m unittest -v
Lint / format:
pipenv run black .
See CONTRIBUTING.md. Please open an issue before large changes.
See SECURITY.md.
See SUPPORT.md. For general questions open a GitHub issue.
- Limited language set (Java, C#)
- No parallel download throttling handling yet
- No direct GitHub language detection fallback implemented
- JSON exporter minimal (subject to enhancement)
Licensed under the MIT License – see LICENSE.
- @GeekMasher – Author
- @zbazztian – Major contributor