Skip to content

Commit 3219e33

Browse files
authored
Merge branch 'main' into 541bdependabot/github_actions/production-dependencies-1cc54e3a68
2 parents 42b03da + 6fc479b commit 3219e33

File tree

2 files changed

+185
-24
lines changed

2 files changed

+185
-24
lines changed
Lines changed: 23 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,23 @@
1+
---
2+
applyTo: "**"
3+
---
4+
5+
This is a Python project for generating CodeQL Summaries from CodeQL databases using Models as Data (MaD).
6+
7+
## Guidelines
8+
9+
- Please ensure you have Python 3.10 or higher installed.
10+
- Use `pipenv` for managing dependencies and virtual environments.
11+
- Use `unittest` for testing.
12+
- Follow PEP 8 style guidelines for Python code.
13+
- Use meaningful commit messages and follow semantic versioning for releases.
14+
- Document your code and provide examples where necessary.
15+
- Add type hints for better code clarity and maintainability.
16+
17+
## Testing
18+
19+
Testing uses `unittest` and can be run with:
20+
21+
```bash
22+
pipenv run test
23+
```

README.md

Lines changed: 162 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,53 +1,191 @@
1-
# CodeQL Summarize
1+
<!-- markdownlint-disable -->
2+
<div align="center">
23

3-
This is the GitHub CodeQL Summarize project and Actions which allows users to generate Models as Data (MaD) from CodeQL databases.
4+
<h1>CodeQL Summarize</h1>
45

5-
## Run
6+
:warning: <strong>Early project – not an official GitHub / CodeQL product</strong> :warning:
67

7-
### Actions
8+
[![GitHub](https://img.shields.io/badge/github-%23121011.svg?style=for-the-badge&logo=github&logoColor=white)](https://github.com/advanced-security/codeql-summarize)
9+
[![GitHub Actions](https://img.shields.io/github/actions/workflow/status/advanced-security/codeql-summarize/publish.yml?style=for-the-badge)](https://github.com/advanced-security/codeql-summarize/actions/workflows/publish.yml?query=branch%3Amain)
10+
[![GitHub Issues](https://img.shields.io/github/issues/advanced-security/codeql-summarize?style=for-the-badge)](https://github.com/advanced-security/codeql-summarize/issues)
11+
[![GitHub Stars](https://img.shields.io/github/stars/advanced-security/codeql-summarize?style=for-the-badge)](https://github.com/advanced-security/codeql-summarize)
12+
[![License](https://img.shields.io/github/license/Ileriayo/markdown-badges?style=for-the-badge)](./LICENSE)
813

9-
The main use case for `codeqlsummarize` is to run it as an Action so the purposes of automating this process.
14+
</div>
15+
<!-- markdownlint-restore -->
16+
17+
Generate CodeQL Models-as-Data (MaD) summaries (sources, sinks, summaries) from existing CodeQL databases and export them in multiple formats suitable for:
18+
19+
- Data extensions (YAML) for CodeQL packs
20+
- Customization libraries (`.qll`)
21+
- Bundled packs containing generated customizations
22+
- Raw JSON for further processing
23+
24+
## Key Features
25+
26+
- Automated download of CodeQL databases via the Code Scanning API (when a token is provided)
27+
- Multiple export formats: `json`, `extensions`, `customizations`, `bundle`
28+
- GitHub Action + GH CLI extension + direct CLI usage
29+
- Automatic language detection from database metadata (fallback to manual selection)
30+
- Caching support (skip with `--disable-cache`)
31+
- Supports (current): `java`, `csharp`
32+
33+
## Supported Languages
34+
35+
Currently limited to the languages enforced in the code (`CODEQL_LANGUAGES`):
36+
37+
- Java
38+
- C#
39+
40+
> Requests / PRs to add more languages are welcome once the upstream model generator queries support them.
41+
42+
## Quick Start
43+
44+
### 1. As a GitHub Action (recommended for automation)
1045

1146
```yml
1247
- name: Generate CodeQL Summaries
13-
uses: advanced-security/codeql-summarize@v1
48+
uses: advanced-security/codeql-summarize@v0.2.0
1449
with:
15-
# This file defines the projects you want to make sure to get the latest and greatest
16-
# summaries from.
1750
projects: ./projects.json
18-
# Token needs access to download the CodeQL databases you want to create summaries for
1951
token: ${{ secrets.CODEQL_SUMMARY_GENERATOR_TOKEN }}
52+
format: extensions
53+
output: ./generated
2054
```
2155
22-
### GH CLI
23-
24-
You can install this tool as part of the GitHub CLI using the following commands:
56+
### 2. GitHub CLI Extension
2557
2658
```bash
27-
gh extensions install advanced-security/gh-codeql-summarize
59+
gh extension install advanced-security/gh-codeql-summarize
2860
gh codeql-summarize --help
2961
```
3062

31-
### Manual Command Line
63+
Example:
3264

3365
```bash
34-
git clone https://github.com/advanced-security/gh-codeql-summarize.git && cd gh-codeql-summarize
35-
python3 -m codeqlsummarize --help
66+
gh codeql-summarize \
67+
--format bundle \
68+
--input examples/projects.json \
69+
--output ./examples
3670
```
3771

38-
## License
72+
### 3. Manual / Local CLI
73+
74+
```bash
75+
git clone https://github.com/advanced-security/codeql-summarize.git
76+
cd codeql-summarize
77+
pipenv install --dev # or pip install -e . if a setup is added later
78+
pipenv run python -m codeqlsummarize --help
79+
```
80+
81+
Minimal invocation (using a local database + explicit language):
82+
83+
```bash
84+
python -m codeqlsummarize \
85+
-db /path/to/codeql-db \
86+
-l java \
87+
-f json \
88+
-o ./out
89+
```
90+
91+
## Action Inputs
92+
93+
| Input | Description | Default |
94+
| ------------ | --------------------------------------------------------------- | -------------------------- |
95+
| `project` | Single repository (owner/name) to summarize | (none) |
96+
| `projects` | Path to a JSON file mapping language to list of repositories | `./projects.json` |
97+
| `language` | Comma-separated language list (overrides auto-detect) | (auto) |
98+
| `format` | Export format: `json`, `extensions`, `customizations`, `bundle` | `extensions` |
99+
| `output` | Output directory (or file for certain formats) | `./` |
100+
| `repository` | GitHub repository context (fallback for `project`) | `${{ github.repository }}` |
101+
| `token` | GitHub token used to download databases | `${{ github.token }}` |
102+
103+
Notes:
104+
105+
- To download CodeQL databases the token must have appropriate permissions (typically `security_events:read` / `repo` depending on visibility). A fine‑grained PAT with Code scanning read access is recommended.
106+
- If a database cannot be downloaded it will be skipped.
107+
108+
## Project File Schema (`projects.json`)
109+
110+
Example (`examples/projects.json`):
111+
112+
```json
113+
{
114+
"java": ["ESAPI/esapi-java-legacy"]
115+
}
116+
```
117+
118+
Structure: `<language>` → array of `<owner>/<repo>` strings.
119+
120+
## Export Formats
121+
122+
| Format | Description | Output Shape |
123+
| ---------------- | ----------------------------------------------------------------------- | --------------------------------------------------------- |
124+
| `json` | Raw rows per model type | One JSON file per database / summary (future enhancement) |
125+
| `extensions` | Data extensions YAML under a CodeQL pack structure | Writes `.yml` under `generated/` inside the detected pack |
126+
| `customizations` | Single `.qll` customization library aggregating models | Requires `-o <file>.qll` |
127+
| `bundle` | Initializes / updates a CodeQL pack containing generated customizations | Creates / updates pack in output dir |
128+
129+
`bundle` will (if necessary) create a pack (e.g. `java-summarize/`) and generate per‑repository `.qll` files plus a `Customizations.qll` aggregator.
130+
131+
## Environment Variables
132+
133+
| Variable | Purpose |
134+
| ------------------- | ---------------------------------------- |
135+
| `GITHUB_TOKEN` | Default token for API calls (Actions) |
136+
| `GITHUB_REPOSITORY` | Default repo context (owner/name) |
137+
| `RUNNER_TEMP` | Temp directory root (Actions) |
138+
| `DEBUG` | If set (non-empty) enables debug logging |
139+
140+
## Exit / Error Behavior
141+
142+
The tool skips repositories whose databases cannot be fetched or located, logging warnings rather than stopping the entire run.
143+
144+
## Typical Workflow (Action + Extensions Format)
145+
146+
1. Maintain a `projects.json` file listing target repositories per language.
147+
2. Schedule a workflow (e.g. nightly) to regenerate models.
148+
3. Commit or publish the generated Data Extensions / Pack as needed.
149+
4. Consume generated models in downstream CodeQL analysis.
150+
151+
## Development
152+
153+
Run tests:
39154

40-
This project is licensed under the terms of the MIT open source license. Please refer to [MIT](./LICENSE.txt) for the full terms.
155+
```bash
156+
pipenv run python -m unittest -v
157+
```
158+
159+
Lint / format:
160+
161+
```bash
162+
pipenv run black .
163+
```
164+
165+
## Contributing
41166

42-
## Maintainers
167+
See [CONTRIBUTING.md](./CONTRIBUTING.md). Please open an issue before large changes.
43168

44-
[CODEOWNERS](./.github/CODEOWNERS) file.
169+
## Security / Reporting Issues
170+
171+
See [SECURITY.md](./SECURITY.md).
45172

46173
## Support
47174

48-
Please create issues for any feature requests, bugs, or documentation problems.
175+
See [SUPPORT.md](./SUPPORT.md). For general questions open a GitHub issue.
176+
177+
## Limitations / Roadmap
178+
179+
- Limited language set (Java, C#)
180+
- No parallel download throttling handling yet
181+
- No direct GitHub language detection fallback implemented
182+
- JSON exporter minimal (subject to enhancement)
183+
184+
## License
185+
186+
Licensed under the MIT License – see [LICENSE](./LICENSE).
49187

50-
## Acknowledgement
188+
## Acknowledgements
51189

52-
- @GeekMasher - Author
53-
- @zbazztian - Major contributor
190+
- @GeekMasher Author
191+
- @zbazztian Major contributor

0 commit comments

Comments
 (0)