Skip to content

Commit c40273f

Browse files
authored
Add RsMetacheck and the proper implementation of the bot (#3)
* add gitignore * feat: add RsMetacheck as dependency and pin python version on 3.12 (3.13 incompatible with some dependency) * update metacheck version * feat: create main cli with metacheck wrapper * feat: add the actual bot implementation to create issue * add example pitfalls to run the bot * small update using utils function * feat: add cli as entrypoint * fix: small changes in the report content * doc: updated README and QUICKSTART * feat: remove gitlab loading * feat: updated bot constant message
1 parent 9c3bad2 commit c40273f

22 files changed

+2618
-278
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
__pycache__/
2+
3+
.env

.python-version

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
3.13
1+
3.11

QUICKSTART.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,111 @@
1+
# Quickstart
2+
3+
Self-contained steps to install, configure, and run `sw-metadata-bot`.
4+
5+
## What the bot does
6+
- Reads pitfalls JSON-LD files produced by RSMetaCheck
7+
- Generates an issue body (pitfalls, warnings, suggestions)
8+
- Creates one issue per repository on GitHub or GitLab (cloud or self-hosted)
9+
- Supports dry-run mode so you can review before posting
10+
11+
## Prerequisites
12+
- Python 3.11 or 3.12
13+
- GitHub or GitLab personal access token with permission to create issues
14+
- RSMetaCheck analysis output (pitfalls `*.jsonld` files)
15+
- Optional: `uv` (recommended) https://docs.astral.sh/uv
16+
17+
## Install
18+
With `uv` (recommended):
19+
```bash
20+
uv add git+https://github.com/codemetasoft/sw-metadata-bot.git
21+
```
22+
23+
With `pip`:
24+
```bash
25+
pip install git+https://github.com/codemetasoft/sw-metadata-bot.git
26+
```
27+
28+
## Configure authentication
29+
Export your tokens (only set what you need):
30+
```bash
31+
export GITHUB_API_TOKEN=ghp_xxxxxxxxxxxx # GitHub
32+
export GITLAB_API_TOKEN=glpat_xxxxxxxxxxxx # GitLab (cloud or self-hosted)
33+
```
34+
35+
Convenient one-liner to load a `.env` file:
36+
```bash
37+
set -a; source .env; set +a
38+
```
39+
Example `.env`:
40+
```
41+
GITHUB_API_TOKEN=ghp_xxxxxxxxxxxx
42+
GITLAB_API_TOKEN=glpat_xxxxxxxxxxxx
43+
```
44+
45+
## Produce analysis data (if you don't have it yet)
46+
Use the bundled metacheck wrapper to create pitfalls outputs:
47+
```bash
48+
uv run sw-metadata-bot metacheck \
49+
--input https://github.com/owner/repo \
50+
--pitfalls-output pitfalls_outputs \
51+
--analysis-output analysis_results.json
52+
```
53+
This produces `pitfalls_outputs/*.jsonld`, which the bot consumes.
54+
You can also provide a json file as input listing mulitple repositories you want to analyse (see `assets/example_list_repo.json`).
55+
56+
## Create issues
57+
Always start with dry-run:
58+
```bash
59+
uv run sw-metadata-bot create-issues \
60+
--pitfalls-output-dir ./pitfalls_outputs \
61+
--issues-dir ./issues_out \
62+
--dry-run
63+
```
64+
65+
Post real issues (remove `--dry-run`):
66+
```bash
67+
uv run sw-metadata-bot create-issues \
68+
--pitfalls-output-dir ./pitfalls_outputs \
69+
--issues-dir ./issues_out
70+
```
71+
72+
Key options:
73+
- `--pitfalls-output-dir` : Directory containing `*.jsonld` analysis files
74+
- `--issues-dir` : Where to store generated issue bodies and reports
75+
- `--dry-run` : Generate content without posting
76+
77+
## Minimal examples (Python)
78+
Detect platform and create issue (dry-run):
79+
```python
80+
from pathlib import Path
81+
from sw_metadata_bot import pitfalls, github_api, create_issues
82+
83+
# Load pitfalls data
84+
data = pitfalls.load_pitfalls(Path("pitfalls_outputs/repo.jsonld"))
85+
repo_url = pitfalls.get_repository_url(data)
86+
87+
# Detect platform
88+
platform_type = create_issues.detect_platform(repo_url)
89+
print(f"Platform: {platform_type}")
90+
91+
# Generate issue content
92+
report = pitfalls.format_report(repo_url, data)
93+
body = pitfalls.create_issue_body(report)
94+
95+
# Create issue (dry-run mode)
96+
github = github_api.GitHubAPI(dry_run=True)
97+
issue_url = github.create_issue(repo_url, "Metadata Quality Report", body)
98+
print(f"Issue URL: {issue_url}")
99+
```
100+
101+
## Troubleshooting
102+
- **Auth failed / 401**: Check `GITHUB_API_TOKEN` / `GITLAB_API_TOKEN` are exported and valid.
103+
- **403 / 404 on issue creation**: You need write/triage permissions on the repository. Test with repos you own first.
104+
- **Platform not supported**: Repo must be GitHub or GitLab (self-hosted GitLab is auto-detected).
105+
- **No pitfalls found**: Ensure `--pitfalls-output-dir` points to metacheck JSON-LD outputs.
106+
- **Review before posting**: Always run with `--dry-run` first and inspect files in `--issues-dir`.
107+
108+
## Supported platforms
109+
- GitHub.com
110+
- GitLab.com
111+
- Self-hosted GitLab instances

README.md

Lines changed: 60 additions & 27 deletions
Original file line numberDiff line numberDiff line change
@@ -1,47 +1,80 @@
11
# sw-metadata-bot
22

3-
A repository to keep the code of the RSMetaCheck bot for pushing issues with existing repository metadata
3+
An automated bot that analyzes repository metadata quality and creates issues with improvement suggestions.
44

5-
## Description
5+
Part of the [CodeMetaSoft](https://w3id.org/codemetasoft) project to improve research software metadata quality.
66

7-
### Goal of the bot
7+
---
88

9-
Contact maintainers and developpers about our pitfalls and warnings analysis about metadata.
10-
We want to start a discussion about the actual state of the quality of the given repository (possibly against others or standards).
11-
Ideally, we would like to point out what should be modified to fix the actual pitfalls and warnings detected.
9+
## 📋 What This Bot Does
1210

13-
### Current approach
11+
If you received an issue from this bot, it means your repository's metadata was automatically analyzed and some improvements were detected.
1412

15-
Based on the RS Metacheck package analysis, this bot creates an issue in the repository host provider to present:
13+
The issue contains:
14+
- **Pitfalls**: Critical metadata issues that should be fixed
15+
- **Warnings**: Metadata improvements that are recommended
16+
- **Suggestions**: Specific recommendations on how to fix each issue
1617

17-
- the detected pitfalls and warnings
18-
- suggestions to fix these pitfalls and warnings.
18+
### Example Issues You Might See
1919

20-
### Current features
20+
- Missing or incomplete `LICENSE` file
21+
- No `CITATION.cff` file (for software citation)
22+
- Incomplete or missing `README` sections
23+
- Missing repository metadata (topics, description, etc.)
24+
- Outdated dependencies
25+
- Missing software documentation
2126

22-
The bot is able to create issues on the repository hosted on gitub.com
23-
In the future, we will add gitlab.com and self-hosted gitlab instances support.
27+
---
2428

25-
### What is out of the scope of this project
29+
## 💬 How to Respond
2630

27-
This repository is not actually doing the analysis of the metadata quality. It is using the analysis provided by the RsMetacheck package.
31+
### If You Agree with the Suggestions
2832

29-
## Install (temporary)
33+
Fix the identified issues and **close the issue** with a comment explaining what you fixed. Your improvements help your software become more discoverable and citable!
3034

31-
(will be publish on PyPy when released)
35+
### If You Disagree or Have Questions
3236

33-
Use uv to install the package directly from github repo.
37+
Feel free to **comment on the issue**. We're happy to discuss the suggestions and help clarify what's needed.
3438

35-
```bash
36-
uv add git+https://github.com/codemetasoft/sw-metadata-bot.git
37-
```
39+
### If You're Not Interested
3840

39-
Or from local path if cloned already,
41+
Simply comment **"unsubscribe"** on the issue and we'll remove your repository from future analysis.
4042

41-
```bash
42-
uv add --editable <path>/sw-metadata-bot
43-
```
43+
---
4444

45-
## Usage
45+
## 🔍 What Analysis Is Used
4646

47-
To be completed.
47+
This bot uses [RSMetaCheck](https://github.com/SoftwareUnderstanding/RsMetaCheck), which analyzes:
48+
- Software metadata completeness
49+
- Citation and documentation quality
50+
- Repository structure and best practices
51+
52+
The bot **does not**:
53+
- Modify your code or files
54+
- Make pull requests
55+
- Have access to your repository secrets
56+
57+
---
58+
59+
## 📚 Learn More
60+
61+
- [CodeMetaSoft Project](https://w3id.org/codemetasoft) - About the initiative
62+
- [RSMetaCheck](https://github.com/SoftwareUnderstanding/RsMetaCheck) - The analysis tool
63+
- [Citation File Format](https://citation-file-format.github.io/) - How to add CITATION.cff
64+
65+
---
66+
67+
## 🛠️ For Maintainers Running This Bot
68+
69+
See [QUICKSTART.md](QUICKSTART.md) for setup and usage instructions.
70+
71+
Supported platforms:
72+
- ✅ GitHub.com
73+
- ✅ GitLab.com
74+
- ✅ Self-hosted GitLab instances
75+
76+
---
77+
78+
## 📝 License
79+
80+
See [LICENSE](LICENSE) file.

assets/example_list_repo.json

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
{
2+
"repositories": [
3+
"https://gitlab.com/example/example_repo_1",
4+
"https://gitlab.com/example/example_repo_2",
5+
"https://github.com/example/example_repo_3"
6+
]
7+
}
Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,99 @@
1+
{
2+
"@context": "https://w3id.org/example/metacheck/0.1.0/",
3+
"@type": "SoftwareQualityAssessment",
4+
"name": "Quality Assessment for cds-astro/cds-moc-rust",
5+
"description": "MOC library in Rust; used in MOCPy, a CLI, a WASM lib, ...",
6+
"creator": {
7+
"@type": "schema:Person",
8+
"name": "Anonymous",
9+
"email": "example@email.com"
10+
},
11+
"dateCreated": "2025-12-17T15:42:57Z",
12+
"license": [
13+
"@id: https://opensource.org/license/mit"
14+
],
15+
"assessedSoftware": {
16+
"@type": "schema:SoftwareApplication",
17+
"name": "cds-astro/cds-moc-rust",
18+
"softwareVersion": "v0.11.0",
19+
"url": "https://github.com/cds-astro/cds-moc-rust"
20+
},
21+
"checks": [
22+
{
23+
"@type": "CheckResult",
24+
"assessesIndicator": {
25+
"@id": "https://w3id.org/example/metacheck/i/indicators/metadatafile"
26+
},
27+
"checkingSoftware": {
28+
"@type": "schema:SoftwareApplication",
29+
"name": "metacheck",
30+
"@id": "https://w3id.org/example/metacheck/tools/",
31+
"softwareVersion": "0.1.0"
32+
},
33+
"process": "The metadata file (codemeta or other) has a version which does not correspond to the version used in the latest release",
34+
"status": {
35+
"@id": "schema:CompletedActionStatus"
36+
},
37+
"checkId": "P001",
38+
"evidence": "P001 detected: codemeta.json version '0.6.0' does not match release version '0.11.0'",
39+
"suggestion": "Ensure the version in your metadata matches the latest official release. Keeping these synchronized avoids confusion for users and improves reproducibility."
40+
},
41+
{
42+
"@type": "CheckResult",
43+
"assessesIndicator": {
44+
"@id": "https://w3id.org/example/metacheck/i/indicators/license"
45+
},
46+
"checkingSoftware": {
47+
"@type": "schema:SoftwareApplication",
48+
"name": "metacheck",
49+
"@id": "https://w3id.org/example/metacheck/tools/",
50+
"softwareVersion": "0.1.0"
51+
},
52+
"process": "codemeta.json version does not match the package's",
53+
"status": {
54+
"@id": "schema:CompletedActionStatus"
55+
},
56+
"checkId": "P017",
57+
"evidence": "P017 detected: LICENSE file only contains copyright information without actual license terms",
58+
"suggestion": "You need to synchronize all version references across metadata and build configuration files."
59+
},
60+
{
61+
"@type": "CheckResult",
62+
"assessesIndicator": {
63+
"@id": "https://w3id.org/example/metacheck/i/indicators/metadatafile"
64+
},
65+
"checkingSoftware": {
66+
"@type": "schema:SoftwareApplication",
67+
"name": "metacheck",
68+
"@id": "https://w3id.org/example/metacheck/tools/",
69+
"softwareVersion": "0.1.0"
70+
},
71+
"process": "codemeta.json dateModified is outdated compared to the actual repository last update date",
72+
"status": {
73+
"@id": "schema:CompletedActionStatus"
74+
},
75+
"checkId": "W002",
76+
"evidence": "W002 detected: Issue detected in repo_list_output_1.json",
77+
"suggestion": "You need to align the version in your metadata file with your latest release tag. Automating this synchronization as part of your release process is highly recommended."
78+
},
79+
{
80+
"@type": "CheckResult",
81+
"assessesIndicator": {
82+
"@id": "https://w3id.org/example/metacheck/i/indicators/codemeta"
83+
},
84+
"checkingSoftware": {
85+
"@type": "schema:SoftwareApplication",
86+
"name": "metacheck",
87+
"@id": "https://w3id.org/example/metacheck/tools/",
88+
"softwareVersion": "0.1.0"
89+
},
90+
"process": "Programming languages in codemeta.json do not have versions",
91+
"status": {
92+
"@id": "schema:CompletedActionStatus"
93+
},
94+
"checkId": "W004",
95+
"evidence": "W004 detected: dateModified in codemeta.json is outdated compared to actual repository last update",
96+
"suggestion": "Include version numbers for each programming language used. Defining these helps ensure reproducibility and compatibility across systems."
97+
}
98+
]
99+
}

0 commit comments

Comments
 (0)