Skip to content

Commit e976c27

Browse files
committed
feat: add --file-availability option for disk available filtering (#163)
2 parents 1428b5d + e8ffe5d commit e976c27

27 files changed

+759
-433
lines changed

.gitignore

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -106,3 +106,6 @@ venv.bak/
106106

107107
# _cache folder with archived websites
108108
cache
109+
110+
# Claude Code
111+
.claude

.release-please-manifest.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
{
2-
".": "1.0.1"
2+
".": "1.0.2"
33
}

CHANGELOG.md

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,43 @@
22

33
# Changelog
44

5+
## [1.0.2](https://github.com/cernopendata/cernopendata-client/compare/1.0.1...1.0.2) (2025-12-17)
6+
7+
8+
### Bug fixes
9+
10+
* **download:** use record IDs for all local data paths ([#167](https://github.com/cernopendata/cernopendata-client/issues/167)) ([9b97f7c](https://github.com/cernopendata/cernopendata-client/commit/9b97f7caebf2c049f2734ec1af399c6aeb027341)), closes [#166](https://github.com/cernopendata/cernopendata-client/issues/166)
11+
* **verifier:** zero-pad Adler32 checksums to 8 hex characters ([#169](https://github.com/cernopendata/cernopendata-client/issues/169)) ([b6daa50](https://github.com/cernopendata/cernopendata-client/commit/b6daa508c3d190585f84853687ff0cfb8acbe792))
12+
13+
14+
### Code refactoring
15+
16+
* **searcher:** remove Python 2 compatibility code ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([cc3bb8b](https://github.com/cernopendata/cernopendata-client/commit/cc3bb8b80b6cf383168e0edf93ae17fafe65ab5c))
17+
* **validator:** remove Python 2 compatibility code ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([b03de4f](https://github.com/cernopendata/cernopendata-client/commit/b03de4fb7519864a1e0f1f1380467007c5a73b17))
18+
19+
20+
### Test suite
21+
22+
* **conftest:** add fixture for automatic directory cleanup ([#168](https://github.com/cernopendata/cernopendata-client/issues/168)) ([b847121](https://github.com/cernopendata/cernopendata-client/commit/b84712178cd143d86d66642b09e45d1ffdd9f6b0))
23+
* **conftest:** add shared CLI runner fixture ([#168](https://github.com/cernopendata/cernopendata-client/issues/168)) ([b387fde](https://github.com/cernopendata/cernopendata-client/commit/b387fde1569c7e81c825fe3b8601a4cbefe51fbd))
24+
* **downloader:** add unit tests for file filtering functions ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([a93d2be](https://github.com/cernopendata/cernopendata-client/commit/a93d2be9528016868a06cf0622c13f83493a834d))
25+
* **get-metadata:** add test for filter without output-value ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([7c65a79](https://github.com/cernopendata/cernopendata-client/commit/7c65a79de09b675847067db7c6f036cf0f62588c))
26+
* **global:** add [@pytest](https://github.com/pytest).mark.local marker for local-only tests ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([59fb1da](https://github.com/cernopendata/cernopendata-client/commit/59fb1da2a4ffd6160ad7cd8ce291d88c4774e807))
27+
* **list-directory:** add test for empty directory ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([22e75e5](https://github.com/cernopendata/cernopendata-client/commit/22e75e56cab423905d701a4de05295fcfd319d3d))
28+
* **metadater:** add tests for filter edge cases ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([b1855b9](https://github.com/cernopendata/cernopendata-client/commit/b1855b9e12b1c8365cdcf1c3f6e73227dedbfba6))
29+
* **validator:** correct typos in test assertions ([#170](https://github.com/cernopendata/cernopendata-client/issues/170)) ([07b7b00](https://github.com/cernopendata/cernopendata-client/commit/07b7b00376921288fbb3aed66fac9ce04866858c))
30+
31+
32+
### Continuous integration
33+
34+
* **commitlint:** fix local running of commit linter on macOS ([#168](https://github.com/cernopendata/cernopendata-client/issues/168)) ([6f411b1](https://github.com/cernopendata/cernopendata-client/commit/6f411b1b4ed859b18a54365d43a0f748dbfa7c9c))
35+
36+
37+
### Documentation
38+
39+
* **claude:** add initial Claude Code configuration ([#164](https://github.com/cernopendata/cernopendata-client/issues/164)) ([7d14b38](https://github.com/cernopendata/cernopendata-client/commit/7d14b3845b859af2f2282f565f1c040898adc7f6))
40+
* **claude:** expand file change instructions for all file types ([#168](https://github.com/cernopendata/cernopendata-client/issues/168)) ([5652939](https://github.com/cernopendata/cernopendata-client/commit/5652939b50c1b0dd60471961e80476dcad64ab0a))
41+
542
## [1.0.1](https://github.com/cernopendata/cernopendata-client/compare/1.0.0...1.0.1) (2025-11-10)
643

744

CLAUDE.md

Lines changed: 147 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,147 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance for Claude Code when working with this repository.
4+
5+
## Project Overview
6+
7+
`cernopendata-client` is a command-line tool to download files from the CERN
8+
Open Data portal. It enables querying datasets and downloading/verifying
9+
individual data files.
10+
11+
## Development Setup
12+
13+
```bash
14+
# Using mise (recommended) - installs Python versions 3.8-3.14
15+
mise install
16+
17+
# Create virtual environment and install in editable mode
18+
python3 -m venv env
19+
source env/bin/activate
20+
pip install -e '.[tests]'
21+
```
22+
23+
## Testing and Code Quality
24+
25+
The project uses `run-tests.sh` for all quality checks. Run all checks with:
26+
27+
```bash
28+
./run-tests.sh
29+
```
30+
31+
### Individual Check Commands
32+
33+
| Command | Description |
34+
| ------------------------------------- | -------------------------------------- |
35+
| `./run-tests.sh --python-tests` | Run pytest test suite (all tests) |
36+
| `./run-tests.sh --python-tests-local` | Run pytest test suite (local only) |
37+
| `./run-tests.sh --format-black` | Check Python formatting (black) |
38+
| `./run-tests.sh --lint-flake8` | Lint Python code (flake8) |
39+
| `./run-tests.sh --lint-pydocstyle` | Check Python docstrings |
40+
| `./run-tests.sh --format-prettier` | Check Markdown/YAML formatting |
41+
| `./run-tests.sh --format-shfmt` | Check shell script formatting |
42+
| `./run-tests.sh --lint-shellcheck` | Lint shell scripts |
43+
| `./run-tests.sh --lint-markdownlint` | Lint Markdown files |
44+
| `./run-tests.sh --lint-yamllint` | Lint YAML files |
45+
| `./run-tests.sh --lint-jsonlint` | Lint JSON files |
46+
| `./run-tests.sh --lint-commitlint` | Check commit message format |
47+
| `./run-tests.sh --lint-manifest` | Check Python manifest (check-manifest) |
48+
| `./run-tests.sh --docs-sphinx` | Build and test Sphinx documentation |
49+
| `./run-tests.sh --docker-build` | Build Docker image |
50+
| `./run-tests.sh --docker-tests` | Run tests in Docker container |
51+
| `./run-tests.sh --lint-hadolint` | Lint Dockerfile |
52+
53+
### Using tox for Multi-Python Testing
54+
55+
```bash
56+
tox # Run tests across all Python versions (3.8-3.14)
57+
tox -e py312 # Run tests for specific Python version
58+
```
59+
60+
### Local vs Remote Tests
61+
62+
Tests that run locally without network access are marked with
63+
`@pytest.mark.local`. Use pytest markers or run-tests.sh to run subsets:
64+
65+
```bash
66+
# Run only local tests (no network access)
67+
./run-tests.sh --python-tests-local
68+
pytest -m local
69+
70+
# Run only remote tests (connect to CERN Open Data portal)
71+
pytest -m "not local"
72+
73+
# Run all tests
74+
./run-tests.sh --python-tests
75+
pytest
76+
```
77+
78+
## Project Structure
79+
80+
- `cernopendata_client/` - Main Python package
81+
- `cli.py` - Command-line interface (Click-based)
82+
- `downloader.py` - File download functionality
83+
- `searcher.py` - Dataset search functionality
84+
- `verifier.py` - File verification
85+
- `validator.py` - Input validation
86+
- `metadater.py` - Metadata handling
87+
- `config.py` - Configuration
88+
- `printer.py` - Output formatting
89+
- `utils.py` - Utility functions
90+
- `walker.py` - Directory traversal
91+
- `tests/` - Test suite
92+
- `docs/` - Sphinx documentation
93+
94+
## Code Style
95+
96+
- Python formatting: black
97+
- Python linting: flake8
98+
- Docstrings: pydocstyle (ignores D413, D301)
99+
- Commit messages: conventional commits (commitlint)
100+
- Shell scripts: shellcheck + shfmt
101+
- Markdown/YAML/JSON: prettier + markdownlint + yamllint + jsonlint
102+
103+
## When Making Changes
104+
105+
When modifying files in this repository:
106+
107+
1. **Update copyright years**: If a file has a copyright header, add the current
108+
year to the list if not already present (e.g.
109+
`Copyright (C) 2019, 2020, 2021, 2023, 2025 CERN.`)
110+
111+
2. **Python files** (`.py`): Run `./run-tests.sh --format-black` to check
112+
formatting, and `./run-tests.sh --lint-flake8` and
113+
`./run-tests.sh --lint-pydocstyle` to check linting
114+
115+
3. **Shell scripts** (`.sh`): Run `./run-tests.sh --format-shfmt` and
116+
`./run-tests.sh --lint-shellcheck` to check formatting and linting
117+
118+
4. **Markdown files** (`.md`): Run `./run-tests.sh --format-prettier` and
119+
`./run-tests.sh --lint-markdownlint` to check formatting and linting
120+
121+
5. **YAML files** (`.yml`, `.yaml`): Run `./run-tests.sh --format-prettier` and
122+
`./run-tests.sh --lint-yamllint` to check formatting and linting
123+
124+
6. **JSON files** (`.json`): Run `./run-tests.sh --format-prettier` and
125+
`./run-tests.sh --lint-jsonlint` to check formatting and linting
126+
127+
7. **Dockerfile**: Run `./run-tests.sh --lint-hadolint` to check linting, and
128+
`./run-tests.sh --docker-build` to verify the Docker image builds correctly
129+
130+
8. **Documentation** (`docs/`): Run `./run-tests.sh --docs-sphinx` to verify
131+
Sphinx documentation builds correctly
132+
133+
9. **Python package structure** (`setup.py`, `MANIFEST.in`): Run
134+
`./run-tests.sh --lint-manifest` to verify the package manifest is correct
135+
136+
10. **Commit messages**: Run `./run-tests.sh --lint-commitlint` to verify commit
137+
messages follow conventional commits format
138+
139+
11. **Tests** (`tests/`): Run `./run-tests.sh --python-tests-local` for local
140+
tests without network access, or `./run-tests.sh --python-tests` for all
141+
tests including those that connect to the CERN Open Data portal
142+
143+
## Key Notes
144+
145+
- License: GPLv3
146+
- Python support: 3.8 - 3.14
147+
- Main branch: `master`

Dockerfile

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# This file is part of cernopendata-client.
22
#
3-
# Copyright (C) 2020, 2022, 2023, 2024 CERN.
3+
# Copyright (C) 2020, 2022, 2023, 2024, 2025 CERN.
44
#
55
# cernopendata-client is free software; you can redistribute it and/or modify
66
# it under the terms of the GPLv3 license; see LICENSE file for more details.
@@ -77,5 +77,5 @@ LABEL org.opencontainers.image.title="cernopendata-client"
7777
LABEL org.opencontainers.image.url="https://github.com/cernopendata/cernopendata-client"
7878
LABEL org.opencontainers.image.vendor="cernopendata"
7979
# x-release-please-start-version
80-
LABEL org.opencontainers.image.version="1.0.1"
80+
LABEL org.opencontainers.image.version="1.0.2"
8181
# x-release-please-end

cernopendata_client/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
#
33
# This file is part of cernopendata-client.
44
#
5-
# Copyright (C) 2019 CERN.
5+
# Copyright (C) 2019, 2020, 2025 CERN.
66
#
77
# cernopendata-client is free software; you can redistribute it and/or modify
88
# it under the terms of the GPLv3 license; see LICENSE file for more details.

cernopendata_client/cli.py

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding: utf-8 -*-
22
# This file is part of cernopendata-client.
33
#
4-
# Copyright (C) 2019, 2020, 2021, 2023 CERN.
4+
# Copyright (C) 2019, 2020, 2021, 2023, 2025 CERN.
55
#
66
# cernopendata-client is free software; you can redistribute it and/or modify
77
# it under the terms of the GPLv3 license; see LICENSE file for more details.
@@ -70,9 +70,9 @@ def version():
7070

7171

7272
@cernopendata_client.command()
73-
@click.option("--recid", type=click.INT, help="Record ID")
74-
@click.option("--doi", help="Digital Object Identifier")
75-
@click.option("--title", help="Title of the record")
73+
@click.option("--recid", type=click.INT, help="Record ID (exact match)")
74+
@click.option("--doi", help="Digital Object Identifier (exact match)")
75+
@click.option("--title", help="Record title (exact match, no wildcards)")
7676
@click.option(
7777
"--output-value",
7878
is_flag=False,
@@ -147,9 +147,9 @@ def get_metadata(server, recid, doi, title, output_value, filters):
147147

148148

149149
@cernopendata_client.command()
150-
@click.option("--recid", type=click.INT, help="Record ID")
151-
@click.option("--doi", help="Digital Object Identifier")
152-
@click.option("--title", help="Record title")
150+
@click.option("--recid", type=click.INT, help="Record ID (exact match)")
151+
@click.option("--doi", help="Digital Object Identifier (exact match)")
152+
@click.option("--title", help="Record title (exact match, no wildcards)")
153153
@click.option(
154154
"--protocol",
155155
default="http",
@@ -230,9 +230,9 @@ def _validate_and_load(server, recid, doi, title, retry_limit, retry_sleep):
230230

231231

232232
@cernopendata_client.command()
233-
@click.option("--recid", type=click.INT, help="Record ID")
234-
@click.option("--doi", help="Digital Object Identifier")
235-
@click.option("--title", help="Record title")
233+
@click.option("--recid", type=click.INT, help="Record ID (exact match)")
234+
@click.option("--doi", help="Digital Object Identifier (exact match)")
235+
@click.option("--title", help="Record title (exact match, no wildcards)")
236236
@click.option(
237237
"--protocol",
238238
default="http",
@@ -347,6 +347,7 @@ def download_files(
347347
record_json = _validate_and_load(
348348
server, recid, doi, title, retry_limit, retry_sleep
349349
)
350+
record_recid = record_json["metadata"]["recid"]
350351
file_locations_info = get_files_list(server, record_json, protocol, expand)
351352
if expand:
352353
if not file_availability and any(f[3] != "online" for f in file_locations_info):
@@ -402,7 +403,7 @@ def download_files(
402403
sys.exit(0)
403404

404405
total_files = len(download_file_locations)
405-
path = str(recid)
406+
path = record_recid
406407
if not os.path.isdir(path):
407408
try:
408409
os.mkdir(path)
@@ -439,11 +440,11 @@ def download_files(
439440
if verify:
440441
file_info_remote = get_file_info_remote(
441442
server,
442-
recid,
443+
record_recid,
443444
protocol=protocol,
444445
filtered_files=[file_location],
445446
)
446-
file_info_local = get_file_info_local(recid)
447+
file_info_local = get_file_info_local(record_recid)
447448
verify_file_info(file_info_local, file_info_remote)
448449
display_message(
449450
msg_type="info",
@@ -452,19 +453,21 @@ def download_files(
452453

453454

454455
@cernopendata_client.command()
455-
@click.option("--recid", type=click.INT, help="Record ID")
456+
@click.option("--recid", type=click.INT, help="Record ID (exact match)")
457+
@click.option("--doi", help="Digital Object Identifier (exact match)")
458+
@click.option("--title", help="Record title (exact match, no wildcards)")
456459
@click.option(
457460
"--server",
458461
default=SERVER_HTTP_URI,
459462
type=click.STRING,
460463
help="Which CERN Open Data server to query? [default={}]".format(SERVER_HTTP_URI),
461464
)
462-
def verify_files(server, recid):
465+
def verify_files(server, recid, doi, title):
463466
"""Verify downloaded data file integrity.
464467
465-
Select a CERN Open Data bibliographic record by a record ID and
466-
verify integrity of downloaded data files belonging to this
467-
record.
468+
Select a CERN Open Data bibliographic record by a record ID, a
469+
DOI, or a title and verify integrity of downloaded data files
470+
belonging to this record.
468471
469472
Examples: \n
470473
\t $ cernopendata-client verify-files --recid 5500
@@ -474,24 +477,28 @@ def verify_files(server, recid):
474477
if recid is not None:
475478
validate_recid(recid)
476479

480+
# Get record metadata and resolve recid from DOI/title if needed
481+
record_json = get_record_as_json(server, recid, doi, title)
482+
record_recid = record_json["metadata"]["recid"]
483+
477484
# Get remote file information
478-
file_info_remote = get_file_info_remote(server, recid)
485+
file_info_remote = get_file_info_remote(server, record_recid)
479486

480487
# Get local file information
481-
file_info_local = get_file_info_local(recid)
488+
file_info_local = get_file_info_local(record_recid)
482489
if not file_info_local:
483490
display_message(
484491
msg_type="error",
485492
msg="No local files found for record {}. Perhaps run `download-files` first? Exiting.".format(
486-
recid
493+
record_recid
487494
),
488495
)
489496
sys.exit(1)
490497

491498
# Verify number of files
492499
display_message(
493500
msg_type="info",
494-
msg="Verifying number of files for record {}... ".format(recid),
501+
msg="Verifying number of files for record {}... ".format(record_recid),
495502
)
496503
display_message(
497504
msg_type="note",

cernopendata_client/metadater.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
# -*- coding: utf-8 -*-
22
# This file is part of cernopendata-client.
33
#
4-
# Copyright (C) 2020 CERN.
4+
# Copyright (C) 2023, 2025 CERN.
55
#
66
# cernopendata-client is free software; you can redistribute it and/or modify
77
# it under the terms of the GPLv3 license; see LICENSE file for more details.

0 commit comments

Comments
 (0)