Skip to content

Commit 4bb5f6f

Browse files
yarikopticclaude
andcommitted
docs: comprehensive CLAUDE.md with architecture, conventions, and checklist
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 8b01529 commit 4bb5f6f

File tree

2 files changed

+140
-0
lines changed

2 files changed

+140
-0
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,3 +14,5 @@ sandbox/
1414
venv/
1515
venvs/
1616
dandischema/_version.py
17+
uv.lock
18+
.cache/

CLAUDE.md

Lines changed: 138 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# CLAUDE.md
2+
3+
This file provides guidance to Claude Code when working with code in this
4+
repository.
5+
6+
## Project Overview
7+
8+
dandischema defines the Pydantic v2 metadata models for the DANDI
9+
neurophysiology data archive. It is used by both the dandi-cli client and the
10+
dandi-archive server. Key concerns: model definitions, JSON Schema generation,
11+
metadata validation, schema migration between versions, and asset metadata
12+
aggregation.
13+
14+
## Build/Test Commands
15+
16+
```bash
17+
tox -e py3 # Run full test suite (preferred)
18+
pytest dandischema/ # Run tests directly in active venv
19+
pytest dandischema/tests/test_metadata.py -v -k "test_name" # Single test
20+
tox -e lint # codespell + flake8
21+
tox -e typing # mypy (strict, with pydantic plugin)
22+
```
23+
24+
- `filterwarnings = error` is active — new warnings will fail tests.
25+
- Coverage is collected by default (`--cov=dandischema`).
26+
27+
## Code Style
28+
29+
- **Formatter**: Black (no explicit line-length override → default 88)
30+
- **Import sorting**: isort with `profile = "black"`, `force_sort_within_sections`,
31+
`reverse_relative`
32+
- **Linting**: flake8 (max-line-length=100, ignores E203/W503)
33+
- **Type checking**: mypy strict — `no_implicit_optional`, `warn_return_any`,
34+
`warn_unreachable`, pydantic plugin enabled
35+
- **Pre-commit hooks**: trailing-whitespace, end-of-file-fixer, check-yaml,
36+
check-added-large-files, black, isort, codespell, flake8
37+
- Imports at top of file; avoid function-level imports unless there is a
38+
concrete reason (circular deps, heavy transitive imports)
39+
40+
## Architecture
41+
42+
### Key Modules
43+
44+
| File | Role |
45+
|------|------|
46+
| `models.py` | All Pydantic models (~2000 lines). Class hierarchy rooted at `DandiBaseModel`. |
47+
| `metadata.py` | `validate()`, `migrate()`, `aggregate_assets_summary()`. |
48+
| `consts.py` | `DANDI_SCHEMA_VERSION`, `ALLOWED_INPUT_SCHEMAS`, `ALLOWED_TARGET_SCHEMAS`. |
49+
| `conf.py` | Instance configuration via env vars (`DANDI_INSTANCE_NAME`, etc.). |
50+
| `types.py` | Custom Pydantic types (`ByteSizeJsonSchema`). |
51+
| `utils.py` | JSON schema helpers, `version2tuple()`, `name2title()`. |
52+
| `exceptions.py` | `ValidationError`, `JsonschemaValidationError`, `PydanticValidationError`. |
53+
| `digests/` | `DandiETag` multipart-upload checksum calculation. |
54+
| `datacite/` | DataCite DOI metadata conversion. |
55+
56+
### Model Hierarchy (simplified)
57+
58+
```
59+
DandiBaseModel
60+
├── PropertyValue # recursive (self-referencing)
61+
├── BaseType
62+
│ ├── StandardsType # name, identifier, version, extensions (recursive)
63+
│ ├── ApproachType, AssayType, SampleType, Anatomy, ...
64+
│ └── MeasurementTechniqueType
65+
├── Person, Organization # Contributor subclasses
66+
├── BioSample # recursive (wasDerivedFrom)
67+
├── AssetsSummary # aggregated stats
68+
└── CommonModel
69+
├── Dandiset → PublishedDandiset
70+
└── BareAsset → Asset → PublishedAsset
71+
```
72+
73+
Several models are **self-referencing** (PropertyValue, BioSample,
74+
StandardsType). These require `model_rebuild()` after the class definition.
75+
76+
### Data Flow: Asset Metadata Aggregation
77+
78+
1. dandi-cli calls `asset.get_metadata()` → populates `BareAsset` including
79+
per-asset `dataStandard` list
80+
2. Asset metadata is serialized via `model_dump(mode="json", exclude_none=True)`
81+
3. Server calls `aggregate_assets_summary(assets)`
82+
`_add_asset_to_stats()` per asset → `AssetsSummary`
83+
4. `_add_asset_to_stats()` collects: numberOfBytes, numberOfFiles, approach,
84+
measurementTechnique, variableMeasured, species, subjects, dataStandard
85+
5. `dataStandard` has deprecated path/encoding heuristic fallbacks for old
86+
clients (remove after 2026-12-01)
87+
88+
### Pre-instantiated Standard Constants
89+
90+
```python
91+
nwb_standard # RRID:SCR_015242
92+
bids_standard # RRID:SCR_016124
93+
ome_ngff_standard # DOI:10.25504/FAIRsharing.9af712
94+
hed_standard # RRID:SCR_014074
95+
```
96+
97+
These are dicts (`model_dump(mode="json", exclude_none=True)`) used by both
98+
dandischema (heuristic fallbacks) and dandi-cli (per-asset population).
99+
100+
### Vendorization
101+
102+
The schema supports deployment for different DANDI instances. Environment
103+
variables (`DANDI_INSTANCE_NAME`, `DANDI_INSTANCE_IDENTIFIER`,
104+
`DANDI_DOI_PREFIX`, etc.) must be set **before** importing
105+
`dandischema.models`. This dynamically adjusts identifier patterns, DOI
106+
prefixes, license enums, and URL patterns. CI tests multiple vendored
107+
configurations.
108+
109+
## Schema Change Checklist
110+
111+
When adding or removing fields from any model (BareAsset, Dandiset,
112+
AssetsSummary, etc.):
113+
114+
1. **Update `_FIELDS_INTRODUCED` in `metadata.py:migrate()`** if adding a new
115+
**top-level field to Dandiset metadata**`migrate()` only processes
116+
Dandiset-level dicts (not Asset metadata). Fields on BareAsset or nested
117+
inside existing structures (e.g. new fields on StandardsType) do not need
118+
entries here.
119+
120+
2. **Update `consts.py`** if bumping `DANDI_SCHEMA_VERSION` or adding to
121+
`ALLOWED_INPUT_SCHEMAS`.
122+
123+
3. **Add tests** covering migration/aggregation with the new field.
124+
125+
4. **Coordinate with dandi-cli** — new fields that dandi-cli populates need
126+
backward-compat guards there (check `"field" in Model.model_fields`) until
127+
the minimum dandischema dependency is bumped.
128+
129+
## Testing Notes
130+
131+
- Tests use `filterwarnings = error` — any new deprecation warning will fail.
132+
- The `clear_dandischema_modules_and_set_env_vars` fixture (conftest.py)
133+
supports testing vendored configurations by clearing cached modules and
134+
setting env vars.
135+
- Network-dependent tests are skipped when `DANDI_TESTS_NONETWORK` is set.
136+
- DataCite tests require `DATACITE_DEV_LOGIN` / `DATACITE_DEV_PASSWORD`.
137+
- `test_models.py:test_duplicate_classes` checks for duplicate field qnames
138+
across models; allowed duplicates are listed explicitly.

0 commit comments

Comments
 (0)