Skip to content

Commit c4e4e8f

Browse files
committed
docs: write up current structure for defining codelists for the specification
1 parent 2f0ec89 commit c4e4e8f

File tree

1 file changed

+91
-0
lines changed

1 file changed

+91
-0
lines changed

docs/codelists.md

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,91 @@
1+
# Codelists
2+
3+
A codelist is a controlled list of valid values for a particular field.
4+
5+
Each codelist definition file describes what the list is for, where the source data comes from, and the structure of the codes.
6+
7+
This helps standardise terminology, improve validation, and make it easier to integrate systems.
8+
9+
## Why
10+
11+
We get several benefits from defining codelists, including:
12+
13+
* **Consistency and reuse**
14+
Ensures the same values are used everywhere, avoiding subtle variations.
15+
* **Easier validation**
16+
Fields referencing a codelist can be checked automatically against a known set of codes.
17+
* **Easier to maintain**
18+
One place to update the list and its metadata, rather than chasing copies in multiple specs.
19+
* **Clear provenance**
20+
Source and licensing information are explicit, so consumers know where the data came from.
21+
* **Declarative, not procedural**
22+
Defined as structured data, so the list is format-neutral and can be processed by different tools and languages.
23+
24+
## Decisions
25+
26+
**Each codelist has a single canonical definition**
27+
The definition lives in this shared planning application specification repository
28+
29+
**The data can be defined in the repo or elsewhere**
30+
Some codelists are specific to this specification so the CSV (or other format) containing the actual codes will be included in this repository. Other codelists have wider applicability so they will be elsewhere for wider use.
31+
32+
**Attributes of codelist definitions**
33+
34+
* `codelist` — short, stable identifier (lowercase kebab-case)
35+
* `name` — singular display name
36+
* `plural` — plural display name
37+
* `description` — purpose and scope
38+
* `organisation` — identifier for the owning organisation
39+
* `licence` — licence for reuse (e.g. `ogl3`)
40+
* `source` — URL to the authoritative source data (CSV or API)
41+
* `fields` — list of column names in the codelist data file
42+
* `key-field` — column containing the unique identifier for codes
43+
* `entry-date` — when this codelist definition was first added
44+
* `end-date` — when this codelist definition was withdrawn (if applicable)
45+
* `notes` — any extra context or implementation guidance
46+
* `github-discussion` — link or ID for relevant discussion thread
47+
48+
**The codelist definition is metadata only**
49+
It describes the list and its columns but does not include the rows themselves.
50+
51+
**Fields in a codelist CSV should match the `fields` attribute**
52+
This allows automated validation to check that the source file has the expected structure.
53+
54+
## Still to decide
55+
56+
* Should codelist definitions include version information beyond `entry-date` and `end-date`?
57+
* Should we require `status` (e.g. active, deprecated, experimental)?
58+
* Do the fields need to be defined?
59+
60+
## Example
61+
62+
Codelist definition:
63+
```yaml
64+
---
65+
codelist: development-phase
66+
name: Development phase
67+
plural: Development phases
68+
description: |
69+
The development phase codelist defines the various stages or phases that an extraction of oil and gas project may progress through, such as exploratory and production. This helps standardize the terminology used to describe the status of projects.
70+
organisation: government-organisation:D1342
71+
licence: ogl3
72+
entry-date: 2025-08-13
73+
end-date:
74+
fields:
75+
- field: reference
76+
- field: name
77+
- field: description
78+
key-field: reference
79+
source:
80+
notes:
81+
github-discussion: 194
82+
---
83+
```
84+
85+
### Validation rules for codelist definitions
86+
87+
* codelist, name, description, fields, source and key-field must be present
88+
* every field in fields must appear as a column in the source data
89+
* the key-field must be unique within the source data
90+
* if end-date is present, it must be on or after entry-date
91+
* `source` must be a valid URL

0 commit comments

Comments
 (0)