Skip to content

Commit a720afb

Browse files
feat: add pathogen.json example of schema usage
1 parent e335316 commit a720afb

18 files changed

+145
-383
lines changed

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
*.bak
22
*.log
3+
*.pyc
34
/build/
45
/data_dev*/
56
/data_local*

packages/nextclade-schemas/README.md

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,8 @@ Both use-cases can be handy in downstream applications, because, as stated above
4141

4242
## Example: Python
4343

44+
This is a simple example which demonstrates how to use Nextclade JSON schemas to generate Python dataclasses and how to use them to read Nextclade JSON files in a type-safe way. Note that Python language is duck typed, so it's probably not the best choice if you want to build something type-safe. We demonstrate with Python here only because its omnipresence in bioinformatics.
45+
4446
First you need to obtain JSON schema definitions. Depending on your needs and tools you use you could:
4547

4648
- run `nextclade schemas write -o nextclade-schemas/`
@@ -56,17 +58,25 @@ cd nextclade-schemas/
5658

5759
pip3 install dacite datamodel-code-generator pydantic
5860

61+
mkdir -p examples/python/lib/
62+
5963
# Generate Python classes
60-
datamodel-codegen --input-file-type "jsonschema" --output-model-type "dataclasses.dataclass" --enum-field-as-literal=all --input "output-json.schema.yaml" --output "examples/python/nextclade_output_json.py"
64+
datamodel-codegen --input-file-type "jsonschema" --output-model-type "dataclasses.dataclass" --enum-field-as-literal=all --input "output-json.schema.yaml" --output "examples/python/lib/nextclade_output_json.py"
65+
66+
datamodel-codegen --input-file-type "jsonschema" --output-model-type "dataclasses.dataclass" --enum-field-as-literal=all --input "input-pathogen-json.schema.yaml" --output "examples/python/lib/nextclade_input_pathogen_json.py"
6167

6268
cd examples python/
6369

64-
# Run the example which is using the generated Python classes
65-
# See packages/nextclade-schemas/examples/python/example.py
66-
python3 examples/python/example.py path/to/your/nextclade.json
70+
# Run the output JSON example which is using the generated Python classes.
71+
# See packages/nextclade-schemas/examples/python/example_output_json.py
72+
python3 examples/python/example_output_json.py $path_to_your_output_nextclade_json
73+
74+
# Run the input pathogen JSON example which is using the generated Python classes.
75+
# See packages/nextclade-schemas/examples/python/example_pathogen_json.py
76+
python3 examples/python/example_pathogen_json.py $path_to_your_pathogen_json
6777
```
6878

69-
In this example the generated file `nextclade_output_json.py` will contain the Python dataclasses derived from `output-json.schema.yaml`. The example program in `examples/python/example.py` reads the Nextclade output JSON file (produced separately with `nextclade run --output-json ...`) and casts the resulting dict to the generated dataclasses (recursively) types using `dacite` library. You can then access to the data in a convenient and type-safe manner, and most text editors should also provide code completions. This is especially true for type-safe, compiled languages.
79+
In this example the generated file `nextclade_output_json.py` will contain the Python dataclasses derived from `output-json.schema.yaml`. The example program in `examples/python/example.py` reads the Nextclade output JSON file (produced separately with `nextclade run --output-json ...`) and casts the resulting dict to the generated dataclasses (recursively) types using `dacite` library. You can then access to the data in a convenient and type-safe manner, and most text editors should also provide code completions. This approach with dataclasses can be somewhat slow for big inputs, because it requires converting Python dicts into dataclasses. You might find another, better solution which fits your use-case better - there are many tools which can understand JSON schema.
7080

7181
## Other languages and tools
7282

packages/nextclade-schemas/examples/python/example.py renamed to packages/nextclade-schemas/examples/python/example_output_json.py

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,12 @@
11
"""
2-
Example demonstrating how to read JSON using Python classes generated from JSON schema.
2+
Example demonstrating how to read Nextclade output JSON using Python classes generated from JSON schema.
33
See README.md in the parent directory for instructions.
4-
```
5-
64
"""
75

86
import sys
97
import json
108
from dacite import from_dict
11-
from nextclade_output_json import ResultsJson
9+
from lib.nextclade_output_json import ResultsJson
1210

1311

1412
def read_nextclade_output_json(filepath: str | None = None) -> ResultsJson:
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
"""
2+
Example demonstrating how to read Nextclade pathogen JSON using Python classes generated from JSON schema.
3+
See README.md in the parent directory for instructions.
4+
"""
5+
6+
import sys
7+
import json
8+
from pathlib import Path
9+
from dacite import from_dict
10+
from lib.nextclade_input_pathogen_json import PathogenJson
11+
12+
13+
def read_nextclade_pathogen_json(filepath: str | None = None) -> PathogenJson:
14+
source = sys.stdin if filepath is None else open(filepath)
15+
with source as f:
16+
json_data = json.load(f)
17+
return from_dict(PathogenJson, json_data)
18+
19+
20+
def dict_get(d: dict, key: str, default=None):
21+
return d[key] if key in d else default
22+
23+
24+
if __name__ == "__main__":
25+
filepath = sys.argv[1] if len(sys.argv) > 1 else None
26+
27+
data = read_nextclade_pathogen_json(filepath)
28+
29+
name = dict_get(data.attributes or {}, "name", "unknown")
30+
ref_name = dict_get(data.attributes or {}, "reference name", "unknown")
31+
ref_accession = dict_get(data.attributes or {}, "reference accession", "unknown")
32+
33+
print(f"Dataset name: {name}")
34+
35+
if data.shortcuts:
36+
print(f"aka: {', '.join(data.shortcuts)}")
37+
38+
print(f"Reference name: {ref_name}")
39+
print(f"Reference accession: {ref_accession}")
40+
41+
if data.files:
42+
print("\nDataset files:")
43+
dataset_dir = Path(filepath).parent if filepath else Path.cwd()
44+
for file_type, filename in vars(data.files).items():
45+
if filename:
46+
absolute_path = dataset_dir / filename
47+
print(f" {file_type}: {absolute_path}")
48+
49+
if data.alignmentParams:
50+
print("\nAlignment parameters:")
51+
params_dict = {k: v for k, v in vars(data.alignmentParams).items() if v is not None}
52+
print(json.dumps(params_dict, indent=2, default=str))

packages/nextclade-schemas/examples/python/lib/.gitkeep

Whitespace-only changes.

packages/nextclade-schemas/input-pathogen-json.schema.json

Lines changed: 8 additions & 100 deletions
Original file line numberDiff line numberDiff line change
@@ -155,9 +155,7 @@
155155
},
156156
"attributes": {
157157
"type": "object",
158-
"additionalProperties": {
159-
"$ref": "#/definitions/AnyType"
160-
}
158+
"additionalProperties": true
161159
},
162160
"shortcuts": {
163161
"type": "array",
@@ -284,47 +282,6 @@
284282
}
285283
},
286284
"definitions": {
287-
"AnyType": {
288-
"description": "Any type that can be represented in JSON",
289-
"anyOf": [
290-
{
291-
"title": "string",
292-
"type": "string"
293-
},
294-
{
295-
"title": "int",
296-
"type": "integer",
297-
"format": "int"
298-
},
299-
{
300-
"title": "float",
301-
"type": "number",
302-
"format": "double"
303-
},
304-
{
305-
"title": "bool",
306-
"type": "boolean"
307-
},
308-
{
309-
"title": "array",
310-
"type": "array",
311-
"items": {
312-
"$ref": "#/definitions/AnyType"
313-
}
314-
},
315-
{
316-
"title": "object",
317-
"type": "object",
318-
"additionalProperties": {
319-
"$ref": "#/definitions/AnyType"
320-
}
321-
},
322-
{
323-
"title": "null",
324-
"type": "null"
325-
}
326-
]
327-
},
328285
"DatasetMeta": {
329286
"type": "object",
330287
"properties": {
@@ -1083,70 +1040,22 @@
10831040
"minimum": 0.0
10841041
},
10851042
"maxIndel": {
1086-
"description": "REMOVED",
1087-
"anyOf": [
1088-
{
1089-
"$ref": "#/definitions/AnyType"
1090-
},
1091-
{
1092-
"type": "null"
1093-
}
1094-
]
1043+
"description": "REMOVED"
10951044
},
10961045
"seedLength": {
1097-
"description": "REMOVED",
1098-
"anyOf": [
1099-
{
1100-
"$ref": "#/definitions/AnyType"
1101-
},
1102-
{
1103-
"type": "null"
1104-
}
1105-
]
1046+
"description": "REMOVED"
11061047
},
11071048
"mismatchesAllowed": {
1108-
"description": "REMOVED",
1109-
"anyOf": [
1110-
{
1111-
"$ref": "#/definitions/AnyType"
1112-
},
1113-
{
1114-
"type": "null"
1115-
}
1116-
]
1049+
"description": "REMOVED"
11171050
},
11181051
"minSeeds": {
1119-
"description": "REMOVED",
1120-
"anyOf": [
1121-
{
1122-
"$ref": "#/definitions/AnyType"
1123-
},
1124-
{
1125-
"type": "null"
1126-
}
1127-
]
1052+
"description": "REMOVED"
11281053
},
11291054
"minMatchRate": {
1130-
"description": "REMOVED",
1131-
"anyOf": [
1132-
{
1133-
"$ref": "#/definitions/AnyType"
1134-
},
1135-
{
1136-
"type": "null"
1137-
}
1138-
]
1055+
"description": "REMOVED"
11391056
},
11401057
"seedSpacing": {
1141-
"description": "REMOVED",
1142-
"anyOf": [
1143-
{
1144-
"$ref": "#/definitions/AnyType"
1145-
},
1146-
{
1147-
"type": "null"
1148-
}
1149-
]
1058+
"description": "REMOVED"
11501059
}
11511060
}
11521061
},
@@ -1347,8 +1256,7 @@
13471256
"type": "number",
13481257
"format": "double"
13491258
}
1350-
},
1351-
true
1259+
}
13521260
]
13531261
},
13541262
"AaMotifsDesc": {

packages/nextclade-schemas/input-pathogen-json.schema.yaml

Lines changed: 1 addition & 44 deletions
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,7 @@ properties:
103103
type: string
104104
attributes:
105105
type: object
106-
additionalProperties:
107-
$ref: '#/definitions/AnyType'
106+
additionalProperties: true
108107
shortcuts:
109108
type: array
110109
items:
@@ -168,29 +167,6 @@ properties:
168167
- $ref: '#/definitions/DatasetCompatibility'
169168
- type: 'null'
170169
definitions:
171-
AnyType:
172-
description: Any type that can be represented in JSON
173-
anyOf:
174-
- title: string
175-
type: string
176-
- title: int
177-
type: integer
178-
format: int
179-
- title: float
180-
type: number
181-
format: double
182-
- title: bool
183-
type: boolean
184-
- title: array
185-
type: array
186-
items:
187-
$ref: '#/definitions/AnyType'
188-
- title: object
189-
type: object
190-
additionalProperties:
191-
$ref: '#/definitions/AnyType'
192-
- title: 'null'
193-
type: 'null'
194170
DatasetMeta:
195171
type: object
196172
properties:
@@ -728,34 +704,16 @@ definitions:
728704
minimum: 0.0
729705
maxIndel:
730706
description: REMOVED
731-
anyOf:
732-
- $ref: '#/definitions/AnyType'
733-
- type: 'null'
734707
seedLength:
735708
description: REMOVED
736-
anyOf:
737-
- $ref: '#/definitions/AnyType'
738-
- type: 'null'
739709
mismatchesAllowed:
740710
description: REMOVED
741-
anyOf:
742-
- $ref: '#/definitions/AnyType'
743-
- type: 'null'
744711
minSeeds:
745712
description: REMOVED
746-
anyOf:
747-
- $ref: '#/definitions/AnyType'
748-
- type: 'null'
749713
minMatchRate:
750714
description: REMOVED
751-
anyOf:
752-
- $ref: '#/definitions/AnyType'
753-
- type: 'null'
754715
seedSpacing:
755716
description: REMOVED
756-
anyOf:
757-
- $ref: '#/definitions/AnyType'
758-
- type: 'null'
759717
AlignmentPreset:
760718
type: string
761719
enum:
@@ -891,7 +849,6 @@ definitions:
891849
additionalProperties:
892850
type: number
893851
format: double
894-
- true
895852
AaMotifsDesc:
896853
description: Describes motifs in amino acid sequences, such as glycosylation sites, disulfide bonds, etc.
897854
examples:

0 commit comments

Comments
 (0)