Skip to content

Commit b0040eb

Browse files
authored
Taxonomy on startup (#561)
* Add convenience script for build command. Should maybe merge later * Avoid start taxonomy on startup If we do want to roll out taxonomy updates automatically, we probably want to regularly spin up the taxonomy container to allow updates while the REST API is live. * Extract method so module can also be used on its own without cli * Import specified taxonomy on startup * Add documentation on taxonomies and its service * Predefine the import folder of taxonomies
1 parent 52a541a commit b0040eb

File tree

9 files changed

+74
-8
lines changed

9 files changed

+74
-8
lines changed

data/taxonomies/.gitkeep

Whitespace-only changes.

docker-compose.dev.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ services:
33
stdin_open: true
44
volumes:
55
- ./src:/app:ro
6+
- ${DATA_PATH}/taxonomies:/data/taxonomies
67
command: python main.py
78

89
fill-db-with-examples:

docker-compose.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -237,9 +237,9 @@ services:
237237
condition: service_completed_successfully
238238

239239
taxonomy:
240+
profiles: ["taxonomy"]
240241
container_name: taxonomy
241242
image: aiod_metadata_catalogue
242243
volumes:
243-
- ./data/taxonomies.json:/data/taxonomies.json
244-
- ./src:/app
244+
- ${DATA_PATH}/taxonomies/taxonomies.json:/data/taxonomies.json
245245
command: python taxonomies/synchronize_taxonomy.py
Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
# Taxonomies
2+
3+
AI-on-Demand uses [taxonomies](https://en.wikipedia.org/wiki/Taxonomy) to standardize terms across assets, for example, for [licenses](https://aiod-dev.i3a.es/docs#/Taxonomies/license_licenses_get), [business sectors](https://aiod-dev.i3a.es/docs#/Taxonomies/industrial_sector_industrial_sectors_get), or [news categories](https://aiod-dev.i3a.es/docs#/Taxonomies/news_category_news_categorys_get).
4+
These taxonomies are defined by the [conceptual model](https://github.com/aiondemand/metadata-schema).
5+
Each term in a taxonomy has a specific definition and may have subterms defined. e.g., the business sector `construction` has subsectors for `infrastructure` and `buildings`, which each may also have subterms.
6+
7+
## Importing the Taxonomy
8+
The JSON file produced by the export script in the metadata repository can be used as input for the `taxonomy` service of this project's `docker compose`.
9+
Put the JSON file at `${DATA_PATH}/taxonomies/taxonomies.json`, where `DATA_PATH` is an environment variable (typically set from the `.env` file).
10+
Then invoke docker compose with the "taxonomy" profile. The script will then:
11+
12+
- invalidate all old known terms that are no longer part of the taxonomy: assets which already use them will keep them, but they may not be added to new items.
13+
- update existing terms, e.g., with new definitions
14+
- add new terms to the taxonomy
15+
16+
## Development Taxonomy
17+
When using the development configuration, you put a file in the `${DATA_PATH}/taxonomies` directory and point to it from the API configuration file (`config.override.toml`), where `${DATA_PATH}/taxonomies` will be mounted as `/data/taxonomies`. For example, with `DATA_PATH=./data` and `./data/taxonomies/example_taxonomies.json` present:
18+
19+
```toml
20+
[dev]
21+
taxonomy="/data/taxonomies/example_taxonomies.json"
22+
```
23+
24+
It is not intended to use this in production, since it might result in accidentally overwriting the taxonomies.
25+
In production, use the `taxonomy` service discussed above.

mkdocs.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,7 @@ nav:
5656
- 'Relationships': developer/schema/relationships.md
5757
- 'Objects': developer/schema/objects.md
5858
- 'Schema Migration': developer/schema/migration.md
59+
- 'Taxonomies': developer/schema/taxonomies.md
5960
- 'Elastic Search': developer/elastic_search.md
6061
- 'Scripts': developer/scripts.md
6162
- 'Release Workflow': developer/releases.md

scripts/build.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
3+
profiles=""
4+
for arg in "$@"; do
5+
profiles+="--profile $arg "
6+
done
7+
8+
NC='\033[0m' # No Color
9+
CYAN='\033[1;36m'
10+
GREEN='\033[0;32m'
11+
12+
source .env
13+
[ ! -f override.env ] && touch override.env
14+
source override.env
15+
16+
if [[ "${USE_LOCAL_DEV}" == "true" ]]; then
17+
compose_with_dev="-f docker-compose.dev.yaml"
18+
profiles+="--profile nginx"
19+
echo -e "Launching ${CYAN}with${NC} local changes."
20+
else
21+
compose_with_dev=""
22+
echo -e "Launching ${GREEN}without${NC} local changes."
23+
fi
24+
25+
command="docker compose --env-file=.env --env-file=override.env -f docker-compose.yaml ${compose_with_dev} ${profiles} build"
26+
echo "${command}"
27+
eval "${command}"

src/config.default.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ request_timeout = 10 # seconds
2929
log_level = "INFO" # Python log levels: https://docs.python.org/3/library/logging.html#logging-levels
3030
disable_reviews = false # set to true to automatically publish submissions
3131
url_prefix = ""
32+
taxonomy = "" # Set to a JSON file containing a taxonomy to put into the database on startup
3233

3334
# Authentication and authorization
3435
[keycloak]

src/main.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
import argparse
99
from datetime import datetime, timezone
1010
import logging
11+
from pathlib import Path
1112

1213
import pkg_resources
1314
import uvicorn
@@ -28,6 +29,7 @@
2829
from database.model.platform.platform_names import PlatformName
2930
from database.session import EngineSingleton, DbSession
3031
from database.setup import create_database, database_exists
32+
from taxonomies.synchronize_taxonomy import synchronize_taxonomy_from_file
3133
from triggers import disable_review_process, enable_review_process
3234
from error_handling import http_exception_handler
3335
from database.model.agent.agent import Agent
@@ -105,6 +107,11 @@ def create_app() -> FastAPI:
105107
drop_database = build_database_setting == "drop-then-build"
106108
build_database(drop_database=drop_database)
107109

110+
if taxonomy_path := DEV_CONFIG.get("taxonomy"):
111+
if not (taxonomy_file := Path(taxonomy_path)).is_file():
112+
raise ValueError(f"dev.taxonomy must be a path to a file, but is {taxonomy_path!r}.")
113+
synchronize_taxonomy_from_file(taxonomy_file)
114+
108115
pyproject_toml = pkg_resources.get_distribution("aiod_metadata_catalogue")
109116
app = build_app(url_prefix=DEV_CONFIG.get("url_prefix", ""), version=pyproject_toml.version)
110117
return app

src/taxonomies/synchronize_taxonomy.py

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -30,7 +30,7 @@ def parse_args():
3030
class Term(NamedTuple):
3131
name: str
3232
definition: str
33-
children: list[Self] # type: ignore[valid-type]
33+
children: list[Self] # type: ignore[valid-type]
3434

3535

3636
type_by_name: dict[str, type] = {
@@ -116,17 +116,21 @@ def synchronize_term(term: Taxonomy):
116116
synchronize_term(term)
117117

118118

119-
def main():
120-
logging.basicConfig(level=logging.INFO)
121-
logging.info("Starting synchronization script.")
122-
args = parse_args()
123-
taxonomies = load_taxonomies_from_json(args.definitions_file)
119+
def synchronize_taxonomy_from_file(file: Path) -> None:
120+
taxonomies = load_taxonomies_from_json(file)
124121
with DbSession(autoflush=False) as session:
125122
for type_, definitions in taxonomies:
126123
synchronize(type_, definitions, session)
127124
logging.info("Committing changes to database.")
128125
session.commit()
129126

130127

128+
def main():
129+
logging.basicConfig(level=logging.INFO)
130+
logging.info("Starting synchronization script.")
131+
args = parse_args()
132+
synchronize_taxonomy_from_file(args.definitions_file)
133+
134+
131135
if __name__ == "__main__":
132136
main()

0 commit comments

Comments
 (0)