Skip to content

Commit 6bb92be

Browse files
authored
feat: Add MedCAT OPCS-4 service (#18)
Introduce a MedCAT OPCS-4 service to handle OPCS annotation codes, corresponding to interventions and procedures. The new service mirrors the structure and behaviour of the ICD-10 equivalent, exposing only the OPCS-4 labels among the annotations generated by the underlying MedCAT model, relying on the existence of a 'cui2opcs4' mapping in the latter's concept database. This commit introduces the new model type and service, adjusts the existing tests accordingly, extends the Docker Compose stack with a 'medcat-opcs4' Docker service, and updates the Grafana/Prometheus configuration to take it into account. Signed-off-by: Phoevos Kalemkeris <[email protected]>
1 parent 4481a90 commit 6bb92be

File tree

25 files changed

+3371
-15
lines changed

25 files changed

+3371
-15
lines changed

.github/workflows/api-docs.yaml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,7 @@ jobs:
3333
run: |
3434
python app/cli/cli.py export-model-apis --model-type medcat_snomed --add-training-apis --no-exclude-unsupervised-training --no-exclude-metacat-training --add-evaluation-apis --add-previews-apis
3535
python app/cli/cli.py export-model-apis --model-type medcat_icd10 --add-training-apis --no-exclude-unsupervised-training --no-exclude-metacat-training --add-evaluation-apis --add-previews-apis
36+
python app/cli/cli.py export-model-apis --model-type medcat_opcs4 --add-training-apis --no-exclude-unsupervised-training --no-exclude-metacat-training --add-evaluation-apis --add-previews-apis
3637
python app/cli/cli.py export-model-apis --model-type medcat_umls --add-training-apis --no-exclude-unsupervised-training --no-exclude-metacat-training --add-evaluation-apis --add-previews-apis
3738
python app/cli/cli.py export-model-apis --model-type anoncat --add-training-apis --add-evaluation-apis --add-previews-apis --exclude-metacat-training --exclude-unsupervised-training
3839
python app/cli/cli.py export-model-apis --model-type transformers_deid --add-training-apis --add-evaluation-apis --add-previews-apis --exclude-metacat-training --exclude-unsupervised-training
@@ -43,6 +44,7 @@ jobs:
4344
git checkout gh-pages
4445
mv ./medcat_snomed_model_apis.json ./docs/medcat_snomed_model_apis.json
4546
mv ./medcat_icd10_model_apis.json ./docs/medcat_icd10_model_apis.json
47+
mv ./medcat_opcs4_model_apis.json ./docs/medcat_opcs4_model_apis.json
4648
mv ./medcat_umls_model_apis.json ./docs/medcat_umls_model_apis.json
4749
mv ./anoncat_model_apis.json ./docs/anoncat_model_apis.json
4850
mv ./transformers_deid_model_apis.json ./docs/transformers_deid_model_apis.json
@@ -51,7 +53,7 @@ jobs:
5153
mv ./cogstack_model_serve_apis.json ./docs/cogstack_model_serve_apis.json
5254
git config --global user.name "cogstack-model-serve"
5355
git config --global user.email "[email protected]"
54-
git add ./docs/medcat_snomed_model_apis.json ./docs/medcat_icd10_model_apis.json ./docs/medcat_umls_model_apis.json ./docs/anoncat_model_apis.json ./docs/transformers_deid_model_apis.json ./docs/huggingface_ner_model_apis.json ./docs/huggingface_llm_model_apis.json ./docs/cogstack_model_serve_apis.json
56+
git add ./docs/medcat_snomed_model_apis.json ./docs/medcat_icd10_model_apis.json ./docs/medcat_opcs4_model_apis.json ./docs/medcat_umls_model_apis.json ./docs/anoncat_model_apis.json ./docs/transformers_deid_model_apis.json ./docs/huggingface_ner_model_apis.json ./docs/huggingface_llm_model_apis.json ./docs/cogstack_model_serve_apis.json
5557
if [[ `git status --porcelain --untracked-files=no` ]]; then
5658
git commit -m "update api docs"
5759
else

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ Currently, CMS offers both HTTP endpoints for running NLP-related jobs and a com
1818
[OpenAPI Docs](https://cogstack.github.io/CogStack-ModelServe/):
1919
- [SNOMED MedCAT Model](https://cogstack.github.io/CogStack-ModelServe/docs/medcat_snomed_model_apis.html)
2020
- [ICD-10 MedCAT Model](https://cogstack.github.io/CogStack-ModelServe/docs/medcat_icd10_model_apis.html)
21+
- [OPCS-4 MedCAT Model](https://cogstack.github.io/CogStack-ModelServe/docs/medcat_opcs4_model_apis.html)
2122
- [UMLS MedCAT Model](https://cogstack.github.io/CogStack-ModelServe/docs/medcat_umls_model_apis.html)
2223
- [De-ID MedCAT Model (AnonCAT)](https://cogstack.github.io/CogStack-ModelServe/docs/anoncat_model_apis.html)
2324
- [HuggingFace NER Model](https://cogstack.github.io/CogStack-ModelServe/docs/huggingface_ner_model_apis.html)
@@ -59,6 +60,7 @@ The following table summarises the servable model types with their respective ou
5960
|:---------------------:|:---------------:|:---------------------------------:|
6061
| medcat_snomed | medcat-snomed | labelled with SNOMED concepts |
6162
| medcat_icd10 | medcat-icd10 | labelled with ICD-10 concepts |
63+
| medcat_opcs4 | medcat-opcs4 | labelled with OPCS-4 concepts |
6264
| medcat_umls | medcat-umls | labelled with UMLS concepts |
6365
| medcat_deid (anoncat) | medcat-deid | labelled with latest PII concepts |
6466
| huggingface_ner | huggingface_ner | customer managed labels |

app/api/routers/unsupervised_training.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,7 @@ async def train_unsupervised_with_hf_dataset(
162162
if hf_dataset_repo_id is None and hf_dataset_package is None:
163163
raise ClientException("Either 'hf_dataset_repo_id' or 'hf_dataset_package' must be provided")
164164

165-
if model_service.info().model_type not in [ModelType.HUGGINGFACE_NER, ModelType.MEDCAT_SNOMED, ModelType.MEDCAT_ICD10, ModelType.MEDCAT_UMLS]:
165+
if model_service.info().model_type not in [ModelType.HUGGINGFACE_NER, ModelType.MEDCAT_SNOMED, ModelType.MEDCAT_ICD10, ModelType.MEDCAT_OPCS4, ModelType.MEDCAT_UMLS]:
166166
raise ConfigurationException(f"Currently this endpoint is not available for models of type: {model_service.info().model_type.value}")
167167

168168
data_dir = tempfile.TemporaryDirectory()

app/cli/README.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ $ cms serve [OPTIONS]
3737

3838
**Options**:
3939

40-
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to serve [required]
40+
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_opcs4|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to serve [required]
4141
* `--model-path TEXT`: The file path to the model package
4242
* `--mlflow-model-uri models:/MODEL_NAME/ENV`: The URI of the MLflow model to serve
4343
* `--host TEXT`: The hostname of the server [default: 127.0.0.1]
@@ -60,7 +60,7 @@ $ cms train [OPTIONS]
6060

6161
**Options**:
6262

63-
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to train [required]
63+
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_opcs4|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to train [required]
6464
* `--base-model-path TEXT`: The file path to the base model package to be trained on
6565
* `--mlflow-model-uri models:/MODEL_NAME/ENV`: The URI of the MLflow model to train
6666
* `--training-type [supervised|unsupervised|meta_supervised]`: The type of training [required]
@@ -86,7 +86,7 @@ $ cms register [OPTIONS]
8686

8787
**Options**:
8888

89-
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to register [required]
89+
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_opcs4|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to register [required]
9090
* `--model-path TEXT`: The file path to the model package [required]
9191
* `--model-name TEXT`: The string representation of the registered model [required]
9292
* `--training-type [supervised|unsupervised|meta_supervised]`: The type of training the model went through
@@ -108,7 +108,7 @@ $ cms export-model-apis [OPTIONS]
108108

109109
**Options**:
110110

111-
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to serve [required]
111+
* `--model-type [medcat_snomed|medcat_umls|medcat_icd10|medcat_opcs4|medcat_deid|anoncat|transformers_deid|huggingface_ner]`: The type of the model to serve [required]
112112
* `--add-training-apis / --no-add-training-apis`: Add training APIs to the doc [default: no-add-training-apis]
113113
* `--add-evaluation-apis / --no-add-evaluation-apis`: Add evaluation APIs to the doc [default: no-add-evaluation-apis]
114114
* `--add-previews-apis / --no-add-previews-apis`: Add preview APIs to the doc [default: no-add-previews-apis]

app/cli/cli.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -65,7 +65,7 @@ def serve_model(
6565
port: str = typer.Option("8000", help="The port of the server"),
6666
model_name: Optional[str] = typer.Option(None, help="The string representation of the model name"),
6767
streamable: bool = typer.Option(False, help="Serve the streamable endpoints only"),
68-
device: Device = typer.Option(Device.DEFAULT, help="The device to serve the model on"),
68+
device: Device = typer.Option(Device.DEFAULT.value, help="The device to serve the model on"),
6969
llm_engine: Optional[LlmEngine] = typer.Option(LlmEngine.CMS.value, help="The engine to use for text generation"),
7070
debug: Optional[bool] = typer.Option(None, help="Run in the debug mode"),
7171
) -> None:
@@ -90,7 +90,7 @@ def serve_model(
9090
model_name = model_name or "CMS model"
9191
logger = _get_logger(debug, model_type, model_name)
9292
config = get_settings()
93-
config.DEVICE = device.value
93+
config.DEVICE = device
9494
if model_type in [
9595
ModelType.HUGGINGFACE_NER,
9696
ModelType.MEDCAT_DEID,
@@ -186,7 +186,7 @@ def train_model(
186186
hyperparameters: str = typer.Option("{}", help="The overriding hyperparameters serialised as JSON string"),
187187
description: Optional[str] = typer.Option(None, help="The description of the training or change logs"),
188188
model_name: Optional[str] = typer.Option(None, help="The string representation of the model name"),
189-
device: Device = typer.Option(Device.DEFAULT, help="The device to train the model on"),
189+
device: Device = typer.Option(Device.DEFAULT.value, help="The device to train the model on"),
190190
debug: Optional[bool] = typer.Option(None, help="Run in the debug mode"),
191191
) -> None:
192192
"""
@@ -212,7 +212,7 @@ def train_model(
212212
logger = _get_logger(debug, model_type, model_name)
213213

214214
config = get_settings()
215-
config.DEVICE = device.value
215+
config.DEVICE = device
216216

217217
model_service_dep = ModelServiceDep(model_type, config)
218218
cms_globals.model_service_dep = model_service_dep

app/domain.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ class ModelType(str, Enum):
1010
MEDCAT_SNOMED = "medcat_snomed"
1111
MEDCAT_UMLS = "medcat_umls"
1212
MEDCAT_ICD10 = "medcat_icd10"
13+
MEDCAT_OPCS4 = "medcat_opcs4"
1314
MEDCAT_DEID = "medcat_deid"
1415
ANONCAT = "anoncat"
1516
TRANSFORMERS_DEID = "transformers_deid"

app/model_services/medcat_model.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -165,7 +165,7 @@ def annotate(self, text: str) -> List[Annotation]:
165165

166166
doc = self.model.get_entities(
167167
text,
168-
addl_info=["cui2icd10", "cui2ontologies", "cui2snomed", "cui2athena_ids"],
168+
addl_info=["cui2icd10", "cui2opcs4", "cui2ontologies", "cui2snomed", "cui2athena_ids"],
169169
)
170170
return [load_pydantic_object_from_dict(Annotation, record) for record in self.get_records_from_doc(doc)]
171171

@@ -186,7 +186,7 @@ def batch_annotate(self, texts: List[str]) -> List[List[Annotation]]:
186186
self._data_iterator(texts),
187187
batch_size_chars=batch_size_chars,
188188
nproc=max(int(cpu_count() / 2), 1),
189-
addl_info=["cui2icd10", "cui2ontologies", "cui2snomed", "cui2athena_ids"],
189+
addl_info=["cui2icd10", "cui2opcs4", "cui2ontologies", "cui2snomed", "cui2athena_ids"],
190190
)
191191
docs = dict(sorted(docs.items(), key=lambda x: x[0]))
192192
annotations_list = []
Lines changed: 121 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,121 @@
1+
import logging
2+
import pandas as pd
3+
from typing import Dict, Optional, final, List
4+
5+
from app import __version__ as app_version
6+
from app.model_services.medcat_model import MedCATModel
7+
from app.config import Settings
8+
from app.domain import ModelCard, ModelType
9+
10+
logger = logging.getLogger("cms")
11+
12+
13+
@final
14+
class MedCATModelOpcs4(MedCATModel):
15+
"""A model service for MedCAT OPCS-4 models."""
16+
17+
OPCS4_KEY = "opcs4"
18+
19+
def __init__(
20+
self,
21+
config: Settings,
22+
model_parent_dir: Optional[str] = None,
23+
enable_trainer: Optional[bool] = None,
24+
model_name: Optional[str] = None,
25+
base_model_file: Optional[str] = None,
26+
) -> None:
27+
"""
28+
Initialises the MedCAT OPCS-4 model service with specified configurations.
29+
30+
Args:
31+
config (Settings): The configuration for the model service.
32+
model_parent_dir (Optional[str]): The directory where the model package is stored. Defaults to None.
33+
enable_trainer (Optional[bool]): The flag to enable or disable trainers. Defaults to None.
34+
model_name (Optional[str]): The name of the model. Defaults to None.
35+
base_model_file (Optional[str]): The model package file name. Defaults to None.
36+
"""
37+
super().__init__(
38+
config,
39+
model_parent_dir=model_parent_dir,
40+
enable_trainer=enable_trainer,
41+
model_name=model_name,
42+
base_model_file=base_model_file,
43+
)
44+
self.model_name = model_name or "OPCS-4 MedCAT model"
45+
46+
@property
47+
def api_version(self) -> str:
48+
"""Getter for the API version of the model service."""
49+
50+
# APP version is used although each model service could have its own API versioning
51+
return app_version
52+
53+
def info(self) -> ModelCard:
54+
"""
55+
Retrieves information about the MedCAT OPCS-4 model.
56+
57+
Returns:
58+
ModelCard: A card containing information about the MedCAT OPCS-4 model.
59+
"""
60+
61+
return ModelCard(
62+
model_description=self.model_name,
63+
model_type=ModelType.MEDCAT_OPCS4,
64+
api_version=self.api_version,
65+
model_card=self.model.get_model_card(as_dict=True),
66+
)
67+
68+
def get_records_from_doc(self, doc: Dict) -> List[Dict]:
69+
"""
70+
Extracts and formats entity records from a document dictionary.
71+
72+
Args:
73+
doc (Dict): The document dictionary containing extracted named entities.
74+
75+
Returns:
76+
List[Dict]: A list of formatted entity records.
77+
"""
78+
79+
df = pd.DataFrame(doc["entities"].values())
80+
81+
if df.empty:
82+
df = pd.DataFrame(columns=["label_name", "label_id", "start", "end", "accuracy"])
83+
else:
84+
new_rows = []
85+
for _, row in df.iterrows():
86+
if self.OPCS4_KEY not in row or not row[self.OPCS4_KEY]:
87+
logger.debug("No mapped OPCS-4 code associated with the entity: %s", row)
88+
else:
89+
for opcs4 in row[self.OPCS4_KEY]:
90+
output_row = row.copy()
91+
if isinstance(opcs4, str):
92+
output_row[self.OPCS4_KEY] = opcs4
93+
elif isinstance(opcs4, dict):
94+
output_row[self.OPCS4_KEY] = opcs4.get("code")
95+
output_row["pretty_name"] = opcs4.get("name")
96+
elif isinstance(opcs4, list) and opcs4:
97+
output_row[self.OPCS4_KEY] = opcs4[-1]
98+
else:
99+
logger.error("Unknown format for the OPCS-4 code(s): %s", opcs4)
100+
if "athena_ids" in output_row and output_row["athena_ids"]:
101+
output_row["athena_ids"] = [
102+
athena_id["code"] for athena_id in output_row["athena_ids"]
103+
]
104+
new_rows.append(output_row)
105+
if new_rows:
106+
df = pd.DataFrame(new_rows)
107+
df.rename(
108+
columns={
109+
"pretty_name": "label_name",
110+
self.OPCS4_KEY: "label_id",
111+
"types": "categories",
112+
"acc": "accuracy",
113+
"athena_ids": "athena_ids",
114+
},
115+
inplace=True,
116+
)
117+
df = self._retrieve_meta_annotations(df)
118+
else:
119+
df = pd.DataFrame(columns=["label_name", "label_id", "start", "end", "accuracy"])
120+
records = df.to_dict("records")
121+
return records

app/registry.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from app.model_services.medcat_model_snomed import MedCATModelSnomed
44
from app.model_services.medcat_model_umls import MedCATModelUmls
55
from app.model_services.medcat_model_icd10 import MedCATModelIcd10
6+
from app.model_services.medcat_model_opcs4 import MedCATModelOpcs4
67
from app.model_services.medcat_model_deid import MedCATModelDeIdentification
78
from app.model_services.huggingface_ner_model import HuggingFaceNerModel
89
from app.model_services.huggingface_llm_model import HuggingFaceLlmModel
@@ -11,6 +12,7 @@
1112
ModelType.MEDCAT_SNOMED: MedCATModelSnomed,
1213
ModelType.MEDCAT_UMLS: MedCATModelUmls,
1314
ModelType.MEDCAT_ICD10: MedCATModelIcd10,
15+
ModelType.MEDCAT_OPCS4: MedCATModelOpcs4,
1416
ModelType.MEDCAT_DEID: MedCATModelDeIdentification,
1517
ModelType.ANONCAT: MedCATModelDeIdentification,
1618
ModelType.TRANSFORMERS_DEID: TransformersModelDeIdentification,

app/utils.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ def get_code_base_uri(model_name: str) -> Optional[str]:
6363
code_base_uris = {
6464
CodeType.SNOMED.value: "http://snomed.info/id",
6565
CodeType.ICD10.value: "https://icdcodelookup.com/icd-10/codes",
66+
CodeType.OPCS4.value: "https://nhsengland.kahootz.com/t_c_home/view?objectID=14270896",
6667
CodeType.UMLS.value: "https://uts.nlm.nih.gov/uts/umls/concept",
6768
}
6869
for code_name, base_uri in code_base_uris.items():

0 commit comments

Comments
 (0)