Skip to content

Commit de5ae86

Browse files
alhendricksonmart-r
authored andcommitted
CU-8699mrvup docs: update urls throughout to point to new cogstack-nlp repo (#71)
1 parent 19b696b commit de5ae86

File tree

18 files changed

+2441
-267
lines changed

18 files changed

+2441
-267
lines changed

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![Build Status](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-v2_main.yml/badge.svg?branch=main)
44
[![Documentation Status](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)](https://readthedocs.org/projects/cogstack-nlp/badge/?version=latest)
5-
[![Latest release](https://img.shields.io/github/v/release/CogStack/MedCAT2)](https://github.com/CogStack/MedCAT2/releases/latest)
5+
[![Latest release](https://img.shields.io/github/v/release/CogStack/cogstack-nlp?filter=medcat/*)](https://github.com/CogStack/cogstack-nlp/releases/latest)
66
<!-- [![pypi Version](https://img.shields.io/pypi/v/medcat.svg?style=flat-square&logo=pypi&logoColor=white)](https://pypi.org/project/medcat/) -->
77

88
Cogstack Natural Language Processing is for analysing clinical data using AI to draw insights from text in or documents in an Electronic Health Records.

anoncat-demo-app/README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Deidentify app
22

3-
Demo for AnonCAT. It uses [MedCAT](https://github.com/CogStack/MedCAT), an advanced natural language processing tool, to identify and classify sensitive information, such as names, addresses, and medical terms.
3+
Demo for AnonCAT. It uses [MedCAT](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-v1), an advanced natural language processing tool, to identify and classify sensitive information, such as names, addresses, and medical terms.
44

55
## Example
66

@@ -22,7 +22,7 @@ MODEL_NAME = '<NAME OF MODEL HERE.zip>'
2222

2323
### Build your own model
2424

25-
To build your own models please follow the tutorials outlined in [MedCATtutorials](https://github.com/CogStack/MedCATtutorials)
25+
To build your own models please follow the tutorials outlined in [MedCATtutorials](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-v1-tutorials)
2626

2727
*__Note:__ This is currently under development*
2828

anoncat-demo-app/app/frontend/src/App.vue

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
<br>
2222
<p>Please DO NOT test with any real sensitive PHI data.</p>
2323
<br>
24-
<p>Local validation and fine-tuning available via <a href="https://github.com/CogStack/MedCATtrainer">MedCATtrainer</a>.
24+
<p>Local validation and fine-tuning available via <a href="https://github.com/CogStack/cogstack-nlp/tree/main/medcat-trainer">MedCATtrainer</a>.
2525
Email us, <a href="mailto:[email protected]">[email protected]</a>, to discuss model access, model performance, and your use case.
2626
</p>
2727
<br>

medcat-trainer/README.md

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
# Medical <img src="https://github.com/CogStack/cogstack-nlp/blob/main/media/cat-logo.png?raw=true" width=45>oncept Annotation Tool Trainer
2+
3+
[![Build Status](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-trainer_qa.yml/badge.svg?branch=main)](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-trainer_qa.yml?query=branch%3Amain)
4+
[![Build Status](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-trainer_release.yml/badge.svg)](https://github.com/CogStack/cogstack-nlp/actions/workflows/medcat-trainer_release.yml)
5+
[![Documentation Status](https://readthedocs.org/projects/cogstack-nlp-medcat-trainer/badge/?version=latest)](https://readthedocs.org/projects/cogstack-nlp-medcat-trainer/badge/?version=latest)
6+
[![Latest release](https://img.shields.io/github/v/release/CogStack/cogstack-nlp?filter=medcat-trainer/*)](https://github.com/CogStack/cogstack-nlp/releases/latest)
7+
8+
MedCATTrainer is an interface for building, improving and customising a given Named Entity Recognition
9+
and Linking (NER+L) model (MedCAT) for biomedical domain text.
10+
11+
MedCATTrainer was presented at EMNLP/IJCNLP 2019 :tada:
12+
[here](https://www.aclweb.org/anthology/D19-3024.pdf)
13+
14+
# Documentation and Discussion
15+
16+
Official docs available [here](https://docs.cogstack.org/projects/medcat-trainer)
17+
18+
If you have any questions why not reach out to the community [discourse forum here](https://discourse.cogstack.org/)
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# Installation
2+
MedCATtrainer is a docker-compose packaged Django application.
3+
4+
## Download from Dockerhub
5+
Clone the repo, run the default docker-compose file and default env var:
6+
```shell
7+
$ git clone https://github.com/CogStack/cogstack-nlp
8+
$ cd cogstack-nlp/medcat-trainer
9+
$ docker-compose up
10+
```
11+
12+
This will use the pre-built docker images available on DockerHub. If your internal firewall does on permit access to DockerHub, you can build directly from source.
13+
14+
To check logs of the MedCATtrainer running containers
15+
```bash
16+
$ docker logs <containerid> | grep "\[medcattrainer\]"
17+
$ docker logs <containerid> | grep "\[bg-process\]"
18+
$ docker logs <containerid> | grep "\[db-backup\]"
19+
```
20+
21+
## MedCAT v0.x models
22+
If you have MedCAT v0.x models, and want to use the trainer please use the following docker-compose file:
23+
This refences the latest built image for the trainer that is still compatible with [MedCAT v0.x.](https://pypi.org/project/medcat/0.4.0.6/) and under.
24+
```shell
25+
$ docker-compose -f docker-compose-mc0x.yml up
26+
```
27+
28+
## Build images from source
29+
The above commands runs the latest release of MedCATtrainer, if you'd prefer to build the Docker images from source, use
30+
```shell
31+
$ docker-compose -f docker-compose-dev.yml up
32+
```
33+
34+
To change environment variables, such as the exposed host ports and language of spaCy model, use:
35+
```shell
36+
$ cp .env-example .env
37+
# Set local configuration in .env
38+
```
39+
40+
## Troubleshooting
41+
If the build fails with an error code 137, the virtual machine running the docker
42+
daemon does not have enough memory. Increase the allocated memory to containers in the docker daemon
43+
settings CLI or associated docker GUI.
44+
45+
On MAC: https://docs.docker.com/docker-for-mac/#memory
46+
47+
On Windows: https://docs.docker.com/docker-for-windows/#resources
48+
49+
### (Optional) SMTP Setup
50+
51+
For password resets and other emailing services email environment variables are required to be set up.
52+
53+
Personal email accounts can be set up by users to do this, or you can contact someone in CogStack for a cogstack no email credentials.
54+
55+
The environment variables required are listed in [Environment Variables.](#(optional)-environment-variables)
56+
57+
Environment Variables are located in envs/env or envs/env-prod, when those are set webapp/frontend/.env must change "VITE_APP_EMAIL" to 1.
58+
59+
### (Optional) Environment Variables
60+
Environment variables are used to configure the app:
61+
62+
|Parameter|Description|
63+
|---------|-----------|
64+
|MEDCAT_CONFIG_FILE|MedCAT config file as described [here](https://github.com/CogStack/cogstack-nlp/blob/main/medcat-v2/medcat/config/config.py)|
65+
|BEHIND_RP| If you're running MedCATtrainer, use 1, otherwise this defaults to 0 i.e. False|
66+
|MCTRAINER_PORT|The port to run the trainer app on|
67+
|EMAIL_USER|Email address which will be used to send users emails regarding password resets|
68+
|EMAIL_PASS|The password or authentication key which will be used with the email address|
69+
|EMAIL_HOST|The hostname of the SMTP server which will be used to send email (default: mail.cogstack.org)|
70+
|EMAIL_PORT|The port that the SMTP server is listening to, common numbers are 25, 465, 587 (default: 465)|
71+
72+
Set these and re-run the docker-compose file.
73+
74+
You'll need to `docker stop` the running containers if you have already run the install.

medcat-trainer/docs/maintenance.md

Lines changed: 61 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,61 @@
1+
# Maintanence
2+
3+
MedCATtrainer is actively maintained. To ensure you receive the latest
4+
security patches of the software and its dependencies you should regularly
5+
be upgrading to the latest release.
6+
7+
The latest stable releases update the `docker-compose.yml` and `docker-compose-prod.yml` files.
8+
9+
To update these docker compose files, either copy them directly from the [repo](https://github.com/CogStack/cogstack-nlp/tree/main/medcat-trainer)
10+
or update the cloned files via:
11+
12+
```shell
13+
$ cd MedCATtrainer
14+
$ git pull
15+
$ docker-compose up
16+
# alternatively for prod releases use:
17+
$ docker-compose -f docker-compose-prod.yml up
18+
```
19+
20+
MedCATtrainer follows [Semver](https://semver.org/), so patch and minor release should always be backwards compatible,
21+
whereas major releases, e.g. v1.x vs 2.x versions signify breaking changes.
22+
23+
Neccessary Django DB migrations will automatically applied between releases, which should largely be invisible to an end admin
24+
or annotation user. Nevertheless, migrating ORM / DB models, then rolling back a release can cause issues if values are defaulted
25+
or removed from a later version.
26+
27+
## Backup and Restore
28+
29+
### Backup
30+
Before updating to a new release, a backup will be created in the `DB_BACKUP_DIR`, as configured in `envs/env`.
31+
A further crontab runs the same backup script at 10pm every night. This does not cause any downtime and will look like
32+
this in the logs:
33+
```shell
34+
medcattrainer-medcattrainer-db-backup-1 | Found backup dir location: /home/api/db-backup and DB_PATH: /home/api/db/db.sqlite3
35+
medcattrainer-medcattrainer-db-backup-1 | Backed up existing DB to /home/api/db-backup/db-backup-2023-09-26__23-26-01.sqlite3
36+
medcattrainer-medcattrainer-db-backup-1 | To restore this backup use $ ./restore.sh /home/api/db-backup/db-backup-2023-09-26__23-26-01.sqlite3
37+
```
38+
39+
A backup is also automatically performed each time the service starts, and any migrations are performed, in the events of a new release
40+
introducing a breaking change and corrupting a DB.
41+
42+
### Restore
43+
If a DB is corrupted or needs to be restored to an existing backed up db use the following commands, whilst the service is running:
44+
45+
```shell
46+
$ docker ps
47+
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
48+
a2489b0c681b cogstacksystems/medcat-trainer-nginx:v2.11.2 "/docker-entrypoint.…" 4 days ago Up 4 days 80/tcp, 0.0.0.0:8001->8000/tcp, :::8001->8000/tcp medcattrainer-nginx-1
49+
20fed153d798 solr:8 "docker-entrypoint.s…" 4 days ago Up 4 days 0.0.0.0:8983->8983/tcp, :::8983->8983/tcp mct_solr
50+
2b250a0975fe cogstacksystems/medcat-trainer:v2.11.2 "/home/run.sh" 4 days ago Up 4 days medcattrainer-medcattrainer-1
51+
$ docker exec -it 2b250a0975fe bash
52+
root@2b250a0975fe:/home/api# cd ..
53+
$ restore_db.sh db-backup-2023-09-25__23-21-39.sqlite3 # run the restore.sh script
54+
Found backup dir location: /home/api/db-backup, found db path: home/api/db/db.sqlite3
55+
DB file to restore: db-backup-2023-09-25__23-21-39.sqlite3
56+
Found db-backup-2023-09-25__23-21-39.sqlite3 - y to confirm backup: y # you'll need tp confirm this is the correct file to restore.
57+
Restored db-backup-2023-09-25__23-21-39.sqlite3 to /home/db/db.sqlite3
58+
```
59+
60+
The `restore_db.sh` script will automatically restore the latest db file, if no file is specified.
61+

0 commit comments

Comments
 (0)