Skip to content

Commit 2957800

Browse files
committed
Migrate main branch from MedCATtutorials to cogstack-nlp
2 parents e55bf6c + 5858ab0 commit 2957800

File tree

45 files changed

+215103
-0
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

45 files changed

+215103
-0
lines changed
6 KB
Binary file not shown.
Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
name: build
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
pull_request:
7+
branches: [ main ]
8+
9+
jobs:
10+
main:
11+
12+
runs-on: ubuntu-24.04
13+
strategy:
14+
matrix:
15+
part: [
16+
introductory/Part_2_*.ipynb,
17+
introductory/Part_3_*.ipynb,
18+
introductory/Part_4_*.ipynb,
19+
introductory/Part_5_*.ipynb,
20+
introductory/Part_1_*.ipynb,
21+
introductory/Part_6_*.ipynb,
22+
specialised/Comparing_Models_with_RegressionSuite.ipynb # this should work still
23+
# specialised/*.ipynb # To make it run, the SnomedCT file needs to be mocked
24+
]
25+
26+
steps:
27+
- uses: actions/checkout@v4
28+
- name: Setup Python
29+
uses: actions/setup-python@v5
30+
with:
31+
python-version: "3.11"
32+
- name: Install dependencies
33+
run: |
34+
pip install -U pip
35+
pip install -r requirements-dev.txt
36+
- name: Install IPython kernel
37+
run: |
38+
python -m ipykernel install --name smoketests --user
39+
- name: Smoke test tutorial
40+
run: |
41+
pytest --collect-only --nbmake ./notebooks/${{ matrix.part }}
42+
pytest --nbmake -n=auto --nbmake-kernel=smoketests --nbmake-timeout=1800 ./notebooks/${{ matrix.part }}

medcat-v1-tutorials/.gitignore

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
#Directories to be ignored fully
2+
/books/
3+
/articles/
4+
/other/
5+
/output/
6+
/graphics/
7+
tmp/
8+
*_tmp/
9+
.ipynb_checkpoints/
10+
11+
# Keep folders with this
12+
!.keep
13+
14+
#tmp and similar files
15+
.nfs*
16+
*.pyc
17+
*.out
18+
*.swp
19+
*.swn
20+
tmp_*
21+
t_*
22+
tmp_*
23+
*_tmp
24+
*.swo
25+
*.lyx.emergency
26+
*.lyx#
27+
*~
28+
*.log
29+
*hidden*

medcat-v1-tutorials/LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2022 MedCAT
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

medcat-v1-tutorials/README.md

Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
# MedCAT Tutorials
2+
3+
[![Build Status](https://github.com/CogStack/MedCATtutorials/actions/workflows/main.yml/badge.svg?branch=main)](https://github.com/CogStack/MedCATtutorials/actions/workflows/main.yml?query=branch%3Amain)
4+
5+
## Introductory tutorials
6+
7+
In this tutorial, we will walk you through each stage of a basic MedCAT project. The blog posts are there to tell a story and explain why several steps or processes which we have decided to take are necessary. While the Jupyter Notebooks are for a hands-on experience building and training your MedCAT models for information extraction tasks.
8+
9+
| Part | Title | Google Colab | Blog Post |
10+
| ---- |-----------------------------------------------------------------------------|------------------------------------------------------------------------------------|-----------|
11+
| 1 | Introduction | - | [TDS](https://medium.com/@w_is_h/medcat-introduction-analyzing-electronic-health-records-e1c420afa13a) |
12+
| 1.1 | [\[OPTIONAL\] Logging With MedCAT](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_1_1_OPTIONAL_Logging_With_MedCAT.ipynb) | -
13+
| 2 | [Data set Preparation and Basic Statistics](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_2_Dataset_Analysis_and_Preparation.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_2_Dataset_Analysis_and_Preparation.ipynb) | [TDS](https://medium.com/towards-data-science/medcat-dataset-analysis-and-preparation-be8bc910bd6d) |
14+
| 3.1 | [Building a new Concept Database (CDB) and Vocabulary (Vocab)](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_1_Building_a_Concept_Database_and_Vocabulary.ipynb) | [TDS](https://medium.com/towards-data-science/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1) |
15+
| 3.2 | [Unsupervised training and NER+L](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_2_Extracting_Diseases_from_Electronic_Health_Records.ipynb) | [TDS](https://medium.com/towards-data-science/medcat-extracting-diseases-from-electronic-health-records-f53c45b3d1c1) |
16+
| 3.3 | [Technical model optimisations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_3_Model_technical_optimisations.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_3_3_Model_technical_optimisations.ipynb) | - |
17+
| 4.1 | [Creating a tokenizer model (huggingface) and embeddings for MetaAnnotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_1_ByteLevelBPETokenizer_and_Embeddings.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_1_ByteLevelBPETokenizer_and_Embeddings.ipynb) | - |
18+
| 4.2 | [Supervised training and fine-tuning + Meta-annotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_2_Supervised_Training_and_Meta_annotations.ipynb) | - |
19+
| 4.3 | [Annotating documents with the full MedCAT pipeline with MetaAnnotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_3_Annotating_documents_with_the_full_MedCAT_pipeline_with_MetaAnnotations.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_4_3_Annotating_documents_with_the_full_MedCAT_pipeline_with_MetaAnnotations.ipynb) | - |
20+
| 5 | [Analysing the results](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_5_Prevalence_of_Physical_and_Mental_Diseases.html) | [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/introductory/Part_5_Prevalence_of_Physical_and_Mental_Diseases.ipynb) | [TDS](https://medium.com/@w_is_h/prevalence-of-physical-and-mental-diseases-450c0f4f5851) |
21+
| 6.1 | [Supervised training Relation-annotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/rel_cat_tutorials/notebooks/introductory/Part_6_1_Supervised_Training_Relation_Extraction.html) | - | - |
22+
| 6.2 | [Infering relationships from annotations](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/rel_cat_tutorials/notebooks/introductory/Part_6_2_Infering_relations_from_annotations_with_Relation_toolkit.html) | - | - |
23+
24+
## Specialised tutorials
25+
26+
These tutorials expand upon specific aspects of the topics covered across the introductory tutorials. If there is anything in particular you would like us to cover in the future, let us know!
27+
28+
| Part | Title | Google Colab |
29+
| ---- |-------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
30+
| - |[Working with SNOMED CT and building a custom Concept Database (CDB)](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/specialised/Preprocessing_SNOMED_CT.html)| [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/specialised/Preprocessing_SNOMED_CT.ipynb)|
31+
| - |[Comparing models using regression test tooling](https://htmlpreview.github.io/?https://github.com/CogStack/MedCATtutorials/blob/main/notebooks/specialised/Comparing_Models_with_RegressionSuite.html)| [![Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob/main/notebooks/specialised/Comparing_Models_with_RegressionSuite.ipynb)|
32+
33+
34+
## Development/Editing
35+
36+
Make sure [jupyter](https://docs.jupyter.org/en/latest/install.html) and [jq](https://stedolan.github.io/jq/download/) are installed and available on your path. Modifying the companion HTML version directly is discouraged and instead install the following pre-commit hook which will generate them during committing your change on `.ipynb` files:
37+
```
38+
git config --local core.hooksPath git-config/hooks
39+
```
40+
41+
To inspect change during code review, visit [Colab](https://colab.research.google.com/github/CogStack/MedCATtutorials/blob) and select the target branch and tutorial. After it is opened, click `File | Revision history` and select start and end revisions you are interested in.
42+
43+
44+
## Known Issues:
45+
* For ContextualVersionConflict on Google Colab, you need to restart the runtime and run the cell again.
46+
* The pre-commit hook requires nbconvert<6 and jinja2<=3.0.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
#!/bin/bash
2+
3+
set -eu
4+
5+
if ! command -v jupyter &> /dev/null
6+
then
7+
echo "ERROR: Cannot find 'jupyter' on your path so the HTML version won't be created/updated automatically"
8+
echo "ERROR: Install 'jupyter' from https://docs.jupyter.org/en/latest/install.html to fix this"
9+
exit 1
10+
elif ! command -v jq &> /dev/null
11+
then
12+
echo "ERROR: Cannot find 'jq' on your path so the HTML version won't be created/updated automatically"
13+
echo "ERROR: Install 'jq' from https://stedolan.github.io/jq/download/ to fix this"
14+
exit 1
15+
else
16+
notebook_paths=`git diff --cached --name-only --diff-filter=d | grep .ipynb; echo ""`
17+
18+
if [ ! -z "$notebook_paths" ]
19+
then
20+
for path in $notebook_paths
21+
do
22+
jupyter nbconvert --to html $path
23+
jupyter nbconvert --clear-output --inplace $path
24+
git add ${path/.ipynb/.html}
25+
git add $path
26+
done
27+
fi
28+
fi

0 commit comments

Comments
 (0)