Skip to content

Commit 693b130

Browse files
authored
Merge pull request #3 from sdsc-ordes/feat/presidio-config
feat: build images and manifests
2 parents 89d4c46 + 32def17 commit 693b130

29 files changed

+680
-236
lines changed
Lines changed: 95 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,95 @@
1+
name: Presidio Docker Build
2+
3+
on:
4+
push:
5+
branches: [ main ]
6+
workflow_dispatch:
7+
8+
env:
9+
REGISTRY_NAME: ghcr.io # SDSC ADD-ON
10+
USERNAME: ${{ github.repository_owner }}
11+
TAG: gha${{ github.run_number }}
12+
13+
jobs:
14+
build-platform-images:
15+
name: Build ${{ matrix.image }} (${{ matrix.platform }})
16+
runs-on: ${{ matrix.runner }}
17+
strategy:
18+
matrix:
19+
include:
20+
- image: presidio-anonymizer
21+
platform: linux/amd64
22+
runner: ubuntu-latest
23+
- image: presidio-analyzer
24+
platform: linux/amd64
25+
runner: ubuntu-latest
26+
# Note: do we want this part of presidio ? Maybe future feature ?
27+
# - image: presidio-image-redactor
28+
# platform: linux/amd64
29+
# runner: ubuntu-latest
30+
steps:
31+
# SDSC ADD-ON
32+
- name: Get latest Presidio release tag
33+
id: presidio_release
34+
run: |
35+
tag=$(curl -s https://api.github.com/repos/microsoft/presidio/releases/latest | jq -r .tag_name)
36+
echo "tag=$tag" >> $GITHUB_OUTPUT
37+
38+
# SDSC ADD-ON
39+
- name: Checkout Presidio (latest release)
40+
uses: actions/checkout@v5
41+
with:
42+
repository: microsoft/presidio
43+
ref: ${{ steps.presidio_release.outputs.tag }}
44+
45+
- name: Set up Docker Buildx
46+
uses: docker/setup-buildx-action@v3
47+
48+
# SDSC ADD-ON
49+
# https://github.com/docker/login-action
50+
- name: Log in to the Container registry
51+
uses: docker/login-action@v3.0.0
52+
with:
53+
registry: ${{ env.REGISTRY_NAME }}
54+
username: ${{ github.actor }}
55+
password: ${{ secrets.GITHUB_TOKEN }}
56+
57+
- name: Build and Push ${{ matrix.image }} for ${{ matrix.platform }}
58+
run: |
59+
# Create platform-specific tag
60+
PLATFORM_TAG=$(echo "${{ matrix.platform }}" | sed 's/\//-/g')
61+
docker buildx build \
62+
--platform ${{ matrix.platform }} \
63+
--push \
64+
--tag ${{ env.REGISTRY_NAME }}/${{ env.USERNAME }}/${{ matrix.image }}:${{ env.TAG }}-${PLATFORM_TAG} \
65+
--cache-from type=registry,ref=${{ env.REGISTRY_NAME }}/${{ env.USERNAME }}/${{ matrix.image }}:latest \
66+
--cache-to type=inline \
67+
./${{ matrix.image }}
68+
69+
create-manifests:
70+
name: Create Multi-Platform Manifests
71+
runs-on: ubuntu-latest
72+
needs: build-platform-images
73+
steps:
74+
# SDSC ADD-ON
75+
# https://github.com/docker/login-action
76+
- name: Log in to the Container registry
77+
uses: docker/login-action@v3.0.0
78+
with:
79+
registry: ${{ env.REGISTRY_NAME }}
80+
username: ${{ github.actor }}
81+
password: ${{ secrets.GITHUB_TOKEN }}
82+
83+
- name: Set up Docker Buildx
84+
uses: docker/setup-buildx-action@v3
85+
86+
- name: Create all multi-platform manifests
87+
run: |
88+
IMAGES=("presidio-anonymizer" "presidio-analyzer" "presidio-image-redactor")
89+
90+
for image in "${IMAGES[@]}"; do
91+
echo "Creating manifest for $image"
92+
docker buildx imagetools create \
93+
--tag ${{ env.REGISTRY_NAME }}/${{ env.USERNAME }}/${image}:${{ env.TAG }} \
94+
${{ env.REGISTRY_NAME }}/${{ env.USERNAME }}/${image}:${{ env.TAG }}-linux-amd64
95+
done

.gitignore

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,10 @@
11
.direnv/
2+
3+
# third party manifests
4+
external/helm/*
5+
external/ytt/*
6+
external/.vendir*
7+
28
# Byte-compiled / optimized / DLL files
39
__pycache__/
410
*.py[codz]

docs/presidio-poc.md

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -8,14 +8,17 @@
88
- can be deployed as an API server using a compose stack
99

1010
## API usage
11+
1112
2-steps:
13+
1214
- analyze: NER from raw text using models
1315
- anonymize: config (rule) based processing of pre-detected PII
1416

1517
### analyze
18+
1619
- Minimal requirements: text + language. By default, all recognizers for that language are enabled.
1720
```sh
18-
$ curl http://localhost:5002/analyze -s --header "Content-Type: application/json" --request POST --data '{"text": "John Smith drivers license is AC432223","language": "en"}' | jq
21+
$ curl http://localhost:5002/analyze -s --header "Content-Type: application/json" --request POST --data '{"text": "John Smith drivers license is AC432223","language": "en"}' | jq
1922
[
2023
{
2124
"analysis_explanation": null,
@@ -33,19 +36,22 @@
3336
}
3437
]
3538
```
36-
- analysis can be controlled by setting detection score, selecting entities, adding context words and adding a correlation id(?)
39+
- analysis can be controlled by setting detection score, selecting entities, adding context words and adding a correlation id(?)
3740
- ad-hoc pattern (regex) recognizers can be provided as json objects
3841
- a correlation-id (hash) can be given to append to logs for easier grouping of analyses in logs / traces.
3942

4043
### anonymize
44+
4145
- By default, the anonymization replaces all detected identifies by their type (e.g. <PERSON>) in the input text.
4246
- An anonymizer dictionary can be provided to associate specific anonymization procedure to specific entity types.
4347
- Two inputs must be given to the endpoint:
4448
- the raw text
4549
- the response from the analyze step (detected entities and their positions)
4650

4751
### artificial sample
52+
4853
Input:
54+
4955
```
5056
Prof. Gérard Waeber, Chef de service
5157
Tél: +41 21 314 68 85 / Fax: +41 21 314 08 95
@@ -77,8 +83,10 @@ jfldéijf
7783
Dr Médecin 00 Formateur
7884
Chef de clinique
7985
```
86+
8087
- ## initial tests
81-
Works with example artifical lettre de sortie.
88+
Works with example artifical lettre de sortie.
89+
8290
```python
8391
import json
8492
import requests
@@ -129,7 +137,9 @@ print(
129137
## limitations
130138

131139
### potential improvements
140+
132141
Model configuration
142+
133143
```yaml
134144
# config.yaml
135145
nlp_engine_name: spacy
@@ -157,30 +167,28 @@ ner_model_configuration:
157167
```
158168
159169
Recognizer configuration
170+
160171
```yaml
161172
# recognizers.yaml
162173
recognizers:
163-
-
164-
name: "Swiss Zip code Recognizer"
174+
- name: "Swiss Zip code Recognizer"
165175
supported_languages:
166176
- language: fr
167177
context: [adresse, postal]
168178
- language: de
169-
context: [ort,]
179+
context: [ort]
170180
- language: it
171181
context: [...]
172182

173183
patterns:
174-
-
175-
name: "zip code (weak)"
176-
regex: "(\\b\\d{5}(?:\\-\\d{4})?\\b)"
177-
score: 0.01
184+
- name: "zip code (weak)"
185+
regex: "(\\b\\d{5}(?:\\-\\d{4})?\\b)"
186+
score: 0.01
178187
context:
179-
- zip
180-
- code
188+
- zip
189+
- code
181190
supported_entity: "ZIP"
182-
-
183-
name: "Titles recognizer"
191+
- name: "Titles recognizer"
184192
supported_language: "en"
185193
supported_entity: "TITLE"
186194
deny_list:
@@ -190,5 +198,4 @@ recognizers:
190198
- Miss
191199
- Dr.
192200
- Prof.
193-
194201
```

docs/services.md

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,88 @@
1+
# Services management
2+
3+
The deployment defines multiple service (or application), each being a
4+
collection of kubernetes manifests located in `src/<service>/`.
5+
6+
## Structure
7+
8+
- `external/`: third party resources
9+
- `src/`: deployable manifests
10+
- secrets are encrypted with sops+age and persisted in `src/secrets/`
11+
12+
Each service is structured as follows (supported tools are `ytt` and `helm`):
13+
14+
```text
15+
├── external
16+
│ └── <tool>
17+
│ └── <service>/... # <- third party templates
18+
└── src
19+
└── <service>
20+
├── additional-manifest.yaml # <- custom manifests for this deployment
21+
├── kustomization.yaml # <- kustomization file to select resources
22+
└── <tool>
23+
├── out/... # <- rendered manifests
24+
└── values.yaml # <- values used for templating
25+
```
26+
27+
## Templating
28+
29+
[ytt](https://carvel.dev/ytt) is the preferred rendering engine, but helm is
30+
also supported as many upstream templates are distributed with
31+
[helm](https://helm.sh).
32+
33+
When running `just render`, we attempt to render each service with helm and then
34+
with ytt and save the rendered manifests in the repository.
35+
36+
## Deployment
37+
38+
When deploying with `just deploy`, deployment is done with kustomize
39+
(`kubectl -k`). This means that the `src` and each of its subdirectories contain
40+
a `kustomization.yaml` file which determine what manifests are included in the
41+
deployment.
42+
43+
For example, running `just deploy src/` will recursively parse
44+
`src/kustomization.yaml` and the `kustomization.yaml` from each resources
45+
declared in that file. This allows to simply exclude services or manifests by
46+
commenting them out of `kustomization.yaml`.
47+
48+
## Updating a service
49+
50+
Here is the typical workflow to re-deploy a service that has been updated
51+
upstream.
52+
53+
1. Update the external manifest templates. This will update the `vendir` lock
54+
file and fetch the latest templates into `external/<tool>/<service>`.
55+
56+
```bash
57+
just external::refresh
58+
```
59+
60+
2. Render the manifests with the new templates.
61+
62+
```bash
63+
just render
64+
```
65+
66+
> [!NOTE]
67+
> This may fail if the new templates broke compatibility with existing values,
68+
> in which case you will need to update your values in
69+
> `src/<service>/<tool>/values.yaml`. Also watch out in case the upstream added
70+
> new template files, as you may need to include them in the service
71+
> `kustomization.yaml`.
72+
73+
3. Deploy the updated manifests.
74+
75+
```bash
76+
just deploy src/<service>
77+
```
78+
79+
> [!IMPORTANT]
80+
> In some cases, you may want to manually delete resources related to the
81+
> service. You can achieve that with `just delete src/<service>` or use
82+
> `kubectl delete` to delete specific resoruces.
83+
84+
## Adding custom manifests
85+
86+
Custom manifests (e.g. additional volumes) can be added inside `src/<service>/`,
87+
but they need to be added as a resource in `kustomization.yaml` file in the same
88+
directory.

external/vendir.lock.yml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
apiVersion: vendir.k14s.io/v1alpha1
2+
directories:
3+
- contents:
4+
- git:
5+
commitTitle: Add label to external PRs (#1707)...
6+
sha: af1c524460ad62e17313520a3cbb618b062b75cb
7+
tags:
8+
- 2.2.360
9+
path: .
10+
path: ytt/presidio
11+
kind: LockConfig

external/vendir.yaml

Lines changed: 20 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,20 @@
1+
apiVersion: vendir.k14s.io/v1alpha1
2+
kind: Config
3+
directories:
4+
- path: ytt/presidio
5+
contents:
6+
- path: .
7+
git:
8+
url: https://github.com/microsoft/presidio
9+
ref: refs/tags/2.2.360
10+
newRootPath: docs/samples/deployments/k8s/charts/presidio
11+
# - path: helm/presidio
12+
# contents:
13+
# - path: .
14+
# helmChart:
15+
# name: presidio
16+
# version: 2.2.360
17+
# git:
18+
# url: https://github.com/microsoft/presidio
19+
# ref: refs/tags/2.2.360
20+
# subPath: docs/samples/deployments/k8s/charts/presidio

justfile

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,11 +33,18 @@ render-ytt dir="src":
3333
fd '^ytt$' {{dir}} \
3434
-x sh -c 'ytt -f {}/values.yaml -f external/ytt/$(basename {//}) --output-files {}/out'
3535

36+
# Render when the code was pulled in via ytt but is a helm template
37+
[private]
38+
render-ytt-extract-helm-template dir="src":
39+
# render mixed ytt + helm templates with our values into src/<service>/mix/out
40+
fd '^helm$' {{dir}} \
41+
-x sh -c 'helm template $(basename {//}) external/ytt/$(basename {//}) -f {}/values.yaml --output-dir {}/out'
42+
3643
# Render manifests
3744
render dir="src":
3845
just fetch && \
39-
just render-helm {{dir}} && \
4046
just render-ytt {{dir}} && \
47+
just render-ytt-extract-helm-template {{dir}} && \
4148
just format
4249

4350
# Apply manifests in dir to the cluster.

src/kustomization.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
apiVersion: kustomize.config.k8s.io/v1beta1
2+
kind: Kustomization
3+
resources:
4+
- ./presidio
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
supported_languages:
2+
- en
3+
default_score_threshold: 0

0 commit comments

Comments
 (0)