Skip to content

Commit 0955ffc

Browse files
committed
docs(k8s): document k8s deployement with links to repo and argocd
1 parent 1543d1c commit 0955ffc

File tree

1 file changed

+18
-17
lines changed

1 file changed

+18
-17
lines changed

submission-snapshot/README.md

Lines changed: 18 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -16,23 +16,19 @@ the JSONs from S3 instead of hooking on a live API.
1616
The loaded data is organised like-so in the S3 bucket:
1717
```bash
1818
<DESTINATION__FILESYSTEM__BUCKET_URL>
19-
├── category # "category" resource from /category endpoint
20-
│ └── 2026-02-16
21-
│ └── 1771268036.7864842.3722039a90.jsonl # JSONL data from /category for 2026-02-16
22-
├── category_data # "category_data" resouce from /data/category/{categoryId}
23-
│ └── 2026-02-16
24-
│ └── 1771268036.7864842.4a41d98fad.jsonl # JSONL data from /data/category/{categoryId} for 2026-02-16
25-
├── _dlt_loads # One file per pipeline run (load), describes the load
26-
│ └── submission_source__1771268036.7864842.jsonl
27-
├── _dlt_pipeline_state # Pipeline state files
28-
│ └── submission-snapshot__1771267844.1206408__998e553c0cea456594bce118ab30fc8850159efc09fbfb1e5179df2b13293c46.jsonl
29-
├── _dlt_version # Dataset schema versioning
30-
│ └── submission_source__1771267974.1898882__998e553c0cea456594bce118ab30fc8850159efc09fbfb1e5179df2b13293c46.jsonl
19+
├── category
20+
│ └── 2026-03-03-data.jsonl # JSONL data from /category for 2026-03-03
21+
├── category_data
22+
│ └── 2026-03-03-data.jsonl # JSONL data from /data/category/{categoryId} for 2026-03-03
23+
├── _dlt_loads # Pipeline run metadata files
24+
├── _dlt_pipeline_state # Pipeline state files
25+
├── _dlt_version # Dataset schema versioning
3126
└── init
3227
```
3328

3429
> [!NOTE]
35-
> We include the `Category.id` and `Category.studyId` values from the `/category` endpoint in the
30+
> We include the `Category.id` and `Category.studyId` values from the `/category` endpoint in the `category_data` items,
31+
> so that downstream ingestions can take the full JSONL file and load each item in the appropriate dataset.
3632
3733
### Getting a Submission API OIDC bearer token
3834

@@ -67,7 +63,7 @@ base_url = "<BASE SUBMISSION API URL>"
6763
[destination.filesystem]
6864
# s3 bucket, use 'file://<ABSOLUTE PATH>' to use the local filesystem
6965
bucket_url = "s3://<BUCKET NAME>" # replace with bucket name/path
70-
layout = "{table_name}/{YYYY}-{MM}-{DD}/{load_id}.{file_id}.{ext}"
66+
layout = "{table_name}/{YYYY}-{MM}-{DD}-data.{ext}"
7167
7268
[destination.filesystem.credentials]
7369
# doesn't matter if using a local filesystem
@@ -215,7 +211,12 @@ spec:
215211
216212
### Ingest snapshots into Bento with Bento-ETL
217213
218-
TODO: Document how to do this.
214+
The `submission-snapshot` workflow is deployed in PCGL's `dev` cluster at the moment.
219215

220-
~~Need to implement S3 source first.~~
221-
Bento-ETL S3 source has been implemented, ready to integrate.
216+
Kustomization base [link](https://github.com/Pan-Canadian-Genome-Library/deployment/blob/main/base/research-portal/submission-snapshots-cronjob/kustomization.yaml).
217+
218+
| Environement | Repo location | ArgoCD Application |
219+
| ------------ | ---------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------ |
220+
| `dev` | [Kustomization Link](https://github.com/Pan-Canadian-Genome-Library/deployment/blob/main/dev/research/submission-snapshots/kustomization.yaml) | [App link](https://argocd.ingress.dev.k8s.pcgl.dev-sd4h.ca/applications/argocd/submission-snapshots?view=tree&resource=) |
221+
| `staging` | n/a | |
222+
| `prod` | n/a | |

0 commit comments

Comments
 (0)