Skip to content

Commit 576d8bd

Browse files
feat(datasource): decode ingestion steps (#1208)
This PR include the deCODE proteomics ingestion steps along with new Dataset for MolecularComplex based on ComplexPortal reference files.
1 parent d93d4ab commit 576d8bd

File tree

75 files changed

+9311
-52
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

75 files changed

+9311
-52
lines changed

.vscode/settings.json

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
"colocalisation",
4040
"contig",
4141
"diffpval",
42+
"Ensembl",
4243
"eqtl",
4344
"finngen",
4445
"GCST",
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: Molecular Complex
3+
---
4+
5+
::: gentropy.dataset.molecular_complex.MolecularComplex

docs/python_api/datasources/_datasources.md

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,11 @@ This section contains information about the data source harmonisation tools avai
1414

1515
1. [GTEx (eQTL catalogue)](eqtl_catalogue/_eqtl_catalogue.md)
1616
2. [UKB PPP (EUR)](ukb_ppp_eur/_ukb_ppp_eur.md)
17+
3. [deCODE proteomics](deCODE/_decode.md)
18+
19+
## Protein complexes
20+
21+
1. [Complex Portal](complex_portal/_complex_portal.md)
1722

1823
## Interaction / Interval-based Experiments
1924

@@ -39,5 +44,4 @@ This section contains information about the data source harmonisation tools avai
3944

4045
## Biological samples
4146

42-
1. [Uberon](biosample_ontologies/_uberon.md)
43-
2. [Cell Ontology](biosample_ontologies/_cell_ontology.md)
47+
1. [Uberon and Cell Ontology](biosample_ontologies/_biosample_ontologies.md)
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
---
2+
title: Complex Portal
3+
---
4+
5+
[Complex Portal](https://www.ebi.ac.uk/complexportal/) is a manually curated resource of macromolecular complexes maintained by EMBL-EBI. It provides two complementary datasets:
6+
7+
- **Experimental** – complexes with direct experimental evidence.
8+
- **Predicted** – computationally predicted complexes.
9+
10+
Both files are distributed in the **ComplexTAB** flat-file format and are filtered to human complexes (NCBI taxonomy ID 9606) during ingestion.
11+
12+
The resulting `MolecularComplex` dataset is used downstream in the deCODE proteomics pipeline to annotate multi-protein SomaScan aptamers with a `molecularComplexId`.
13+
14+
::: gentropy.datasource.complex_portal.ComplexTab
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
nav:
2+
- _decode.md
3+
- manifest.md
4+
- aptamer_metadata.md
5+
- study_index.md
6+
- summary_stats.md
Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
---
2+
title: deCODE proteomics
3+
---
4+
5+
[deCODE proteomics](https://www.nature.com/articles/s41586-023-06563-x) is a large-scale proteomics dataset generated by deCODE genetics, a biopharmaceutical company based in Iceland. The dataset includes measurements of protein levels in blood samples from thousands of individuals (up to ~36,000 Icelandic participants), using the SomaScan aptamer-based platform.
6+
7+
Two sub-datasets are provided:
8+
9+
- **RAW** (`deCODE-proteomics-raw`): non-SMP-normalised SomaScan measurements.
10+
- **SMP** (`deCODE-proteomics-smp`): SMP-normalised SomaScan measurements.
11+
12+
For a full description of the dataset and methods, refer to [Eldjarn et al., 2023](https://www.nature.com/articles/s41586-023-06563-x).
13+
14+
::: gentropy.datasource.decode.deCODEDataSource
15+
::: gentropy.datasource.decode.deCODEPublicationMetadata
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: deCODE Aptamer Metadata
3+
---
4+
5+
::: gentropy.datasource.decode.aptamer_metadata.AptamerMetadata
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
title: deCODE manifest
3+
---
4+
5+
::: gentropy.datasource.decode.manifest.deCODEManifest
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: deCODE Study Index
3+
---
4+
5+
::: gentropy.datasource.decode.study_index.deCODEStudyIdParts
6+
::: gentropy.datasource.decode.study_index.deCODEStudyIndex
Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,6 @@
1+
---
2+
title: deCODE Summary Statistics
3+
---
4+
5+
::: gentropy.datasource.decode.summary_statistics.deCODEHarmonisationConfig
6+
::: gentropy.datasource.decode.summary_statistics.deCODESummaryStatistics

0 commit comments

Comments
 (0)