bento-mdb-external

Repo for managing imports of external terminologies into MDB.

1. Uberon Ontology Integration Pipeline

Overview

This project integrates Uberon anatomical ontology terms and their synonyms into the Model DataBase (MDB) graph.

Uberon is a multi-species anatomy ontology maintained by the Open Biological and Biomedical Ontologies (OBO) Foundry and widely used in biomedical datasets for standardized anatomical terminology.

Steps involved in this workflow are:

Extract relevant anatomical terms from Uberon
Convert ontology data into a format compatible for ingestion in the MDB data model
Insert Uberon terms into the graph as Term nodes
Insert synonym terms as additional Term nodes
Create Concept nodes** that link primary terms with their synonyms
Tag mappings to indicate their source ontology
Eventually connect these concepts to Value Set nodes so they can map to GC terms in the broader MDB ecosystem
Capture all executed Cypher queries so the ingestion process can be replayed on another MDB environment

This repository contains the full pipeline used to ingest Uberon anatomy terms into MDB.

Workflow Overview

The ingestion pipeline follows these stages:

Uberon OWL Ontology -> SPARQL Extraction -> CSV Ontology Dataset -> Flattening + Normalization -> JSON Flat Term Objects
Python Ingestion Scripts -> Neo4j MDB Graph -> Term Nodes, Concept Nodes, Tag Nodes -> Further Integration: Value Sets → GC Term

Step 1 — Selecting the Uberon Dataset The first step was identifying the appropriate Uberon dataset to meet MDB user requirements.

Uberon provides ontology releases in several formats:

OWL
OBO
RDF
JSON

For this pipeline we selected the latest OWL release for human view data because:

It contains the full ontology structure for human anatomy (relevant to most regularly referenced terms in other data models)
It preserves synonym relationships
It allows flexible querying via SPARQL

Source: https://uberon.github.io/downloads.html#subsets

The OWL file contains:

anatomical entities
definitions
cross references
exact synonyms
related synonyms
ontology relationships

However, the OWL format is not directly usable for ingestion into MDB, so the ontology needed to be queried with sparql, extracted and transformed.

Step 2 — Extracting Data with SPARQL

To extract only the relevant ontology fields, SPARQL queries were written against the OWL file.

SPARQL was used to retrieve and store the following information into a csv file:

Field	Description
id	Uberon identifier
label	primary term
definition	textual definition
exact synonyms	curated exact synonym list
related synonyms	additional synonyms
xrefs	external references

Step 3 — Dataset Flattening

The CSV dataset contained nested synonym structures that are not compatible with the MDB format.

MDB requires:

one Term node per textual value
synonyms represented as separate terms
synonym relationships represented through Concept nodes

The dataset was flattened into JSON records containing:

flat main uberon term
flat synonym terms
synonym mappings

Example for mappings json:

{
  "primary_value": "liver",
  "origin_id": "UBERON:0002107",
  "origin_name": "UBERON",
  "synonyms": [
    { "value": "hepatic organ" },
    { "value": "hepatic tissue" }
  ]
}

Step 4 - Inserting Terms

Primary Uberon terms were inserted into Neo4j as Term nodes using a Python ingestion script.

Term properties include:

Property	Description
value	term text
handle	normalized identifier
origin_name	ontology source
origin_id	ontology ID
origin_definition	ontology definition
external_references	ontology external reference
nanoid	unique identifier
origin_version	version number
_commit	ingestion commit

Step 5 — Inserting Synonym Terms

Synonyms were inserted as additional Term nodes.

This ensures:

synonym reuse across ontologies
normalized graph structure
easier semantic linking

Step 6 — Linking Terms with Concepts

Synonyms are linked to their primary terms through Concept nodes.

Structure:

(term) → represents → (concept) ← represents ← (term) (concept) → has_tag → (tag)

Concept nodes group synonymous terms while allowing additional metadata tags to be attached. Each concept created from Uberon mappings is associated with a Tag node indicating the mapping source.

Example:

mapping_src = "uberon"

This allows filtering or auditing ontology mappings by source.

Step 7. Value Set Integration

(WIP)

Repository Structure

Example project structure:

uberon_ingestion ├── data/ │ ├── human_view_clean.csv │ ├── uberon_terms_main.yaml // converted from human_view_clean.csv using script "uberon_csv_to_yaml_json.py" | ├── uberon_terms.json // converted from uberon_terms_main.yaml using script "uberon_csv_to_yaml_json.py" │ ├── uberon_synonyms new.yaml // converted from human_view_clean.csv using script "uberon_csv_to_yaml_json.py" │ ├── uberon_synonyms.json // converted from uberon_synonyms new.yaml using script "uberon_csv_to_yaml_json.py" │ ├── uberon_synonym_mapping.json // mapping made to connect terms with synonyms, source: human_view_clean.csv, script: build_syn_mappings.py │ ├── scripts/ │ ├── build_syn_mappings.py //used to create the mapping file │ ├── insert_uberon_relationships.py // used to merge relationships, create concept nodes and tag nodes │ ├── insert_uberon_terms.py // used to merge term nodes and synonym nodes │ ├── test_connection.py // used to ensure connection │ ├── uberon_query.sparql // contains sparql query to extract data from .owl file │ ├── sanity_check.sparql // contains counting and tallying of data from .owl file │ ├── logs/ │ └── cypher_queries.log (WIP) │ └── README.md

Directory Description

Directory	Purpose
`data/`	Stores ontology source files and intermediate datasets produced during extraction and transformation.
`scripts/`	Contains Python scripts and SPARQL queries used for ontology extraction, transformation, and ingestion.
`logs/`	Stores logged Cypher queries executed during ingestion so the process can be replayed on another MDB instance.
`README.md`	Documentation describing the pipeline, workflow, and project structure.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bento-mdb-external

1. Uberon Ontology Integration Pipeline

Overview

Workflow Overview

Repository Structure

Directory Description

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

bento-mdb-external

1. Uberon Ontology Integration Pipeline

Overview

Workflow Overview

Repository Structure

Directory Description