Skip to content

Improve speed of converting SQLite to FHIR #64

@joeflack4

Description

@joeflack4

Overview

I tried to convert HPO to FHIR using semsql as an intermediary. However, after about 40 minutes, I decided to give up and switch to Obographs for speed. I think it took about 10 minutes to convert to a .db, and the rest of the time in my process was just OAK trying to load the DB. Normally semsql is much faster to load than using rdflib, but not in this case. I looked and saw that my hpo.db was about 1GB, which is about 10x larger than my hpo.owl. I looked at some of my other conversions, and it looks like this 5-10x file size was normal.

If I'm correct that the issue is not so much OAK performance, but just the file size in general, is there anything we can do to reduce these file sizes? Or maybe it's not so much the size, but the structure that is taking OAK a long time to parse downstream? If this is more of an OAK issue (or both an OAK issue and a semsql issue), I can open up a ticket over there.

Potential causes

May be 1 or more of the following that's taking a lot of time.
a. Semsql: File size
b. Semsql: Non-optimal structures for downstream parsing
c. OAK: Not parsing optimally
d. OAK: Spending time doing things that are maybe not needed for my use case

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions