Skip to content

Add support for the Jelly binary RDF format #3349

@Ostrzyciel

Description

@Ostrzyciel

Jelly is a binary RDF format designed for fast serialization and deserialization of RDF graphs and datasets, in a streaming manner, while maintaining a good compression ratio. It's currently implemented for Java and Python, with an experimental implementation for Rust.

The pyjelly package already supports RDFLib, using the plugin system. You can install it easily with pip install pyjelly[rdflib]

I've opened this issue, following a suggestion made by @nicholascar in RDFLib/pySHACL#303 to add Jelly support directly to RDFLib.

We've looked at this internally (together with @lapkinvladimir), and we think that's a great idea, but there are a few questions to iron out.

  • We think the best approach here would be to add a dependency on pyjelly[rdflib] in rdflib.
  • Currently RDFLib's plugin system does not work 100% seamlessly with Jelly, you have to import a specific package to make format autodetection based on the file extension work properly (see documentation). We can most likely fix that with an easy one-line change in rdflib, while we are adding the new dependency.
  • pyjelly adds dedicated interfaces for serializing and parsing files in a streaming manner – this was already discussed in Streaming parsers #1560. You can stream quad-by-quad, or dataset-by-dataset – see the docs here: https://w3id.org/jelly/pyjelly/dev/getting-started/#parsing-a-stream-of-graphs
    • The way we see this, introducing streaming parser support for other RDF formats is a larger topic that should be tackled separately. We hope that our APIs can serve as inspiration (you can also copy them, it's Apache 2.0).
    • In the future, when (and if) RDFLib introduces streaming parser/serializers APIs, we would be very happy to deprecate and remove these APIs in pyjelly, and ask users to switch to the RDFLib APIs.
  • Currently the lowest supported version of Python in RDFLib is 3.9. The latest version of pyjelly (0.7.x) supports only Python 3.10 and up. pyjelly 0.6.x supported Python 3.9 as well, but we dropped support for it due to Py 3.9 becoming EOL and therefore unsafe to use.
    • We can either add a dependency on pyjelly 0.6.x, or wait for RDFLib to upgrade its minimal Python version to 3.10.
    • I should note here that pyjelly 0.7.x introduces large performance improvements, due to the use of pre-compiled wheels using mypyc, so it would be great if we could use that.

@nicholascar what do you think about this plan?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions