RDF Lists

There's a often-held belief within the software developer community that RDF is “not very good at representing lists” - and whilst this isn't necessarily true, it is easy to see why some people might reach that conclusion if they are coming in with an application-centric mindset. This article summarises the various ways that lists can be represented in RDF and dives into the pros and cons of each approach, contrasting this with what software engineers normally expect when they use list-like functionality.

What is a list?

Making a statement about the capabilities of RDF relating to lists requires us first to explore what we mean by a "list" because lists themselves are not a formally defined construct - we will do this through the lens of computer programming. Whilst most people would agree that lists are “collections of items”, there is also nuance within that definition, such as...

Is the order of the list important?
Can the list contain duplicate elements?
Is the list immutable?

Coming at these questions from an application centric perspective, most garden-variety software engineers wouldn't see this as a problem because typically these are distinctions that can be worked out in the application layer.

Take the following constructs within the python standard library.

Python Construct	Ordered?	Distinct Items?	Mutable?
`List`	Yes	No	Yes
`Tuple`	Yes	No	No
`Range`	Yes	Yes	No
`Set`	No	Yes	Yes
`FrozenSet`	No	Yes	No

As an aside, the distinction here between sets and lists is important. Sets are mathematical constructs which where order and uniqueness are disregarded entirely - observe the behavior of the python set object…

>>> {1,2,3} == {1,3,2}
True
>>> {1,2,3} == {1,1,1,1,3,3,3,1,2,1,1,2,2,2,3,3}
True

The keen-eyed will note that there are gaps in the implementation here - there is for example no built-in data structure for representing “unordered lists” or “ordered sets” (these constructs can be created by Python developers if they need them, but are not provided in the standard library).

The absence of these particular use cases does not seem to have harmed Python's popularity - the designers of Python have just chosen to omit them based on what they observed to be the most common requirements. The pythonic view of collections and sequences of items is something unique to python - you have no guarantee that lists will be treated the same if you go to another programming language (in fact, you will be hard pressed to find two programming languages that provide the exact same interface over lists).

This brings us on to the subject of lists in data. Data is commonly used as a method of information exchange and as such we expect it to be technology agnostic - this ensures that data providers don't force technology choices upon data consumers. Take the example of something like a JSON file, the de-facto standard for modern information exchange. A consumer should be able to work with this whether they're using Java, Rust, Python or any other programming language.

JSON is defined by ECMA 404 and supports exactly one type of list called an array. Array's members are typically assumed by deserializers to be ordered, non-distinct and open for expansion.

{
    "id": 1,
    "things": ["one", "two", "three"]
}

How, using this standard, do I communicate to a user that this array should only contain unique values? How can I communicate that this is “the entire list” and should never be changed? How can I communicate whether the order of items does or doesn't matter? The answer to these questions is simple: you can't. JSON itself provides no convention or mechanism to express this kind of information natively - instead, consumers are just expected to “somehow know” and to handle these constraints at point-of-consumption. Data providers could communicate this information using non-native methods that they created themselves (such as documentation or via some kind of invented standard) but as a solution this is sub-optimal unless it's based on a widely used standard - it only serves to paper over the shortcomings of the JSON data exchange format - it simply isn't that expressive when it comes to lists (or in fact any forms of metadata).

Let's now shift to look at how lists and collections can be represented in RDF.

Lists in RDF

If we are just talking about native capabilities of the RDF data model, there are no concrete data structures called “lists” - an RDF graph is, and will only ever be, a set of triples (Node -> Edge -> Node). That said, within that data structure we can certainly represent a particular type of list straight out of the box:

:bob :knows :tom.
:bob :knows :dick.
:bob :knows :harry.

The RDF standards provide no way of interpreting these three triples as meaning anything other than "three people that :bob :knows" - essentially, this is a list. It's not closed, is assumed to be mutable, is unordered and (given that RDF graphs are a set) is distinct.

So on the surface, RDF would seem to possess the same limited scope for lists that JSON does... but RDF has a secret weapon: it is extendable via controlled vocabularies. These vocabularies allow for standards to be unambiguously coupled to data - something that is not possible in most other data formats. Among these standards are several that are recommended by W3C, including the RDFS (RDF-Schema) standard which is very closely married to RDF and contains several terms relating to the creation of lists.

RDFS Collections and Containers

The RDFS standard defines two distinct types of list-like terminology: Containers and Collections. The distinction between these two is important and can best be expressed as relating directly to graph topologies and the various pros and cons associated with those topologies.

We'll consider each in turn, starting with Collections.

RDF Collections

RDF Collections follow a linked-list graph topology and only defines one subclass, which is rdf:List. Each node in an rdf:List points to exactly two things:

The data this node represents, here indicated by rdf:first (sometimes called the “head”)
The next node in the list here indicated by rdf:rest (sometimes called the “tail”).

The term rdf:nil can be used to represent an empty list - point to this with the final item in your list in order to indicate that the list has ended. To represent, for example, the fact that that bob has three friends via an rdf:List, you would include the following triples:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
@prefix : <http://www.example.com>.

ex:bob ex:hasFriends _:b1.
_:b1 rdf:first ex:tom; rdf:rest _:b2.
_:b2 rdf:first ex:dick; rdf:rest _:b3.
_:b3 rdf:first ex:harry; rdf:rest rdf:nil.

Which would relate a graph topology that looks something like this.

flowchart LR
    Bob[:bob] -->|:hasFriends| BNode1(node)
    BNode1 -->|rdf:first| Tom[:tom]
    BNode1 -->|rdf:rest| BNode2[node]
    BNode2 -->|rdf:first| Dick[:dick]
    BNode2 -->|rdf:rest| BNode3[node]
    BNode3 -->|rdf:first| Harry[:harry]
    BNode3 -->|rdf:rest| Nil[rdf:nil]

There is a shorthand within the turtle syntax that you can use to create lists, and it's fairly subtle:

PREFIX : <http://example.com/>
:bob :hasFriends (:tom :dick :harry).

What often scares people is that they write out a simple looking syntax like this and end up with the complicated graph topology seen above - but it is a topology that carries certain advantages, which we'll dig into now.

Firstly: let's unpack some of the facets of this particular structure.

It is closed, in that we can see that the list has finite items and we have ways of calculating which are the first and last items.
Its members do not need to be unique - we can add the same item multiple times in the list.
It is ordered - starting at :bob there is only one direction of travel (in hindsight, using an ordered list for Bob's friends probably wasn't the most appropriate choice - not unless bob is ranking his friends somehow!).

Being “closed” is particularly important with regards to RDF reasoners because most reasoners operate on the open world assumption - meaning that they will only perform inference if they're sure they have all the facts. Every time a reasoner finds itself saying “I don't know”, it will simply do nothing and this applies for when it's figuring out if it's reached the end of a list.

We also have certain other advantages when it comes to working with this data directly, and they're much the same advantages that linked lists carry:

Inserting or deleting items within the list is computationally cheaper because we only need to make changes to the nodes immediately adjacent to the one we're working with.
Our list is decentralized - it can exist across multiple locations without those locations having to worry about some kind of unifying “hub node” that ties all of its members together.

It's not all good news though:

You cannot use this construct to represent an un-ordered list (the spec only lets each node have a single first/rest pointer).
Expensive to figure out if two or more members exist in the same list (you need to traverse the list to find this out).
The “append” operation (adding an item to the end of the list) is also slightly more expensive than it is when using rdf:Containers.

We'll take a look at rdf:Containers now.

RDF Containers

RDF Containers follow a hub and spoke topology - a central resource exists (the collection) and all other items are asserted to be an rdf:member of that container. As you would expect, the pros and cons of this approach are very closely related to the pros and cons of the hub and spoke topology more generally.

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX : <http://example.com/>

:bob :hasFriends :bobsFriends.

:bobsFriends a rdfs:Container;
  rdfs:member :tom, :dick, :harry.

Which would create a graph topology that looks something like this:

flowchart LR
    Bob[:bob] -->|:hasFriends| Container[rdfs:Container]
    Container -->|rdfs:member| Tom[:tom] 
    Container -->|rdfs:member| Dick[:dick]
    Container -->|rdfs:member| Harry[:harry]

On the surface this gives us an unordered list with distinct members (because RDF graphs are sets remember!). The list is also open, because it gives us no indication of a beginning or an end.

There are a few variants available within the RDFS Container ontology that expand on these capabilities.

Firstly; the rdfs:member property has a special subProperty called rdfs:ContainerMembershipProperty which is a resource that's assumed to have instances that look like this: rdf:_1, rdf:_2, rdf:_3... These can be used to indicate ordering within the container, like so:

:bobsFriends a rdfs:Container;
  rdf:_1 :tom;
  rdf:_2 :dick; 
  rdf:_3 :harry.

This changes the properties of the graph significantly - we now have an indication that the list is ordered and we can now add the same element multiple times in the same list. We can further indicate how this container should be interpreted by using one of the three provided subclasses within the RDFS specification: rdf:Bag, rdf:Seq and rdf:Alt.

rdf:Seq - indicates that this container is intended to be ordered.
rdf:Bag - indicates that this container is intended to be unordered (even if you're using numerical membership properties)
rdf:Alt - indicates that this is a list that you typically only need one value from, with the default value being the first (you might use this to indicate the various alternative names for something in order of preference).

Worth noting that these three subclasses are indicated in the spec to only serve as informal indications about how applications should interpret them - formally they all behave identically to the rdfs:Container as detailed above.

The pros and cons of Containers largely mirror those of Collections. You get a couple of key advantages...

Computationally inexpensive to add new items - you just insert a new triple.
Much simpler representation than rdf:collection.
Much less expensive to find if a member exists in the same container as another item (you can just directly query for the existence of a relationship between the “hub” and any “spokes” you are interested in).

As with collections, these advantages do not come for free…

Being an open list, there's very little you can do in terms of reasoning because a reasoner can never know that it's looking at the complete list.
It's much more expensive to insert and delete things with an ordered container than an ordered collection - we'd need to update not only the adjacent items but also every item in the list after the one we affected.
The model is centralized - all members need to be aware of some central “collection” resource that exists somewhere.

In many ways, the pros and cons of containers vs collections are mirrored!

Conclusion

RDF and its associated ecosystem provides many ways to represent lists. Just as with the Python programming language, RDF doesn't cater for every possible list-related use case, but it does provide a way of expressing more types of list than most other data formats. The extensible nature of RDF also means you can also accommodate any new standards or requirements if you need to, and you can do so in an externalized, application-agnostic way via controlled vocabularies.

RDF lists may be more complicated, but this comes with the advantage of them also being more expressive.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RDF Lists

What is a list?

Lists in RDF

RDFS Collections and Containers

RDF Collections

RDF Containers

Conclusion

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Overview

Linked Data Learning Resources

Vocabulary Overviews

Clone this wiki locally