Skip to content

Join between non-RDF and RDF data on the subject position #12

@LorenzBuehmann

Description

@LorenzBuehmann

Currently, RDF data is parsed as URI and put into a DataFrame with (shortened) URIs.

Consider the N-Triples

<http://example.org/a1> <http://example.org/p1> <http://example.org/b1> .
<http://example.org/a1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/A> .

and the mapping

<#AMapping>
	rml:logicalSource [
		rml:source "/tmp/datalake-test/a.nt";
		nosql:store nosql:rdf
	];
	rr:subjectMap [
		rr:template "{id}";
		rr:class ex:A
	];

	rr:predicateObjectMap [
		rr:predicate ex:p1;
		rr:objectMap [rml:reference "example.org/p1"]
	] .

the data will be converted to this DataFrame

root
|-- id: string (nullable = true)
|-- example.org/p1: string (nullable = true)

+--------------+--------------+
|id |example.org/p1|
+--------------+--------------+
|example.org/a1|example.org/b1|
+--------------+--------------+

The problem now is, any other data is just handled by the plain values contained the the corresponding datasource, i.e. it's never handled internally as URI as one would expect by the RML mappings.

Consider the CSV file

nr,p2
b1,c1
b2,c2
b3,c3

and the mapping

<#BMapping>
	rml:logicalSource [
		rml:source "/tmp/datalake-test/b.csv";
		nosql:store nosql:csv
	];
	rr:subjectMap [
		rr:template "http://example.org/{nr}";
		rr:class ex:B
	];

	rr:predicateObjectMap [
		rr:predicate ex:p2;
		rr:objectMap [rml:reference "p2"]
	] .

the DataFrame will just be

root
 |-- nr: string (nullable = true)
 |-- p2: string (nullable = true)

+---+---+
|nr |p2 |
+---+---+
|b1 |c1 |
|b2 |c2 |
|b3 |c3 |
+---+---+

Clearly, any join would fail and result in an empty DataFrame:

prefix ex: <http://example.org/>

select * where {
  ?s a ex:A ;
       ex:p1 ?o .
   ?o ex:p2 ?o1 .
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions