-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Description
Currently, RDF data is parsed as URI and put into a DataFrame with (shortened) URIs.
Consider the N-Triples
<http://example.org/a1> <http://example.org/p1> <http://example.org/b1> .
<http://example.org/a1> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://example.org/A> .
and the mapping
<#AMapping>
rml:logicalSource [
rml:source "/tmp/datalake-test/a.nt";
nosql:store nosql:rdf
];
rr:subjectMap [
rr:template "{id}";
rr:class ex:A
];
rr:predicateObjectMap [
rr:predicate ex:p1;
rr:objectMap [rml:reference "example.org/p1"]
] .
the data will be converted to this DataFrame
root
|-- id: string (nullable = true)
|-- example.org/p1: string (nullable = true)
+--------------+--------------+
|id |example.org/p1|
+--------------+--------------+
|example.org/a1|example.org/b1|
+--------------+--------------+
The problem now is, any other data is just handled by the plain values contained the the corresponding datasource, i.e. it's never handled internally as URI as one would expect by the RML mappings.
Consider the CSV file
nr,p2
b1,c1
b2,c2
b3,c3
and the mapping
<#BMapping>
rml:logicalSource [
rml:source "/tmp/datalake-test/b.csv";
nosql:store nosql:csv
];
rr:subjectMap [
rr:template "http://example.org/{nr}";
rr:class ex:B
];
rr:predicateObjectMap [
rr:predicate ex:p2;
rr:objectMap [rml:reference "p2"]
] .
the DataFrame will just be
root
|-- nr: string (nullable = true)
|-- p2: string (nullable = true)
+---+---+
|nr |p2 |
+---+---+
|b1 |c1 |
|b2 |c2 |
|b3 |c3 |
+---+---+
Clearly, any join would fail and result in an empty DataFrame:
prefix ex: <http://example.org/>
select * where {
?s a ex:A ;
ex:p1 ?o .
?o ex:p2 ?o1 .
}