[Milestone Overview] [Previous Milestone: Tabular Data] [Next Milestone: Alignment]
Wir haben das Reasoning über die Ontologie aus Ontotext Refine erst in GraphDB durchgeführt. Da wir jedoch festgestellt haben, dass entgegen der GraphDB-Statistik im Bild unten die Anzahl der Statements in der Turtle-Datei result-triples.ttl vor dem Reasoning und in der Datei statements-graphdb.ttl nach dem Reasoning gleich geblieben ist, haben wir uns entschieden, das Reasoning mit im Protege durchzuführen.
Datei aus Ontotext Refine, vor reasoning: result-triples.ttl
Datei aus GraphDB, nach reasoning: statements-graphdb.ttl
GraphDB führt direkt nach dem Import eines Graphen (Datei im .ttl-Format) automatisch Reasoning aus.
In der Abbildung ist die Übersicht unseres GraphDB-Repositories zu sehen, in das unsere in Ontotext Refine erstellte Turtle-Datei als Wissensgraph importiert wurde. Zu sehen ist, dass die importierte Datei insgesamt 45,433 RDF-Statements enthält. Durch das Reasoning von GraphDB wurden zusätzlich 26,406 Statements inferiert.
Wir haben das Reasoning von GraphDB ebenfalls mit Rapper (Raptor RDF Syntax Library) überprüft. Rapper ist ein Kommandozeilen-Tool, das RDF-Daten in verschiedenen Formaten (z.B. Turtle, N-Triple) verarbeiten kann. Wir nutzen es, um die Anzahl der Statements in den beiden Turtle Dateien zu zählen, die wir in Ontotext Refine erstellt haben und die von GraphDB durch Reasoning erweitert wurden.
Datei vor reasoning (result-triples.ttl):
$ rapper -i turtle result-triples.ttl -q -o ntriples | sort | uniq | wc -l
> 45433Datei nach reasoning (statements-graphdb.ttl):
$ rapper -i turtle statements-graphdb.ttl -q -o ntriples | sort | uniq | wc -l
> 45433Zu sehen ist, dass die Anzahl der Statements in der Datei result-triples.ttlvor dem Reasoning 45,433 beträgt. Nach dem Reasoning hat sich die Anzahl laut rapper nicht verändert, was darauf schließen lässt, dass GraphDB kein Reasoning durchgeführt hat. Laut GraphDB wurden jedoch 26,406 Statements inferiert, was darauf hindeutet, dass GraphDB Reasoning durchgeführt hat.
Aufgrund der Diskrepanz zwischen den Ergebnissen von GraphDB und Rapper haben wir uns entschieden, das Reasoning mit Protege durchzuführen.
Datei aus Ontotext Refine, vor reasoning: result-triples.ttl
Datei aus Protege, nach reasoning: statements-protege.ttl
- Wir haben die Datei
result-triples.ttlin Protege importiert. - Wir haben den Reasoner "HermiT" ausgewählt und das Reasoning gestartet.
------------------------------- Running Reasoner ------------------------------- Pre-computing inferences: - class hierarchy - object property hierarchy - data property hierarchy - class assertions - object property assertions - same individuals Ontologies processed in 894 ms by HermiT - Prüfung mit Rapper
$ rapper -i turtle result-triples.ttl -q -o ntriples | sort | uniq | wc -l
> 45433Datei nach reasoning (statements-protege.ttl):
$ rapper -i turtle statements-protege.ttl -q -o ntriples | sort | uniq | wc -l
> 50950Zu sehen ist, dass die Anzahl der Statements nach dem Reasoning in Protege von 45,433 auf 50,950 angestiegen ist. Dies bestätigt, dass Protege erfolgreich Reasoning durchgeführt hat und zusätzliche inferierte Aussagen generiert wurden.
Bisher war in GraphDB die Ontologie result-triples.ttl importiert, die aus Ontotext Refine exportiert wurde. Diese Datei enthielt jedoch keine inferierten Aussagen. Nach dem Reasoning in Protege haben wir die Datei statements-protege.ttl erstellt, die nun alle inferierten Aussagen enthält. Diese Datei wurde in GraphDB importiert und ersetzt die vorherige Datei. Demnach stimmt die Abbildung die oben zu sehen ist, nicht mehr mit der aktuellen Statistik überein.
To query all restaurants that sell pizzas without tomato, we created the following query:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT ?establishmentName ?pizzaName ?address ?cityName ?stateName
WHERE
{
?establishment a group2:Establishment ;
group2:offersMenuItem ?pizza ;
group2:hasName ?establishmentName .
?pizza a group2:Pizza ;
group2:hasName ?pizzaName .
FILTER NOT EXISTS
{
?pizza group2:hasIngredient ?ingredient .
FILTER(LCASE(STR(?ingredient)) = "tomato")
}
FILTER NOT EXISTS
{
?pizza group2:hasName ?name
FILTER(REGEX(LCASE(STR(?name)), "tomato"))
}
FILTER NOT EXISTS
{
?pizza group2:hasDescription ?description
FILTER(REGEX(LCASE(STR(?description)), "tomato"))
}
OPTIONAL
{
?establishment group2:hasStreetAddress ?address .
?establishment group2:locatedInCity ?city .
?city a group2:City ;
group2:hasName ?cityName .
?establishment group2:locatedInState ?state .
?state a group2:State ;
group2:hasName ?stateName .
}
}
This query is designed to identify pizzas that are explicitly offered by an establishment and do not contain the ingredient 'tomato'. To ensure case insensitivity, all textual checks are performed using lowercase representations. A pizza is excluded from the results if the word "tomato" appears in its menu item name, description, or listed ingredients. We return the address, the city and the state for 'detailed' information about the restaurant / establishment. We decided to make this an 'optional' because we would like to provide all establishments that offer such a pizza, even if the establishment would not be able to be located. We also returned the actual names of the establishments, the pizzas, the cities and the states for better readability.
The csv file for this task can be found under: restaurant-pizzas-without-tomato.csv
We received feedback that we should solely rely on our class group2:Tomato for filtering pizzas without tomato. Therefore we hereby provide an alternative query that uses our ingredient class group2:Tomato:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT ?establishmentName ?pizzaName ?address ?cityName ?stateName
WHERE
{
?establishment a group2:Establishment ;
group2:offersMenuItem ?pizza ;
group2:hasName ?establishmentName .
?pizza a group2:Pizza ;
group2:hasName ?pizzaName .
FILTER NOT EXISTS
{
?pizza group2:hasIngredient group2:Tomato .
FILTER(LCASE(STR(?ingredient)) = "tomato")
}
OPTIONAL
{
?establishment group2:hasStreetAddress ?address .
?establishment group2:locatedInCity ?city .
?city a group2:City ;
group2:hasName ?cityName .
?establishment group2:locatedInState ?state .
?state a group2:State ;
group2:hasName ?stateName .
}
} The results of this query can be found in the file: restaurant-pizzas-without-tomato-v2.csv
To query the average price of a Margherita pizza, we created the following query:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT (AVG(?price) AS ?averagePrice)
WHERE
{
?pizza a group2:MargheritaPizza ;
group2:hasPrice ?price .
}Result: 15.511943 (currency: USD)
Because we already clustered and mapped pizzas containing the phrase 'margherita' we can simply get all margherita pizzas by the class group2:MargheritaPizza. Also, we only have USD as currency, so we don't have to worry about different currencies.
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT (AVG(?price) AS ?averagePrice)
WHERE
{
?pizza a group2:Pizza ;
group2:hasName ?name ;
group2:hasPrice ?price .
FILTER(CONTAINS(LCASE(?name), "margherita"))
} We decided to make another query that queries all pizzas that are an instance of our pizza class. The pizza would need to have a name and a price. The name of the pizza should also contain the phrase 'margherita', again regardless of the capitalization.
Our query returned a prize of 15.350257 (currency: USD), which is slightly differs.
If we would replace the filter line with:
FILTER(CONTAINS(?name, "Margherita Pizza"))Our price would change to 15.511943 (currency: USD) which verifies the first query that we proposed.
To query the number of restaurants by city, sorted by state and the number of restaurants, we created the following query:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT ?stateName ?cityName (COUNT(DISTINCT ?establishment) AS ?numberOfEstablishments)
WHERE
{
?establishment a group2:Establishment ;
group2:locatedInCity ?city .
?city a group2:City ;
group2:locatedInState ?state;
group2:hasName ?cityName .
?state a group2:State ;
group2:hasName ?stateName .
}
GROUP BY ?stateName ?cityName
ORDER BY ASC(?stateName)First we check if an establishment is located in a city. After that, we look in which state a city is located in. Then, we need to group by both the name of the state and the name of the city because both are located in the SELECT statement. Lastly we sort in ascending order by the name of the state.
We count the establishments distinctly, to make sure, that if the same establishment would appear several times in a city, it would not be counted multiple times.
If we would query without the distinct statement, we would get 1008 results, instead of 985 with the distinct statement.
To check the number we would use the following statement:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT (COUNT(DISTINCT ?establishment) AS ?numberOfEstablishments)
WHERE
{
?establishment a group2:Establishment ;
group2:locatedInCity ?city .
} To query the restaurants that are missing a postal code, we created the following query:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT ?establishmentName
WHERE
{
?establishment a group2:Establishment ;
group2:hasName ?establishmentName .
?establishment group2:locatedInCity ?city .
FILTER NOT EXISTS
{
?city group2:hasPostcode ?postalCode .
}
} We first check for all instances of the establishment category and retrieve their names.
Then we locate the city an establishment is located in, as our hasPostcode property connects only to city and not to an establishment directly.
Lastly, we check if the city associated with an entity is missing a postal code.
If this is the case, we return the names of these establishments.
We received feedback that we can encounter cases where a city has multiple postcodes. Then we cannot make a statement about the postcode of the establishment, because it is not directly connected to the postcode. Therefore we added the postcode property to the establishment class as well, so that we can query the postcode of an establishment directly.
With unionOf we can define that the property hasPostcode can be used for both, cities and establishments.
group2:hasPostcode a owl:DatatypeProperty ;
rdfs:domain [ a owl:Class ;
owl:unionOf ( group2:City group2:Establishment ) ] ;
rdfs:range xsd:string ;
rdfs:label "has postcode" ;
rdfs:comment "A city has a postcode represented as a string." .
Unfortunately, Ontotext Refine does not support the owl:unionOf property, so we had to add the property manually in the Turtle file. First, we set the domain of the hasPostcode property in Ontotext Refine to owl:Thing, which allows mapping to any class. Then, we added the group2:hasPostcode property to the establishment class. Now we can export the RDF file from Ontotext Refine. The section regarding the hasPostcode property in the Turtle file looks like this:
group2:hasPostcode a owl:DatatypeProperty;
rdfs:label "has postcode";
rdfs:comment "A city has a postcode represented as a string.";
rdfs:domain owl:Thing;
rdfs:range xsd:string .We tried to restrict the domain of the hasPostcode to only cities and establishments. Therefore we manually added the owl:unionOf property to the hasPostcode property in the turtle file result-triples.ttl as follows:
group2:hasPostcode a owl:DatatypeProperty ;
rdfs:label "has postcode";
rdfs:comment "A city has a postcode represented as a string.";
rdfs:domain [ owl:unionOf ( group2:City group2:Establishment ) ] ;
rdfs:range xsd:string .Unfortunately, GraphDB did not support the owl:unionOf property, so we had to remove it again.
We decided to keep the domain of the hasPostcode property as owl:Thing and to query the postcode of an establishment directly. The query for establishments without a postcode now looks like this:
PREFIX group2: <http://www.uni-jena.de/kg25/group2#>
SELECT ?establishmentName
WHERE
{
?establishment a group2:Establishment ;
group2:hasName ?establishmentName .
FILTER NOT EXISTS
{
?establishment group2:hasPostcode ?postalCode .
}
} Although we now set the domain of the hasPostcode property to owl:Thing we therefore solved the problem that can occur when a city has multiple postcodes. In fact this reworked ends up querying two establishments more (12 vs. 14) than the approach without the integration of the feedback.
A third approach would be to create a new class group2:ThingWithPostcode that would be the superclass of both group2:City and group2:Establishment. This way, we could set the domain of the hasPostcode property to group2:ThingWithPostcode. However, we decided to keep the second approach, as it is simpler and for the purposes of this task sufficient.
To verify the correctness of our SPARQL queries, we tried to ensure that everything was an entity of a class that we created. For the queries where it was possible, we wrote additional queries, like in Subtask SPARQL.3 and SPARQL.4 to compare with slightly altered queries.
The documentation was done right after making some changes to the queries to gather all necessary information and our ideas for creating them.
[Milestone Overview] [Previous Milestone: Tabular Data] [Next Milestone: Alignment]
