Querying and Reasoning

[Milestone Overview] [Previous Milestone: Tabular Data] [Next Milestone: Alignment]

Subtask SPARQL.1

Wir haben das Reasoning über die Ontologie aus Ontotext Refine erst in GraphDB durchgeführt. Da wir jedoch festgestellt haben, dass entgegen der GraphDB-Statistik im Bild unten die Anzahl der Statements in der Turtle-Datei result-triples.ttl vor dem Reasoning und in der Datei statements-graphdb.ttl nach dem Reasoning gleich geblieben ist, haben wir uns entschieden, das Reasoning mit im Protege durchzuführen.

GraphDB Reasoning

Datei aus Ontotext Refine, vor reasoning: result-triples.ttl

Datei aus GraphDB, nach reasoning: statements-graphdb.ttl

GraphDB führt direkt nach dem Import eines Graphen (Datei im .ttl-Format) automatisch Reasoning aus.

In der Abbildung ist die Übersicht unseres GraphDB-Repositories zu sehen, in das unsere in Ontotext Refine erstellte Turtle-Datei als Wissensgraph importiert wurde. Zu sehen ist, dass die importierte Datei insgesamt 45,433 RDF-Statements enthält. Durch das Reasoning von GraphDB wurden zusätzlich 26,406 Statements inferiert.

Wir haben das Reasoning von GraphDB ebenfalls mit Rapper (Raptor RDF Syntax Library) überprüft. Rapper ist ein Kommandozeilen-Tool, das RDF-Daten in verschiedenen Formaten (z.B. Turtle, N-Triple) verarbeiten kann. Wir nutzen es, um die Anzahl der Statements in den beiden Turtle Dateien zu zählen, die wir in Ontotext Refine erstellt haben und die von GraphDB durch Reasoning erweitert wurden.

Datei vor reasoning (result-triples.ttl):

$ rapper -i turtle result-triples.ttl -q -o ntriples | sort | uniq | wc -l
> 45433

Datei nach reasoning (statements-graphdb.ttl):

$ rapper -i turtle statements-graphdb.ttl -q -o ntriples | sort | uniq | wc -l
> 45433

Zu sehen ist, dass die Anzahl der Statements in der Datei result-triples.ttlvor dem Reasoning 45,433 beträgt. Nach dem Reasoning hat sich die Anzahl laut rapper nicht verändert, was darauf schließen lässt, dass GraphDB kein Reasoning durchgeführt hat. Laut GraphDB wurden jedoch 26,406 Statements inferiert, was darauf hindeutet, dass GraphDB Reasoning durchgeführt hat.

Aufgrund der Diskrepanz zwischen den Ergebnissen von GraphDB und Rapper haben wir uns entschieden, das Reasoning mit Protege durchzuführen.

Protege Reasoning

Datei aus Ontotext Refine, vor reasoning: result-triples.ttl

Datei aus Protege, nach reasoning: statements-protege.ttl

Wir haben die Datei result-triples.ttl in Protege importiert.

Wir haben den Reasoner "HermiT" ausgewählt und das Reasoning gestartet.

------------------------------- Running Reasoner ------------------------------- 
Pre-computing inferences: 
    - class hierarchy 
    - object property hierarchy 
    - data property hierarchy 
    - class assertions 
    - object property assertions 
    - same individuals 
Ontologies processed in 894 ms by HermiT

Prüfung mit Rapper

$ rapper -i turtle result-triples.ttl -q -o ntriples | sort | uniq | wc -l
> 45433

Datei nach reasoning (statements-protege.ttl):

$ rapper -i turtle statements-protege.ttl -q -o ntriples | sort | uniq | wc -l
> 50950

Zu sehen ist, dass die Anzahl der Statements nach dem Reasoning in Protege von 45,433 auf 50,950 angestiegen ist. Dies bestätigt, dass Protege erfolgreich Reasoning durchgeführt hat und zusätzliche inferierte Aussagen generiert wurden.

Wichtige Änderung

Bisher war in GraphDB die Ontologie result-triples.ttl importiert, die aus Ontotext Refine exportiert wurde. Diese Datei enthielt jedoch keine inferierten Aussagen. Nach dem Reasoning in Protege haben wir die Datei statements-protege.ttl erstellt, die nun alle inferierten Aussagen enthält. Diese Datei wurde in GraphDB importiert und ersetzt die vorherige Datei. Demnach stimmt die Abbildung die oben zu sehen ist, nicht mehr mit der aktuellen Statistik überein.

Subtask SPARQL.2

To query all restaurants that sell pizzas without tomato, we created the following query:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT ?establishmentName ?pizzaName ?address ?cityName ?stateName  
WHERE 
{ 
    ?establishment a group2:Establishment ;
                   group2:offersMenuItem ?pizza ;
    			   group2:hasName ?establishmentName .
    ?pizza a group2:Pizza ;
           group2:hasName ?pizzaName .
    FILTER NOT EXISTS 
    {
        ?pizza group2:hasIngredient ?ingredient .
        FILTER(LCASE(STR(?ingredient)) = "tomato")
    }
    FILTER NOT EXISTS
    {
        ?pizza group2:hasName ?name
        FILTER(REGEX(LCASE(STR(?name)), "tomato"))
    }
    FILTER NOT EXISTS
    {
        ?pizza group2:hasDescription ?description
        FILTER(REGEX(LCASE(STR(?description)), "tomato"))
    }
    OPTIONAL 
    {
        ?establishment group2:hasStreetAddress ?address .
        ?establishment group2:locatedInCity ?city .
        ?city a group2:City ;
              group2:hasName ?cityName .
        ?establishment group2:locatedInState ?state .
        ?state a group2:State ;
               group2:hasName ?stateName .
    }
}

This query is designed to identify pizzas that are explicitly offered by an establishment and do not contain the ingredient 'tomato'. To ensure case insensitivity, all textual checks are performed using lowercase representations. A pizza is excluded from the results if the word "tomato" appears in its menu item name, description, or listed ingredients. We return the address, the city and the state for 'detailed' information about the restaurant / establishment. We decided to make this an 'optional' because we would like to provide all establishments that offer such a pizza, even if the establishment would not be able to be located. We also returned the actual names of the establishments, the pizzas, the cities and the states for better readability.

The csv file for this task can be found under: restaurant-pizzas-without-tomato.csv

Feedback Integration

We received feedback that we should solely rely on our class group2:Tomato for filtering pizzas without tomato. Therefore we hereby provide an alternative query that uses our ingredient class group2:Tomato:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT ?establishmentName ?pizzaName ?address ?cityName ?stateName  
WHERE 
{ 
    ?establishment a group2:Establishment ;
                   group2:offersMenuItem ?pizza ;
    			   group2:hasName ?establishmentName .
    ?pizza a group2:Pizza ;
           group2:hasName ?pizzaName .
    FILTER NOT EXISTS 
    {
        ?pizza group2:hasIngredient group2:Tomato .
        FILTER(LCASE(STR(?ingredient)) = "tomato")
    }
    OPTIONAL 
    {
        ?establishment group2:hasStreetAddress ?address .
        ?establishment group2:locatedInCity ?city .
        ?city a group2:City ;
              group2:hasName ?cityName .
        ?establishment group2:locatedInState ?state .
        ?state a group2:State ;
               group2:hasName ?stateName .
    }
}

The results of this query can be found in the file: restaurant-pizzas-without-tomato-v2.csv

Subtask SPARQL.3

To query the average price of a Margherita pizza, we created the following query:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT (AVG(?price) AS ?averagePrice)
WHERE
{ 
    ?pizza a group2:MargheritaPizza ;
           group2:hasPrice ?price .
}

Result: 15.511943 (currency: USD)

Because we already clustered and mapped pizzas containing the phrase 'margherita' we can simply get all margherita pizzas by the class group2:MargheritaPizza. Also, we only have USD as currency, so we don't have to worry about different currencies.

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT (AVG(?price) AS ?averagePrice)
WHERE
{ 
    ?pizza a group2:Pizza ;
		   group2:hasName ?name ;
           group2:hasPrice ?price .
    FILTER(CONTAINS(LCASE(?name), "margherita"))
}

We decided to make another query that queries all pizzas that are an instance of our pizza class. The pizza would need to have a name and a price. The name of the pizza should also contain the phrase 'margherita', again regardless of the capitalization.

Our query returned a prize of 15.350257 (currency: USD), which is slightly differs.

If we would replace the filter line with:

    FILTER(CONTAINS(?name, "Margherita Pizza"))

Our price would change to 15.511943 (currency: USD) which verifies the first query that we proposed.

Subtask SPARQL.4

To query the number of restaurants by city, sorted by state and the number of restaurants, we created the following query:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT ?stateName ?cityName (COUNT(DISTINCT ?establishment) AS ?numberOfEstablishments)
WHERE
{ 
    ?establishment a group2:Establishment ;
                   group2:locatedInCity ?city .
    ?city a group2:City ;
          group2:locatedInState ?state;
          group2:hasName ?cityName .
    ?state a group2:State ;
           group2:hasName ?stateName .
} 
GROUP BY ?stateName ?cityName
ORDER BY ASC(?stateName)

First we check if an establishment is located in a city. After that, we look in which state a city is located in. Then, we need to group by both the name of the state and the name of the city because both are located in the SELECT statement. Lastly we sort in ascending order by the name of the state.

We count the establishments distinctly, to make sure, that if the same establishment would appear several times in a city, it would not be counted multiple times. If we would query without the distinct statement, we would get 1008 results, instead of 985 with the distinct statement. To check the number we would use the following statement:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT (COUNT(DISTINCT ?establishment) AS ?numberOfEstablishments)
WHERE
{ 
    ?establishment a group2:Establishment ;
                   group2:locatedInCity ?city .
}

Subtask SPARQL.5

To query the restaurants that are missing a postal code, we created the following query:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT ?establishmentName
WHERE
{ 
    ?establishment a group2:Establishment ;
                   group2:hasName ?establishmentName .
    ?establishment group2:locatedInCity ?city .
    FILTER NOT EXISTS 
    {
        ?city group2:hasPostcode ?postalCode .
    }
}

We first check for all instances of the establishment category and retrieve their names. Then we locate the city an establishment is located in, as our hasPostcode property connects only to city and not to an establishment directly. Lastly, we check if the city associated with an entity is missing a postal code. If this is the case, we return the names of these establishments.

Feedback Integration

We received feedback that we can encounter cases where a city has multiple postcodes. Then we cannot make a statement about the postcode of the establishment, because it is not directly connected to the postcode. Therefore we added the postcode property to the establishment class as well, so that we can query the postcode of an establishment directly.

First Approach

With unionOf we can define that the property hasPostcode can be used for both, cities and establishments.

group2:hasPostcode a owl:DatatypeProperty ;
    rdfs:domain [ a owl:Class ;
                  owl:unionOf ( group2:City group2:Establishment ) ] ;
    rdfs:range xsd:string ;
    rdfs:label "has postcode" ;
    rdfs:comment "A city has a postcode represented as a string." .

Unfortunately, Ontotext Refine does not support the owl:unionOf property, so we had to add the property manually in the Turtle file. First, we set the domain of the hasPostcode property in Ontotext Refine to owl:Thing, which allows mapping to any class. Then, we added the group2:hasPostcode property to the establishment class. Now we can export the RDF file from Ontotext Refine. The section regarding the hasPostcode property in the Turtle file looks like this:

group2:hasPostcode a owl:DatatypeProperty;
  rdfs:label "has postcode";
  rdfs:comment "A city has a postcode represented as a string.";
  rdfs:domain owl:Thing;
  rdfs:range xsd:string .

We tried to restrict the domain of the hasPostcode to only cities and establishments. Therefore we manually added the owl:unionOf property to the hasPostcode property in the turtle file result-triples.ttl as follows:

group2:hasPostcode a owl:DatatypeProperty ;
    rdfs:label "has postcode";
    rdfs:comment "A city has a postcode represented as a string.";
    rdfs:domain [ owl:unionOf ( group2:City group2:Establishment ) ] ;
    rdfs:range xsd:string .

Unfortunately, GraphDB did not support the owl:unionOf property, so we had to remove it again.

Second Approach

We decided to keep the domain of the hasPostcode property as owl:Thing and to query the postcode of an establishment directly. The query for establishments without a postcode now looks like this:

PREFIX group2: <http://www.uni-jena.de/kg25/group2#>

SELECT ?establishmentName
WHERE
{ 
    ?establishment a group2:Establishment ;
                   group2:hasName ?establishmentName .
    FILTER NOT EXISTS 
    {
        ?establishment group2:hasPostcode ?postalCode .
    }
}

Although we now set the domain of the hasPostcode property to owl:Thing we therefore solved the problem that can occur when a city has multiple postcodes. In fact this reworked ends up querying two establishments more (12 vs. 14) than the approach without the integration of the feedback.

Third Approach

A third approach would be to create a new class group2:ThingWithPostcode that would be the superclass of both group2:City and group2:Establishment. This way, we could set the domain of the hasPostcode property to group2:ThingWithPostcode. However, we decided to keep the second approach, as it is simpler and for the purposes of this task sufficient.

Subtask SPARQL.6

To verify the correctness of our SPARQL queries, we tried to ensure that everything was an entity of a class that we created. For the queries where it was possible, we wrote additional queries, like in Subtask SPARQL.3 and SPARQL.4 to compare with slightly altered queries.

The documentation was done right after making some changes to the queries to gather all necessary information and our ideas for creating them.

[Milestone Overview] [Previous Milestone: Tabular Data] [Next Milestone: Alignment]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Querying and Reasoning

Subtask SPARQL.1

GraphDB Reasoning

Protege Reasoning

Wichtige Änderung

Subtask SPARQL.2

Feedback Integration

Subtask SPARQL.3

Subtask SPARQL.4

Subtask SPARQL.5

Feedback Integration

First Approach

Second Approach

Third Approach

Subtask SPARQL.6

FilesExpand file tree

querying-reasoning.md

Latest commit

History

querying-reasoning.md

File metadata and controls

Querying and Reasoning

Subtask SPARQL.1

GraphDB Reasoning

Protege Reasoning

Wichtige Änderung

Subtask SPARQL.2

Feedback Integration

Subtask SPARQL.3

Subtask SPARQL.4

Subtask SPARQL.5

Feedback Integration

First Approach

Second Approach

Third Approach

Subtask SPARQL.6