Skip to content

Query Templates of the Benchmark

Olaf Hartig edited this page Aug 16, 2019 · 27 revisions

Table of Contents


QT1

Template file: ./artifacts/queryTemplates/main/QT1.txt

Description:
Queries of this template retrieve several attributes of every review about a product in a given offer.

While every offer is about a single product, from products to reviews we have a 1: N relationship where the average out-degree is 10 (i.e., on average every product has 10 reviews). Hence, the result tree of each query of this template has about 10 leaf nodes that are all different from one another.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. Additionally, to a lesser degree, the template also covers CP 2.2 (because of the traversal from an offer to its product) and CP 2.1 (because of the traversal from the product to its reviews).

Choke points covered by this template:

Placeholder: $offerID
Placeholder type:: ID
Placeholder meaning:: the nr attribute of some offers
Possible values of the placeholder: All the offer IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate 20*N instances of this query template, because per 1000 products, there are 20000 offers generated.

QT2.

Template file: ./artifacts/queryTemplates/main/QT2.txt

Description:
Queries of this template retrieve all the reviews about all the products of a given producer.

Hence, these queries traverse along a sequence of two 1: N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the reviews) has an average out-degree of 10. Consequently, the result tree of such a query has about 500 leaf nodes that are all different from one another.

Due to the traversal along a sequence of 1: N relationships, this template covers chokepoint CP 2.1.

Choke points covered by this template:

Placeholder: $producerID
Placeholder type: ID
Placeholder meaning: the nr attribute of some producer
Possible values for the placeholder: All the producer IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate N/50 instances of this query template, because per 1000 products, there are 22 producers generated.

QT3

Template file: ./artifacts/queryTemplates/main/QT3.txt

Description:
Queries of this template retrieve several attributes of both the product type and its parent type for a product discussed by a given review.

These queries traverse along a sequence of three N:1 relationships, because every review is about one product, and every product has one product type which, in turn, has one parent type. Hence, given a review, there is only one corresponding parent type in the dataset. Consequently, the result tree of such a query has only a single leaf node.

Due to the traversal along a sequence of N:1 relationships, queries of this template cover chokepoint CP 2.2. By requesting several attributes of some internal nodes of the result tree (namely, the product type nodes) these queries cover chokepoints CP 2.3 and CP 1.1. Additionally, CP 1.1 is covered again by requesting multiple attributes of the leaf node (the parent type).

Choke points covered by this template:

Placeholder: $reviewID
Placeholder type: ID
Placeholder meaning: the nr attribute of some review
Possible values for the placeholder: All the review IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate 10*N instances of this query template, because per 1000 products, there are 10,000 reviews generated.

QT4

Template file: ./artifacts/queryTemplates/main/QT4.txt

Description: Queries of this template retrieve details of the reviews about the products in a given offer, including the country code of the reviewer’s country.

While every offer is about a single product, from products to reviews we have a 1: N relationship where the average out-degree is 10 (i.e., on average every product has 10 reviews). Furthermore, every review has a single reviewer, and every reviewer has a single country. However, different reviews may have the same reviewer and different reviewers may have the same country. Hence, the result tree of each query of this template has about 10 leaf nodes where some of them may be duplicates of one another. Similarly, intermediate nodes

By requesting several attributes of the intermediate object (i.e., the product or reviews), this query template covers chokepoint CP 2.3. Additionally, the template also covers CP 2.2 (because of the traversal from an offer to its product, from a review to its reviewer and then to the reviewer’s country) and, to a lesser degree, CP 2.5 (because different reviews may have the same reviewer and different reviewers may have the same country).

Choke points covered by this template:

Placeholder: $offerID
Placeholder type: ID
Placeholder meaning: the nr attribute of some product
Possible values of the placeholder: All the offer IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate 20*N instances of this query template, because per 1000 products, there are 20,000 offers generated.

QT5

Template file: ./artifacts/queryTemplates/main/QT5.txt

Description:
Queries of this template go from a given product to its reviews and back, repeatedly for four times, and request the review titles along the way and the product label in the end.

Hence, these queries alternate between traversing along a 1: N relationship and traversing back along the inverse of that relationship. The 1:N relationship (from a product to its reviews) has an average out-degree of 10 (i.e., on average every product has 10 reviews) and, in the opposite direction, every review is for a single product. Hence, each query of this template repeatedly visits the same product and the same ca. 10 reviews. Thus, the query result trees consist of ca. 10,000 leaf nodes that all represent the same product, and the trees contain many duplicate internal nodes (not only within each level of depth but also across levels).

While queries of this template may not seem to be queries that may be used in practice, they are a typical example of queries the traverse relationships in cycles, coming back to the same nodes multiple times. Hence, this query template covers chokepoint CP 2.4. Additionally, this query template covers chokepoint CP 2.3 because the queries request attributes of the internal nodes of the result tree (i.e., the review titles).

Choke points covered by this template:

Placeholder: $productID
Placeholder type: ID
Placeholder meaning: the nr attribute of some product
Possible values for the placeholder: All the product IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate N instances of this query template, because the BSBM data generator is scaled by the number of products.

QT6

Template file: ./artifacts/queryTemplates/main/QT6.txt

Description:
Queries of this template retrieve the country of the producers that produce the products offered by a given vendor.

Queries of this template traverse along a sequence of one 1: N relationship and three N:1 relationships. That is, from a vendor to its offers we have a 1: N relationship, which has an average out-degree of 2000. Thereafter, every offer is about a single product, every product has a single producer, and every product is from a single country. Hence, the result trees of these queries have about 2000 leaf nodes. However, vendors may have multiple different offers about the same product, several of these products may be from the same producer, and different producers may be from the same country. Hence, there may be duplicates among the leaf nodes as well as among the internal nodes of the result trees.

Due to the latter property, this template presents a typical example of queries that do not traverse in cycles and yet visit nodes multiple times. Hence, the template covers chokepoint CP 2.5. Additionally, the template covers CP 2.2 because of the sequence of N:1 relationships that the queries traverse (i.e., from an offer to its product, from the product to its producer, and from the producer to its country).

Choke points covered by this template:

Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: the nr attribute of some vendor
Possible values for the placeholder: All the vendor IDs in the specific benchmark dataset can be used for this query template.
Number of possible instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

QT7

Template file: ./artifacts/queryTemplates/main/QT7.txt

Description:
Queries of this template retrieve data about 10 offers of a given vendor where these offers must be the first 10 at a given offset within the list of all offers of the given vendor.

From a vendor to its offers we have a 1: N relationship with an average out-degree of 2000 (i.e., on average, every vendor has 2000 offers). In addition, every offer is about a single product, but different offers may be about the same product. Hence, the result tree of each query of this template has about 10 leaf nodes that may be duplicates of one another.

The chokepoint covered primarily by this template is CP 3.2. Additionally, the template covers chokepoints CP 1.1 (because the queries request multiple attributes of the products), CP 2.3 (because the queries request an attribute of internal result tree nodes, i.e., the offers) and, to a lesser degree, CP 2.5 (because different offers may be about the same product; however, chances are small that some of the selected 10 offers are about the same product).

Choke points covered by this template:

Placeholders: $vendorID, $offset
Placeholder types: ID, Int
Placeholder meaning: $vendorID is the nr attribute of some vendor, and $offset specifies the offset
Possible values for the placeholders: All the vendor IDs in the specific benchmark dataset can be used for $vendorID; the value of $offset can be selected from the interval [1,200].
Number of possible instances: For a scale factor of N, we can generate about 20*N (i.e., (N/100)*2000) instances of this query template, because per 1000 products, there are 12 vendors generated. In addition, the number of possible values for $offset is about 2000 because, on average, every vendor has 2000 offers.

QT8

Template file: ./artifacts/queryTemplates/main/QT8.txt

Description:
Queries of this template retrieve data about a given number of offers that are the first in the list of all offers sorted by a given ordering criterion.

The result tree of each query of this template has 10 different leaf nodes.

The main purpose of this query template is to cover chokepoints CP 3.1 and CP 3.3. Additionally, it also covers chokepoint CP 1.1.

Choke points covered by this template:

Placeholders: $cnt, $attrOffer1, $attrOffer2
Placeholder types: Int, OffersSortingField, OffersSortingField
Placeholder meaning: $cnt is the number of offers to be retrieved, $attrOffer1 specifies the main sort order, and $attrOffer2 specifies the secondary sort order
Possible values for the placeholders: The value of $cnt can be selected from the interval [200, 300]; the value of $attrOffer1 can be any of the 10 values of OffersSortingField, and the same holds for $attrOffer2.
Number of possible instances: We can generate 10,000 (i.e.100*10*9) instances of this query template, because the number of value for $attr1 and for $attr2 is 10, respectively, and there are 100 possible values for $cnt.

QT9

Template file: ./artifacts/queryTemplates/main/QT9.txt

Description:
Queries of this template retrieve data about the reviews for the products in the first 50 offers by a given vendor. For each such product, the reviews have to be sorted on a given field. Hence, in contrast to the previous query template, which is about sorting a whole set of things (offers in this specific case), this query template requires sorting in separate subsets (namely, sets of reviews).

Queries of this template traverse along a sequence of a 1: N relationship (from a vendor to its offers), an N:1 relationships (the offers to their products). and another 1: N relationship (from products to their reviews). That is, from a vendor to its offers we have a 1: N relationship, which has an average out-degree of 2000.

From vendor to offers we have a 1: N relationship where the average out-degree is 2000 (i.e., on average every vendor has 2000 offers). In addition, every offer is about a single product, there would be 2000 product (with same products multiple times), and every product has 10 reviews on average. Hence, the result tree of each query of this template has about 20000 leaf nodes that some reviews may repeat multiple times.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. Additionally, this query template covers chokepoint CP 3.1 (because of adding paging without offset on offers) and chokepoint CP 3.3 (because of adding ordering on the leaf nodes of the result (i.e., the reviews))

Choke points covered by this template:

Placeholders: $vendorID, $attrReview
Placeholder types: ID, ReviewFieldInput
Placeholder meaning: $vendorID is the nr attribute of some vendor, and, $attrReview specifies the field of reviews to be used for sorting
Possible values for the placeholders: All vendor IDs in the specific benchmark dataset can be used for $vendorID; the value for $attrReview can be any of the 9 values of ReviewField.
Number of possible instances: For a scale factor of N, we can generate (N/100)*9 instances of this query template, because per 1000 products, there are 12 vendors generated. And the number of possible values for $attrReview is 9.

QT10

Template file: ./artifacts/queryTemplates/main/QT10.txt

Description:
Queries of this template retrieve all reviews with text that contains the given string.

Review texts of different reviews may contain common substrings. Hence, the result of each query of this template consists of a number of different reviews, where this number depends on the selected search string. In general, this number may range from zero (none of the reviews contains the given search string) to the number of all reviews in the benchmark dataset (which is 10x the scale factor).

Primarily, this query template covers chokepoint CP 4.1. Additionally, by requesting several attributes of the leaf nodes of the resulting reviews, this query template also covers chokepoint CP 1.1.

Choke points covered by this template:

Placeholder: $textOfReviewKeyword
Placeholder type: String
Placeholder meaning: the target string which is contained in the text of reviews
Possible values for the placeholder: The value of $textOfReviewKeyword can be any word from dictionary 2 in BSBM (around 90,000 words totally)
Number of possible instances: We can generate around 90,000 instances of this query template, because the text of review is a string with 50-200 terms, which is generated from 90,000 words.

QT11

Template file: ./artifacts/queryTemplates/main/QT11.txt

Description:
Queries of this template search for all offers of a given vendor by using a search condition (instead of starting the traversal from the given vendor as done in Q6). Then, for each offer, the price and the offerWebpage is requested, as well as data about the offered product.

From a vendor to its offers we have a 1:N relationship, which has an average out-degree of 2000. Thereafter, every offer is about a single product. Hence, the result trees of these queries have about 2000 leaf nodes. However, vendors may have multiple different offers about the same product. Hence, there may be duplicates among the leaf nodes as well as among the internal nodes of the result trees.

Due to the latter property, this template is another example of queries that do not traverse in cycles and yet visit nodes multiple times. Hence, the template covers chokepoint CP 2.5. Besides this property, the primary focus of this query is chokepoint CP 4.4 (Subquery-based search). Additionally, by requesting several attributes of the requested objects, this template also covers chokepoint CP 1.1.

Choke points covered by this template:

Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: $vendorID is the nr attribute of some vendors
Possible values for the placeholder: All vendor IDs in the specific benchmark dataset can be used for this template.
Number of possible instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

QT12

Template file: ./artifacts/queryTemplates/main/QT12.txt

Description:
Queries of this template retrieve data about offers for products of a given producer. These offers are filtered based on a given vendor ID. The filter condition in this query template is the same as the search condition in the previous template.

These queries traverse along two 1:N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the offers) has an average out-degree of 20. Consequently, there are about 1000 offers totally for a given producer. By filtering the retrieved offers based on the subquery with the vendor ID, the number of leaf nodes of the result tree may thus range from 0 to about 1000.

Primarily, this query template covers chokepoint CP 4.3 (because it uses a subquery to filter the retrieved offers). Additionally, by requesting several attributes of the offers and of the offered products, this query template covers chokepoint CP 1.1. Moreover, by traversing along two 1:N relationships, this query template covers CP 2.1.

Choke points covered by this template:

Placeholders: $producer, $vendor
Placeholder types: ID, ID
Placeholder meaning: $producerID is the nr attribute of some producer, and $vendorID is the nr attribute of some vendor
Possible values for the placeholders: All producer IDs in the specific benchmark dataset can be used for $producerID, and all vendor IDs in the dataset can be used for $vendorID.
Number of possible instances: For a scale factor of N, we can generate (N/50)*(N/100) instances of this query template, because per 1000 products, there are 22 producers and 12 vendors generated.

QT13

Template file: ./artifacts/queryTemplates/main/QT13.txt

Description:
Queries of this template retrieve data about offers for products of a given producer. These offers are filtered based on a date associated with the vendors of the offers. By using a date range test, the filter condition in this query template is more complex than the one in the previous template.

The characteristics of query results for this template are the same as for the previous template.

Primarily, this query template covers chokepoints CP 4.3 (because it uses a subquery to filter the retrieved offers) and CP 4.2 (because the filter condition considers dates). Additionally, for the same reasons as for the previous template, this template also covers chokepoints CP 1.1 and CP 2.1.

Choke points covered by this template:

Placeholders: $producerID, $date
Placeholder types: ID, Date
Placeholder meaning: $producerID is the nr attributes of some producers, and $date specifies a date
Possible values for the placeholders: All producer IDs in the specific benchmark dataset can be used for $producerID; the value of $date can be any date from 2000-09-20 to 2006-12-23.
Number of possible instances: For a scale factor of N, we can generate (N/50)*2285 instances of this query template, because per 1000 products, there are 22 producers generated. In addition, the number of possible values for $date is 2285 (because the publishDate value of each vendor is a date in the range from 2000-09-20 to 2006-12-23).

QT14

Template file: ./artifacts/queryTemplates/main/QT14.txt

Description:
Queries of this template retrieve products that produced by a given producer and then retrieve offers that offer these products. Then the queries filter these offers by conjunctions of multiple conditions in a subquery. Lastly, queries retrieve price and offerWebpage of these offers and further retrieve product of these offers.

These queries traverse along two 1: N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the offers) has an average out-degree of 20. Consequently, there are about 1000 offers totally for a given producer. Because the retrieved offers are filtered by a subquery with conjunctions of multiple conditions, the number of leaf nodes of the result tree is between 1 to 1000, which may be the same from one another.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. This query template covers CP 2.1 because of traversing along two 1: N relationships. Additionally, the template also covers CP 4.1, CP 4.2, CP 4.3 and CP 4.5 (because of using a conjunction of multiple conditions as a subquery to filter the retrieved result, and containing string matching and date range matching in the conditions).

Choke points covered by this template:

Placeholders: $producerID; $date; $commentOfVendorkeyword
Placeholder types: ID; Date; String
Placeholder meaning: $producerID is the nr attribute of some producer; $date specifies a date; $commentOfVendorKeyword is the target string which is contained in the comment of vendor
Possible values for the placeholders: All producer IDs in the specific benchmark dataset can be used for $producerID; the value of $date can be any date from 2000-09-20 to 2006-12-23; the value of $commentOfVendorKeyword can be any word from dictionary 2 of BSBM (which contains around 90,000 words).
Number of possible instances: For a scale factor of N, we can generate (N/50)*(2285)*90,000 instances of this query template, because per 1000 products, there are 12 vendors generated. In addition, the number of possible values for $date is 2285 (because the publishDate value of each vendor is a date in the range from 2000-09-20 to 2006-12-23), and the comment of vendor is a string with 20-50 terms, which is generated from 90,000 words.

QT15

Template file: ./artifacts/queryTemplates/main/QT15.txt

Description:
Queries of this template return the number of offers of a given vendor.

These queries traverse a 1: N relationship from vendor to its offers, which has an average out-degree of 2000. Therefore, the result tree of such a query has 1 leaf node, the value of which around 2000. This query template covers CP 5.2 (because of retrieve count of review scores).

Choke points covered by this template:

Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: $vendorID is the nr attribute of some vendors
Possible values for the placeholder: All vendor IDs can be used for this template.
Number of possible instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

QT16

Template file: ./artifacts/queryTemplates/main/QT16.txt

Description:
Queries of this template retrieve some aggregation values of price for all offers of a given vendor.

These queries traverse a 1: N relationship from vendor to its offers, which has an average out-degree of 2000. Therefore, the result tree of such a query has 1 leaf node that is calculated from around 2000 prices of vendors.

This query template covers CP 5.1 (because of retrieve count, sum, avg, max and min of review scores).

Choke points covered by this template:

Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: $vendorID is the nr attribute of some vendors
Possible values for the placeholder: All vendor IDs in the specific benchmark dataset can be used for this template.
Number of possible instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

Clone this wiki locally