Skip to content

Query Templates of the Benchmark

Sijin Cheng edited this page Jun 26, 2019 · 27 revisions

This is an uncompleted version... TODO:

Description of query templates

Q1. offer_product_review

Description:
Queries of this template retrieve several attributes of every review about a product in a given offer.

While every offer is about a single product, from products to reviews we have a 1: N relationship where the average out-degree is 10 (i.e., on average every product has 10 reviews). Hence, the result tree of each query of this template has about 10 leaf nodes that are all different from one another.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. Additionally, to a lesser degree, the template also covers CP 2.2 (because of the traversal from an offer to its product) and CP 2.1 (because of the traversal from the product to its reviews).

Choke points:
CP 1.1: Multi-attribute retrieval
CP 2.1 Traversal of different 1:N relationship types (there is no traversal of multiple different 1:N) (CP 2.2: Efficient traversal of 1:1 relationship types)
Placeholder: $offerID
Placeholder type: ID
Placeholder meaning: the nr attribute of some offers
Possible values of the placeholder: All the possible offer IDs can be used for this query template.
Instances: For a scale factor of N, we can generate 20*N instances of this query template, because per 1000 products, there are 20000 offers generated.

Q2. producer_product_review

Description:
Queries of this template retrieve all the reviews about all the products of a given producer.

Hence, these queries traverse along a sequence of two 1: N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the reviews) has an average out-degree of 10. Consequently, the result tree of such a query has about 500 leaf nodes that are all different from one another.

Due to the traversal along a sequence of 1: N relationships, this template covers chokepoint CP 2.1.
Choke points:
CP 2.1 (Traversal of different 1: N relationship types)
Placeholder: $producerID
Placeholder type: ID
Placeholder meaning: the nr attribute of some producer
The possible value of the placeholder: All the possible producer IDs can be used for this query template.
Instances: For a scale factor of N, we can generate N/50 instances of this query template, because per 1000 products, there are 22 producers generated.

Q3. review_product_producttype_parenttype

Description:
Queries of this template retrieve several attributes of both the product type and its parent type for a product discussed by a given review.

These queries traverse along a sequence of three N:1 relationships, because every review is about one product, and every product has one product type which, in turn, has one parent type. Hence, given a review, there is only one corresponding parent type in the dataset. Consequently, the result tree of such a query has only a single leaf node.

Due to the traversal along a sequence of N:1 relationships, queries of this template cover chokepoint CP 2.2. By requesting several attributes of some internal nodes of the result tree (namely, the product type nodes) these queries cover chokepoints CP 2.3 and CP 1.1. Additionally, CP 1.1 is covered again by requesting multiple attributes of the leaf node (the parent type).
Choke points:
CP 1.1 Multi-attribute retrieval
CP 2.2 Traversal of different 1:1 relationship types
CP 2.3: Relationship traversal with and without retrieval of intermediate object data
Placeholder: $reviewID
Placeholder type: ID
Placeholder meaning: the nr attribute of some review
The possible value of the placeholder: All the possible review IDs can be used for this query template.
Instances: For a scale factor of N, we can generate 10*N instances of this query template, because per 1000 products, there are 10,000 reviews generated.

Q4. offer_product_review_person_country

Description: Queries of this template retrieve details of the reviews about the products in a given offer, including the country code of the reviewer’s country.

While every offer is about a single product, from products to reviews we have a 1: N relationship where the average out-degree is 10 (i.e., on average every product has 10 reviews). Furthermore, every review has a single reviewer, and every reviewer has a single country. However, different reviews may have the same reviewer and different reviewers may have the same country. Hence, the result tree of each query of this template has about 10 leaf nodes where some of them may be duplicates of one another. Similarly, intermediate nodes

By requesting several attributes of the intermediate object (i.e., the product or reviews), this query template covers chokepoint CP 2.3. Additionally, the template also covers CP 2.2 (because of the traversal from an offer to its product, from a review to its reviewer and then to the reviewer’s country) and, to a lesser degree, CP 2.5 (because different reviews may have the same reviewer and different reviewers may have the same country).

Choke points:
CP 2.2: Traversal of different 1:1 relationship types
CP 2.3: Relationship traversal with and without retrieval of intermediate object data
( CP 2.5: Acyclic relationship traversal that visits data objects repeatedly ) (only to obtain the country code in the end)
Placeholder: $offerID
Placeholder type: ID
Placeholder meaning: the nr attribute of some product
Possible values of the placeholder: All the possible offer IDs can be used for this query template.
Instances: For a scale factor of N, we can generate 20*N instances of this query template, because per 1000 products, there are 20,000 offers generated.

Q5. product_review_product4

Description:
Queries of this template go from a given product to its reviews and back, repeatedly for four times, and request the review titles along the way and the product label in the end.

Hence, these queries alternate between traversing along a 1: N relationship and traversing back along the inverse of that relationship. The 1:N relationship (from a product to its reviews) has an average out-degree of 10 (i.e., on average every product has 10 reviews) and, in the opposite direction, every review is for a single product. Hence, each query of this template repeatedly visits the same product and the same ca. 10 reviews. Thus, the query result trees consist of ca. 10,000 leaf nodes that all represent the same product, and the trees contain many duplicate internal nodes (not only within each level of depth but also across levels).

While queries of this template may not seem to be queries that may be used in practice, they are a typical example of queries the traverse relationships in cycles, coming back to the same nodes multiple times. Hence, this query template covers chokepoint CP 2.4. Additionally, this query template covers chokepoint CP 2.3 because the queries request attributes of the internal nodes of the result tree (i.e., the review titles).

Choke points:
( CP2.1: Traversal of different 1: N relationship types ) (only one 1: N, not multiple different ones)
( CP 2.2: Traversal of different 1:1 relationship types ) (only one N:1, not multiple different ones)
CP 2.3: Relationship traversal with and without retrieval of intermediate object data
CP 2.4: Traversal of relationships that form cycles
Placeholder: $productID
Placeholder type: ID
Placeholder meaning: the nr attribute of some product
The possible value of the placeholder: All the possible product IDs can be used for this query template.
Instances: For a scale factor of N, we can generate N instances of this query template, because the BSBM data generator is scaled by the number of products.

Q6. vendor_offer_product_producer_country

Description:
Queries of this template retrieve the country of the producers that produce the products offered by a given vendor.

Queries of this template traverse along a sequence of one 1: N relationship and three N:1 relationships. That is, from a vendor to its offers we have a 1: N relationship, which has an average out-degree of 2000. Thereafter, every offer is about a single product, every product has a single producer, and every product is from a single country. Hence, the result trees of these queries have about 2000 leaf nodes. However, vendors may have multiple different offers about the same product, several of these products may be from the same producer, and different producers may be from the same country. Hence, there may be duplicates among the leaf nodes as well as among the internal nodes of the result trees.

Due to the latter property, this template presents a typical example of queries that do not traverse in cycles and yet visit nodes multiple times. Hence, the template covers chokepoint CP 2.5. Additionally, the template covers CP 2.2 because of the sequence of N:1 relationships that the queries traverse (i.e., from an offer to its product, from the product to its producer, and from the producer to its country).

Choke points:
( CP 2.1: Traversal of different 1: N relationship types ) (only one 1: N, not multiple different ones)
CP 2.2: Traversal of different 1:1 relationship types
CP 2.5: Acyclic relationship traversal that visits data objects repeatedly
Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: the nr attribute of some vendor
The possible value of the placeholder: All the possible vendor_id could be used for this query template.
Instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.
Template:

Q7. paging_with_offset_as_counter (orders of a vendor)

Description:
Queries of this template retrieve data about 10 offers of a given vendor where these offers must be the first 10 at a given offset within the list of all offers of the given vendor.

From a vendor to its offers we have a 1: N relationship with an average out-degree of 2000 (i.e., on average, every vendor has 2000 offers). In addition, every offer is about a single product, but different offers may be about the same product. Hence, the result tree of each query of this template has about 10 leaf nodes that may be duplicates of one another.

The chokepoint covered primarily by this template is CP 3.2. Additionally, the template covers chokepoints CP 1.1 (because the queries request multiple attributes of the products), CP 2.3 (because the queries request an attribute of internal result tree nodes, i.e., the offers) and, to a lesser degree, CP 2.5 (because different offers may be about the same product; however, chances are small that some of the selected 10 offers are about the same product).

Choke points:
CP 1.1: Multi-attribute retrieval
CP 2.3: Relationship traversal with and without retrieval of intermediate object data
( CP 2.5: Acyclic relationship traversal that visits data objects repeatedly ) (only to obtain the product label in the end; chances are little that we end up at the same product multiple times)
CP 3.2: Paging with offset
Placeholders: $vendorID, $offset
Placeholder type: ID, Int
Placeholder meaning: $vendorID is the nr attribute of some vendor, and $offset specifies the offset
The possible value of the placeholder: The vendor to be used for $id can be any vendor (chosen randomly); the value of $offset is selected by random from the interval [1,200].
Instances: For a scale factor of N, we can generate about 20*N (i.e., (N/100)*2000) instances of this query template, because per 1000 products, there are 12 vendors generated. In addition, the number of possible values for $offset is about 2000 because, on average, every vendor has 2000 offers.

Q8. sorting_with_paging (all orders)

Description:
Queries of this template retrieve data about a given number of offers that are the first in the list of all offers sorted by a given ordering criterion.

The result tree of each query of this template has 10 different leaf nodes.

The main purpose of this query template is to cover chokepoints CP 3.1 and CP 3.3. Additionally, it also covers chokepoint CP 1.1.

Choke points:
CP 1.1: Multi-attribute retrieval
CP 3.1: Paging without offset
CP 3.3: Ordering
Placeholders: $cnt, $attrOffer1, $attrOffer2
Placeholder type: Int, OffersSortingField, OffersSortingField
Placeholder meaning: $cnt is the number of offers to be retrieved, $attrOffer1 specifies the main sort order, and $attrOffer2 specifies the secondary sort order
The possible value of the placeholder: The value of $cnt is selected by random from the interval [200, 300]; the value of $attrOffer1 is selected by random from all 10 values of OffersSortingField, and the same holds for $attrOffer2.
Instances: We can generate 10,000 (i.e.100109) instances of this query template, because the number of value for $attr1 and for $attr2 is 10, respectively, and there are 100 possible values for $cnt.

Q9. paging_and_sorting (reviews)

Description:
Queries of this template retrieve data about the reviews for the products in the first 50 offers by a given vendor. For each such product, the reviews have to be sorted on a given field. Hence, in contrast to the previous query template, which is about sorting a whole set of things (offers in this specific case), this query template requires sorting in separate subsets (namely, sets of reviews).

Queries of this template traverse along a sequence of a 1: N relationship (from a vendor to its offers), an N:1 relationships (the offers to their products). and another 1: N relationship (from products to their reviews). That is, from a vendor to its offers we have a 1: N relationship, which has an average out-degree of 2000.

From vendor to offers we have a 1: N relationship where the average out-degree is 2000 (i.e., on average every vendor has 2000 offers). In addition, every offer is about a single product, there would be 2000 product (with same products multiple times), and every product has 10 reviews on average. Hence, the result tree of each query of this template has about 20000 leaf nodes that some reviews may repeat multiple times.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. Additionally, this query template covers chokepoint CP 3.1 (because of adding paging without offset on offers) and chokepoint CP 3.3 (because of adding ordering on the leaf nodes of the result (i.e., the reviews))

Choke points:
CP 1.1: Multi-attribute retrieval
( CP2.1: Traversal of different 1: N relationship types ) (only one 1:N, not multiple different ones)
( CP 2.2: Traversal of different 1:1 relationship types ) (only one N:1, not multiple different ones)
CP 2.5: Acyclic relationship traversal that visits data objects repeatedly (!!!!!)
CP 3.1: Paging without offset
CP 3.3: Ordering
Placeholders: $vendorID, $attrReview
Placeholder type: ID, ReviewFieldInput
Placeholder meaning: $vendorID is the nr attribute of some vendor, and, $attrReview specifies the field of reviews to be used for sorting The possible value of the placeholder: The vendor to be used for $id can be any vendor (chosen randomly), for the dataset with scale factor N, there are about N/100 vendors in total; the value $attrReview is selected by random from all 9 values of ReviewField. Instances: For a scale factor of N, we can generate (N/100)*9 instances of this query template, because per 1000 products, there are 12 vendors generated. And the number of possible values for $attrReview is 9.

Q10. stringMatching

Description:
Queries of this template retrieve all reviews with text that contains the given string.

Review texts of different reviews may contain common substrings. Hence, the result of each query of this template consists of a number of different reviews, where this number depends on the selected search string. In general, this number may range from zero (none of the reviews contains the given search string) to the number of all reviews in the benchmark dataset (which is 10x the scale factor).

Primarily, this query template covers chokepoint CP 4.1. Additionally, by requesting several attributes of the leaf nodes of the resulting reviews, this query template also covers chokepoint CP 1.1.

Choke points:
CP 1.1 Multi-attribute retrieval
CP 4.1 String matching
Placeholder: <$textOfReviewKeyword>
Placeholder type: < String >
Placeholder meaning: < the target string which is contained in the text of reviews>
Possible value of the placeholder: < The value of $textOfReviewKeyword can be any word from dictionary 2 in BSBM (around 90,000 words totally)
Instances: We can generate around 90,000 instances of this query template, because the text of review is a string with 50-200 terms, which is generated from 90,000 words.

Q11. subquerySearch

Description:
Queries of this template search for all offers of a given vendor by using a search condition (instead of starting the traversal from the given vendor as done in Q6). Then, for each offer, the price and the offerWebpage is requested, as well as data about the offered product.

From a vendor to its offers we have a 1:N relationship, which has an average out-degree of 2000. Thereafter, every offer is about a single product. Hence, the result trees of these queries have about 2000 leaf nodes. However, vendors may have multiple different offers about the same product. Hence, there may be duplicates among the leaf nodes as well as among the internal nodes of the result trees.

Due to the latter property, this template is another example of queries that do not traverse in cycles and yet visit nodes multiple times. Hence, the template covers chokepoint CP 2.5. Besides this property, the primary focus of this query is chokepoint CP 4.4 (Subquery-based search). Additionally, by requesting several attributes of the requested objects, this template also covers chokepoint CP 1.1.

Choke points:
CP 1.1 Multi-attribute retrieval
CP 2.5 Acyclic relationship traversal that visits data objects repeatedly
CP 4.4 Subquery-based search
Placeholder: <$vendorID >
Placeholder type: < ID >
Placeholder meaning: <$vendorID is the nr attribute of some vendors>
Possible value of the placeholder: < The vendor to be used for $vendorID can be any vendor (chosen randomly)>
Instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

Q12. subqueryFilter1

Description:
Queries of this template retrieve data about offers for products of a given producer. These offers are filtered based on a given vendor ID. The filter condition in this query template is the same as the search condition in the previous template.

These queries traverse along two 1:N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the offers) has an average out-degree of 20. Consequently, there are about 1000 offers totally for a given producer. By filtering the retrieved offers based on the subquery with the vendor ID, the number of leaf nodes of the result tree may thus range from 0 to about 1000.

Primarily, this query template covers chokepoint CP 4.3 (because it uses a subquery to filter the retrieved offers). Additionally, by requesting several attributes of the offers and of the offered products, this query template covers chokepoint CP 1.1. Moreover, by traversing along two 1:N relationships, this query template covers CP 2.1.

Choke points:
CP 1.1 Multi-attribute retrieval
CP 2.1 Traversal of different 1:N relationship types
CP 4.3 Subquery-based filtering
Placeholder: <$producer, $vendor >
Placeholder type: < ID, ID >
Placeholder meaning: <$producerID is the nr attribute of some producer, and $vendorID is the nr attribute of some vendor>
The possible value of the placeholder: < The producer to be used for $producerID can be any producer(chosen randomly); and the vendor to be use for $vendorID can be any vendor (chosen randomly)>
Instances: For a scale factor of N, we can generate (N/50)*(N/100) instances of this query template, because per 1000 products, there are 22 producers and 12 vendors generated.

Q13. subqueryFilter2

Description:
Queries of this template retrieve data about offers for products of a given producer. These offers are filtered based on a date associated with the vendors of the offers. By using a date range test, the filter condition in this query template is more complex than the one in the previous template.

The characteristics of query results for this template are the same as for the previous template.

Primarily, this query template covers chokepoints CP 4.3 (because it uses a subquery to filter the retrieved offers) and CP 4.2 (because the filter condition considers dates). Additionally, for the same reasons as for the previous template, this template also covers chokepoints CP 1.1 and CP 2.1.

Choke points:
CP 1.1 Multi-attribute retrieval
CP 2.1 Traversal of different 1:N relationship types
CP 4.2 Date matching
CP 4.3 Subquery-based filtering
Placeholder: $producerID, $date
Placeholder type: ID, Date
Placeholder meaning: $producerID is the nr attributes of some producers, and $date specifies a date
The possible value of the placeholder: The producer to be used for $producerID can be any producer (for the dataset with scale factor N, there are about N/50 producers in total); the value of $date is selected by randomly from from 2000-09-20 to 2006-12-23
Instances: For a scale factor of N, we can generate (N/50)*2285 instances of this query template, because per 1000 products, there are 22 producers generated. In addition, the number of possible values for $date is 2285 (because the publishDate value of each vendor is a date in the range from 2000-09-20 to 2006-12-23).

Q14. multipleFilters

Description:
Queries of this template retrieve products that produced by a given producer and then retrieve offers that offer these products. Then the queries filter these offers by conjunctions of multiple conditions in a subquery. Lastly, queries retrieve price and offerWebpage of these offers and further retrieve product of these offers.

These queries traverse along two 1: N relationships. The first one (from the given producer to its products) has an average out-degree of 50, and the second one (from the products to the offers) has an average out-degree of 20. Consequently, there are about 1000 offers totally for a given producer. Because the retrieved offers are filtered by a subquery with conjunctions of multiple conditions, the number of leaf nodes of the result tree is between 1 to 1000, which may be the same from one another.

By requesting several attributes of the leaf nodes of the result (i.e., the reviews) this query template covers chokepoint CP 1.1. This query template covers CP 2.1 because of traversing along two 1: N relationships. Additionally, the template also covers CP 4.1, CP 4.2, CP 4.3 and CP 4.5 (because of using a conjunction of multiple conditions as a subquery to filter the retrieved result, and containing string matching and date range matching in the conditions).

Choke points:
CP 1.1 Multi-attribute retrieval
CP 2.1 Traversal of different 1: N relationship types
CP 4.1 String matching
CP 4.2 Date matching
CP 4.3 Subquery-based filtering
CP 4.5 Multiple filter conditions
Placeholder: $producerID; $date; $commentOfVendorkeyword
Placeholder type: ID; Date; String
Placeholder meaning: $producerID is the nr attribute of some producer; $date specifies a date; $commentOfVendorKeyword is the target string which is contained in the comment of vendor
The possible value of the placeholder: The producer to be used for $producerID can be any producer (for the dataset with scale factor N, there are about N/50 producers in total); the value of $date is selected by randomly from 2000-09-20 to 2006-12-23; the value of $commentOfVendorKeyword is any words from dictionary 2 (around 90,000 words).
Instances: For a scale factor of N, we can generate (N/50)*(2285)*90,000 instances of this query template, because per 1000 products, there are 12 vendors generated. In addition, the number of possible values for $date is 2285 (because the publishDate value of each vendor is a date in the range from 2000-09-20 to 2006-12-23), and the comment of vendor is a string with 20-50 terms, which is generated from 90,000 words.

Q15. count

Description:
Queries of this template return the number of offers of a given vendor.

These queries traverse a 1: N relationship from vendor to its offers, which has an average out-degree of 2000. Therefore, the result tree of such a query has 1 leaf node, the value of which around 2000. This query template covers CP 5.2 (because of retrieve count of review scores).

Choke points:
CP 5.2 Counting
Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: $vendorID is the nr attribute of some vendors
Possible value of the placeholder: The vendor to be used for $vendorID can be any vendor; for the dataset with scale factor N, there are about N/100 vendors in total.
Instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

Q16. aggregation

Description:
Queries of this template retrieve some aggregation values of price for all offers of a given vendor.

These queries traverse a 1: N relationship from vendor to its offers, which has an average out-degree of 2000. Therefore, the result tree of such a query has 1 leaf node that is calculated from around 2000 prices of vendors.

This query template covers CP 5.1 (because of retrieve count, sum, avg, max and min of review scores).

Choke points:
CP 5.1 Calculation-based aggregation
Placeholder: $vendorID
Placeholder type: ID
Placeholder meaning: $vendorID is the nr attribute of some vendors
Possible value of the placeholder: The vendor to be used for $vendorID can be any vendor; for the dataset with scale factor N, there are about N/100 vendors in total.
Instances: For a scale factor of N, we can generate N/100 instances of this query template, because per 1000 products, there are 12 vendors generated.

Clone this wiki locally