Skip to content

Datasets

Olaf Hartig edited this page Aug 14, 2019 · 24 revisions

This document describes the data used by the Linköping GraphQL Benchmark (LinGBM).

LinGBM is based on a scalable synthetic dataset that can be generated in an unlimited number of different sizes. Instead of designing a new dataset generator from scratch, LinGBM uses the dataset generator of the Berlin SPARQL Benchmark (BSBM). The generated data can be created in the form of an SQL database or an RDF graph.

In the remainder of this document we provide i) an Entity-Relationship diagram that models the scenario captured by the benchmark datasets, ii) the corresponding relational schema of the SQL-database version of the benchmark datasets, and iii) an overview of the average cardinalities of the relationships in the generated data. For more details regarding the datasets and the dataset generator we refer to the BSBM Dataset Specification.

Entity-Relationship Diagram

The benchmark datasets capture a fictitious e-commerce scenario with products that have a type, a producer, and a number of different features. Moreover, for each product, there are offers by different vendors and reviews by different persons. Overall, the captured scenario consists of nine types of entities and eight types of relationships between such entities.

This part shows the relational model of the dataset. The BSBM data generator could output the dataset as a MySQL dump. This dump uses the following entity relationship and relational schema:

Relational Schema

Vendor (nr, label, comment, homepage, country)
Offer (nr, product, producer, vendor, price, validFrom, validTo, deliveryDays, offerWebpage)
Producer (nr, label, comment, homepage, country)
Product (nr, label, comment, producer, propertyNum1, propertyNum2, propertyNum3, propertyNum4, propertyNum5, propertyNum6, propertyTex1, propertyTex2, propertyTex3, propertyTex4, propertyTex5, propertyTex6)
Person (nr, name, mbox_sha1sum, country)
Review (nr, product, producer, person, reviewDate, title, text, language, rating1, rating2, rating3, rating4, publisher)
ProductFeature (nr, label, comment)
ProductType (nr, label, comment, parent)
ProductTypeProduct (product, productType)
ProductFeatureProduct (product, productFeature)

Cardinalities of Relationships

Relationship cardinalities note
Producer-Product 1: N One producer per Product; 50 products on average per producer
Product-Review 1: N 10 reviews per product on average; 1 product per Review, selection follows a normal distribution
Product-Offer 1: N 20 Offers on average per product; one Product per offer, selection follows a normal distribution
Person- Review 1: N one author per Review; 20 reviews per person on average
Ratingsite-Review 1: N Every Review belongs to one rating site; A rating site generated 10000 reviews on average
Vendors-Offers 1: N one offer belongs to a vendor; 2000 offers on average per vendor
Product-ProductType N:1 1 ProductType per product (leaves only)
Product-ProductFeature M: N 10-20 ProductFeatures per product
Clone this wiki locally