These are the questions for the key/value and document stores.
+Instructions
+For the questions in this section, we will consider a document-oriented database with Yelp data. Imagine there are 3 collections: businesses, users and reviews.
+Please email the answer to jan.aerts@uhasselt.be. Your email should include the answers for each statement like this (obviously mock-up):
+1.1: false
+1.2: true
+1.3: true
+1.4: false
+2.1: true
+2.2: ...
+
+ Note
+ |
++Explicitly state which are false and which are true. Do not just send a list of the true statements. + | +
Dataset
+{
+ "_key": "tnhfDv5Il8EaGSXZGiuQGg",
+ "_id": "businesses/tnhfDv5Il8EaGSXZGiuQGg",
+
+ // the business's name
+ "name": "Garaje",
+
+ // the city
+ "city": "San Francisco",
+
+ // 2 character state code
+ "state": "CA",
+
+ // star rating
+ "stars": 4.5,
+
+ // number of reviews
+ "review_count": 1198,
+
+ // object, business attributes to values. note: some attribute values might be objects
+ "attributes": {
+ "RestaurantsTakeOut": true,
+ "BusinessParking": {
+ "garage": false,
+ "street": true,
+ "lot": false
+ },
+ },
+
+ // business category: Restaurant, Plumber, ...
+ "category": "Restaurant"
+}
+{
+ "_key": "zdSx_SD6obEhz9VrW9uAWA",
+ "_id": "reviews/zdSx_SD6obEhz9VrW9uAWA",
+
+ // user id, maps to the user in users collection
+ "user_id": "users/Ha3iJu77CxlrFm-vQRs_8g",
+
+ // business id, maps to business in businesses collection
+ "business_id": "businesses/tnhfDv5Il8EaGSXZGiuQGg",
+
+ // star rating
+ "stars": 4,
+
+ // date of review
+ "date": {
+ "year": 2016,
+ "month": 3,
+ "day": 9
+ },
+
+ // number of useful votes received
+ "useful": 15,
+
+ // the review itself
+ "text": "Great place to hang out after work"
+}
+{
+ "_key": "Ha3iJu77CxlrFm-vQRs_8g",
+ "_id": "users/Ha3iJu77CxlrFm-vQRs_8g",
+
+ // the user's first name
+ "name": "Sebastien",
+
+ // the number of reviews they've written
+ "review_count": 56,
+
+ // when the user joined Yelp
+ "yelping_since": {
+ "year": 2011,
+ "month": 1,
+ "day": 1
+ },
+
+ // number of fans the user has
+ "fans": 1032,
+
+ // the years the user was elite
+ "elite": [
+ 2012,
+ 2013
+ ],
+
+ // average rating of all reviews
+ "average_stars": 4.31
+}
+We will make the following assumptions:
+-
+
-
+
All documents are well-formed, and therefore have the same schema. In other words: all keys are present in all documents (e.g.
+attributes
is not missing from one of the businesses).
+ -
+
There are users who have written no reviews and there are businesses that have received no reviews.
+
+
Question 1
+Consider the following query:
+FOR r IN reviews
+COLLECT m=r.date.month AGGREGATE u=MAX(r.useful)
+LIMIT 5
+SORT u DESC
+RETURN {m:m, u:u}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 1.1 - This query shows the 5 months with the highest number of useful votes their reviews received.
+Possible answer 1.2 - This query shows the 5 most useful reviews.
+Possible answer 1.3 - This query will return a value for each month of the year, even if there are no reviews in that month.
+Possible answer 1.4 - This query shows 5 random months together with the highest number of useful votes a review in them received.
+Question 2
+Consider the following query:
+FOR u IN users
+FOR r IN reviews
+FILTER r.user_id == u._id
+FILTER r.stars < (u.average_stars/2)
+RETURN {n:u.name,us:u.average_stars,s:r.stars}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 2.1 - This query will only return results for users who have written reviews.
+Possible answer 2.2 - All users will appear in the results.
+Possible answer 2.3 - This query returns a result for each review where the user gives less than half of their average number of stars.
+Possible answer 2.4 - This query will return the same results if the first two lines were swapped (i.e. first FOR r IN reviews
, then FOR u IN users
).
Question 3
+Consider the following query:
+FOR u IN users
+SORT u.fans DESC
+LIMIT 1
+RETURN {a:u.name, b:u.average_stars}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 3.1 - The result is not deterministic because there might be multiple users with an equal amount of fans.
+Possible answer 3.2 - This query returns the name and average stars given for the user with the fewest fans.
+Possible answer 3.3 - This query returns the name and average stars given for the user with the most fans.
+Possible answer 3.4 - The result is independent of the maximum number of stars a user gave in their reviews.
+Question 4
+Consider the following query:
+FOR b IN businesses
+FILTER b.state == "CA"
+RETURN DISTINCT {
+ name: b.name,
+ stars: (
+ FOR r IN reviews
+ FILTER r.business_id == b._id
+ FILTER r.date.year == 2016
+ RETURN r.stars
+)}
+Which of the following statements are true? Attention: there might be none, there might be more than one.
+Possible answer 4.1 - This returns the name of each business in California, plus an array of the stars they received in 2016. If a business didn’t have a review in 2016, that business is not included in the output.
+Possible answer 4.2 - This returns the name of each business in California, plus an array of the stars they received in 2016. If a business didn’t have a review in 2016, an empty array is returned for the stars.
+Possible answer 4.3 - The DISTINCT
has no effect on the output and could have been removed.
Possible answer 4.4 - The SORT r.stars DESC
has no effect on the output and could be removed.
Question 5
+Which of the following queries returns the take-out restaurant with the highest number of reviews in 2018? The output should be a single object and look like this:
+{
+ "_key": "GBTPC53ZrG1ZBY3DT8Mbcw",
+ "_id": "businesses/GBTPC53ZrG1ZBY3DT8Mbcw",
+ "name": "Luke",
+ "city": "New Orleans",
+ "state": "LA",
+ "stars": 4,
+ "review_count": 4554,
+ "attributes": {
+ "RestaurantsReservations": "True",
+ "RestaurantsTakeOut": "True"
+ },
+ "category": "Restaurant"
+}
+Attention: there might be none, there might be more than one.
+Possible answer 5.1
+FOR a IN businesses
+FILTER a.attributes.RestaurantsTakeOut == "True" AND a.category == "Restaurant"
+SORT a.review_count DESC
+LIMIT 1
+RETURN a
+Possible answer 5.2
+LET a = (
+ FOR b IN reviews
+ FILTER b.date.year == 2018
+ COLLECT c = b.business_id WITH COUNT INTO cnt
+ SORT cnt DESC
+ LIMIT 1
+ RETURN DOCUMENT(c)
+)
+
+FOR d IN a
+FILTER d.attributes.RestaurantsTakeout == "True"
+FILTER d.category == "Restaurant"
+RETURN d
+Possible answer 5.3
+FOR r IN reviews
+FOR b IN businesses
+FILTER r.business_id == b._id
+FILTER r.date.year == 2018
+FILTER b.category == "Restaurant"
+FILTER b.attributes.RestaurantsTakeOut == "True"
+COLLECT c = r.business_id WITH COUNT INTO d
+SORT d DESC
+LIMIT 1
+RETURN DOCUMENT(c)
+Possible answer 5.4
+FOR b IN businesses
+FILTER b.attributes.RestaurantsTakeOut == "True"
+FILTER b.category == "Restaurant"
+SORT b.review_count DESC
+LIMIT 1
+RETURN b.name
+Question 6
+Which of the following queries results in a list of unique business categories? It would look like this:
+["Restaurant","Plumber","Beauty & Spas","Gunsmith","Wedding Planner"]
+Attention: there might be none, there might be more than one.
+Possible answer 6.1
+FOR b IN businesses
+COLLECT c=b.category
+RETURN c
+Possible answer 6.2
+FOR b IN businesses
+RETURN DISTINCT b.category
+Possible answer 6.3
+LET categories = (
+ FOR b IN businesses
+ RETURN b.category
+)
+FOR c IN categories
+RETURN DISTINCT c
+Possible answer 6.4
+FOR c IN (
+ FOR b IN businesses
+ RETURN b.category
+)
+RETURN DISTINCT c
+