Run a short set of SQL queries in Google BigQuery against the NYC Yellow Taxi public dataset to practice cloud data warehouse basics (public datasets, SQL, cost awareness).
- Sign in with a Google account and open BigQuery in the Google Cloud Console.
- Use BigQuery Sandbox (no billing) or a classroom project provided by the instructor.
- In the Explorer, locate the NYC Taxi public dataset (Yellow Trips). Choose a 2023 table if available. Tip: use the dataset browser and copy the fully qualified table name from the UI.
- Write each query in a separate cell/tab and label it with the exercise number in a comment.
- Prefer date filters to limit scanned bytes. Use preview to confirm schemas before running.
- Keep notes of any assumptions (e.g., which exact table you used).
Mandatory (1–6):
- Count the number of trips in January 2023
- Calculate the total revenue generated by taxi trips in 2023
- Find the most popular pickup location
- Analyze the number of trips per hour of the day
- Calculate the average trip distance
- Find the longest trip by distance
Optional (7–10): 7. Calculate the total number of passengers by payment type 8. Find the most common drop-off location for trips paid by credit card 9. Calculate the total number of trips that had more than 4 passengers 10. Subquery — Find the average fare for trips longer than the average trip distance
Hint: Use functions like EXTRACT, DATETIME/TIMESTAMP functions, GROUP BY, ORDER BY, and LIMIT. For revenue, sum fare components (e.g., fare_amount + tip_amount + tolls where applicable), based on the table schema you select.
- Submit your solutions via a Pull Request.
- Public datasets are free to query; scanning bytes count toward sandbox limits. Always filter by date when possible.
- Use the query validator to review “bytes processed” before running.
