Databases
Carry-over from lecture 6
Whenever documentation says pip install …, you probably want to add it to your requirements.txt.
Who has experience with SQL? What have you used it for?
SQL syntax is largely the same across SQL databases:
- SQLite
- DuckDB
- PostgreSQL
- MySQL
- BigQuery
- Oracle
- …
The variations are known as "dialects".
- Mac: Assuming you have Homebrew set up, use the "package manager" option.
- Windows: Not available for Chocolatey yet, so probably easiest to do the "direct download".
-
Download and unzip the access to electricity dataset as a CSV.
-
Query it.
SELECT * FROM read_csv('[path].csv');
SELECT * FROM read_csv(
'[path].csv',
nullstr=['', '..']
);CREATE TABLE electricity AS SELECT ...;What's a question we might want to ask?
Column names with spaces need double quotes. Alternatively, normalize_names.
SELECT
"Country name",
"2022 [YR2022]" - "1990 [YR1990]" AS diff
FROM electricity
ORDER BY diff DESC;- Tabular
- DuckDB:
read_csv() - Tables are like DataFrames
- Columns have types
- Column-based operations
SELECTis like boolean indexingGROUP BYis likegroupby()
pandas allows you to build up operations over multiple lines; harder to do that in SQL.
Lots of ways to connect to databases from Python, including:
- pandas
- SQLAlchemy
- Object Relational Mapper (ORM)
Allow you to use the same Python syntax across databases
How would you take data from an API and get it into a database?
- In the Google Cloud Console, make sure your Project is selected.
- Open BigQuery.
- Enable the API.
- Open a public dataset.
- Try
CategoryofEconomicsandPriceofFree
- Try
SELECT company_name, COUNT(*) AS num_complaints
FROM `bigquery-public-data.cfpb_complaints.complaint_database`
GROUP BY company_name
ORDER BY num_complaints DESC;Coming soon