GitHub - fran-cornachione/DuckDB-ETL: An ETL process made with SQL (DuckDB) and Python. Extracts data from .csv files, creates tables from those files,

DuckDB ETL

In this project, I extracted data from csv files, then created tables from each file, and loaded them into Postgres.

tables = ["students", "teachers", "classes", "courses", "enrollments", "grades"]

for table in tables: 
	conn.execute(f"""
	CREATE OR REPLACE TABLE
		{table}
	AS SELECT
		*
	FROM
		read_csv_auto('data/raw_data/{table}.csv');""")

For the transformations, I modified the schema of each tabl. By default, Postgres takes INTEGER columns (4 bytes) as BIGINT (8 bytes). For columns like age, id, credits, this is unnecessary and leads to wasted storage. For example:

conn.execute("""
CREATE OR REPLACE TABLE
    students_clean
AS SELECT
    CAST(student_id AS INTEGER) AS student_id,
    CAST(age AS SMALLINT) AS age,
    * EXCLUDE (student_id, age)
FROM
    students;
""")

Once I transformed all the tables, I simply loaded them into Postgres with this simple query

for table in tables:
    clean_table_name = f"{table}_clean" # students_clean, teachers_clean, etc...
  
    conn.execute(f"""
    CREATE OR REPLACE TABLE 
        pg_db.{table} 
    AS SELECT 
        * 
    FROM 
        {clean_table_name};
    """)
  
    print(f"[{table}] table loaded succesfully into Postgres")

conn.close()

How to run the project

Clone repository

git clone https://github.com/fran-cornachione/DuckDB-ETL

Install requirements
```
pip install -r requirements.txt
```

Replace this line with your Postgres credentials

conn.execute(f"""
ATTACH 'dbname=your_dbname user=your_user password=your_password host=your_host port=5432' AS pg_db (TYPE POSTGRES);
""")

On the ETL.ipynb file, click the "Run All" button

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data/raw_data		data/raw_data
image		image
ETL.ipynb		ETL.ipynb
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DuckDB ETL

How to run the project

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

fran-cornachione/DuckDB-ETL

Folders and files

Latest commit

History

Repository files navigation

DuckDB ETL

How to run the project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages