Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 4 additions & 6 deletions 01-docker-terraform/2_docker_sql/data-loading-parquet.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,12 @@
"source": [
"# Data loading \n",
"\n",
"Here we will be using the ```.paraquet``` file we downloaded and do the following:\n",
"Here we will be using the ```.parquet``` file we downloaded and do the following:\n",
" - Check metadata and table datatypes of the paraquet file/table\n",
" - Convert the paraquet file to pandas dataframe and check the datatypes. Additionally check the data dictionary to make sure you have the right datatypes in pandas, as pandas will automatically create the table in our database.\n",
" - Convert the parquet file to pandas dataframe and check the datatypes. Additionally check the data dictionary to make sure you have the right datatypes in pandas, as pandas will automatically create the table in our database.\n",
" - Generate the DDL CREATE statement from pandas for a sanity check.\n",
" - Create a connection to our database using SQLAlchemy\n",
" - Convert our huge paraquet file into a iterable that has batches of 100,000 rows and load it into our database."
" - Convert our huge parquet file into an iterable that has batches of 100,000 rows and load it into our database."
]
},
{
Expand Down Expand Up @@ -236,9 +236,7 @@
"\ttotal_amount FLOAT(53), \n",
"\tcongestion_surcharge FLOAT(53), \n",
"\t\"Airport_fee\" FLOAT(53)\n",
")\n",
"\n",
"\n"
")\n"
]
}
],
Expand Down