Skip to content

Latest commit

 

History

History
197 lines (134 loc) · 9.17 KB

File metadata and controls

197 lines (134 loc) · 9.17 KB

Folder Organization

The main working files are found in working directory. This is the current status of the folders

crypto-dashboard
|___ data (for data generated by notebooks)
|    |___ raw
|    |___ transform
|    |___ report
|
|___ notebooks (for experimentation)
|
|___ src
|    |___ defillama
|         |___ etl_defillama_data_bq.py (for ingestion to BQ)
|         |___ etl_defillama_data.py (for ingestion to cloud sql/postgres)
|         |___ sql_queries.py (for ingestion to cloud sql/postgres)
|                    
|    |___ global_crypto
          |___ etl_global_data_bq.py (for ingestion to BQ)
|         |___ etl_global_data.py (for ingestion to cloud sql/postgres)
|         |___ sql_queries.py (for ingestion to cloud sql/postgres)
|
|    |___ category_crypto
          |___ etl_category_data_bq.py (for ingestion to BQ)
|         |___ etl_category_data.py (for ingestion to cloud sql/postgres)
|         |___ sql_queries.py(for ingestion to cloud sql/postgres)
|
|    |___ db_credentials.py
|    |___ config.ini
|
|___ app.py
|___ docker-compose.yml (To test postgres/metabase/pgadmin locally)
|___ Dockerfile (for cloud)
|___ requirements.txt
|___ requirements_prod.txt

Dependency management

Before update requirements.txt:

  1. git pull
  2. Install all new dependencies: pip install -r requirements.txt
  3. Update your newly installed dependencies to the list: pip freeze > requirements.txt
  4. A lighter requirements file for deployment is requirements_prod.txt

How to run the script

  1. Check that you have the valid credentials. 3 files in question are .env, config.ini, and db_credentials.py

  2. Be in the same directory as app.py

  3. In your terminal/PowerShell, run docker compose up

  4. In another terminal in the same directory as app.py, run python3 app.py

Setting Up Cloud Run

Local Set up

DockerFile

Ensure that you copy the scripts into the Docker container with the same directory structure. This is to allow the codes to function because they calling scripts and functions across different folders. Achieve this by writing a DockerFile provided

Check that CMD exec gunicorn --bind :$PORT --workers 1 --threads 8 --timeout 0 app:app is correct. In this case, the first app in app:app is the NAME of the script where you intended function to run resides. The second app is the name of the flask application you called inside the app script.

If you named script called main.py for example, then it will be written as main:app

Also, check the codes for the DockerFile, ensure you have the relevant files available i.e requirements_prod. You can generate this file yourself by filtering your own requirements.txt to reduce the size of the Docker Image. Larger size means more money for GCP :(

db_credentials.py

This is the tricky park and there are 3 scenarios

1) Running locally with a local postgres

In this case, use the block that has config['xxx']. This pulls credentials from a local file. This connects the codes to a local postgres server that is started. This would require the use of an active docker compose started with docker compose up.

2) Running locally with Cloud SQL

In this case, use the function that has as credential_file is None (so as to avoid changes in the ETL scripts). In this case, type in the relevant details. The HOST should be the public IP address of the SQL DB. You can type the raw cmc api key

3) Running in cloud with Cloud SQL

The settings remains as above with 2 exceptions. At this point, your password should be linked to the secret manager with the relevent project id and secret id. Then the host should be the connection url to the database (for cloud SQL). If you are using Postgres on a VM, use a static IP address of the VM as the host.

The CMC API key would also be invoked via the secret manager

Note that in the event that connecting via Cloud SQL's IP, connection URL, and secret fails. It is likely that this is a permissions issue, or you have not set up the secret, or you are not logged into the gloud locally. Do it via gcloud init

Pushing to Cloud

Once you have prepared the files above, head over to your CLI (after logging into your gcloud account with gloud init)

gcloud builds submit --tag gcr.io/<name of project>/<name_of_image>
gcloud run deploy --image gcr.io/<name of project>/<name_of_image> --platform managed

Note that you can also find this name in your cloud run

The first line would build the image and name the image (aka tag the image) while the second line would push the image and creates the cloud run instance. If it fails to work then there is a chance where the repo in the Google account to hold the containers is not created. Proceed to create it.

You will meet some settings like naming the service, location, allow unauthorized invocations. This would be changed in GCP later.

Head on over to container and artifact registry to check if the images are sent there. Go on the Cloud Run to see if the push is successful.

Cloud Run

In Cloud Run, here is where we do some extra settings. Click on your Cloud Run instance and go to edit and deploy new revision

Container Tab

  1. Check that the container port is 8080 or one that matches your docker image (gunicorn) or the flask app.

  2. CPU is only allocated during request processing

  3. Memory 4GB, vCPU 2

  4. Request time out: 1000 seconds

Connetions

Cloud SQL connections -> Add Connection -> Choose the Cloud SQL db

If you are not using Cloud SQL and are using Postgres on a VM, then there is nothing much to do there except for controlling the CPU and Memory allocated

Done!

Cloud Scheduler

Scripts in cloud run are invoked by visiting the url that looks like this https://<container name>-hxz3jlrdla-as.a.run.app. Cloud Scheduler will visit this link at the relevent timing to trigger the script

  1. Head on over to Cloud Scheduler and schedule a job

  2. Name the schedule, choose the appropriate region, frequency, and timezone

  3. Target Type:HTTP, URL: copy and paste the link by your cloud run. It looks like https://<container name>-<some gibberish>-as.a.run.app. HTTP method: GET. Auth header -> Add OIDC token -> use the service account

  4. 5 Max Retry, max retry duration 0s, min back off: 65 seconds (for Coingecko API to reset after a min). Just leave max backoff and doublings alone.

  5. Create

Checks

Head over to cloud run (or pgadmin connected to the Postgres in the VM) and check the logs.

Clean Up if something goes wrong

  1. Delete containers in container registry
  2. Delete Cloud Scheduler Job
  3. Delete database related stuffs

More often than not, it is Coingecko banning your IP

Tricking Heroku and Metabase

Metabase would write metadata into the the DB, known as database_url by Heroku. The problem is that if you have your own DB, you wouldn't really want to underutilize your DB and get trapped with their sad 10K row DB limit.

Hence we have the change the database. However, Heroku would not allow it because it would always want to be working with one DB. When Heroku's locks onto one DB, you would not be able to edit its database link and therefore you need to trick it

Hence, first create the second DB heroku addons:attach heroku-postgresql -a <name of app> --as <name of db> If its me, I would use heroku addons:attach heroku-postgresql -a <name of app> --as HEROKU-DATABASE. I call this the decoy DB

Given there is two databases, we can delete the original DB created by Heroku when you first deploy Metabase called DATABASE or DATABASE_URL. When this occurs, Heroku will look for a new default database which is HEROKU-DATABASE the one that you just created.

heroku addons:detach DATABASE -a <name of app>

After this, you have freed up the DATABASE_URL key value pair. You are free to create a new one

heroku config:add DATABASE_URL= postgres://<USER>:<PASSWORD>@<PG_HOST OR PUBLIC IPV4 OF DB>:<A port that you open for postgres>/<DB_NAME> -a <name of app>

Note, do not change the DATABASE_URL name. This is because the Metabase container hardcoded to detect this variable name to pass in the postgres credentials

Heroku Memory problem

If your Metabase is shutting down on free tier like us, then add this in the env on Heroku:

KEY: JAVA_TOOL_OPTIONS
Values:
    -XX:+UseContainerSupport 
    -XX:+UseG1GC -verbose:gc
    -XX:+UnlockExperimentalVMOptions 
    -XX:+UseCGroupMemoryLimitForHeap

You are done!

Setting up VM + Postgres

  1. Go the Google's marketplace to deploy postgres on a VM Steps roughly follow this: https://joncloudgeek.com/blog/deploy-postgres-container-to-compute-engine/#migrate-data-to-new-db

    Note the version of postgres

    Expose the 5432 port

    Choose the relevant space and computing power needed

  2. Adjust your network settings to allow IP addresses

  3. Set the VM External IP address to static

  4. The external ID is the host ID of postgres as well

  5. Using the SSH and Terminal, use psql and set up your DB. Change the temp password assigned to you when you set up Postgres. Then, add the relavent user and passwords and also assigning them permissions to database that you created

  6. You may need to change the IP settings in your DB using this https://cloud.google.com/community/tutorials/setting-up-postgres

You are done!