|
| 1 | +# 📅 Data Modeling |
| 2 | + |
| 3 | +This repository contains the setup for the data modeling modules in Weeks 1 and 2. |
| 4 | + |
| 5 | +:wrench: **Tech Stack** |
| 6 | + |
| 7 | +- Git |
| 8 | +- Postgres |
| 9 | +- PSQL CLI |
| 10 | +- Database management environment (DataGrip, DBeaver, VS Code with extensions, etc.) |
| 11 | +- Docker, Docker Compose, and Docker Desktop |
| 12 | + |
| 13 | +:pencil: **TL;DR** |
| 14 | + |
| 15 | +1. [Clone the repository](https://github.com/DataExpert-io/data-engineer-handbook/edit/main/bootcamp/materials/1-dimensional-data-modeling/README.md). |
| 16 | +2. [Start Postgres instance](https://github.com/DataExpert-io/data-engineer-handbook/edit/main/bootcamp/materials/1-dimensional-data-modeling/README.md#2%EF%B8%8F%E2%83%A3run-postgres). |
| 17 | +3. [Connect to Postgres](https://github.com/DataExpert-io/data-engineer-handbook/edit/main/bootcamp/materials/1-dimensional-data-modeling/README.md#threeconnect-to-postgres-in-database-client) using your preferred database management tool. |
| 18 | + |
| 19 | +For detailed instructions and more information, please refer to the step-by-step instructions below. |
| 20 | + |
| 21 | +## 1️⃣ **Clone the repository** |
| 22 | + |
| 23 | +- Clone the repo using the SSH link. This will create a new folder in the current directory on your local machine. |
| 24 | + |
| 25 | + ```bash |
| 26 | + git clone [email protected]:DataExpert-io/data-engineer-handbook.git |
| 27 | + ``` |
| 28 | + |
| 29 | + > ℹ️ To securely interact with GitHub repositories, it is recommended to use SSH keys. Follow the instructions provided **[here](https://docs.github.com/en/authentication/connecting-to-github-with-ssh/adding-a-new-ssh-key-to-your-github-account)** to set up SSH keys on GitHub. |
| 30 | + > |
| 31 | + |
| 32 | +- Navigate into the cloned repo using the command line: |
| 33 | + |
| 34 | + ```bash |
| 35 | + cd data-engineer-handbook/bootcamp/materials/1-dimensional-data-modeling |
| 36 | + ``` |
| 37 | + |
| 38 | +## 2️⃣ **Run Postgres** |
| 39 | + |
| 40 | +There are two methods to get Postgres running locally. |
| 41 | + |
| 42 | +### 💻 **Option 1: Run on local machine** |
| 43 | + |
| 44 | +1. Install Postgres |
| 45 | + - For Mac: Follow this **[tutorial](https://daily-dev-tips.com/posts/installing-postgresql-on-a-mac-with-homebrew/)** (Homebrew is really nice for installing on Mac) |
| 46 | + - For Windows: Follow this **[tutorial](https://www.sqlshack.com/how-to-install-postgresql-on-windows/)** |
| 47 | +2. Run this command after replacing **`<computer-username>`** with your computer's username: |
| 48 | + |
| 49 | + ```bash |
| 50 | + psql -U <computer-username> postgres < data.dump |
| 51 | + ``` |
| 52 | + |
| 53 | +3. Set up DataGrip, DBeaver, or your VS Code extension to point at your locally running Postgres instance. |
| 54 | +4. Have fun querying! |
| 55 | +
|
| 56 | +### 🐳 **Option 2: Run Postgres in Docker** |
| 57 | +
|
| 58 | +- Install Docker Desktop from **[here](https://www.docker.com/products/docker-desktop/)**. |
| 59 | +- Copy **`example.env`** to **`.env`**: |
| 60 | + |
| 61 | + ```bash |
| 62 | + cp example.env .env |
| 63 | + ``` |
| 64 | +
|
| 65 | +- Start the Docker Compose container: |
| 66 | + - If you're on Mac: |
| 67 | + |
| 68 | + ```bash |
| 69 | + make up |
| 70 | + ``` |
| 71 | + |
| 72 | + - If you're on Windows: |
| 73 | + |
| 74 | + ```bash |
| 75 | + docker compose up -d |
| 76 | + ``` |
| 77 | + |
| 78 | +- A folder named **`postgres-data`** will be created in the root of the repo. The data backing your Postgres instance will be saved here. |
| 79 | +- You can check that your Docker Compose stack is running by either: |
| 80 | + - Going into Docker Desktop: you should see an entry there with a drop-down for each of the containers running in your Docker Compose stack. |
| 81 | + - Running **`docker ps -a`** and looking for the containers with the name **`postgres`**. |
| 82 | +- When you're finished with your Postgres instance, you can stop the Docker Compose containers with: |
| 83 | + |
| 84 | + ```bash |
| 85 | + make down |
| 86 | + ``` |
| 87 | + |
| 88 | + Or if you're on Windows: |
| 89 | + |
| 90 | + ```bash |
| 91 | + docker compose down -v |
| 92 | + ``` |
| 93 | +
|
| 94 | +### :rotating_light: **Need help loading tables?** :rotating_light: |
| 95 | +
|
| 96 | +> Refer to the instructions below to resolve the issue when the data dump fails to load tables, displaying the message `PostgreSQL Database directory appears to contain a database; Skipping initialization.` |
| 97 | +> |
| 98 | +
|
| 99 | +## :three: **Connect to Postgres in Database Client** |
| 100 | +
|
| 101 | +- Some options for interacting with your Postgres instance: |
| 102 | + - DataGrip - JetBrains; 30-day free trial or paid version. |
| 103 | + - VSCode built-in extension (there are a few of these). |
| 104 | + - PGAdmin. |
| 105 | + - Postbird. |
| 106 | +- Using your client of choice, follow the instructions to establish a new PostgreSQL connection. |
| 107 | + - The default username is **`postgres`** and corresponds to **`$POSTGRES_USER`** in your **`.env`**. |
| 108 | + - The default password is **`postgres`** and corresponds to **`$POSTGRES_PASSWORD`** in your **`.env`**. |
| 109 | + - The default database is **`postgres`** and corresponds to **`$POSTGRES_DB`** in your **`.env`**. |
| 110 | + - The default host is **`localhost`** or **`0.0.0.0`.** This is the IP address of the Docker container running the PostgreSQL instance. |
| 111 | + - The default port for Postgres is **`5432` .** This corresponds to the **`$CONTAINER_PORT`** variable in the **`.env`** file. |
| 112 | + |
| 113 | + → :bulb: You can edit these values by modifying the corresponding values in **`.env`**. |
| 114 | + |
| 115 | +- If the test connection is successful, click "Finish" or "Save" to save the connection. You should now be able to use the database client to manage your PostgreSQL database locally. |
| 116 | +
|
| 117 | +## **🚨 Tables not loading!? 🚨** |
| 118 | +- If you are on Windows and used **`docker compose up`**, table creation and data load will not take place with container creation. Once you have docker container up and verified that you are able to connect to empty postgres database with your own choice of client, follow the following steps: |
| 119 | +1. On Docker desktop, connect to my-postgres-container terminal. |
| 120 | +2. Run: |
| 121 | + ```bash |
| 122 | + psql \ |
| 123 | + -v ON_ERROR_STOP=1 \ |
| 124 | + --username $POSTGRES_USER \ |
| 125 | + --dbname $POSTGRES_DB \ |
| 126 | + < /docker-entrypoint-initdb.d/data.dump> |
| 127 | + ``` |
| 128 | + - → This will run the file `data.dump` from inside your docker container. |
| 129 | +
|
| 130 | +- If the tables don't come with the loaded data, follow these steps with manual installation of postgres: |
| 131 | + |
| 132 | +1. Find where your `psql` client is installed (Something like `C:\\Program Files\\PostgreSQL\\13\\runpsql.bat`) |
| 133 | +2. Make sure you're in the root of the repo, and launch `psql` by running that `.bat` script |
| 134 | +3. Enter your credentials for postgres (described in the connect to postgres section) |
| 135 | + - → If the above worked, you should now be inside a psql REPL (It looks like `postgres=#`) |
| 136 | +4. Run: |
| 137 | + |
| 138 | + ```bash |
| 139 | + postgres=# \\i data.dump |
| 140 | + ``` |
| 141 | + |
| 142 | + - → This will run the file `data.dump` from inside your psql REPL. |
| 143 | +
|
| 144 | +--- |
| 145 | +
|
| 146 | +#### 💡 Additional Docker Make commands |
| 147 | +
|
| 148 | +- To restart the Postgres instance, you can run **`make restart`**. |
| 149 | +- To see logs from the Postgres container, run **`make logs`**. |
| 150 | +- To inspect the Postgres container, run **`make inspect`**. |
| 151 | +- To find the port Postgres is running on, run **`make ip`**. |
0 commit comments