|
| 1 | +## Introduction to PostgreSQL |
| 2 | + |
| 3 | +PostgreSQL, often simply called Postgres, is a powerful, open-source object-relational database management system (ORDBMS). It has a strong reputation for reliability, feature robustness, and performance. PostgreSQL was first developed in 1986 at the University of California, Berkeley as part of the POSTGRES project. It has since evolved into one of the most advanced and widely-used database systems, with a strong community supporting its development. PostgreSQL supports all major operating systems, including Linux, Mac OS, and Windows. |
| 4 | + |
| 5 | +## Key Features of PostgreSQL |
| 6 | + |
| 7 | +PostgreSQL offers a wide range of features that make it a popular choice for many applications: |
| 8 | + |
| 9 | +1. **Extensive data types**: PostgreSQL supports a large variety of built-in data types and allows users to define their own custom data types. It can handle complex data types such as arrays, JSON, and geometric types. |
| 10 | + |
| 11 | +2. **ACID compliance**: PostgreSQL adheres to the ACID principles (Atomicity, Consistency, Isolation, Durability), ensuring reliable and trustworthy transactions. [More details](databases_introduction.md#acid-compliance) |
| 12 | + |
| 13 | +3. **Concurrency control**: PostgreSQL uses multi-version concurrency control (MVCC) to provide high concurrency without conflicts, allowing multiple transactions to access the same data simultaneously. |
| 14 | + |
| 15 | +4. **Advanced querying capabilities**: PostgreSQL supports complex SQL queries, subqueries, common table expressions (CTEs), recursive queries, and window functions. It also allows users to define their own functions, triggers, and stored procedures in various programming languages. |
| 16 | + |
| 17 | +5. **Full-text search**: PostgreSQL provides powerful full-text search capabilities, including stemming, ranking, and phrase-searching support. It uses indexes like B-tree, hash, and GiST to optimize search performance. |
| 18 | + |
| 19 | +6. **Replication and high availability**: PostgreSQL supports various replication strategies, such as asynchronous streaming, logical, and synchronous replication, providing data redundancy, fault tolerance, and high availability. |
| 20 | + |
| 21 | +7. **Security and authentication**: PostgreSQL offers robust security features, including SSL encryption, username/password authentication, LDAP authentication, Kerberos authentication, role-based access control (RBAC), and row-level security (RLS). |
| 22 | + |
| 23 | +## Setting Up PostgreSQL |
| 24 | + |
| 25 | +To get PostgreSQL running on your local machine, you will need to have the following tools installed: |
| 26 | + |
| 27 | +1. **PostgreSQL Server**: You can follow the step-by-step instructions provided on the [official website](https://www.postgresql.org/download/). Once the installation is complete, you can run the server by opening the application. |
| 28 | + |
| 29 | +2. **PostgreSQL Query Tools**: Once the PostgreSQL server is installed, you can install tools to manage and interact with PostgreSQL. There are multiple choices, each with its own set of unique features and all of them support the basic functionalities. Here are some famous ones - [PgAdmin](https://www.pgadmin.org/), [DBeaver](https://dbeaver.io/), or you can even use terminal tools like [Psql](https://www.postgresql.org/docs/current/app-psql.html). |
| 30 | + |
| 31 | +!!! Hint |
| 32 | + **Installation on Mac** |
| 33 | + |
| 34 | + PostgreSQL can be installed on Mac by using `homebrew`. Run the command `brew install postgresql`. For more details and options, follow the [official website](https://www.postgresql.org/download/macosx/). |
| 35 | + |
| 36 | +## Learning the Basics |
| 37 | + |
| 38 | +Practice makes man perfect, so let's learn PostgreSQL through sample codes. Below are some sample code snippets in increasing order of complexity, designed to help you understand various aspects of PostgreSQL. |
| 39 | + |
| 40 | +!!! Hint |
| 41 | + Before we begin, please note that to interact with the database, you need to use the PostgreSQL Query Language, which is a variant of the SQL language. If you are using terminal, then you can activate psql mode by running `psql`. Once inside you can connect to the database by running the following command: |
| 42 | + |
| 43 | + ```sql |
| 44 | + -- Connecting to a PostgreSQL database |
| 45 | + -- Use a client or terminal with appropriate access credentials |
| 46 | + \c my_database; |
| 47 | + ``` |
| 48 | + Or you can use any of the user-interface tools like PgAdmin for better user experience. |
| 49 | + |
| 50 | +**1. Creating a Database** |
| 51 | + |
| 52 | +```sql |
| 53 | +-- Creating a database. Replace `my_database` with your database name |
| 54 | +CREATE DATABASE my_database; |
| 55 | +``` |
| 56 | + |
| 57 | +**2. Creating a Table** |
| 58 | + |
| 59 | + |
| 60 | +```sql |
| 61 | +-- Creating a simple table. Replace `employees` with your table name |
| 62 | +CREATE TABLE employees ( |
| 63 | + id SERIAL PRIMARY KEY, |
| 64 | + name VARCHAR(50), |
| 65 | + position VARCHAR(50), |
| 66 | + departmentid INT, |
| 67 | + salary DECIMAL |
| 68 | +); |
| 69 | +``` |
| 70 | + |
| 71 | +!!! Hint |
| 72 | + [Here is a detailed list](https://www.postgresql.org/docs/current/datatype.html) of all supported data types in PostgreSQL. Note, you can also [create custom data types](https://www.postgresql.org/docs/current/sql-createtype.html). |
| 73 | + |
| 74 | +**3. Inserting Data** |
| 75 | + |
| 76 | +```sql |
| 77 | +-- Inserting data into the table |
| 78 | +INSERT INTO employees (name, position, salary) |
| 79 | +VALUES ('John Doe', 'Software Engineer', 70000); |
| 80 | +``` |
| 81 | + |
| 82 | +**4. Basic Data Retrieval** |
| 83 | + |
| 84 | +```sql |
| 85 | +-- Retrieving all data from a table |
| 86 | +SELECT * FROM employees; |
| 87 | + |
| 88 | +-- Limiting the number of rows returned |
| 89 | +SELECT * FROM employees LIMIT 10; |
| 90 | + |
| 91 | +-- Retrieving specific columns |
| 92 | +SELECT name, position FROM employees; |
| 93 | + |
| 94 | +-- Retrieving data in descending order |
| 95 | +SELECT * FROM employees ORDER BY salary DESC; |
| 96 | +``` |
| 97 | + |
| 98 | +**5. Data Retrieval with Conditions** |
| 99 | + |
| 100 | +```sql |
| 101 | +-- Retrieving specific data with a condition |
| 102 | +SELECT name, position FROM employees WHERE salary > 50000; |
| 103 | + |
| 104 | +-- Filtering on string columns |
| 105 | +SELECT * FROM employees WHERE name LIKE '%Doe%'; |
| 106 | + |
| 107 | +-- Filtering on datetime columns |
| 108 | +SELECT * FROM orders WHERE order_date BETWEEN '2022-01-01' AND '2022-02-01'; |
| 109 | + |
| 110 | +-- Filtering on datetime columns with interval (works same as above) |
| 111 | +SELECT * FROM orders WHERE order_date BETWEEN '2022-01-01' AND '2022-02-01'::date + interval '1 month'; |
| 112 | + |
| 113 | +-- To filter based on multiple conditions and values |
| 114 | +SELECT * FROM employees WHERE name LIKE '%Doe%' AND salary > 50000 |
| 115 | + AND position in ('Software Engineer', 'Data Scientist'); |
| 116 | +``` |
| 117 | + |
| 118 | +**6. Updating Data** |
| 119 | + |
| 120 | +```sql |
| 121 | +-- Updating data in the table |
| 122 | +UPDATE employees SET salary = 75000 WHERE name = 'John Doe'; |
| 123 | +``` |
| 124 | + |
| 125 | +**7. Deleting Data** |
| 126 | + |
| 127 | +```sql |
| 128 | +-- Deleting data from the table |
| 129 | +DELETE FROM employees WHERE id = 1; |
| 130 | + |
| 131 | +-- Deleting all data from the table |
| 132 | +DELETE FROM employees; |
| 133 | + |
| 134 | +-- Deleting the table |
| 135 | +DROP TABLE employees; |
| 136 | + |
| 137 | +-- Deleting multiple tables |
| 138 | +DROP TABLE employees, departments; |
| 139 | +``` |
| 140 | + |
| 141 | +**8. Joining Tables** |
| 142 | + |
| 143 | +```sql |
| 144 | +-- Creating another table |
| 145 | +CREATE TABLE departments ( |
| 146 | + id SERIAL PRIMARY KEY, |
| 147 | + name VARCHAR(50) |
| 148 | +); |
| 149 | + |
| 150 | +-- Inserting data into the new table |
| 151 | +INSERT INTO departments (name) VALUES ('Engineering'); |
| 152 | + |
| 153 | +-- Joining two tables |
| 154 | +SELECT employees.name, departments.name AS department_name |
| 155 | +FROM employees |
| 156 | +JOIN departments ON employees.departmentid = departments.id; |
| 157 | +``` |
| 158 | + |
| 159 | +**9. Using Aggregate Functions** |
| 160 | + |
| 161 | +```sql |
| 162 | +-- Using an aggregate function to get the average salary |
| 163 | +SELECT AVG(salary) FROM employees; |
| 164 | + |
| 165 | +-- Group by a column (ex: getting the average salary by department) |
| 166 | +SELECT department_name, AVG(salary) AS avg_salary |
| 167 | +FROM employees |
| 168 | +JOIN departments ON employees.id = departments.id |
| 169 | +GROUP BY department_name; |
| 170 | +``` |
| 171 | + |
| 172 | +**10. Complex Query with Subquery and Grouping** |
| 173 | + |
| 174 | +```sql |
| 175 | +-- Finding the highest salary in each department |
| 176 | +SELECT department_name, MAX(salary) AS max_salary |
| 177 | +FROM ( |
| 178 | + SELECT employees.name, employees.salary, departments.name AS department_name |
| 179 | + FROM employees |
| 180 | + JOIN departments ON employees.id = departments.id |
| 181 | +) AS department_salaries |
| 182 | +GROUP BY department_name; |
| 183 | +``` |
| 184 | + |
| 185 | +These examples cover a range of basic to more complex tasks you can perform with PostgreSQL, from establishing a connection to executing advanced queries. As you become more comfortable with these operations, you'll be able to tackle more complex scenarios and optimize your database interactions. |
| 186 | + |
| 187 | +## Python Sample Code |
| 188 | + |
| 189 | +There are multiple python packages available for PostgreSQL like, [psycopg2](https://pypi.org/project/psycopg2/) and [asyncpg](https://pypi.org/project/asyncpg/). For this section, we will use [asyncpg](https://pypi.org/project/asyncpg/) package that provides support for asynchronous programming. |
| 190 | + |
| 191 | +A sample code to connect to the PostgreSQL server and fetch the result is shown below, |
| 192 | + |
| 193 | +```python linenums="1" |
| 194 | +# import |
| 195 | +import asyncio |
| 196 | +import asyncpg |
| 197 | + |
| 198 | +# the main function that connect to the PostgreSQL server, |
| 199 | +# fetch the result and print the result |
| 200 | +async def run(): |
| 201 | + # connect to the PostgreSQL server |
| 202 | + conn = await asyncpg.connect(user='postgres', password='admin', |
| 203 | + database='mydb', host='localhost') |
| 204 | + # fetch the result |
| 205 | + result = await conn.fetch( |
| 206 | + 'SELECT * FROM mytbl LIMIT 1' |
| 207 | + ) |
| 208 | + # print the result |
| 209 | + print(dict(result)) |
| 210 | + # close the connection |
| 211 | + await conn.close() |
| 212 | + |
| 213 | +if __name__ == '__main__': |
| 214 | + # run the code |
| 215 | + loop = asyncio.get_event_loop() |
| 216 | + loop.run_until_complete(run()) |
| 217 | +``` |
| 218 | + |
| 219 | +Creating dynamic queries based on user input can be easily done by passing the variables to the `fetch` function. Below is the modification you need to do. If you notice, we have two variables in the query for `id` and `limit` denoted by `$1` and `$2` respectively. The respective values are passed in the `fetch` function. Rest of the code remains same. |
| 220 | + |
| 221 | +```python linenums="1" |
| 222 | +# fetch the result |
| 223 | +result = await conn.fetch( |
| 224 | + 'SELECT * FROM mytbl where id = $1 LIMIT $2', |
| 225 | + 123, 1 |
| 226 | +) |
| 227 | +``` |
| 228 | + |
| 229 | +You can use `conn.execute` to run the query without fetching the result. Below is the modification needed. |
| 230 | + |
| 231 | +```python linenums="1" |
| 232 | +# insertion example (one row) |
| 233 | +result = await conn.execute( |
| 234 | + 'INSERT INTO mytbl (code, name) VALUES ($1, $2) where id = $3', |
| 235 | + 123, 'mohit', 1 |
| 236 | +) |
| 237 | +``` |
| 238 | + |
| 239 | +If you want to execute for multiple rows, you can use `conn.executemany` instead of `conn.execute`. Below is the modification to the code shown above. |
| 240 | + |
| 241 | +```python linenums="1" |
| 242 | +# insertion example (multiple rows) |
| 243 | +result = await conn.executemany( |
| 244 | + 'INSERT INTO mytbl (code, name) VALUES ($1, $2) where id = $3', |
| 245 | + [(123, 'mohit', 1), (124, 'mayank', 2)] |
| 246 | +) |
| 247 | +``` |
| 248 | + |
| 249 | +You might want to create a generic function to execute queries and retry in case of failure. Here is how you can do it using the `tenacity` library. The below code will retry 3 times if the query fails with exponential backoff. |
| 250 | + |
| 251 | +```python linenums="1" |
| 252 | +# import |
| 253 | +import asyncio |
| 254 | +import asyncpg |
| 255 | +import functools |
| 256 | +from tenacity import TryAgain, retry, stop_after_attempt, wait_exponential |
| 257 | + |
| 258 | +# custom retry logging function |
| 259 | +def custom_retry_log(retry_state, msg): |
| 260 | + if retry_state.attempt_number != 1: |
| 261 | + print(f"Retrying {retry_state.attempt_number - 1} for {msg}") |
| 262 | + |
| 263 | +# main function |
| 264 | +async def execute_fetch_script(script, values=(), msg=None, retry_on_failure=True): |
| 265 | + # create connection |
| 266 | + conn = await asyncpg.connect(user='postgres', password='admin', |
| 267 | + database='mydb', host='localhost') |
| 268 | + try: |
| 269 | + # retry mechanism |
| 270 | + log_callback = functools.partial(custom_retry_log, msg=msg) |
| 271 | + |
| 272 | + # retry mechanism |
| 273 | + @retry(wait=wait_exponential(multiplier=2, min=2, max=16), |
| 274 | + stop=stop_after_attempt(4), |
| 275 | + after=log_callback, reraise=True) |
| 276 | + async def retry_wrapper(): |
| 277 | + try: |
| 278 | + # execute the select SQL script |
| 279 | + records = await conn.fetch(script, *values) |
| 280 | + project_records = [dict(record) for record in records] |
| 281 | + print(project_records) # remove this |
| 282 | + return project_records |
| 283 | + except Exception as e: |
| 284 | + if retry_on_failure: |
| 285 | + raise TryAgain(e) |
| 286 | + else: |
| 287 | + print(f"Failure in {msg} - {e}") |
| 288 | + return |
| 289 | + |
| 290 | + # db call wrapper |
| 291 | + return await retry_wrapper() |
| 292 | + except Exception as e: |
| 293 | + raise Exception(f"Failure in {msg} - {e}") |
| 294 | + finally: |
| 295 | + # close db connections |
| 296 | + await conn.close() |
| 297 | + |
| 298 | + |
| 299 | +if __name__ == '__main__': |
| 300 | + loop = asyncio.get_event_loop() |
| 301 | + script ='SELECT * FROM mytbl where projectid = $1 LIMIT $2' |
| 302 | + values = (2, 1) |
| 303 | + loop.run_until_complete(execute_fetch_script(script, values, "Testing Run")) |
| 304 | +``` |
| 305 | + |
| 306 | +If you noticed, all of the above examples are executing the query within a single transaction. In case you want to execute multiple queries in one transaction, you can do as shown below, |
| 307 | + |
| 308 | +```python linenums="1" |
| 309 | +# import |
| 310 | +import asyncio |
| 311 | +import asyncpg |
| 312 | +import functools |
| 313 | +from tenacity import TryAgain, retry, stop_after_attempt, wait_exponential |
| 314 | + |
| 315 | +# create the connection |
| 316 | +conn = await asyncpg.connect(user='postgres', password='admin', |
| 317 | + database='mydb', host='localhost') |
| 318 | + |
| 319 | +# start the transaction |
| 320 | +async with conn.transaction(): |
| 321 | + |
| 322 | + try: |
| 323 | + # Query 1 - execute the select SQL script |
| 324 | + records = await conn.fetch('SELECT * FROM mytbl where projectid = $1 LIMIT $2', 2, 1) |
| 325 | + |
| 326 | + # Query 2 - update the table |
| 327 | + await conn.execute('UPDATE mytbl SET name = $1 where projectid = $2', 'mohit', 2) |
| 328 | + |
| 329 | + # handle exception |
| 330 | + except Exception as e: |
| 331 | + # in case of exception rollback the transaction |
| 332 | + await conn.execute('ROLLBACK;') |
| 333 | + |
| 334 | + finally: |
| 335 | + # close db connections |
| 336 | + await conn.close() |
| 337 | +``` |
| 338 | + |
| 339 | +## Snippets |
| 340 | + |
| 341 | +Real world problems will require much more than what we covered in the above sections. Lets cover some important queries in this section. |
| 342 | + |
| 343 | +**Casting a column to a different data type** |
| 344 | + |
| 345 | +```sql |
| 346 | +-- Casting a column to a different data type |
| 347 | +SELECT CAST(salary AS VARCHAR) FROM employees; |
| 348 | +``` |
| 349 | + |
| 350 | +**Using JSONB column** |
| 351 | + |
| 352 | +```sql |
| 353 | +-- Extracting data from JSONB column |
| 354 | +-- Suppose data column contains {"name": "John", "address": {"city": "New York", "state": "NY"}} |
| 355 | +SELECT name, jsonb_extract_path(data, 'address', 'city') AS city FROM employees; |
| 356 | +``` |
| 357 | + |
| 358 | +**Extracting components from a DateTime column** |
| 359 | + |
| 360 | +```sql |
| 361 | +-- Extracting month from DATE column |
| 362 | +-- Suppose in a tbl, order_date col contains info like 2022-01-01 |
| 363 | +SELECT DATE_TRUNC('month', order_date) AS month, COUNT(*) AS order_count |
| 364 | +FROM orders |
| 365 | +GROUP BY month |
| 366 | +ORDER BY month; |
| 367 | + |
| 368 | +-- Extract year from DATE column, use: DATE_TRUNC('year', order_date) |
| 369 | +-- Extract quarter from DATE column, use: DATE_TRUNC('quarter', order_date) |
| 370 | +-- Extract week from DATE column, use: DATE_TRUNC('week', order_date) |
| 371 | +-- Extract day from DATE column, use: DATE_TRUNC('day', order_date) |
| 372 | +-- Extract hour from DATE column, use: DATE_TRUNC('hour', order_date) |
| 373 | +-- Extract minute from DATE column, use: DATE_TRUNC('minute', order_date) |
| 374 | +-- Extract second from DATE column, use: DATE_TRUNC('second', order_date) |
| 375 | +``` |
| 376 | + |
| 377 | +## Conclusion |
| 378 | + |
| 379 | +PostgreSQL's combination of features, performance, and reliability makes it a popular choice for a wide range of applications, from small projects to large-scale enterprise systems. Its open-source nature, strong community support, and continuous development ensure that PostgreSQL will remain a leading database management system for years to come. Hope this article helped you understand the basics of PostgreSQL and piqued your interest in learning more. |
| 380 | + |
| 381 | +## References |
| 382 | + |
| 383 | +[1] GeeksforGeeks - [What is PostgreSQL?](https://www.geeksforgeeks.org/what-is-postgresql-introduction/) | [PostgreSQL Tutorial](https://www.geeksforgeeks.org/postgresql-tutorial/) |
| 384 | + |
| 385 | +[2] w3schools - [PostgreSQL Tutorial](https://www.w3schools.com/postgresql/postgresql_intro.php) |
| 386 | + |
| 387 | +[3] Tutorialspoint - [PostgreSQL Tutorial](https://www.tutorialspoint.com/postgresql/postgresql_overview.htm) |
0 commit comments