pgkeen (/ˈpiː‿ˈdʒiː.ˈkiːn/
) extends the official postgres:16
docker image with essential extensions for AI/ML, vector operations, and data processing. It's built as a force multiplier for AI-function-calling tools like MCP, Dify, Flowise, etc, to enable a set of fundamental tools that can compose higher order functionality.
- Python 3.10.14 compiled from source
- Full carryover support for the official
postgres:16
image and its idiosyncrasies. - All extensions pre-installed and enabled on
postgres
database and user.
A simple set of tools to run administrative commands on the PostgreSQL instance after it's been initialized. It uses the psql
command-line client to execute SQL commands against the database for common tasks related to user and database management.
./pgkeen
pgkeen - (`/'pi:‿'dʒi:.'ki:n/`) quality-of-life tools.
Usage:
pgkeen [OPTIONS] COMMAND
pgkeen [COMMAND] --help | -h
pgkeen --version | -v
Commands:
db Database management commands
docker Docker management commands
user User management commands
Every 5 minutes, pgkeen
uses the pg_cron
and getenv
extensions to automatically synchronize PostgreSQL settings (GUCs) with environment variables.
Here's how it works:
- The
pg_settings
view is queried, and the list of settings is filtered to exclude the following...- Settings that are internal-use only (
vartype = 'internal'
) - Session-specific settings (
name LIKE 'local%'
or'session%'
) - Settings that require a full server restart (
context = 'postmaster'
)
- Settings that are internal-use only (
- For the remaining settings, a corresponding environment variable name is generated.
env_var = CONCAT('PG_', UPPER(REPLACE(name, '.', '__')))
- Then, the values of these environment variables are retrieved.
- If a setting's current value in the database differs from its corresponding environment variable, an
ALTER SYSTEM
command is generated for it. - Finally,
pg_reload_conf()
is called to atomically apply all staged changes to the live configuration.
PostgreSQL GUC | Environment Variable |
---|---|
work_mem |
PG_WORK_MEM |
log_min_duration_statement |
PG_LOG_MIN_DURATION_STATEMENT |
auto_explain.log_min_duration |
PG_AUTO_EXPLAIN__LOG_MIN_DURATION |
Tip
Reset a GUC back to its default value by setting its environment variable to an empty string (""
).
Note
These changes are persistent across process, container, and system restarts, as ALTER SYSTEM
writes them to the postgresql.auto.conf
file within your $PGDATA
volume.
Extension | Description |
---|---|
pgvector | Open-source vector similarity search with support for exact and approximate nearest neighbor search |
pg_embedding | Hierarchical Navigable Small World (HNSW) algorithm for high-performance vector similarity search |
postgresml | Machine learning extension that enables training and inference on text and tabular data using SQL |
pg_vectorize | Automates text-to-embeddings transformation and provides hooks into popular LLMs for AI workloads |
Extension | Description |
---|---|
pg_partman | Automated partition management for time-based and serial-based table partitioning |
pg_cron | Simple cron-based job scheduler that runs inside the database as an extension |
pgmq | Lightweight message queue built on PostgreSQL for reliable async message processing |
pgsql-http | HTTP client for PostgreSQL that allows making HTTP requests from SQL |
pg_net | Async networking interface for PostgreSQL that enables making HTTP requests and handling webhooks |
jsonschema | JSON Schema validation for PostgreSQL that validates JSON data against schemas |
pg_hashids | Generate short, unique, non-sequential ids from numbers using the Hashids algorithm |
envvar | Functions for reading environment variables from within PostgreSQL sessions |
All are pre-enabled on the postgres
database via CREATE EXTENSION
on image initialization.
Extension | Description |
---|---|
hstore | Key-value store data type for storing sets of key/value pairs within a single PostgreSQL value |
ltree | Hierarchical tree-like structures representation with operations for searching and manipulation |
citext | Case-insensitive character string type that behaves like text but ignores case in comparisons |
bloom | Index access method based on Bloom filters for equality queries on multiple columns |
intarray | Functions and operators for manipulating arrays of integers with GiST indexing support |
pg_trgm | Trigram matching for fast similarity searching and fuzzy string matching |
dict_int | Text search dictionary template for integers with customizable formatting |
fuzzystrmatch | Functions for determining similarities and distance between strings using various algorithms |
uuid-ossp | Functions for generating universally unique identifiers (UUIDs) using standard algorithms |
xml2 | XPath querying and XSLT processing functions for XML data manipulation |
autoinc | Functions for autoincrementing fields and automatic sequence management |
intagg | Integer aggregator and enumerator functions for working with integer collections |
plpython3u | Procedural language that allows writing PostgreSQL functions and procedures in Python |
Extension | Description |
---|---|
Apache AGE | A graph database extension that allows leveraging graph database functionality on top of PostgreSQL |
PostGIS | Spatial database extender that adds support for geographic objects and spatial queries |
# Run with default settings
docker run -d --name pgkeen -e POSTGRES_PASSWORD=$YOUR_PASSWORD -p 5432:5432 -v data:/var/lib/postgresql/data veloper/pgkeen:latest
POSTGRES_PASSWORD=$YOUR_PASSWORD POSTGRES_HOST_AUTH_METHOD=$YOUR_AUTH_METHOD docker-compose up
services:
db:
restart: always
image: pgkeen:latest
command: postgres -c config_file=/var/lib/postgresql/data/postgresql.conf
environment:
POSTGRES_DB: postgres
POSTGRES_USER: postgres
POSTGRES_PASSWORD: "${POSTGRES_PASSWORD:-}"
POSTGRES_HOST_AUTH_METHOD: "${POSTGRES_HOST_AUTH_METHOD:-trust}"
PGDATA: /var/lib/postgresql/data
# Declarative Examples ENV => GUCs
PG_WORK_MEM: 4MB
PG_LOG_MIN_DURATION_STATEMENT: 0
PG_AUTO_EXPLAIN__LOG_MIN_DURATION: 0
ports:
- "5432:5432"
networks:
shared_default:
aliases:
- postgres # alias so other services can connect using a more explicit name
volumes:
- ./data/pg:/var/lib/postgresql/data
networks:
shared_default:
driver: bridge
# Uncomment if you want to launch just this service, and allow other
# networks to access it (requires the external network to be created
# beforehand)
# external: true
All of the documentation for the official postgres:16
image still hold true so consult the PostgreSQL Docker Hub README for extensive details on the available configuration options.
Contributions are welcome! If you have ideas for improvements, bug fixes, or new features, please follow these steps:
- Fork this repository.
- Create a feature branch for your changes.
- Ensure your code follows project conventions and passes all tests.
- Submit a pull request with a clear description of your changes and reference any related issues.
- For bugs or feature requests, please open an issue first to discuss your ideas.
Always use the golden-rule of open source contributions:
"If I were the maintainer, how would I want to receive this pull request?"
Simple 3-BSD License. See the LICENSE file for details.