Skip to content

PostgresKeyValueStorage and NamespaceRoutedKeyValueStorage#2959

Merged
kmatasfp merged 13 commits intomainfrom
2957-keyvaluestorage-for-postgres
Mar 13, 2026
Merged

PostgresKeyValueStorage and NamespaceRoutedKeyValueStorage#2959
kmatasfp merged 13 commits intomainfrom
2957-keyvaluestorage-for-postgres

Conversation

@kmatasfp
Copy link
Contributor

@kmatasfp kmatasfp commented Mar 10, 2026

Resolves: #2957

This PR adds 2 new KeyValueStorage implementations:

  1. PostgresKeyValueStorage

  2. RoutedKeyValueStorage that composes:

    • PostgresKeyValueStorage
    • RedisKeyValueStorage

    Routing rule:

    • KeyValueStorageNamespace::Worker { .. } -> Redis
    • all other namespaces -> Postgres

Postgre KeyValueStorage implementation for worker executor can be enabled with:

GOLEM__KEY_VALUE_STORAGE__TYPE=Postgres
GOLEM__KEY_VALUE_STORAGE__CONFIG__HOST=your-postgres-endpoint
GOLEM__KEY_VALUE_STORAGE__CONFIG__PORT=5432
GOLEM__KEY_VALUE_STORAGE__CONFIG__DATABASE=your_kv_db
GOLEM__KEY_VALUE_STORAGE__CONFIG__USERNAME=postgres
GOLEM__KEY_VALUE_STORAGE__CONFIG__PASSWORD=***
GOLEM__KEY_VALUE_STORAGE__CONFIG__MAX_CONNECTIONS=20

NamespaceRoutedKeyValueStorage can be enabled for worker executor can be enabled with:

- GOLEM__KEY_VALUE_STORAGE__TYPE=NamespaceRouted
- GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__TYPE=<InnerType>
- GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__TYPE=<InnerType>


// example redis as cache
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__TYPE=Redis
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__HOST=localhost
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__PORT=6379
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__DATABASE=0
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__POOL_SIZE=20
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__KEY_PREFIX=worker-executor
GOLEM__KEY_VALUE_STORAGE__CONFIG__CACHE__CONFIG__TLS=true

// postgres as persistance
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__TYPE=Postgres
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__HOST=your-postgres-endpoint
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__PORT=5432
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__DATABASE=golem_kv
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__USERNAME=postgres
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__PASSWORD=***
GOLEM__KEY_VALUE_STORAGE__CONFIG__PERSISTENT__CONFIG__MAX_CONNECTIONS=20

Also improves multi queries for Postgres implementation for IndexedStorage

key TEXT NOT NULL,
value_hash BYTEA NOT NULL,
value BYTEA NOT NULL,
PRIMARY KEY (namespace, key, value_hash)
Copy link
Contributor Author

@kmatasfp kmatasfp Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI PK on (namespace, key, value BYTEA) can become expensive, so we use blake3 hash of the value here instead

Copy link
Contributor Author

@kmatasfp kmatasfp Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assumption is we store values larger than 32 bytes in the set, if not then hash optimization does not make any sense

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We only use the "set" subset of the trait for tracking running agents per executor, and the value is always a serialized OwnedAgentId. We can change the trait API to use strings for this, if that helps

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think using strings as a value here would be more efficient and I take for sorted_set the value is equally small in reality @vigoo ?

Copy link
Contributor Author

@kmatasfp kmatasfp Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and from my investigation we seem to write more than just a OwnedAgentId to the set, we write also this guy as a value

pub struct AgentStatusRecord {

we do it here:

Copy link
Contributor Author

@kmatasfp kmatasfp Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To answer my own question, sorted_set values seems be all over the place, from small to largish, its serialized value of:

pub enum ScheduledAction {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The AgentStatusRecord is a typical "KV cache" scenario. There the value is big indeed, the whole serialized record. But that's the "get/set" API.

What I was referring to is the "set-like" API which is only used to to store a set of currently running agents.

The sorted set "subset" of the kv store trait is only used for the scheduler.

(We could split the KV Store trait to 3 sub-traits actually, but let's not do it now)

Copy link
Contributor Author

@kmatasfp kmatasfp Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes you are right as always xD. Confused myself, early morning here xD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok removed value hash for set_storage kept it for sorted_set_storage as ScheduledAction::Invoke can be largish

@kmatasfp kmatasfp requested a review from noise64 March 11, 2026 15:22
@kmatasfp kmatasfp changed the title KeyValueStorage implementation for Postgres PostgresKeyValueStorage and RoutedKeyValueStorage Mar 11, 2026
use std::sync::Arc;

#[derive(Debug, Clone)]
pub struct RoutedKeyValueStorage {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest some changes to make it more general / better named (otherwise good!):

  • Let's call this KeyValueStorageWithSeparateCache
  • instead of redis and postgres, the two fields should be persistent and cache or something like that (they are already generic dyn KeyValueStorage so they can be anything)

And then in the configuration side in golem_config, we should also use similar names and let the two inner KV stores be anything.

this way it's future proof and we can reuse it when we switch backends more (or do other things like use an in-memory impl for local runs, etc)

Copy link
Contributor Author

@kmatasfp kmatasfp Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed to NamespaceRoutedKeyValueStorage, to make indent clear - based on namespace you can use different kind of KVS. See above, now the inner types can be whatever from this list:

- Redis
- Postgres
- Sqlite
- MultiSqlite
- InMemory

@kmatasfp kmatasfp changed the title PostgresKeyValueStorage and RoutedKeyValueStorage PostgresKeyValueStorage and NamespaceRoutedKeyValueStorage Mar 13, 2026
@kmatasfp kmatasfp merged commit e1de8b3 into main Mar 13, 2026
29 checks passed
@kmatasfp kmatasfp deleted the 2957-keyvaluestorage-for-postgres branch March 13, 2026 15:58
@github-actions github-actions bot locked and limited conversation to collaborators Mar 13, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

KeyValueStorage trait implementation for Postgres

2 participants