PostgresKeyValueStorage and NamespaceRoutedKeyValueStorage#2959
PostgresKeyValueStorage and NamespaceRoutedKeyValueStorage#2959
Conversation
| key TEXT NOT NULL, | ||
| value_hash BYTEA NOT NULL, | ||
| value BYTEA NOT NULL, | ||
| PRIMARY KEY (namespace, key, value_hash) |
There was a problem hiding this comment.
FYI PK on (namespace, key, value BYTEA) can become expensive, so we use blake3 hash of the value here instead
There was a problem hiding this comment.
Assumption is we store values larger than 32 bytes in the set, if not then hash optimization does not make any sense
There was a problem hiding this comment.
We only use the "set" subset of the trait for tracking running agents per executor, and the value is always a serialized OwnedAgentId. We can change the trait API to use strings for this, if that helps
There was a problem hiding this comment.
yeah i think using strings as a value here would be more efficient and I take for sorted_set the value is equally small in reality @vigoo ?
There was a problem hiding this comment.
and from my investigation we seem to write more than just a OwnedAgentId to the set, we write also this guy as a value
golem/golem-common/src/model/mod.rs
Line 677 in e00a1dc
we do it here:
There was a problem hiding this comment.
To answer my own question, sorted_set values seems be all over the place, from small to largish, its serialized value of:
golem/golem-common/src/model/mod.rs
Line 302 in e00a1dc
There was a problem hiding this comment.
The AgentStatusRecord is a typical "KV cache" scenario. There the value is big indeed, the whole serialized record. But that's the "get/set" API.
What I was referring to is the "set-like" API which is only used to to store a set of currently running agents.
The sorted set "subset" of the kv store trait is only used for the scheduler.
(We could split the KV Store trait to 3 sub-traits actually, but let's not do it now)
There was a problem hiding this comment.
yes you are right as always xD. Confused myself, early morning here xD
There was a problem hiding this comment.
Ok removed value hash for set_storage kept it for sorted_set_storage as ScheduledAction::Invoke can be largish
| use std::sync::Arc; | ||
|
|
||
| #[derive(Debug, Clone)] | ||
| pub struct RoutedKeyValueStorage { |
There was a problem hiding this comment.
I suggest some changes to make it more general / better named (otherwise good!):
- Let's call this
KeyValueStorageWithSeparateCache - instead of
redisandpostgres, the two fields should bepersistentandcacheor something like that (they are already genericdyn KeyValueStorageso they can be anything)
And then in the configuration side in golem_config, we should also use similar names and let the two inner KV stores be anything.
this way it's future proof and we can reuse it when we switch backends more (or do other things like use an in-memory impl for local runs, etc)
There was a problem hiding this comment.
Renamed to NamespaceRoutedKeyValueStorage, to make indent clear - based on namespace you can use different kind of KVS. See above, now the inner types can be whatever from this list:
- Redis
- Postgres
- Sqlite
- MultiSqlite
- InMemory
Resolves: #2957
This PR adds 2 new KeyValueStorage implementations:
PostgresKeyValueStorageRoutedKeyValueStoragethat composes:PostgresKeyValueStorageRedisKeyValueStorageRouting rule:
KeyValueStorageNamespace::Worker { .. }-> RedisPostgre KeyValueStorage implementation for worker executor can be enabled with:
NamespaceRoutedKeyValueStorage can be enabled for worker executor can be enabled with:
Also improves multi queries for Postgres implementation for IndexedStorage