-
Notifications
You must be signed in to change notification settings - Fork 301
Description
Query Caching in Persistent
Goal:
- Get query caching in
persistentoperations. - ... Such that the cache can be busted by writes
- ... On specific tables
- ... But not accounting for other threads or other servers doing this
Or, we want to have a similar feature to ActiveRecord caching.
Persistent Background
The StatementCache
persistent is, for the most part, implemented as a ReaderT backend, where backend in our case is SqlBackend.
A SqlBackend includes a connStmtMap - a StatementCache which closes over an IORef (Map Text Statement).
The Text contains the query that we are issuing.
That means we can probably hook into this in order to modify our own cache and check whether or not we want to proceed.
Let's trace this and see how get works with the statement cache.
Implementing get
The function get is defined on PersistStoreRead - the actual definition is reproduced here:
class
( Show (BackendKey backend)
, Read (BackendKey backend)
, Eq (BackendKey backend)
, Ord (BackendKey backend)
, PersistCore backend
, PersistField (BackendKey backend)
, A.ToJSON (BackendKey backend)
, A.FromJSON (BackendKey backend)
) =>
PersistStoreRead backend
where
get
:: forall record m
. (MonadIO m, PersistRecordBackend record backend)
=> Key record -> ReaderT backend m (Maybe record)Note that - unfortunately - we define this as polymorphic on the backend, not the m, meaning we need to make a newtype CachedSqlBackend = CachedSqlBackend SqlBackend to use override this method directly.
The implementation for SqlBackend is here:
instance PersistStoreRead SqlBackend where
get k = do
mEs <- getMany [k]
return $ Map.lookup k mEs
-- inspired by Database.Persist.Sql.Orphan.PersistQuery.selectSourceRes
getMany [] = return Map.empty
getMany ks@(k : _) = do
conn <- ask
let
t = entityDef . dummyFromKey $ k
let
cols = commaSeparated . Foldable.toList . keyAndEntityColumnNames t
let
wher = whereStmtForKeys conn ks
let
sql =
T.concat
[ "SELECT "
, cols conn
, " FROM "
, connEscapeTableName conn t
, " WHERE "
, wher
]
let
parse vals =
case parseEntityValues t vals of
Left s ->
liftIO $
throwIO $
PersistMarshalError ("getBy: " <> s)
Right row -> return row
withRawQuery sql (Foldable.foldMap keyToValues ks) $ do
es <- CL.mapM parse .| CL.consume
return $ Map.fromList $ fmap (\e -> (entityKey e, entityVal e)) esSo we end up calling withRawQuery, which calls rawQueryRes, which does two things of note:
getStmtConn conn sqlstmtQuery stmt vals
getStmtConn looks up the statement in the SqlBackend's StatementCache.
If the statement is present, it returns it.
Otherwise, it does connPrepare conn sql to create an initial Statement, and then adds some extra logic to prevent the Statement from being used after finalized.
Finally, we call hookGetStatement, a user-customizable hook of type SqlBackend -> Text -> Statement -> IO Statement.
So what is connPrepare doing?
Fort postgresql, we define it here, as prepare':
prepare' :: PG.Connection -> Text -> IO Statement
prepare' conn sql = do
let
query = PG.Query (T.encodeUtf8 sql)
return
Statement
{ stmtFinalize = return ()
, stmtReset = return ()
, stmtExecute = execute' conn query
, stmtQuery = withStmt' conn query
}So "finalize" and "reset" don't do anything.
This is OK because withStmt' doesn't actually create a prepared statement, it executes the query one-off.
Which, y'know, maybe isn't ideal, but hey, whatever.
Trouble
The Text that is stored includes ? placeholders, so we will need to keep track of the [PersistValue] that each Text is called with.
ie, get (OrganizationKey 1) and get (OrganizationKey 2) are going to have the same query text:
SELECT *
FROM organizations
WHERE id = ?This means we probably want a type like HashMap Text (HashMap [PersistValue] [[PersistValue]]) - ie, a mapping of Text queries, to a mapping of query arguments, to the results of that query.
Summary of Available Touchpoints
- We can make a
newtypeoverSqlBackendand define separate instances ofPersistStoreReadetc.- But this doesn't help us much in a
PersistentOperation m => ...world, because the backend is often tied to the record, and we'd have to changePersistentOperationSqlto be compatible withSqlBackendand not equal toSqlBackend. Maybe not a huge deal, but doesn't feel promising + lots of work.
- But this doesn't help us much in a
- We can modify a
SqlBackenddirectly to enhance it with this caching ability.- We have
connPreparewhich createsStatements, andhookGetStatementwhich is able to alter aStatementbefore returning it. Statementitself can be overriden at thestmtQuerypoint.
- We have
So,
-- called when we actually execute the statement
stmtQuery :: Statement -> forall m. MonadIO m => [PersistValue] -> Acquire (ConduitM () [PersistValue] m ())
-- called when we prepare the statement on first execution
connPrepare :: SqlBackend -> Text -> IO Statement
-- called when we retrieve the statement prior to execution
hookGetStatement :: SqlBackendHooks -> SqlBackend -> Text -> Statement -> IO StatementA Strategy for Reading Cache
I think we can have hookGetStatement install a cache lookup in stmtQuery.
We have the Text of the query that will be keyed on in hookGetStatement, and stmtQuery receives the [PersistValue] of the arguments to the query.
If we are closing over some Cache type like Map Text (Map [PersistValue] [[PersistValue]]), then we can do a lookup on that Map with the Text query, and then the stmtQuery can do a lookup on the Map with the [PersistValue] arguments.
If that exists, we return it, if not, we actually run the query.
This level of references and mutability means we would probably do best with an StmContainers.Map Text (StmContainers.Map [PersistValue] [[PersistValue]]).
hookGetStatement will look up the mutable Map, then provide the mutable Map to stmtQuery, which can look things up on actual execution.
Writing Cache
Now, actually populating the cache may be more difficult.
stmtQuery returns a Acquire (ConduitM () [PersistValue] m ()).
An Acquire is a type that can be used to provide somewhat manual free/release - for example, see rawQuery.
In rawQueryRes, we do mkAcquire (log >> getStmtConn conn sql) stmtReset.
So stmtReset :: Statement -> IO () is called, but without any further information.
If we want to use this as a point to write to our cache (feels reasonable), we'll need to smuggle that information in somehow.
Can we modify Statement and stmtQuery to save rows as they come through?
We can fmap over an Acquire, allowing us to write ConduitM () [PersistValue] m () -> ConduitT () [PersistValue] m ().
I think we can have something like,
modifyAcquireToSaveRows
:: ([PersistValue] -> [PersistValue] -> IO ())
-- ^ for @args@, save current @row@
-> ([PersistValue] -> IO ())
-- ^ trigger that cache is complete for these arguments
-> Statement
-> Statement
modifyAcquireToSaveRows saveRow signalArgsComplete statement =
statement
{ stmtQuery = \args -> do
src <- stmtQuery statement args
pure
( src
.| saveRowC args
)
}
where
saveRowC args = do
mrow <- await
case mrow of
Nothing -> do
-- query is over, signal we can save for this
signalArgsComplete args
Just row -> do
-- query is ongoing
saveRow args row
yield rowThen, to write cache,
modifyStatementToSaveCache
:: Chan [PersistValue]
-- ^ The accumulated channel of persist values
-> IO QueryFullyCompleted
-- ^ Action to return whether query completed
-> ([[PersistValue]] -> IO ())
-- ^ Action to write accumulated values to cache
-> Statement
-> Statement
modifyStatementToSaveCache values didQueryComplete writeRowsToCache statement =
statement
{ stmtReset = do
compleat <- didQueryComplete
case compleat of
QueryUnfinished -> do
-- don't want to save here, so flush cache
_ <- getChanContents values
pure ()
QueryFinished -> do
rows <- getChanContents values
writeRowsToCache rows
stmtReset statement
}This leaves a lot of implementation details unsaid.
cachedQueries implementation
I think, with the above, we can write a function like this:
cachedQueries :: (PersistentReadOperationSql m) => m a -> m a
cachedQueries action = do
stuff <- unsafeLiftPersistentReadOperation mkCachingImplements
localSqlBackend
(\sqlBackend -> addRequisiteHooks stuff sqlBackend)
actionThis will modify the local SqlBackend with caching inside of the block.
Cache Busting
We definitely want a way to say "don't use cache in this lexical scope."
cachedQueries do
get $ OrganizationKey 10 -- database fetch
get $ OrganizationKey 10 -- cache hit
noCaching do
get $ OrganizationKey 10 -- database fetchWith the above design, cachedQueries is closing over that local state, and it is not made available to the inner action.
So noCaching has no ability to read or modify this - the side channels have all been completely encapsulated.
cachedQueries could provide a handle of sorts, but this breaks the lexical nature of it.
We want to call arbitrary queries, and those arbitrary queries shouldn't have to know if they have a cache or not.
However... SqlBackend does have a connVault, a Vault which allows us to store arbitrary values.
We could store the caching stuff in the Vault, along with a flag indicating whether or not the cache should be used.
Hooks, etc could consult this entry in the Vault.
Hm.
Should noCaching force just a database lookup, or also an update to cache?
Consider:
cachedQueries do
get $ OrganizationKey 10 -- database fetch
-- somewhere else, someone modifies the organization in db
noCaching do
get $ OrganizationKey 10 -- database fetch
get $ OrganizationKey 10 -- cache hit -- but from where?I think noCaching should probably always hit the database, but also always update the cache.
Don't cache writes!!
We definitely don't want our mechanism here to cache write queries.
Many of them go through stmtExecute, but some will go through stmtQuery.
So we need to have checking that the query text is not an INSERT or UPDATE or DELETE.
Cache Busting on Writes
The above design has a mapping Query -> Arguments -> Results.
But this is no good. Consider:
cachedQueries do
get $ OrganizationKey 10 -- database fetch
update (OrganizationKey 10) [OrganizationName =. "blah"]
get $ OrganizationKey 10 -- cache hit, known stale valueThis will cause bugs.
So we want a way to know which tables have been modified by an update query, and then invalidate just those entries in the cache.
One option is to just invalidate the entire cache on every update/insert.
This works and is very easy to implement.
Another, more difficult, option is to only invalidate cached queries that read from tables that have been modified.
To do this, we'd need to parse the query text for tables, and then scan each cached query text for a mention of the table.
If that table is mentioned, we clear the cache for that query.
This may also be somewhat slow to operate.