Skip to content

Commit fdd5049

Browse files
authored
Properly cleanup forkers at the appropriate times (#1564)
# Description - The backing store of the V1 LedgerDB was only tracked in the resource registry if we were starting from Genesis. Now the backing store will be properly tracked in the resource registry even when we start from a snapshot. - Closing the LedgerDB will no longer release all the open forkers, but instead invalidate them by emptying the ldbForkers map, so that the only possible operation that could be performed is closing them in the LedgerDB clients, such as ChainSel or the forging loop. - Closing the forker is idempotent, and it was performed both when calling `forkerClose` as well as when the resource registry of the LedgerDB client was going out of scope. Now, `forkerClose` will release the resource from the registry so this won't run twice.
2 parents ed79e03 + e2d80d1 commit fdd5049

File tree

11 files changed

+249
-108
lines changed

11 files changed

+249
-108
lines changed

ouroboros-consensus-cardano/app/snapshot-converter.hs

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -203,15 +203,18 @@ load config@Config{inpath = pathToDiskSnapshot -> Just (fs@(SomeHasFS hasFS), pa
203203
(V2.state h,) <$> Trans.lift (V2.readAll (V2.tables h))
204204
LMDB -> do
205205
checkSnapshotFileStructure LMDB path fs
206-
((dbch, bstore), _) <-
206+
((dbch, k, bstore), _) <-
207207
withExceptT SnapshotError $
208208
V1.loadSnapshot
209209
nullTracer
210210
(V1.LMDBBackingStoreArgs tempFP defaultLMDBLimits Dict.Dict)
211211
ccfg
212212
(V1.SnapshotsFS fs)
213+
rr
213214
ds
214-
(V1.current dbch,) <$> Trans.lift (V1.bsReadAll bstore (V1.changelogLastFlushedState dbch))
215+
values <- Trans.lift (V1.bsReadAll bstore (V1.changelogLastFlushedState dbch))
216+
_ <- Trans.lift $ RR.release k
217+
pure (V1.current dbch, values)
215218
load _ _ _ _ = error "Malformed input path!"
216219

217220
store ::
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
<!--
2+
A new scriv changelog fragment.
3+
4+
Uncomment the section that is right (remove the HTML comment wrapper).
5+
-->
6+
7+
<!--
8+
### Patch
9+
10+
- A bullet item for the Patch category.
11+
12+
-->
13+
<!--
14+
### Non-Breaking
15+
16+
- A bullet item for the Non-Breaking category.
17+
18+
-->
19+
<!--
20+
### Breaking
21+
22+
- A bullet item for the Breaking category.
23+
24+
-->

ouroboros-consensus-diffusion/src/ouroboros-consensus-diffusion/Ouroboros/Consensus/Node.hs

Lines changed: 42 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -553,6 +553,48 @@ type NetworkAddr addr =
553553
-- network layer.
554554
--
555555
-- This function runs forever unless an exception is thrown.
556+
--
557+
-- This function spawns a resource registry (which we will refer to as
558+
-- __the consensus registry__) that will include the ChainDB as one of
559+
-- its resources. When the Consensus layer is shut down, the consensus
560+
-- resource registry will exit the scope of the 'withRegistry'
561+
-- function. This causes all resources allocated in the registry
562+
-- —including the ChainDB— to be closed.
563+
--
564+
-- During it's operation, different consensus threads will create
565+
-- resources associated with the ChainDB, eg Forkers in the LedgerDB,
566+
-- or Followers in the ChainDB. These resources are not created by the
567+
-- database themselves (LedgerDB, VolatileDB, and ImmutableDB). For
568+
-- example, chain selection opens a forker using the LedgerDB.
569+
-- Crucially, this means that clients creating these resources are
570+
-- instantiated after the ChainDB.
571+
--
572+
-- We rely on a specific sequence of events for this design to be correct:
573+
--
574+
-- - The ChainDB is only closed by exiting the scope of the consensus
575+
-- resource registry.
576+
--
577+
-- - If a client creates resources tied to any of the
578+
-- aforementioned databases and is forked into a separate thread,
579+
-- that thread is linked to the consensus registry. Because resources
580+
-- in a registry are deallocated in reverse order of allocation, any
581+
-- resources created by such threads will be deallocated before the
582+
-- ChainDB is closed, ensuring proper cleanup.
583+
--
584+
-- Currently, we have two distinct approaches to resource management
585+
-- and database closure:
586+
--
587+
-- - In the LedgerDB, closing the database does not close any resources
588+
-- created by its clients. We rely on the resource registry to deallocate
589+
-- these resources before the LedgerDB is closed. However, after closing
590+
-- the LedgerDB, the only permitted action on these resources is to free them.
591+
-- See 'ldbForkers'.
592+
--
593+
-- - In the ChainDB, closing the database also closes all followers and
594+
-- iterators.
595+
--
596+
-- TODO: Ideally, the ChainDB and LedgerDB should follow a consistent
597+
-- approach to resource deallocation.
556598
runWith ::
557599
forall m addrNTN addrNTC blk p2p.
558600
( RunNode blk
Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
<!--
2+
A new scriv changelog fragment.
3+
4+
Uncomment the section that is right (remove the HTML comment wrapper).
5+
-->
6+
7+
### Patch
8+
9+
- The backing store of the V1 LedgerDB was only tracked in the
10+
resource registry if we were starting from Genesis. Now the backing
11+
store will be properly tracked in the resource registry even when we
12+
start from a snapshot.
13+
14+
- Closing the LedgerDB will no longer release all the open forkers,
15+
but instead invalidate them by emptying the ldbForkers map, so that
16+
the only possible operation that could be performed is closing them
17+
in the LedgerDB clients, such as ChainSel or the forging loop.
18+
19+
- Closing the forker is idempotent, and it was performed both when
20+
calling `forkerClose` as well as when the resource registry of the
21+
LedgerDB client was going out of scope. Now, `forkerClose` will
22+
release the resource from the registry so this won't run twice.
23+
24+
<!--
25+
### Non-Breaking
26+
-->
27+
<!--
28+
### Breaking
29+
30+
- A bullet item for the Breaking category.
31+
32+
-->

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/ChainDB.hs

Lines changed: 15 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -38,11 +38,21 @@
3838
-- chain. 'Ouroboros.Consensus.Storage.ChainDB.API.ChainDB' defines the chain
3939
-- DB API.
4040
--
41-
-- NOTE: at the moment there is an inconsistency in the module structure for
42-
-- each of these components. In particular,
43-
-- "Ouroboros.Consensus.Storage.LedgerDB" contains the whole definition and API
44-
-- for the LedgerDB, but the other three databases are broken up into multiple
45-
-- smaller submodules. We aim to resolve this when UTxO-HD is merged.
41+
-- == Resource Management in the ChainDB
42+
--
43+
-- Clients of the ChainDB can allocate resources from the databases
44+
-- it contains (LedgerDB, VolatileDB, and ImmutableDB):
45+
--
46+
-- - The LedgerDB is used to create 'Forker's.
47+
--
48+
-- - The ChainDB is used to create 'Follower's (which in turn contain
49+
-- 'Iterator's).
50+
--
51+
-- These resources must eventually be freed.
52+
--
53+
-- Threads that make use of the ChainDB to allocate resources *MUST* be closed
54+
-- before the ChainDB is closed. See 'Ouroboros.Consensus.Node.runWith' for the
55+
-- approach we follow in consensus to ensure this principle.
4656
module Ouroboros.Consensus.Storage.ChainDB
4757
( module Ouroboros.Consensus.Storage.ChainDB.API
4858
, module Ouroboros.Consensus.Storage.ChainDB.Impl

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/Forker.hs

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -628,4 +628,5 @@ data TraceForkerEvent
628628
| ForkerReadStatistics
629629
| ForkerPushStart
630630
| ForkerPushEnd
631+
| DanglingForkerClosed
631632
deriving (Show, Eq)

ouroboros-consensus/src/ouroboros-consensus/Ouroboros/Consensus/Storage/LedgerDB/V1.hs

Lines changed: 55 additions & 38 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ import Control.Monad.Except
2525
import Control.Monad.Trans (lift)
2626
import Control.ResourceRegistry
2727
import Control.Tracer
28-
import Data.Bifunctor (first)
2928
import qualified Data.Foldable as Foldable
3029
import Data.Functor.Contravariant ((>$<))
3130
import Data.Kind (Type)
@@ -83,24 +82,29 @@ mkInitDb ::
8382
Complete LedgerDbArgs m blk ->
8483
Complete V1.LedgerDbFlavorArgs m ->
8584
ResolveBlock m blk ->
86-
InitDB (DbChangelog' blk, BackingStore' m blk) m blk
85+
InitDB (DbChangelog' blk, ResourceKey m, BackingStore' m blk) m blk
8786
mkInitDb args bss getBlock =
8887
InitDB
8988
{ initFromGenesis = do
9089
st <- lgrGenesis
9190
let genesis = forgetLedgerTables st
9291
chlog = DbCh.empty genesis
93-
(_, backingStore) <-
92+
(bsKey, backingStore) <-
9493
allocate
9594
lgrRegistry
9695
(\_ -> newBackingStore bsTracer baArgs lgrHasFS' genesis (projectLedgerTables st))
9796
bsClose
98-
pure (chlog, backingStore)
97+
pure (chlog, bsKey, backingStore)
9998
, initFromSnapshot =
10099
runExceptT
101-
. loadSnapshot bsTracer baArgs (configCodec . getExtLedgerCfg . ledgerDbCfg $ lgrConfig) lgrHasFS'
102-
, closeDb = bsClose . snd
103-
, initReapplyBlock = \cfg blk (chlog, bstore) -> do
100+
. loadSnapshot
101+
bsTracer
102+
baArgs
103+
(configCodec . getExtLedgerCfg . ledgerDbCfg $ lgrConfig)
104+
lgrHasFS'
105+
lgrRegistry
106+
, closeDb = \(_, r, _) -> void $ release r
107+
, initReapplyBlock = \cfg blk (chlog, r, bstore) -> do
104108
!chlog' <- reapplyThenPush cfg blk (readKeySets bstore) chlog
105109
-- It's OK to flush without a lock here, since the `LedgerDB` has not
106110
-- finished initializing, only this thread has access to the backing
@@ -113,10 +117,10 @@ mkInitDb args bss getBlock =
113117
mapM_ (flushIntoBackingStore bstore) toFlush
114118
pure toKeep
115119
else pure chlog'
116-
pure (chlog'', bstore)
117-
, currentTip = ledgerState . current . fst
118-
, pruneDb = pure . first pruneToImmTipOnly
119-
, mkLedgerDb = \(db, lgrBackingStore) -> do
120+
pure (chlog'', r, bstore)
121+
, currentTip = \(ch, _, _) -> ledgerState . current $ ch
122+
, pruneDb = \(ch, r, bs) -> pure (pruneToImmTipOnly ch, r, bs)
123+
, mkLedgerDb = \(db, ldbBackingStoreKey, ldbBackingStore) -> do
120124
(varDB, prevApplied) <-
121125
(,) <$> newTVarIO db <*> newTVarIO Set.empty
122126
flushLock <- mkLedgerDBLock
@@ -125,7 +129,8 @@ mkInitDb args bss getBlock =
125129
let env =
126130
LedgerDBEnv
127131
{ ldbChangelog = varDB
128-
, ldbBackingStore = lgrBackingStore
132+
, ldbBackingStore = ldbBackingStore
133+
, ldbBackingStoreKey = ldbBackingStoreKey
129134
, ldbLock = flushLock
130135
, ldbPrevApplied = prevApplied
131136
, ldbForkers = forkers
@@ -329,13 +334,14 @@ implCloseDB (LDBHandle varState) = do
329334
-- Idempotent
330335
LedgerDBClosed -> return Nothing
331336
LedgerDBOpen env -> do
337+
-- By writing this tvar, we already make sure that no
338+
-- forkers can perform operations other than closing, as
339+
-- they rely on accessing the LedgerDB, which is now closed.
332340
writeTVar varState LedgerDBClosed
333341
return $ Just env
334342

335343
-- Only when the LedgerDB was open
336-
whenJust mbOpenEnv $ \env -> do
337-
closeAllForkers env
338-
bsClose (ldbBackingStore env)
344+
whenJust mbOpenEnv $ void . release . ldbBackingStoreKey
339345

340346
mkInternals ::
341347
( IOLike m
@@ -351,7 +357,7 @@ mkInternals h =
351357
, push = getEnv1 h implIntPush
352358
, reapplyThenPushNOW = getEnv1 h implIntReapplyThenPush
353359
, wipeLedgerDB = getEnv h $ void . destroySnapshots . snapshotsFs . ldbHasFS
354-
, closeLedgerDB = getEnv h $ bsClose . ldbBackingStore
360+
, closeLedgerDB = getEnv h $ void . release . ldbBackingStoreKey
355361
, truncateSnapshots = getEnv h $ void . implIntTruncateSnapshots . ldbHasFS
356362
, getNumLedgerTablesHandles = pure 0
357363
}
@@ -482,6 +488,10 @@ data LedgerDBEnv m l blk = LedgerDBEnv
482488
, ldbBackingStore :: !(LedgerBackingStore m l)
483489
-- ^ Handle to the ledger's backing store, containing the parts that grow too
484490
-- big for in-memory residency
491+
, ldbBackingStoreKey :: !(ResourceKey m)
492+
-- ^ When deallocating the backing store upon closing the LedgerDB
493+
-- (via the ChainDB shutting down), we will release the backing
494+
-- store with this action.
485495
, ldbLock :: !(LedgerDBLock m)
486496
-- ^ The flush lock to the 'BackingStore'. This lock is crucial when it
487497
-- comes to keeping the data in memory consistent with the data on-disk.
@@ -512,6 +522,18 @@ data LedgerDBEnv m l blk = LedgerDBEnv
512522
-- ^ Open forkers.
513523
--
514524
-- INVARIANT: a forker is open iff its 'ForkerKey' is in this 'Map.
525+
--
526+
-- The resources that could possibly be held by these forkers will
527+
-- be released by each one of the client's registries. This means
528+
-- that for example ChainSelection will, upon closing its registry,
529+
-- release its forker and any resources associated.
530+
--
531+
-- Upon closing the LedgerDB we will overwrite this variable such
532+
-- that existing forkers can only be closed, as closing doesn't
533+
-- involve accessing this map (other than possibly removing the
534+
-- forker from it if the map still exists).
535+
--
536+
-- As the LedgerDB should outlive any clients, this is fine.
515537
, ldbNextForkerKey :: !(StrictTVar m ForkerKey)
516538
, ldbSnapshotPolicy :: !SnapshotPolicy
517539
, ldbTracer :: !(Tracer m (TraceEvent blk))
@@ -691,21 +713,6 @@ newForkerByRollback ::
691713
newForkerByRollback h rr n = getEnv h $ \ldbEnv -> do
692714
withReadLock (ldbLock ldbEnv) (acquireAtTarget ldbEnv (Left n) >>= traverse (newForker h ldbEnv rr))
693715

694-
-- | Close all open block and header 'Forker's.
695-
closeAllForkers ::
696-
IOLike m =>
697-
LedgerDBEnv m l blk ->
698-
m ()
699-
closeAllForkers ldbEnv =
700-
do
701-
forkerEnvs <- atomically $ do
702-
forkerEnvs <- Map.elems <$> readTVar forkersVar
703-
writeTVar forkersVar Map.empty
704-
return forkerEnvs
705-
mapM_ closeForkerEnv forkerEnvs
706-
where
707-
forkersVar = ldbForkers ldbEnv
708-
709716
-- | Acquire both a value handle and a db changelog at the tip. Holds a read lock
710717
-- while doing so.
711718
acquireAtTarget ::
@@ -784,7 +791,7 @@ newForker h ldbEnv rr dblog = readLocked $ do
784791
-- read access we acquired above.
785792
modifyTVar (ldbForkers ldbEnv) $ Map.insert forkerKey forkerEnv
786793
traceWith (foeTracer forkerEnv) ForkerOpen
787-
pure $ mkForker h (ldbQueryBatchSize ldbEnv) forkerKey
794+
pure $ mkForker h (ldbQueryBatchSize ldbEnv) forkerKey forkerEnv
788795

789796
mkForker ::
790797
( IOLike m
@@ -795,10 +802,11 @@ mkForker ::
795802
LedgerDBHandle m l blk ->
796803
QueryBatchSize ->
797804
ForkerKey ->
805+
ForkerEnv m l blk ->
798806
Forker m l blk
799-
mkForker h qbs forkerKey =
807+
mkForker h qbs forkerKey forkerEnv =
800808
Forker
801-
{ forkerClose = implForkerClose h forkerKey
809+
{ forkerClose = implForkerClose h forkerKey forkerEnv
802810
, forkerReadTables = getForkerEnv1 h forkerKey implForkerReadTables
803811
, forkerRangeReadTables = getForkerEnv1 h forkerKey (implForkerRangeReadTables qbs)
804812
, forkerGetLedgerState = getForkerEnvSTM h forkerKey implForkerGetLedgerState
@@ -807,18 +815,27 @@ mkForker h qbs forkerKey =
807815
, forkerCommit = getForkerEnvSTM h forkerKey implForkerCommit
808816
}
809817

818+
-- | This function receives an environment instead of reading it from
819+
-- the DB such that we can close the forker even if the LedgerDB is
820+
-- closed. In fact this should never happen as clients of the LedgerDB
821+
-- (which are the ones opening forkers) should never outlive the
822+
-- LedgerDB.
810823
implForkerClose ::
811824
IOLike m =>
812825
LedgerDBHandle m l blk ->
813826
ForkerKey ->
827+
ForkerEnv m l blk ->
814828
m ()
815-
implForkerClose (LDBHandle varState) forkerKey = do
816-
envMay <-
829+
implForkerClose (LDBHandle varState) forkerKey env = do
830+
frk <-
817831
atomically $
818832
readTVar varState >>= \case
819833
LedgerDBClosed -> pure Nothing
820834
LedgerDBOpen ldbEnv -> do
821835
stateTVar
822836
(ldbForkers ldbEnv)
823-
(Map.updateLookupWithKey (\_ _ -> Nothing) forkerKey)
824-
whenJust envMay closeForkerEnv
837+
(\m -> Map.updateLookupWithKey (\_ _ -> Nothing) forkerKey m)
838+
case frk of
839+
Nothing -> pure ()
840+
Just e -> traceWith (foeTracer e) DanglingForkerClosed
841+
closeForkerEnv env

0 commit comments

Comments
 (0)