Skip to content

Commit 091c1eb

Browse files
ffakenzch1bo
andauthored
Event log rotation (#1997)
<!-- Describe your change here --> - Event log rotation enable hydra heads to resume faster by reducing replay time. - Rotating an event log means replacing the current state file with a new one that starts from a checkpoint event, while moving the old file to the next state-logId. - A Checkpoint event captures the latest HeadState, computed by aggregating all previous StateChanged events. - File-based persistence rotates event log files using an increasing index named logId, based on latest StateChanged event id + 1. - Added a new run option to enable rotation after a given number of events: _"The number of Hydra events to trigger rotation (default: no rotation)"_ - On startup, depending on the rotation config used, the event log file might be rotated and the `logId` index will be incremented. - Added new server output to allow 3rd party agents to detect the checkpoint and trigger any appropriate archival / backup / cleanup needed, without interrupting the hydra head. --- <!-- Consider each and tick it off one way or the other --> * [x] CHANGELOG updated or not needed * [x] Documentation updated or not needed * [x] Haddocks updated or not needed * [x] No new TODOs introduced or explained herafter --------- Co-authored-by: Sebastian Nagel <[email protected]> Co-authored-by: Sebastian Nagel <[email protected]>
1 parent fb0fabe commit 091c1eb

File tree

29 files changed

+806
-318
lines changed

29 files changed

+806
-318
lines changed

CHANGELOG.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,8 @@ changes.
1717
- Remove runtime dependency to `etcd` by embedding and shipping it with `hydra-node`.
1818
- New option `--use-system-etcd` to prefer the system etcd instead of the embedded one.
1919

20+
- Add file-based event log rotation support via optional `--persistence-rotate-after` command line option.
21+
2022
- **BREAKING** Update scripts to plutus 1.45.0.0.
2123

2224
- Hydra will now store etcd cluster information on the filesystem in directories content-addressed

docs/docs/dev/architecture/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -53,7 +53,7 @@ The `hydra-node` component exposes an [asynchronous API](https://hydra.family/he
5353

5454
### Persistence
5555

56-
All API server outputs and the `hydra-node` state are preserved on disk. The persistence layer is responsible for loading historical messages and the Hydra state from disk, as well as storing them. Currently, there hasn't been a need to increase the complexity of this layer or use a database.
56+
The `hydra-node` state is preserved on disk. The persistence layer is responsible for loading historical messages and Hydra state from disk, as well as storing them in so-called event log files. Depending on the rotation configuration used at startup, these event log files will be rotated to improve restart times. So far, there hasnt been a need to increase the complexity of this layer or to use a database.
5757

5858
### Logging
5959

docs/docs/known-issues.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -47,9 +47,9 @@ here: [Etcd Configuration](configuration#networking-configuring-the-limits-of-et
4747

4848
If the hydra-node has breaking changes in regards to reading the files it stores in the `persistence` folder, it used to be recommended to just delete the entire folder.
4949

50-
Now, because of etcd, it is important to only delete the `hydra-node` specific files; not the files associated with `etcd`. In particular you may like to delete the following file:
50+
Now, because of etcd, it is important to only delete the `hydra-node` specific files; not the files associated with `etcd`. In particular you may like to delete the following files:
5151

52-
- `persistence/state`
52+
- `persistence/state*`
5353

5454
Note that, as with any adjustments of this kind, it is good practice to make a backup first!
5555

hydra-cluster/src/Hydra/Cluster/Scenarios.hs

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ import Network.HTTP.Req (
152152
import Network.HTTP.Simple (getResponseBody, httpJSON, setRequestBodyJSON)
153153
import Network.HTTP.Types (urlEncode)
154154
import System.FilePath ((</>))
155-
import System.Process (proc, readCreateProcessWithExitCode)
155+
import System.Process (callProcess)
156156
import Test.Hydra.Tx.Fixture (testNetworkId)
157157
import Test.Hydra.Tx.Gen (genKeyPair)
158158
import Test.QuickCheck (choose, elements, generate)
@@ -401,9 +401,9 @@ nodeReObservesOnChainTxs tracer workDir cardanoNode hydraScriptsTxId = do
401401
<&> modifyConfig (\config -> config{startChainFrom = Just tip})
402402

403403
withTempDir "blank-state" $ \tmpDir -> do
404-
void $ readCreateProcessWithExitCode (proc "cp" ["-r", workDir </> "state-2", tmpDir]) ""
405-
void $ readCreateProcessWithExitCode (proc "rm" ["-rf", tmpDir </> "state-2" </> "state"]) ""
406-
void $ readCreateProcessWithExitCode (proc "rm" ["-rf", tmpDir </> "state-2" </> "last-known-revision"]) ""
404+
callProcess "cp" ["-r", workDir </> "state-2", tmpDir]
405+
callProcess "rm" ["-rf", tmpDir </> "state-2" </> "state*"]
406+
callProcess "rm" ["-rf", tmpDir </> "state-2" </> "last-known-revision"]
407407
withHydraNode hydraTracer bobChainConfigFromTip tmpDir 2 bobSk [aliceVk] [1] $ \n2 -> do
408408
-- Also expect to see past server outputs replayed
409409
headId2 <- waitMatch 5 n2 $ headIsInitializingWith (Set.fromList [alice, bob])

hydra-cluster/src/HydraNode.hs

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -405,6 +405,7 @@ prepareHydraNode chainConfig workDir hydraNodeId hydraSKey hydraVKeys allNodeIds
405405
, hydraSigningKey
406406
, hydraVerificationKeys
407407
, persistenceDir = stateDir
408+
, persistenceRotateAfter = Nothing
408409
, chainConfig
409410
, whichEtcd = EmbeddedEtcd
410411
, ledgerConfig =
@@ -531,6 +532,7 @@ withHydraNode tracer chainConfig workDir hydraNodeId hydraSKey hydraVKeys allNod
531532
, hydraSigningKey
532533
, hydraVerificationKeys
533534
, persistenceDir = stateDir
535+
, persistenceRotateAfter = Nothing
534536
, chainConfig
535537
, whichEtcd = EmbeddedEtcd
536538
, ledgerConfig =

hydra-cluster/test/Test/EndToEndSpec.hs

Lines changed: 55 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@ import CardanoNode (
2020
withCardanoNodeDevnet,
2121
)
2222
import Control.Lens ((^..), (^?))
23+
import Control.Monad (foldM_)
2324
import Data.Aeson (Result (..), Value (Null, Object, String), fromJSON, object, (.=))
2425
import Data.Aeson qualified as Aeson
2526
import Data.Aeson.Lens (AsJSON (_JSON), key, values, _JSON)
27+
import Data.Aeson.Types (parseMaybe)
2628
import Data.ByteString qualified as BS
2729
import Data.List qualified as List
2830
import Data.Map qualified as Map
@@ -87,6 +89,7 @@ import HydraNode (
8789
getSnapshotUTxO,
8890
input,
8991
output,
92+
prepareHydraNode,
9093
requestCommitTx,
9194
send,
9295
waitFor,
@@ -95,6 +98,7 @@ import HydraNode (
9598
waitMatch,
9699
withHydraCluster,
97100
withHydraNode,
101+
withPreparedHydraNode,
98102
)
99103
import System.Directory (removeDirectoryRecursive, removeFile)
100104
import System.FilePath ((</>))
@@ -158,6 +162,57 @@ spec = around (showLogsOnFailure "EndToEndSpec") $ do
158162
waitMatch 10 node $ \v -> do
159163
guard $ v ^? key "tag" == Just "SnapshotConfirmed"
160164

165+
it "rotates persistence on start up" $ \tracer -> do
166+
withClusterTempDir $ \tmpDir -> do
167+
(aliceCardanoVk, aliceCardanoSk) <- keysFor Alice
168+
initialUTxO <- generate $ genUTxOFor aliceCardanoVk
169+
Aeson.encodeFile (tmpDir </> "utxo.json") initialUTxO
170+
let offlineConfig =
171+
Offline
172+
OfflineChainConfig
173+
{ offlineHeadSeed = "test"
174+
, initialUTxOFile = tmpDir </> "utxo.json"
175+
, ledgerGenesisFile = Nothing
176+
}
177+
-- Start a hydra-node in offline mode and submit several self-txs
178+
withHydraNode (contramap FromHydraNode tracer) offlineConfig tmpDir 1 aliceSk [] [] $ \node -> do
179+
foldM_
180+
( \utxo i -> do
181+
let Just (aliceTxIn, aliceTxOut) = UTxO.find (isVkTxOut aliceCardanoVk) utxo
182+
let Right selfTx =
183+
mkSimpleTx
184+
(aliceTxIn, aliceTxOut)
185+
(mkVkAddress testNetworkId aliceCardanoVk, txOutValue aliceTxOut)
186+
aliceCardanoSk
187+
send node $ input "NewTx" ["transaction" .= selfTx]
188+
waitMatch 10 node $ \v -> do
189+
guard $ v ^? key "tag" == Just "SnapshotConfirmed"
190+
guard $ v ^? key "snapshot" . key "number" == Just (toJSON (i :: Integer))
191+
v ^? key "snapshot" . key "utxo" >>= parseMaybe parseJSON
192+
)
193+
initialUTxO
194+
[1 .. (200 :: Integer)]
195+
196+
-- Measure restart time
197+
t0 <- getCurrentTime
198+
diff1 <- withHydraNode (contramap FromHydraNode tracer) offlineConfig tmpDir 1 aliceSk [] [] $ \_ -> do
199+
t1 <- getCurrentTime
200+
let diff = diffUTCTime t1 t0
201+
pure diff
202+
203+
-- Measure restart after rotation
204+
options <- prepareHydraNode offlineConfig tmpDir 1 aliceSk [] [] id
205+
let options' = options{persistenceRotateAfter = Just 10}
206+
t1 <- getCurrentTime
207+
diff2 <- withPreparedHydraNode (contramap FromHydraNode tracer) tmpDir 1 options' $ \_ -> do
208+
t2 <- getCurrentTime
209+
let diff = diffUTCTime t2 t1
210+
pure diff
211+
212+
unless (diff2 < diff1 * 0.9) $
213+
failure $
214+
"Expected to start up 10% quicker than original " <> show diff1 <> ", but it took " <> show diff2
215+
161216
it "supports multi-party networked heads" $ \tracer -> do
162217
withClusterTempDir $ \tmpDir -> do
163218
(aliceCardanoVk, aliceCardanoSk) <- keysFor Alice

0 commit comments

Comments
 (0)