Documentation
- new template and walkthrough
New Features
-
add cluster-membership ir node
-
introduce
sliced!syntax for processing with anonymous ticks -
replay failed simulation instances and show logs
Previously, a failed simulation would only show the backtrace, and would
have to be re-run withHYDRO_SIM_LOG=1to display the simulation
steps. This would result in lots of log pollution since logging would
also be enabled for passing instances.Now, we've made some tweaks to bolero so that it re-executes the test
after finding a failing input, passing anis_replayflag. We use this
to enable rich logging, so that the failure is displayed with all the
simulation steps that led to it. -
add APIs for unordered stream assertions and tests for
collect_quorum
AI Disclosure: all the tests were generated using Kiro (!)Currently, the core functionality test has to be split up into several
exhaustive units because otherwise the search space becomes too large
for CI (~400s). Eventually, keyed streams may help but splitting up the
test is a reasonable short-term fix. -
deterministic simulator for Hydro programs
This introduces a deterministic simulator for Hydro that can simulate
various asynchronous scenarios to identify bugs in code that use
non-deterministic operators, such asbatch. This PR focuses on just
the infrastructure and support for simulatingbatchon a
totally-ordered, exactly-once stream. Support for additional
non-deterministic operators will follow in separate PRs (currently, an
exception is thrown to prevent use of the simulator on programs that use
such operators).The simulator's job is to explore the space of potential asynchronous
executions. Because "top-level" operators guarantee "eventual
determinism" (per Flo), we do not need to simulate every possible
interleaving of message arrivals and processing. Instead, we only need
to simulate sources of non-determinism at the points in the program
where a user intentionally observes them (such asbatchor
assume_ordering).When compiling a Hydro program for the simulator, we emit several DFIR
programs. One of these is theasync_dfir, which contains all
asynchronously executed top-level operators in the Hydro program. Again,
thanks to Flo semantics, we do not need to simulate the behavior of
executing these operators on different prefixes of the input, since we
know that none of the downstream operators change their behavior based
on the provided prefix (this is somewhat more complicated for unbounded
singletons, whose intermediate states are affected by the set of
elements processed in each batch, but we will address this separately).Because each tick relies on a set of decisions being made to select
their inputs (batch,snapshot), we emit each tick's code into a
separate DFIR graph. The top-level simulator then schedules
(LaunchedSim::scheduler) the async graph and tick graphs by always
trying to make progress with the async graph first (so that we have the
full set of possible inputs at each batch boundary), and when the async
graph cannot make any further progress it selects one of the ticks,
makes a batching decision for each of its inputs
(autonomous_decision), and then executes the tick.The selection of which tick to run and which elements to release in each
batch are driven by a source of non-determinism, which is either:
a) libfuzzer (if using.sim().fuzzand running withcargo sim)
b) a RNG with 8192 iterations (if using.sim().fuzzand running with
cargo testand no reproducer is available)
c) a static input of decisions (if using.sim().fuzzand running with
cargo testand a reproducer is available)
d) an exhaustive, depth-first search algorithm (if using
.sim().exhaustiveand running withcargo test)Whenever a fuzzer finds a failure, it stores the sequence of decisions
that leads to the crash in a.binfile as a reproducer, so that we can
re-execute quickly in testing environments.Because Hydro uses a staged compilation model, our approach to compiling
and executing the Hydro program is also a bit unique. Because the fuzzer
needs to track branch coverage inside the Hydro program to be effective,
and because we need low-latency interactions between the user assertions
and the Hydro code, we cannot run the compiled program in a separate
process. Instead, we compile the Hydro code into a shared library and
dynamically load it into the host process (which has the testing code).
The shared library only provides the compiled DFIR graphs, so the
simulator scheduler runs in the host process to enable low-latency
switching between the DFIR and testing code.This PR includes a couple of toy examples testing the simulator's
functionality. Of particular interest is
sim_crash_with_fuzzed_batching, which takes forever with an exhaustive
search but quickly finds a failure case with a fuzzer. -
Generalize bench_client's workload generation
Allow custom functions for generating bench_client workloads (beyond
u32,u32). -
add keyed singletons and optionals to eliminate unsafety in membership tracking
Bug Fixes
- Staggered client
Client initially outputs 1 message per tick instead of all messages in 1
giant batch, so if downstream operators do not dynamically adjust the
batch size, they are not overwhelmed - Client aggregator waits for all clients before outputting
Outputs N/A as throughput and latency until all clients have responded
with positive throughput. - refactor github actions workflows, make stable the default toolchain
- remove strange use of batching in bench_client
Refactor
-
reduce syntactic overhead of connecting test inputs / outputs
Rather than having separatesource/sink/bytes/bincode
APIs, we use a singleconnectmethod that uses a trait to resolve the
appropriate connection result. -
reduce atomic pollution in quorum counting
Also fixes a sync bug in Compartmentalized Paxos. Due to batching
semantics, we could end up in a situation where responses have missing
metadata (for example if the batch refuses to release any elements).The simulator would hopefully have caught this, we can use this as an
example. -
remove uses of legacy
*_keyedAPIs
Also adds missing doctests to the aggregation APIs onKeyedStream.
Test
- add test for collecting unordered quorum
Documentation (BREAKING)
- add docs for
for_each/dest_sink, restrict to strict streams
Breaking Change:for_each/dest_sinknow require a totally-ordered,
retry-free stream since the downstream may not tolerate such
non-determinism. This helps highlight cases where the developer needs to
reason carefully about the safety of their side effects.
New Features (BREAKING)
-
add specialized
sim_input/sim_outputto reduce simulation boilerplate -
cluster member ids are now clone instead of copy
This is part one of a series of changes. The first part just changes
MemberIds from being Copy to Clone, so that they can later support more
use cases. -
restrict
KeyedSingletonto not allow asynchronous key removal
Previously,KeyedSingletonsupportedfilteron unbounded collections
andlatestto yield snapshots from a tick. Both of these APIs are
problematic because they let developers asynchronously remove elements
from the collection, which goes against the "Singleton" naming. We have
reasonable alternatives for all these use-sites, so remove for now to
minimize the semantic complexity (also for the simulator).Eventually, we will want to bring back
KeyedOptionalwhich permits
such asynchronous removal. -
add
OrderingandRetriestraits with basic axioms
Reduces the need to usenondet!inside broadcast APIs, and also
prepares us for graph viz metadata by requiring these traits to be
implemented for the type parameters inStream/KeyedStream. -
refine types when values of a
KeyedSingletonare immutable
There are many cases where once a value is released into a
KeyedSingleton, it will never be changes. In such cases, we can permit
APIs like.entrieseven if the set of entries is growing.We use this type to improve the API surface for looking up values for a
request. Now, aKeyedSingletonof requests can look up values from
anotherKeyedSingleton.
Bug Fixes (BREAKING)
-
fix cardinality for
Optional::or
UsingHydroNode::Chainis very dangerous for singletons / optionals,
because it can lead to cardinality > 1 within a single batch, which
breaks a fundamental invariant.This introduces a new
HydroNode::ChainFirstoperator that only emits
the first value from the chain. This is paired with a new DFIR operator
chain_firstfor this behavior.We also rewrite some
Singletonlogic to useOptionalunder the hood,
which reduces the places where we deal with chaining in the IR,
hopefully avoiding future incidents.
Refactor (BREAKING)
-
don't return results from
SimSender::send
Also makes theassert_*APIs onSimReceivermore general to support
asymmetricPartialEq -
allow
simfeature withoutdeployfeature
Also removes leftover prototype "properties" code. -
migrate Hydro IR to have Flo semantics at all levels
Previously, the Hydro -> Hydro IR layer did a bunch of work to translate
between Flo semantics and DFIR semantics, so that Hydro IR -> DFIR was
generally 1:1. In preparation for the simulator and DFIR's migration to
Flo semantics, we now use Flo semantics in the Hydro IR (top level
operators do not reset their state on batch boundaries), and shift the
Flo -> DFIR semantics translation to the IR -> DFIR layer.This also comes with a breaking API change to clean up naming and avoid
function overloading. In particular,batch/snapshotare renamed to
batch_atomicandsnapshot_atomicfor the case where the batched
result is supposed to be in-sync with the atomic execution context.There were a couple of bugs during the transition to the new IR
semantics that were not caught by existing unit tests, so I added
additional tests that cover each of them.Also makes some minor improvements / bug fixes the way:
- Restricts
KeyedSingleton::snapshotto unbounded-value only, since
bounded value keyed singletons do not replay their elements - Groups together definitions for
resolve_futuresand
resolve_futures_ordered - Removes various unnecessary
NoAtomictrait bounds
- Restricts
-
rename
continue_if*tofilter_if*and document
Eventually, we will want to change these APIs to take a "Condition" live
collection instead of an optional, but this at least improves the naming
for now. -
clean up API for cycles / forward references, document
Ongoing quest to reduce the public API surface. Moves theDeferTickIR
node to be at the root of the output side of the cycle, rather than
inserted at the sink. -
move
boundednessmodule underlive_collections
Type tags for boundedness are only used in the type parameters for live
collections. -
set up prelude, move collections under
live_collectionsmodule
Bigger refactor to clean up the top-level namespace forhydro_lang.
First, we move all the definitions for different live collections into
thelive_collectionsmodule.We also rename the
unsafetymodule tonondet.Then, instead of having a bunch of types exported directly from
hydro_lang, we instead create apreludemodule that exports them.
This reduces the pollution of the namespace when importing from
hydro_lang. -
reduce namespace pollution
-
rename
ClusterIdtoMemberId
Will allow us to use this for external clients as well. -
adjust cluster membership APIs to allow dynamic clusters
-
remove
KeyedOptional
Keyed optionals don't make sense to distinguish from keyed singleton,
since a null optional is equivalent to the key not being present at all.
So we can capture all cases with just a keyed singleton.Eventually... we may want to re-distinguish these if we have keys with a
null value. But right now we don't need that so avoid unnecessary code. -
nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch
Pulled together into one big PR to avoid conflicts, this makes three
significant changes to the core Hydro APIs. No semantic changes, only
syntax and naming.- Instead of non-deterministic functions being marked with the
unsafe
keyword, which conflates non-determinism with memory unsafety, we now
pass around "non-determinism guards" that act as tokens forcing
developers to be aware of the non-determinism. With the VSCode Highlight
extension, we preserve highlighting of these instances. - We unify naming for "collection-like" types (Stream, KeyedStream) to
use.batchto batch elements - We unify naming for "floating-value-like" types (Singleton, Optional,
KeyedSingleton, KeyedOptional) to use.snapshotto grab an instance of
the value (or its contents)
- Instead of non-deterministic functions being marked with the
-
simplify networking APIs according to Process/Cluster types
Breaking change: when sending to a cluster, you must use
demux_bincode. The*_anonymousAPIs have been removed in favor of
the.values()API on keyed streams. This also eliminates the
send_bytesAPIs, in favor of the bidirectional external client APIs.This refactors the networking APIs to rely less on complex traits and
instead useProcessandClustertypes to determine the input /
output type restrictions. We also now emitKeyedStreamwhenever the
sender is aCluster.
Commit Statistics
- 38 commits contributed to the release.
- 36 commits were understood as conventional.
- 36 unique issues were worked on: #1970, #1975, #1983, #1984, #1990, #1995, #2011, #2016, #2028, #2032, #2033, #2035, #2060, #2067, #2073, #2075, #2099, #2104, #2108, #2111, #2135, #2136, #2140, #2158, #2172, #2173, #2181, #2209, #2219, #2227, #2243, #2256, #2265, #2272, #2293, #2304
Commit Details
view details
- #1970
- Generalize bench_client's workload generation (b412fa0)
- #1975
- Simplify networking APIs according to Process/Cluster types (1a344a9)
- #1983
- Add keyed singletons and optionals to eliminate unsafety in membership tracking (920d9a3)
- #1984
- Remove uses of legacy
*_keyedAPIs (2e2cd77)
- Remove uses of legacy
- #1990
- Nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch (804f995)
- #1995
- Remove strange use of batching in bench_client (ab22c44)
- #2011
- Remove
KeyedOptional(381be86)
- Remove
- #2016
- Adjust cluster membership APIs to allow dynamic clusters (1c26bc7)
- #2028
- Refactor github actions workflows, make stable the default toolchain (c40876e)
- #2032
- Rename
ClusterIdtoMemberId(628c1c8)
- Rename
- #2033
- Refine types when values of a
KeyedSingletonare immutable (eadd7d0)
- Refine types when values of a
- #2035
- Client aggregator waits for all clients before outputting (f9e5955)
- #2060
- Reduce namespace pollution (4925e2c)
- #2067
- Set up prelude, move collections under
live_collectionsmodule (05145bf)
- Set up prelude, move collections under
- #2073
- Move
boundednessmodule underlive_collections(4bf1c05)
- Move
- #2075
- Clean up API for cycles / forward references, document (5f8a4da)
- #2099
- Add
OrderingandRetriestraits with basic axioms (e535156)
- Add
- #2104
- Add docs for
for_each/dest_sink, restrict to strict streams (1af3a66)
- Add docs for
- #2108
- Fix cardinality for
Optional::or(21ce30c)
- Fix cardinality for
- #2111
- Rename
continue_if*tofilter_if*and document (537309f)
- Rename
- #2135
- Staggered client (1c13977)
- #2136
- Migrate Hydro IR to have Flo semantics at all levels (a4d8af6)
- #2140
- Reduce atomic pollution in quorum counting (057192a)
- #2158
- Deterministic simulator for Hydro programs (224b2db)
- #2172
- Reduce syntactic overhead of connecting test inputs / outputs (0be5729)
- #2173
- Add APIs for unordered stream assertions and tests for
collect_quorum(a4bdf39)
- Add APIs for unordered stream assertions and tests for
- #2181
- Replay failed simulation instances and show logs (1fbd6e0)
- #2209
- Allow
simfeature withoutdeployfeature (fa4e9d9)
- Allow
- #2219
- Restrict
KeyedSingletonto not allow asynchronous key removal (a157970)
- Restrict
- #2227
- New template and walkthrough (3fd84aa)
- #2243
- Don't return results from
SimSender::send(b256cba)
- Don't return results from
- #2256
- Introduce
sliced!syntax for processing with anonymous ticks (194280b)
- Introduce
- #2265
- Cluster member ids are now clone instead of copy (5579acd)
- #2272
- Add cluster-membership ir node (98b899b)
- #2293
- Add test for collecting unordered quorum (1fc7515)
- #2304
- Add specialized
sim_input/sim_outputto reduce simulation boilerplate (6b1d66a)
- Add specialized
- Uncategorized
- Release sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0 (ac88df1)
- Release hydro_build_utils v0.0.1, dfir_lang v0.15.0, dfir_macro v0.15.0, variadics v0.0.10, sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0, safety bump 5 crates (092de25)