Release hydro_std v0.15.0 · hydro-project/hydro

Documentation

new template and walkthrough

New Features

add cluster-membership ir node
introduce sliced! syntax for processing with anonymous ticks
replay failed simulation instances and show logs
Previously, a failed simulation would only show the backtrace, and would
have to be re-run with HYDRO_SIM_LOG=1 to display the simulation
steps. This would result in lots of log pollution since logging would
also be enabled for passing instances.

Now, we've made some tweaks to bolero so that it re-executes the test
after finding a failing input, passing an is_replay flag. We use this
to enable rich logging, so that the failure is displayed with all the
simulation steps that led to it.
add APIs for unordered stream assertions and tests for collect_quorum
AI Disclosure: all the tests were generated using Kiro (!)

Currently, the core functionality test has to be split up into several
exhaustive units because otherwise the search space becomes too large
for CI (~400s). Eventually, keyed streams may help but splitting up the
test is a reasonable short-term fix.
deterministic simulator for Hydro programs
This introduces a deterministic simulator for Hydro that can simulate
various asynchronous scenarios to identify bugs in code that use
non-deterministic operators, such as batch. This PR focuses on just
the infrastructure and support for simulating batch on a
totally-ordered, exactly-once stream. Support for additional
non-deterministic operators will follow in separate PRs (currently, an
exception is thrown to prevent use of the simulator on programs that use
such operators).

The simulator's job is to explore the space of potential asynchronous
executions. Because "top-level" operators guarantee "eventual
determinism" (per Flo), we do not need to simulate every possible
interleaving of message arrivals and processing. Instead, we only need
to simulate sources of non-determinism at the points in the program
where a user intentionally observes them (such as batch or
assume_ordering).

When compiling a Hydro program for the simulator, we emit several DFIR
programs. One of these is the async_dfir, which contains all
asynchronously executed top-level operators in the Hydro program. Again,
thanks to Flo semantics, we do not need to simulate the behavior of
executing these operators on different prefixes of the input, since we
know that none of the downstream operators change their behavior based
on the provided prefix (this is somewhat more complicated for unbounded
singletons, whose intermediate states are affected by the set of
elements processed in each batch, but we will address this separately).

Because each tick relies on a set of decisions being made to select
their inputs (batch, snapshot), we emit each tick's code into a
separate DFIR graph. The top-level simulator then schedules
(LaunchedSim::scheduler) the async graph and tick graphs by always
trying to make progress with the async graph first (so that we have the
full set of possible inputs at each batch boundary), and when the async
graph cannot make any further progress it selects one of the ticks,
makes a batching decision for each of its inputs
(autonomous_decision), and then executes the tick.

The selection of which tick to run and which elements to release in each
batch are driven by a source of non-determinism, which is either:
a) libfuzzer (if using .sim().fuzz and running with cargo sim)
b) a RNG with 8192 iterations (if using .sim().fuzz and running with
cargo test and no reproducer is available)
c) a static input of decisions (if using .sim().fuzz and running with
cargo test and a reproducer is available)
d) an exhaustive, depth-first search algorithm (if using
.sim().exhaustive and running with cargo test)

Whenever a fuzzer finds a failure, it stores the sequence of decisions
that leads to the crash in a .bin file as a reproducer, so that we can
re-execute quickly in testing environments.

Because Hydro uses a staged compilation model, our approach to compiling
and executing the Hydro program is also a bit unique. Because the fuzzer
needs to track branch coverage inside the Hydro program to be effective,
and because we need low-latency interactions between the user assertions
and the Hydro code, we cannot run the compiled program in a separate
process. Instead, we compile the Hydro code into a shared library and
dynamically load it into the host process (which has the testing code).
The shared library only provides the compiled DFIR graphs, so the
simulator scheduler runs in the host process to enable low-latency
switching between the DFIR and testing code.

This PR includes a couple of toy examples testing the simulator's
functionality. Of particular interest is
sim_crash_with_fuzzed_batching, which takes forever with an exhaustive
search but quickly finds a failure case with a fuzzer.
Generalize bench_client's workload generation
Allow custom functions for generating bench_client workloads (beyond
u32,u32).
add keyed singletons and optionals to eliminate unsafety in membership tracking

Bug Fixes

Staggered client
Client initially outputs 1 message per tick instead of all messages in 1
giant batch, so if downstream operators do not dynamically adjust the
batch size, they are not overwhelmed
Client aggregator waits for all clients before outputting
Outputs N/A as throughput and latency until all clients have responded
with positive throughput.
refactor github actions workflows, make stable the default toolchain
remove strange use of batching in bench_client

Refactor

reduce syntactic overhead of connecting test inputs / outputs
Rather than having separate source / sink / bytes / bincode
APIs, we use a single connect method that uses a trait to resolve the
appropriate connection result.
reduce atomic pollution in quorum counting
Also fixes a sync bug in Compartmentalized Paxos. Due to batching
semantics, we could end up in a situation where responses have missing
metadata (for example if the batch refuses to release any elements).

The simulator would hopefully have caught this, we can use this as an
example.
remove uses of legacy *_keyed APIs
Also adds missing doctests to the aggregation APIs on KeyedStream.

Test

add test for collecting unordered quorum

Documentation (BREAKING)

add docs for for_each / dest_sink, restrict to strict streams
Breaking Change: for_each / dest_sink now require a totally-ordered,
retry-free stream since the downstream may not tolerate such
non-determinism. This helps highlight cases where the developer needs to
reason carefully about the safety of their side effects.

New Features (BREAKING)

add specialized sim_input / sim_output to reduce simulation boilerplate
cluster member ids are now clone instead of copy
This is part one of a series of changes. The first part just changes
MemberIds from being Copy to Clone, so that they can later support more
use cases.
restrict KeyedSingleton to not allow asynchronous key removal
Previously, KeyedSingleton supported filter on unbounded collections
and latest to yield snapshots from a tick. Both of these APIs are
problematic because they let developers asynchronously remove elements
from the collection, which goes against the "Singleton" naming. We have
reasonable alternatives for all these use-sites, so remove for now to
minimize the semantic complexity (also for the simulator).

Eventually, we will want to bring back KeyedOptional which permits
such asynchronous removal.
add Ordering and Retries traits with basic axioms
Reduces the need to use nondet! inside broadcast APIs, and also
prepares us for graph viz metadata by requiring these traits to be
implemented for the type parameters in Stream / KeyedStream.
refine types when values of a KeyedSingleton are immutable
There are many cases where once a value is released into a
KeyedSingleton, it will never be changes. In such cases, we can permit
APIs like .entries even if the set of entries is growing.

We use this type to improve the API surface for looking up values for a
request. Now, a KeyedSingleton of requests can look up values from
another KeyedSingleton.

Bug Fixes (BREAKING)

fix cardinality for Optional::or
Using HydroNode::Chain is very dangerous for singletons / optionals,
because it can lead to cardinality > 1 within a single batch, which
breaks a fundamental invariant.

This introduces a new HydroNode::ChainFirst operator that only emits
the first value from the chain. This is paired with a new DFIR operator
chain_first for this behavior.

We also rewrite some Singleton logic to use Optional under the hood,
which reduces the places where we deal with chaining in the IR,
hopefully avoiding future incidents.

Refactor (BREAKING)

don't return results from SimSender::send
Also makes the assert_* APIs on SimReceiver more general to support
asymmetric PartialEq
allow sim feature without deploy feature
Also removes leftover prototype "properties" code.
migrate Hydro IR to have Flo semantics at all levels
Previously, the Hydro -> Hydro IR layer did a bunch of work to translate
between Flo semantics and DFIR semantics, so that Hydro IR -> DFIR was
generally 1:1. In preparation for the simulator and DFIR's migration to
Flo semantics, we now use Flo semantics in the Hydro IR (top level
operators do not reset their state on batch boundaries), and shift the
Flo -> DFIR semantics translation to the IR -> DFIR layer.

This also comes with a breaking API change to clean up naming and avoid
function overloading. In particular, batch / snapshot are renamed to
batch_atomic and snapshot_atomic for the case where the batched
result is supposed to be in-sync with the atomic execution context.

There were a couple of bugs during the transition to the new IR
semantics that were not caught by existing unit tests, so I added
additional tests that cover each of them.

Also makes some minor improvements / bug fixes the way:
- Restricts KeyedSingleton::snapshot to unbounded-value only, since
  bounded value keyed singletons do not replay their elements
- Groups together definitions for resolve_futures and
  resolve_futures_ordered
- Removes various unnecessary NoAtomic trait bounds
rename continue_if* to filter_if* and document
Eventually, we will want to change these APIs to take a "Condition" live
collection instead of an optional, but this at least improves the naming
for now.
clean up API for cycles / forward references, document
Ongoing quest to reduce the public API surface. Moves the DeferTick IR
node to be at the root of the output side of the cycle, rather than
inserted at the sink.
move boundedness module under live_collections
Type tags for boundedness are only used in the type parameters for live
collections.
set up prelude, move collections under live_collections module
Bigger refactor to clean up the top-level namespace for hydro_lang.
First, we move all the definitions for different live collections into
the live_collections module.

We also rename the unsafety module to nondet.

Then, instead of having a bunch of types exported directly from
hydro_lang, we instead create a prelude module that exports them.
This reduces the pollution of the namespace when importing from
hydro_lang.
reduce namespace pollution
rename ClusterId to MemberId
Will allow us to use this for external clients as well.
adjust cluster membership APIs to allow dynamic clusters
remove KeyedOptional
Keyed optionals don't make sense to distinguish from keyed singleton,
since a null optional is equivalent to the key not being present at all.
So we can capture all cases with just a keyed singleton.

Eventually... we may want to re-distinguish these if we have keys with a
null value. But right now we don't need that so avoid unnecessary code.
nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch
Pulled together into one big PR to avoid conflicts, this makes three
significant changes to the core Hydro APIs. No semantic changes, only
syntax and naming.
1. Instead of non-deterministic functions being marked with the unsafe
  keyword, which conflates non-determinism with memory unsafety, we now
  pass around "non-determinism guards" that act as tokens forcing
  developers to be aware of the non-determinism. With the VSCode Highlight
  extension, we preserve highlighting of these instances.
2. We unify naming for "collection-like" types (Stream, KeyedStream) to
  use .batch to batch elements
3. We unify naming for "floating-value-like" types (Singleton, Optional,
  KeyedSingleton, KeyedOptional) to use .snapshot to grab an instance of
  the value (or its contents)
simplify networking APIs according to Process/Cluster types
Breaking change: when sending to a cluster, you must use
demux_bincode. The *_anonymous APIs have been removed in favor of
the .values() API on keyed streams. This also eliminates the
send_bytes APIs, in favor of the bidirectional external client APIs.

This refactors the networking APIs to rely less on complex traits and
instead use Process and Cluster types to determine the input /
output type restrictions. We also now emit KeyedStream whenever the
sender is a Cluster.

Commit Statistics

38 commits contributed to the release.
36 commits were understood as conventional.
36 unique issues were worked on: #1970, #1975, #1983, #1984, #1990, #1995, #2011, #2016, #2028, #2032, #2033, #2035, #2060, #2067, #2073, #2075, #2099, #2104, #2108, #2111, #2135, #2136, #2140, #2158, #2172, #2173, #2181, #2209, #2219, #2227, #2243, #2256, #2265, #2272, #2293, #2304

Commit Details

view details

#1970
- Generalize bench_client's workload generation (b412fa0)
#1975
- Simplify networking APIs according to Process/Cluster types (1a344a9)
#1983
- Add keyed singletons and optionals to eliminate unsafety in membership tracking (920d9a3)
#1984
- Remove uses of legacy *_keyed APIs (2e2cd77)
#1990
- Nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch (804f995)
#1995
- Remove strange use of batching in bench_client (ab22c44)
#2011
- Remove KeyedOptional (381be86)
#2016
- Adjust cluster membership APIs to allow dynamic clusters (1c26bc7)
#2028
- Refactor github actions workflows, make stable the default toolchain (c40876e)
#2032
- Rename ClusterId to MemberId (628c1c8)
#2033
- Refine types when values of a KeyedSingleton are immutable (eadd7d0)
#2035
- Client aggregator waits for all clients before outputting (f9e5955)
#2060
- Reduce namespace pollution (4925e2c)
#2067
- Set up prelude, move collections under live_collections module (05145bf)
#2073
- Move boundedness module under live_collections (4bf1c05)
#2075
- Clean up API for cycles / forward references, document (5f8a4da)
#2099
- Add Ordering and Retries traits with basic axioms (e535156)
#2104
- Add docs for for_each / dest_sink, restrict to strict streams (1af3a66)
#2108
- Fix cardinality for Optional::or (21ce30c)
#2111
- Rename continue_if* to filter_if* and document (537309f)
#2135
- Staggered client (1c13977)
#2136
- Migrate Hydro IR to have Flo semantics at all levels (a4d8af6)
#2140
- Reduce atomic pollution in quorum counting (057192a)
#2158
- Deterministic simulator for Hydro programs (224b2db)
#2172
- Reduce syntactic overhead of connecting test inputs / outputs (0be5729)
#2173
- Add APIs for unordered stream assertions and tests for collect_quorum (a4bdf39)
#2181
- Replay failed simulation instances and show logs (1fbd6e0)
#2209
- Allow sim feature without deploy feature (fa4e9d9)
#2219
- Restrict KeyedSingleton to not allow asynchronous key removal (a157970)
#2227
- New template and walkthrough (3fd84aa)
#2243
- Don't return results from SimSender::send (b256cba)
#2256
- Introduce sliced! syntax for processing with anonymous ticks (194280b)
#2265
- Cluster member ids are now clone instead of copy (5579acd)
#2272
- Add cluster-membership ir node (98b899b)
#2293
- Add test for collecting unordered quorum (1fc7515)
#2304
- Add specialized sim_input / sim_output to reduce simulation boilerplate (6b1d66a)
Uncategorized
- Release sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0 (ac88df1)
- Release hydro_build_utils v0.0.1, dfir_lang v0.15.0, dfir_macro v0.15.0, variadics v0.0.10, sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0, safety bump 5 crates (092de25)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hydro_std v0.15.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Documentation

New Features

Bug Fixes

Refactor

Test

Documentation (BREAKING)

New Features (BREAKING)

Bug Fixes (BREAKING)

Refactor (BREAKING)

Commit Statistics

Commit Details

Uh oh!