Skip to content

hydro_std v0.15.0

Latest

Choose a tag to compare

@hydro-project-bot hydro-project-bot released this 25 Nov 23:03
· 199 commits to main since this release

Documentation

  • new template and walkthrough

New Features

  • add cluster-membership ir node

  • introduce sliced! syntax for processing with anonymous ticks

  • replay failed simulation instances and show logs
    Previously, a failed simulation would only show the backtrace, and would
    have to be re-run with HYDRO_SIM_LOG=1 to display the simulation
    steps. This would result in lots of log pollution since logging would
    also be enabled for passing instances.

    Now, we've made some tweaks to bolero so that it re-executes the test
    after finding a failing input, passing an is_replay flag. We use this
    to enable rich logging, so that the failure is displayed with all the
    simulation steps that led to it.

  • add APIs for unordered stream assertions and tests for collect_quorum
    AI Disclosure: all the tests were generated using Kiro (!)

    Currently, the core functionality test has to be split up into several
    exhaustive units because otherwise the search space becomes too large
    for CI (~400s). Eventually, keyed streams may help but splitting up the
    test is a reasonable short-term fix.

  • deterministic simulator for Hydro programs
    This introduces a deterministic simulator for Hydro that can simulate
    various asynchronous scenarios to identify bugs in code that use
    non-deterministic operators, such as batch. This PR focuses on just
    the infrastructure and support for simulating batch on a
    totally-ordered, exactly-once stream. Support for additional
    non-deterministic operators will follow in separate PRs (currently, an
    exception is thrown to prevent use of the simulator on programs that use
    such operators).

    The simulator's job is to explore the space of potential asynchronous
    executions. Because "top-level" operators guarantee "eventual
    determinism" (per Flo), we do not need to simulate every possible
    interleaving of message arrivals and processing. Instead, we only need
    to simulate sources of non-determinism at the points in the program
    where a user intentionally observes them (such as batch or
    assume_ordering).

    When compiling a Hydro program for the simulator, we emit several DFIR
    programs. One of these is the async_dfir, which contains all
    asynchronously executed top-level operators in the Hydro program. Again,
    thanks to Flo semantics, we do not need to simulate the behavior of
    executing these operators on different prefixes of the input, since we
    know that none of the downstream operators change their behavior based
    on the provided prefix (this is somewhat more complicated for unbounded
    singletons, whose intermediate states are affected by the set of
    elements processed in each batch, but we will address this separately).

    Because each tick relies on a set of decisions being made to select
    their inputs (batch, snapshot), we emit each tick's code into a
    separate DFIR graph. The top-level simulator then schedules
    (LaunchedSim::scheduler) the async graph and tick graphs by always
    trying to make progress with the async graph first (so that we have the
    full set of possible inputs at each batch boundary), and when the async
    graph cannot make any further progress it selects one of the ticks,
    makes a batching decision for each of its inputs
    (autonomous_decision), and then executes the tick.

    The selection of which tick to run and which elements to release in each
    batch are driven by a source of non-determinism, which is either:
    a) libfuzzer (if using .sim().fuzz and running with cargo sim)
    b) a RNG with 8192 iterations (if using .sim().fuzz and running with
    cargo test and no reproducer is available)
    c) a static input of decisions (if using .sim().fuzz and running with
    cargo test and a reproducer is available)
    d) an exhaustive, depth-first search algorithm (if using
    .sim().exhaustive and running with cargo test)

    Whenever a fuzzer finds a failure, it stores the sequence of decisions
    that leads to the crash in a .bin file as a reproducer, so that we can
    re-execute quickly in testing environments.

    Because Hydro uses a staged compilation model, our approach to compiling
    and executing the Hydro program is also a bit unique. Because the fuzzer
    needs to track branch coverage inside the Hydro program to be effective,
    and because we need low-latency interactions between the user assertions
    and the Hydro code, we cannot run the compiled program in a separate
    process. Instead, we compile the Hydro code into a shared library and
    dynamically load it into the host process (which has the testing code).
    The shared library only provides the compiled DFIR graphs, so the
    simulator scheduler runs in the host process to enable low-latency
    switching between the DFIR and testing code.

    This PR includes a couple of toy examples testing the simulator's
    functionality. Of particular interest is
    sim_crash_with_fuzzed_batching, which takes forever with an exhaustive
    search but quickly finds a failure case with a fuzzer.

  • Generalize bench_client's workload generation
    Allow custom functions for generating bench_client workloads (beyond
    u32,u32).

  • add keyed singletons and optionals to eliminate unsafety in membership tracking

Bug Fixes

  • Staggered client
    Client initially outputs 1 message per tick instead of all messages in 1
    giant batch, so if downstream operators do not dynamically adjust the
    batch size, they are not overwhelmed
  • Client aggregator waits for all clients before outputting
    Outputs N/A as throughput and latency until all clients have responded
    with positive throughput.
  • refactor github actions workflows, make stable the default toolchain
  • remove strange use of batching in bench_client

Refactor

  • reduce syntactic overhead of connecting test inputs / outputs
    Rather than having separate source / sink / bytes / bincode
    APIs, we use a single connect method that uses a trait to resolve the
    appropriate connection result.

  • reduce atomic pollution in quorum counting
    Also fixes a sync bug in Compartmentalized Paxos. Due to batching
    semantics, we could end up in a situation where responses have missing
    metadata (for example if the batch refuses to release any elements).

    The simulator would hopefully have caught this, we can use this as an
    example.

  • remove uses of legacy *_keyed APIs
    Also adds missing doctests to the aggregation APIs on KeyedStream.

Test

  • add test for collecting unordered quorum

Documentation (BREAKING)

  • add docs for for_each / dest_sink, restrict to strict streams
    Breaking Change: for_each / dest_sink now require a totally-ordered,
    retry-free stream since the downstream may not tolerate such
    non-determinism. This helps highlight cases where the developer needs to
    reason carefully about the safety of their side effects.

New Features (BREAKING)

  • add specialized sim_input / sim_output to reduce simulation boilerplate

  • cluster member ids are now clone instead of copy
    This is part one of a series of changes. The first part just changes
    MemberIds from being Copy to Clone, so that they can later support more
    use cases.

  • restrict KeyedSingleton to not allow asynchronous key removal
    Previously, KeyedSingleton supported filter on unbounded collections
    and latest to yield snapshots from a tick. Both of these APIs are
    problematic because they let developers asynchronously remove elements
    from the collection, which goes against the "Singleton" naming. We have
    reasonable alternatives for all these use-sites, so remove for now to
    minimize the semantic complexity (also for the simulator).

    Eventually, we will want to bring back KeyedOptional which permits
    such asynchronous removal.

  • add Ordering and Retries traits with basic axioms
    Reduces the need to use nondet! inside broadcast APIs, and also
    prepares us for graph viz metadata by requiring these traits to be
    implemented for the type parameters in Stream / KeyedStream.

  • refine types when values of a KeyedSingleton are immutable
    There are many cases where once a value is released into a
    KeyedSingleton, it will never be changes. In such cases, we can permit
    APIs like .entries even if the set of entries is growing.

    We use this type to improve the API surface for looking up values for a
    request. Now, a KeyedSingleton of requests can look up values from
    another KeyedSingleton.

Bug Fixes (BREAKING)

  • fix cardinality for Optional::or
    Using HydroNode::Chain is very dangerous for singletons / optionals,
    because it can lead to cardinality > 1 within a single batch, which
    breaks a fundamental invariant.

    This introduces a new HydroNode::ChainFirst operator that only emits
    the first value from the chain. This is paired with a new DFIR operator
    chain_first for this behavior.

    We also rewrite some Singleton logic to use Optional under the hood,
    which reduces the places where we deal with chaining in the IR,
    hopefully avoiding future incidents.

Refactor (BREAKING)

  • don't return results from SimSender::send
    Also makes the assert_* APIs on SimReceiver more general to support
    asymmetric PartialEq

  • allow sim feature without deploy feature
    Also removes leftover prototype "properties" code.

  • migrate Hydro IR to have Flo semantics at all levels
    Previously, the Hydro -> Hydro IR layer did a bunch of work to translate
    between Flo semantics and DFIR semantics, so that Hydro IR -> DFIR was
    generally 1:1. In preparation for the simulator and DFIR's migration to
    Flo semantics, we now use Flo semantics in the Hydro IR (top level
    operators do not reset their state on batch boundaries), and shift the
    Flo -> DFIR semantics translation to the IR -> DFIR layer.

    This also comes with a breaking API change to clean up naming and avoid
    function overloading. In particular, batch / snapshot are renamed to
    batch_atomic and snapshot_atomic for the case where the batched
    result is supposed to be in-sync with the atomic execution context.

    There were a couple of bugs during the transition to the new IR
    semantics that were not caught by existing unit tests, so I added
    additional tests that cover each of them.

    Also makes some minor improvements / bug fixes the way:

    • Restricts KeyedSingleton::snapshot to unbounded-value only, since
      bounded value keyed singletons do not replay their elements
    • Groups together definitions for resolve_futures and
      resolve_futures_ordered
    • Removes various unnecessary NoAtomic trait bounds
  • rename continue_if* to filter_if* and document
    Eventually, we will want to change these APIs to take a "Condition" live
    collection instead of an optional, but this at least improves the naming
    for now.

  • clean up API for cycles / forward references, document
    Ongoing quest to reduce the public API surface. Moves the DeferTick IR
    node to be at the root of the output side of the cycle, rather than
    inserted at the sink.

  • move boundedness module under live_collections
    Type tags for boundedness are only used in the type parameters for live
    collections.

  • set up prelude, move collections under live_collections module
    Bigger refactor to clean up the top-level namespace for hydro_lang.
    First, we move all the definitions for different live collections into
    the live_collections module.

    We also rename the unsafety module to nondet.

    Then, instead of having a bunch of types exported directly from
    hydro_lang, we instead create a prelude module that exports them.
    This reduces the pollution of the namespace when importing from
    hydro_lang.

  • reduce namespace pollution

  • rename ClusterId to MemberId
    Will allow us to use this for external clients as well.

  • adjust cluster membership APIs to allow dynamic clusters

  • remove KeyedOptional
    Keyed optionals don't make sense to distinguish from keyed singleton,
    since a null optional is equivalent to the key not being present at all.
    So we can capture all cases with just a keyed singleton.

    Eventually... we may want to re-distinguish these if we have keys with a
    null value. But right now we don't need that so avoid unnecessary code.

  • nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch
    Pulled together into one big PR to avoid conflicts, this makes three
    significant changes to the core Hydro APIs. No semantic changes, only
    syntax and naming.

    1. Instead of non-deterministic functions being marked with the unsafe
      keyword, which conflates non-determinism with memory unsafety, we now
      pass around "non-determinism guards" that act as tokens forcing
      developers to be aware of the non-determinism. With the VSCode Highlight
      extension, we preserve highlighting of these instances.
    2. We unify naming for "collection-like" types (Stream, KeyedStream) to
      use .batch to batch elements
    3. We unify naming for "floating-value-like" types (Singleton, Optional,
      KeyedSingleton, KeyedOptional) to use .snapshot to grab an instance of
      the value (or its contents)
  • simplify networking APIs according to Process/Cluster types
    Breaking change: when sending to a cluster, you must use
    demux_bincode. The *_anonymous APIs have been removed in favor of
    the .values() API on keyed streams. This also eliminates the
    send_bytes APIs, in favor of the bidirectional external client APIs.

    This refactors the networking APIs to rely less on complex traits and
    instead use Process and Cluster types to determine the input /
    output type restrictions. We also now emit KeyedStream whenever the
    sender is a Cluster.

Commit Statistics

Commit Details

view details
  • #1970
    • Generalize bench_client's workload generation (b412fa0)
  • #1975
    • Simplify networking APIs according to Process/Cluster types (1a344a9)
  • #1983
    • Add keyed singletons and optionals to eliminate unsafety in membership tracking (920d9a3)
  • #1984
    • Remove uses of legacy *_keyed APIs (2e2cd77)
  • #1990
    • Nondet instead of unsafe, snapshot instead of latest_tick, batch instead of tick_batch (804f995)
  • #1995
    • Remove strange use of batching in bench_client (ab22c44)
  • #2011
  • #2016
    • Adjust cluster membership APIs to allow dynamic clusters (1c26bc7)
  • #2028
    • Refactor github actions workflows, make stable the default toolchain (c40876e)
  • #2032
    • Rename ClusterId to MemberId (628c1c8)
  • #2033
    • Refine types when values of a KeyedSingleton are immutable (eadd7d0)
  • #2035
    • Client aggregator waits for all clients before outputting (f9e5955)
  • #2060
    • Reduce namespace pollution (4925e2c)
  • #2067
    • Set up prelude, move collections under live_collections module (05145bf)
  • #2073
    • Move boundedness module under live_collections (4bf1c05)
  • #2075
    • Clean up API for cycles / forward references, document (5f8a4da)
  • #2099
    • Add Ordering and Retries traits with basic axioms (e535156)
  • #2104
    • Add docs for for_each / dest_sink, restrict to strict streams (1af3a66)
  • #2108
    • Fix cardinality for Optional::or (21ce30c)
  • #2111
    • Rename continue_if* to filter_if* and document (537309f)
  • #2135
  • #2136
    • Migrate Hydro IR to have Flo semantics at all levels (a4d8af6)
  • #2140
    • Reduce atomic pollution in quorum counting (057192a)
  • #2158
    • Deterministic simulator for Hydro programs (224b2db)
  • #2172
    • Reduce syntactic overhead of connecting test inputs / outputs (0be5729)
  • #2173
    • Add APIs for unordered stream assertions and tests for collect_quorum (a4bdf39)
  • #2181
    • Replay failed simulation instances and show logs (1fbd6e0)
  • #2209
    • Allow sim feature without deploy feature (fa4e9d9)
  • #2219
    • Restrict KeyedSingleton to not allow asynchronous key removal (a157970)
  • #2227
    • New template and walkthrough (3fd84aa)
  • #2243
    • Don't return results from SimSender::send (b256cba)
  • #2256
    • Introduce sliced! syntax for processing with anonymous ticks (194280b)
  • #2265
    • Cluster member ids are now clone instead of copy (5579acd)
  • #2272
    • Add cluster-membership ir node (98b899b)
  • #2293
    • Add test for collecting unordered quorum (1fc7515)
  • #2304
    • Add specialized sim_input / sim_output to reduce simulation boilerplate (6b1d66a)
  • Uncategorized
    • Release sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0 (ac88df1)
    • Release hydro_build_utils v0.0.1, dfir_lang v0.15.0, dfir_macro v0.15.0, variadics v0.0.10, sinktools v0.0.1, hydro_deploy_integration v0.15.0, lattices_macro v0.5.11, variadics_macro v0.6.2, lattices v0.6.2, multiplatform_test v0.6.0, dfir_rs v0.15.0, copy_span v0.1.0, hydro_deploy v0.15.0, hydro_lang v0.15.0, hydro_std v0.15.0, safety bump 5 crates (092de25)