Skip to content

Translator: JobIdNotFound error when multiple miners connect simultaneously in aggregated mode #223

@average-gary

Description

@average-gary

Summary

The translator proxy fails with JobIdNotFound error when multiple SV1 miners connect simultaneously while running in aggregate_channels = true mode. This causes the translator to disconnect from the upstream pool and fail over, eventually shutting down.

Root Cause

In aggregated mode, when SetNewPrevHash is processed, the translator iterates over all channels in extended_channels and calls on_set_new_prev_hash() on each one:

// miner-apps/translator/src/lib/sv2/channel_manager/mining_message_handler.rs
for (_, channel) in channel_manager_data.extended_channels.iter() {
    let mut channel = channel.write()...;
    channel.on_set_new_prev_hash(m_static.clone())...;  // Called on ALL channels
}

The on_set_new_prev_hash() method in channels_sv2::client::extended::ExtendedChannel removes the job from future_jobs and then clears all remaining future jobs:

// stratum/protocols/v2/channels-sv2/src/client/extended.rs
match self.future_jobs.remove(&set_new_prev_hash.job_id) {
    Some(mut activated_job) => { ... }
    None => { return Err(ExtendedChannelError::JobIdNotFound); }
}
self.future_jobs.clear();  // Clears ALL future jobs

The problem occurs with the following sequence:

  1. First OpenExtendedMiningChannelSuccess (channel_id=2) creates extended_channels[2]
  2. First NewExtendedMiningJob (job_id=1) stores job in extended_channels[2].future_jobs
  3. First SetNewPrevHash (job_id=1) activates job on channel_id=2, clearing future_jobs
  4. Second OpenExtendedMiningChannelSuccess (channel_id=3) creates extended_channels[3]
  5. Second NewExtendedMiningJob (job_id=1) stores job in ALL channels (2 and 3)
  6. Second SetNewPrevHash (job_id=1) iterates ALL channels:
    • Tries to activate job_id=1 on channel_id=2 → FAILS (job was just re-added but gets processed first)
    • Never gets to channel_id=3

Steps to Reproduce

  1. Configure translator with aggregate_channels = true
  2. Start pool with Template Provider
  3. Start translator connecting to pool
  4. Connect multiple SV1 miners simultaneously (e.g., 5-10 miners connecting within ~100ms)

Expected Behavior

All miners should connect successfully and receive mining jobs.

Actual Behavior

2026-01-29T16:25:19.528350Z  INFO  Received: SetNewPrevHash(channel_id=3, job_id=1, ...)
2026-01-29T16:25:19.528422Z  ERROR Failed to set new prev hash: JobIdNotFound
2026-01-29T16:25:19.528514Z  WARN  Upstream connection dropped: FailedToProcessSetNewPrevHash

The translator disconnects and eventually shuts down after exhausting retry attempts.

Proposed Fix

Before calling on_set_new_prev_hash() on each channel, check if the channel actually has the referenced job in its future_jobs. Skip channels that don't have the job:

for (_, channel) in channel_manager_data.extended_channels.iter() {
    let mut channel = channel.write()...;
    // Skip channels that don't have this job as a future job
    if !channel.get_future_jobs().contains_key(&m_static.job_id) {
        continue;
    }
    channel.on_set_new_prev_hash(m_static.clone())...;
}

Environment

  • sv2-apps: main branch
  • stratum-core: main branch
  • Configuration: aggregate_channels = true

Logs

Log 1: Multiple miners connecting simultaneously (10 miners)
2026-01-29T16:25:17.108993Z  INFO translator_sv2: Starting Translator Proxy...
2026-01-29T16:25:18.111103Z  INFO translator_sv2::sv2::upstream::upstream: Connected to upstream at 127.0.0.1:3336
2026-01-29T16:25:19.522888Z  INFO translator_sv2::sv1::sv1_server::sv1_server: New SV1 downstream connection from 10.30.76.55:38518
2026-01-29T16:25:19.523029Z  INFO translator_sv2::sv1::sv1_server::sv1_server: Downstream 1 registered successfully
2026-01-29T16:25:19.523109Z  INFO translator_sv2::sv1::sv1_server::sv1_server: New SV1 downstream connection from 10.30.207.44:57182
2026-01-29T16:25:19.523181Z  INFO translator_sv2::sv1::sv1_server::sv1_server: Downstream 2 registered successfully
...
2026-01-29T16:25:19.526184Z  INFO Received: OpenExtendedMiningChannelSuccess(request_id: 1, channel_id: 2, ...)
2026-01-29T16:25:19.527239Z  INFO Received: NewExtendedMiningJob(channel_id: 2, job_id: 1, ...)
2026-01-29T16:25:19.527519Z  INFO Received: SetNewPrevHash(channel_id=2, job_id=1, ...)
2026-01-29T16:25:19.527655Z  INFO Received: OpenExtendedMiningChannelSuccess(request_id: 2, channel_id: 3, ...)
2026-01-29T16:25:19.528017Z  INFO Received: NewExtendedMiningJob(channel_id: 3, job_id: 1, ...)
2026-01-29T16:25:19.528350Z  INFO Received: SetNewPrevHash(channel_id=3, job_id=1, ...)
2026-01-29T16:25:19.528422Z ERROR Failed to set new prev hash: JobIdNotFound
2026-01-29T16:25:19.528514Z  WARN Upstream connection dropped: FailedToProcessSetNewPrevHash
2026-01-29T16:25:19.529520Z ERROR All upstreams failed after 3 retries each
Log 2: Similar failure with 6 miners
2026-01-29T16:06:52.129312Z  INFO translator_sv2: Starting Translator Proxy...
2026-01-29T16:06:53.259420Z  INFO Connected to upstream at 127.0.0.1:3336
2026-01-29T16:06:56.492618Z  INFO New SV1 downstream connection from 10.30.76.59:33732
2026-01-29T16:06:56.492774Z  INFO Downstream 1 registered successfully
...
2026-01-29T16:06:56.495542Z  INFO Received: OpenExtendedMiningChannelSuccess(request_id: 1, channel_id: 2, ...)
2026-01-29T16:06:56.495674Z  INFO Received: NewExtendedMiningJob(channel_id: 2, job_id: 1, ...)
2026-01-29T16:06:56.496138Z  INFO Received: SetNewPrevHash(channel_id=2, job_id=1, ...)
2026-01-29T16:06:56.496334Z ERROR Failed to set new prev hash: JobIdNotFound
2026-01-29T16:06:56.496479Z  WARN Upstream connection dropped: FailedToProcessSetNewPrevHash
2026-01-29T16:06:56.497513Z ERROR All upstreams failed after 3 retries each

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions