Skip to content

Conversation

@average-gary
Copy link
Contributor

Summary

Fixes #223

Prevents JobIdNotFound error when multiple SV1 miners connect simultaneously in aggregated mode.

Problem

When multiple miners connect at the same time, the translator would fail because handle_set_new_prev_hash iterates over ALL extended channels and calls on_set_new_prev_hash() on each one. Since on_set_new_prev_hash() removes the job from future_jobs and clears remaining jobs, subsequent channels fail when they try to find the same job_id.

Solution

Before calling on_set_new_prev_hash(), check if the channel actually has the referenced job in its future_jobs. Skip channels that don't have the job.

This fix is applied to all three code paths in handle_set_new_prev_hash:

  1. Aggregated mode (upstream_extended_channel and extended_channels loop)
  2. Non-aggregated group channel mode
  3. Non-aggregated individual channel mode

Testing

  • Existing unit tests pass
  • Manual testing with multiple simultaneous miner connections - confirmed working

Changes

  • miner-apps/translator/src/lib/sv2/channel_manager/mining_message_handler.rs: Added defensive checks before calling on_set_new_prev_hash() to verify job exists in future_jobs

…imultaneously

When multiple SV1 miners connect simultaneously in aggregated mode, the
translator would fail with JobIdNotFound error when processing SetNewPrevHash
messages for subsequent channels.

Root cause: In handle_set_new_prev_hash, the code iterates over ALL extended
channels and calls on_set_new_prev_hash() on each one. When on_set_new_prev_hash
is called on a channel, it removes the job from future_jobs and clears all
remaining future jobs. This causes subsequent channels to fail when they try
to find the same job_id.

Fix: Before calling on_set_new_prev_hash(), check if the channel actually has
the referenced job in its future_jobs. Skip channels that don't have the job.
Also ensure that SV1 message forwarding only happens after successful
on_set_new_prev_hash() calls to prevent potential panics from get_active_job().

This fix is applied to all three code paths:
1. Aggregated mode (upstream_extended_channel and extended_channels loop)
2. Non-aggregated group channel mode
3. Non-aggregated individual channel mode
@average-gary average-gary force-pushed the fix/translator-job-id-not-found branch from 52a6f0d to 878a3ba Compare January 29, 2026 17:04
…gated mode

This test reproduces the JobIdNotFound bug that occurs when multiple SV1 miners
connect simultaneously in aggregated mode (aggregate_channels = true).

The bug manifests when SetNewPrevHash arrives targeting a group channel ID:
- Translator iterates over ALL extended channels in the group
- First channel removes the job from future_jobs and clears remaining future jobs
- Subsequent channels fail with JobIdNotFound when trying to find the same job_id

Test name: aggregated_translator_handles_group_channel_set_new_prev_hash_without_job_id_not_found

Test strategy:
1. Start translator in aggregated mode with MockUpstream
2. Connect 5 miners (all aggregated into same channel)
3. Send initial job (job_id=1) to establish working state
4. Send future job (job_id=2) to GROUP channel ID
5. Send SetNewPrevHash (job_id=2) to GROUP channel ID (bug trigger)
6. Verify translator survives and miners continue submitting shares

Related issue: stratum-mining#223
Related PR: stratum-mining#224
@average-gary
Copy link
Contributor Author

The recently added test demonstrates the bug exists on main and is fixed by this PR:

Checkout main

git checkout main
git pull origin main

Cherry-pick ONLY the test commit (not the fix)

git cherry-pick e3246b69

Run the test - it should FAIL (translator crashes with JobIdNotFound)

cd integration-tests
RUST_LOG=error cargo test aggregated_translator_handles_group_channel_set_new_prev_hash_without_job_id_not_found -- --nocapture

Expected: Test FAILS - translator crashes, miners get "Connection refused"

Now cherry-pick the fix

git cherry-pick 878a3ba5

Run the test again - it should PASS

RUST_LOG=error cargo test aggregated_translator_handles_group_channel_set_new_prev_hash_without_job_id_not_found -- --nocapture

Expected: Test PASSES - all miners submit shares successfully

average-gary added a commit to average-gary/sv2-apps that referenced this pull request Jan 29, 2026
…gated mode

This test reproduces the JobIdNotFound bug that occurs when multiple SV1 miners
connect simultaneously in aggregated mode (aggregate_channels = true).

The bug manifests when SetNewPrevHash arrives targeting a group channel ID:
- Translator iterates over ALL extended channels in the group
- First channel removes the job from future_jobs and clears remaining future jobs
- Subsequent channels fail with JobIdNotFound when trying to find the same job_id

Test name: aggregated_translator_handles_group_channel_set_new_prev_hash_without_job_id_not_found

Test strategy:
1. Start translator in aggregated mode with MockUpstream
2. Connect 5 miners (all aggregated into same channel)
3. Send initial job (job_id=1) to establish working state
4. Send future job (job_id=2) to GROUP channel ID
5. Send SetNewPrevHash (job_id=2) to GROUP channel ID (bug trigger)
6. Verify translator survives and miners continue submitting shares

Related issue: stratum-mining#223
Related PR: stratum-mining#224
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Translator: JobIdNotFound error when multiple miners connect simultaneously in aggregated mode

1 participant