Skip to content

feat: native bls verifier#8946

Draft
wemeetagain wants to merge 8 commits intobing/blst-zfrom
cayman/bls-verifier
Draft

feat: native bls verifier#8946
wemeetagain wants to merge 8 commits intobing/blst-zfrom
cayman/bls-verifier

Conversation

@wemeetagain
Copy link
Member

Motivation

  • Now that we have lodestar-z blst and pubkey caches integrated in feat(blst): replace blst and pubkeys with lodestar-z #8900 , we can take advantage of this to revamp our bls batch verifier logic
  • The previous JS worker-thread approach incurred overhead from serializing signature data between threads, managing worker lifecycle, and coordinating job queues in JS. By moving BLS verification into native code and leveraging the libuv thread pool, the new design reduces per-verification overhead, eliminates JS worker management complexity, and provides more predictable performance characteristics.

Description

This PR replaces Lodestar's previous JavaScript-based multi-threaded BLS signature verification system with a new native NAPI BLS verifier powered by @chainsafe/lodestar-z/bls-batch. The old architecture used a custom JS worker-thread pool to parallelize BLS verification across multiple threads. The new architecture eliminates the JS worker pool entirely and instead delegates verification work directly to native (Rust/C++) code via NAPI, using the libuv thread pool for async concurrency.

Key Changes

New: BlsVerifier class (packages/beacon-node/src/chain/bls/blsVerifier.ts)

A single unified verifier that handles all BLS verification modes:

  • Synchronous main-thread verification — for time-critical paths like gossip blocks where async dispatch latency is unacceptable.
  • Immediate async verification — dispatches work to the libuv thread pool via native async NAPI calls (asyncVerifyIndexed, asyncVerifyAggregate, asyncVerifySingle).
  • Batched verification — accumulates batchable signature sets (up to 32 sets or 100ms) then flushes them as a single batch. On batch failure, retries each caller's sets individually to identify the invalid one(s). This can yield ~50% CPU improvement under normal gossip conditions.
  • Backpressure — tracks inflight jobs against a configurable maximum (40,000) and queues callers when at capacity, preventing unbounded resource consumption.
  • Signature type splitting — splits ISignatureSet arrays into three native-friendly buckets (indexed, aggregate, single) for optimized native codepaths. Indexed sets resolve public keys natively from a shared pubkey cache by validator index.
New: libuv thread pool sizing (packages/cli/src/setUvThreadPool.ts)

Automatically sizes UV_THREADPOOL_SIZE to match available CPU cores (clamped to 4–32), since the native verifier relies on libuv threads instead of JS workers. Set at the earliest CLI entry point before any async I/O.

TODO this doesn't seem to work, we need a better setup here

New: Grafana dashboard (dashboards/lodestar_bls_thread_pool.json)

A monitoring dashboard for the new BLS verifier with metrics for inflight jobs, batch sizes, flush durations, buffer wait times, retry rates, and per-sig-set timing.

Removed

It's unfortunate, but this design requires breaking the old interface, and it doesn't seem worth it to maintain both anymore.

  • The entire multithread/ worker pool subsystem (6 files: index.ts, jobItem.ts, poolSize.ts, types.ts, utils.ts, worker.ts).
  • The singleThread.ts fallback verifier.
  • maybeBatch.ts and utils.ts helpers that supported the old architecture.
  • The --chain.blsVerifyAllMultiThread CLI option (no longer needed — all async verification goes through native calls).
  • Associated tests and test infrastructure for the removed worker pool.
Updated
  • IBlsVerifier interface — streamlined; added verifySignatureSetsSameMessage for efficient same-message batch verification (used in attestation validation).
  • chain.ts — constructs the new BlsVerifier directly instead of the old multi-thread pool.
  • Metrics — replaced worker-pool metrics with verifier-specific metrics (inflight jobs, batch flush duration, buffer wait, retry counters, etc.).
  • Tests — updated mocks and test utilities to match the simplified verifier interface.

This is currently deployed on feat3

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @wemeetagain, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly revamps Lodestar's BLS signature verification logic by transitioning from a JavaScript worker-thread based system to a native NAPI BLS verifier. The motivation behind this change is to overcome the overhead associated with JS worker-thread management, such as data serialization and job coordination, and to leverage native code for improved performance and more predictable characteristics. The new architecture delegates verification tasks directly to native code, utilizing the libuv thread pool for asynchronous concurrency, which is expected to reduce per-verification overhead and enhance overall efficiency, particularly under high-load conditions like gossip processing.

Highlights

  • Native BLS Verifier Implementation: Introduced a new BlsVerifier class that leverages @chainsafe/lodestar-z/bls-batch for native NAPI BLS verification, replacing the previous JavaScript-based worker-thread pool.
  • Flexible Verification Modes: The new BlsVerifier supports synchronous main-thread verification for time-critical operations, immediate async verification via the libuv thread pool, and batched verification for efficiency gains.
  • Optimized Batching and Backpressure: Implemented a batching mechanism that accumulates signature sets (up to 32 sets or 100ms) and flushes them as a single batch, with retry logic for individual invalid sets. Backpressure is applied by tracking inflight jobs against a configurable maximum (40,000).
  • Libuv Thread Pool Sizing: Added logic to automatically size the UV_THREADPOOL_SIZE environment variable to match available CPU cores (clamped to 4–32), ensuring optimal utilization for native async operations.
  • Enhanced Monitoring: A new Grafana dashboard (dashboards/lodestar_bls_thread_pool.json) was added to monitor the native BLS verifier's performance, including metrics for inflight jobs, batch sizes, flush durations, buffer wait times, and retry rates.
  • Removal of Old BLS Verification System: The entire multithread/ worker pool subsystem, singleThread.ts fallback verifier, and associated CLI options and tests have been removed, simplifying the BLS verification architecture.
  • Streamlined Interface and Metrics: The IBlsVerifier interface was streamlined, and existing metrics were updated to reflect the new native verifier's operations, providing more relevant performance insights.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • dashboards/lodestar_bls_thread_pool.json
    • Added new Grafana panels for monitoring the native BLS verifier's throughput, latency, batch performance, and same-message/sync verification metrics.
  • packages/beacon-node/src/chain/bls/blsVerifier.ts
    • Added the new BlsVerifier class, implementing synchronous, asynchronous, and batched BLS signature verification using native NAPI calls.
  • packages/beacon-node/src/chain/bls/index.ts
    • Updated exports to expose the new BlsVerifier and removed references to the old BlsMultiThreadWorkerPool and BlsSingleThreadVerifier.
  • packages/beacon-node/src/chain/bls/interface.ts
    • Removed PublicKey import, updated comments for verifyOnMainThread to clarify its synchronous nature, and refined the description for verifySignatureSetsSameMessage to reflect native pubkey resolution by index.
  • packages/beacon-node/src/chain/bls/maybeBatch.ts
    • Removed the maybeBatch.ts file, as its functionality is superseded by the new native batching logic.
  • packages/beacon-node/src/chain/bls/multithread/index.ts
    • Removed the multithread worker pool implementation.
  • packages/beacon-node/src/chain/bls/multithread/jobItem.ts
    • Removed the jobItem.ts file, which defined job queue items for the old multi-thread system.
  • packages/beacon-node/src/chain/bls/multithread/poolSize.ts
    • Removed the poolSize.ts file, which determined the size of the old worker thread pool.
  • packages/beacon-node/src/chain/bls/multithread/types.ts
    • Removed the types.ts file, which defined types for the old multi-thread worker communication.
  • packages/beacon-node/src/chain/bls/multithread/utils.ts
    • Removed the utils.ts file, which contained utilities for the old multi-thread system.
  • packages/beacon-node/src/chain/bls/multithread/worker.ts
    • Removed the worker.ts file, which implemented the worker logic for the old multi-thread system.
  • packages/beacon-node/src/chain/bls/singleThread.ts
    • Removed the singleThread.ts fallback verifier.
  • packages/beacon-node/src/chain/bls/utils.ts
    • Removed the utils.ts file, which contained BLS utility functions for the old system.
  • packages/beacon-node/src/chain/chain.ts
    • Updated the BeaconChain constructor to instantiate the new BlsVerifier directly, removing the conditional logic for BlsSingleThreadVerifier and BlsMultiThreadWorkerPool.
  • packages/beacon-node/src/chain/options.ts
    • Removed blsVerifyAllMainThread and blsVerifyAllMultiThread options from IChainOptions and defaultChainOptions.
  • packages/beacon-node/src/chain/validation/attestation.ts
    • Updated validateGossipAttestationsSameAttData to pass validator index instead of publicKey to verifySignatureSetsSameMessage, leveraging native pubkey resolution.
  • packages/beacon-node/src/metrics/metrics/lodestar.ts
    • Removed old BLS thread pool and single thread metrics, and added new, comprehensive blsVerifier metrics for the native implementation.
  • packages/beacon-node/test/e2e/api/impl/beacon/block/endpoint.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/api/impl/beacon/node/endpoints.test.ts
    • Removed blsVerifyAllMainThread from test configurations.
  • packages/beacon-node/test/e2e/api/impl/beacon/state/endpoint.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/api/impl/lightclient/endpoint.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/api/lodestar/lodestar.test.ts
    • Removed blsVerifyAllMainThread from test configurations.
  • packages/beacon-node/test/e2e/chain/bls/multithread.test.ts
    • Removed the multi-thread BLS test file.
  • packages/beacon-node/test/e2e/chain/lightclient.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/chain/proposerBoostReorg.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/chain/stateCache/nHistoricalStates.test.ts
    • Removed blsVerifyAllMainThread from test configurations.
  • packages/beacon-node/test/e2e/doppelganger/doppelganger.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/e2e/sync/finalizedSync.test.ts
    • Removed blsVerifyAllMainThread from test configurations.
  • packages/beacon-node/test/e2e/sync/unknownBlockSync.test.ts
    • Removed blsVerifyAllMainThread from test configurations.
  • packages/beacon-node/test/mocks/mockedBls.ts
    • Removed PublicKey import and updated verifySignatureSetsSameMessage to accept index instead of publicKey.
  • packages/beacon-node/test/perf/chain/validation/attestation.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/spec/presets/fork_choice.test.ts
    • Removed blsVerifyAllMainThread from test configuration.
  • packages/beacon-node/test/unit/chain/bls/bls.test.ts
    • Updated BLS unit tests to use the new BlsVerifier and getPubkeyCache for native pubkey resolution.
  • packages/beacon-node/test/unit/chain/bls/utils.test.ts
    • Removed the BLS utility test file.
  • packages/beacon-node/test/unit/chain/validation/attestation/validateGossipAttestationsSameAttData.test.ts
    • Updated attestation validation unit tests to use the new BlsVerifier and getPubkeyCache.
  • packages/beacon-node/test/utils/networkWithMockDb.ts
    • Removed blsVerifyAllMainThread from default network options.
  • packages/beacon-node/test/utils/validationData/attestation.ts
    • Removed blsVerifyAllMainThread option and updated the BLS verifier instantiation to use the new BlsVerifier.
  • packages/cli/src/index.ts
    • Added an import for setUvThreadPool.js to ensure early configuration of the libuv thread pool.
  • packages/cli/src/options/beaconNodeOptions/chain.ts
    • Removed chain.blsVerifyAllMultiThread and chain.blsVerifyAllMainThread CLI options.
  • packages/cli/src/setUvThreadPool.ts
    • Added a new file to dynamically size the UV_THREADPOOL_SIZE environment variable based on available CPU cores.
  • packages/cli/test/unit/options/beaconNodeOptions.test.ts
    • Removed chain.blsVerifyAllMultiThread and chain.blsVerifyAllMainThread from test options.
  • packages/state-transition/test/perf/util.ts
    • Updated createPubkeyCache to getPubkeyCache for consistency with native BLS operations.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@wemeetagain
Copy link
Member Author

@nazarhussain can you re-review :) I moved this from #8900 to here

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

The pull request introduces a native BLS verifier using NAPI and libuv, replacing the previous JavaScript worker-thread pool. This is a significant architectural improvement that should reduce serialization overhead and improve verification throughput. The new BlsVerifier class handles synchronous, asynchronous, and batched verification modes with built-in backpressure. However, there are some issues regarding error handling during shutdown, missing state checks in the batching logic, and potential compatibility issues with older Node.js 18 versions in the thread pool sizing script.

Comment on lines +262 to +265
} catch {
// A signature could be malformed, causing a deserialization error
return false;
} finally {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The catch block in verifyAsync catches all errors, including the Error("BlsVerifier closing") thrown by trackJob when the verifier is shutting down. This causes the verifier to return false (indicating an invalid signature) instead of propagating the closure error. This can lead to incorrect behavior in callers, such as blacklisting valid blocks or attestations during node shutdown. It should distinguish between verification failures and system errors.


/** Run a native async job, waiting for a slot if at capacity */
private async trackJob<T>(fn: () => Promise<T>): Promise<T> {
if (this.inflightJobs >= this.maxInflightJobs) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The trackJob method should check if the verifier is closed at the very beginning. Currently, it only checks this.closed after waiting for a slot in the queue. If there are available slots, it proceeds to execute the job even if close() has been called.

  private async trackJob<T>(fn: () => Promise<T>): Promise<T> {
    if (this.closed) {
      throw Error("BlsVerifier closing");
    }
    if (this.inflightJobs >= this.maxInflightJobs) {

}

/** Enqueue a batchable job into the buffer */
private enqueueBatchable(job: PendingJob, priority: boolean): void {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The enqueueBatchable method does not check if the verifier is closed. This allows new batchable jobs to be accepted and buffered even after close() has been called, potentially leading to leaked promises or delayed rejections.

  private enqueueBatchable(job: PendingJob, priority: boolean): void {
    if (this.closed) {
      job.reject(Error("BlsVerifier closing"));
      return;
    }
    if (!this.buffer) {

// Try aggregate verification first (1 native job)
const isAllValid = await this.trackJob(() =>
blsBatch.asyncVerifySameMessage(
sets.map((s) => ({index: s.index, signature: s.signature})),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The mapping here is redundant as the sets array already contains objects with the required {index, signature} shape. Removing this mapping avoids unnecessary allocations, especially for large batches of attestations.

          sets,

// read once by libuv before the first async I/O, so it must be set at the earliest
// entry point. Respect any explicit user override.

import {availableParallelism} from "node:os";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

availableParallelism was introduced in Node.js v18.14.0. Since Lodestar supports Node.js 18.0.0+, this import will cause a crash on older Node.js 18 versions. A fallback to os.cpus().length or a check for the existence of the function should be added.

* - Backpressure via `canAcceptWork()` using `inflightJobs` counter against `maxInflightJobs`.
*/
export class BlsVerifier implements IBlsVerifier {
private maxInflightJobs = 40_000;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The maxInflightJobs limit is hardcoded to 40,000. While this might be a safe upper bound for memory, it is significantly higher than the previous limit (512) and is not configurable via CLI or constructor options, contrary to what the PR description suggests. Consider making this configurable or explaining the choice of 40,000.

@spiral-ladder spiral-ladder force-pushed the bing/blst-z branch 2 times, most recently from 52d9206 to 740c395 Compare March 3, 2026 09:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants