Skip to content

feat(consume): add consume enginex simulator #1765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

danceratopz
Copy link
Member

@danceratopz danceratopz commented Jun 18, 2025

🗒️ Description

Adds a simulator consume enginex that runs BlockchainEngineXFixture against clients which has the potential to speed-up consensus test execution via by 10-50x.

Running The Simulator & Initial results

I've been testing locally with a subset of 1846 tests that create 29 groups from Cancun:

uv run fill --output=fixtures --clean -x -m "not zkevm and not slow" tests/cancun --fork=Cancun --evm-bin=../evmone/build/bin/evmone-t8n -n 8 --generate-pre-alloc-groups

Then consume against a hive dev server:

uv run consume enginex --input=fixtures-cancun --durations=5 --dist=loadgroup -n 8 --enginex-fcu-frequency=0

Results for 1846 tests with 29 groups (=^ 29 client initializations):

  • reth: 85.37s (0:01:25).
  • besu: 361.71s (0:06:01).
    (client versions at Pectra Fork).

FCU Behavior

# Disable all FCUs (fastest execution)
uv run consume enginex --enginex-fcu-frequency=0 fixtures/

# FCU every test (current behavior)  
uv run consume enginex --enginex-fcu-frequency=1 fixtures/

# FCU every 3rd test per pre-allocation group
uv run consume enginex --enginex-fcu-frequency=3 fixtures/

Xdist Behavior

Tests get distributed to xdist worker by pre-allocation group using loadgroup:

uv run consume enginex --input=fixtures-cancun --durations=5 --dist=loadgroup -n 8 --enginex-fcu-frequency=0

🔗 Related Issues

✅ Checklist

  • All: Set appropriate labels for the changes.
  • All: Considered squashing commits to improve commit history.
  • All: Added an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • Tests: Ran mkdocs serve locally and verified the auto-generated docs for new tests in the Test Case Reference are correctly formatted.

@danceratopz danceratopz self-assigned this Jun 18, 2025
@danceratopz danceratopz added type:feat type: Feature scope:consume Scope: Consume command suite labels Jun 18, 2025
@danceratopz danceratopz force-pushed the feat/consume-enginex branch from a2411f5 to b5a12fb Compare June 25, 2025 08:17
@danceratopz danceratopz force-pushed the feat/consume-enginex branch 2 times, most recently from c49dd35 to 19cf929 Compare June 30, 2025 14:38
Copy link
Contributor

@spencer-tb spencer-tb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some small comments (mostly optional suggestions)! Will continue reviewing tmo :)

Still amazed at the speed up!!


# Store on session for later retrieval by test_tracker fixture
session._pre_alloc_group_counts = group_counts
logger.info(f"Collected {len(group_counts)} groups with tests: {dict(group_counts)}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could consider adding {dict(group_counts)} to the debug log level while keeping the group count at the info log level. I only say this as on my first run this looked quite sporradic :D

fcu_frequency = getattr(request.config, "enginex_fcu_frequency", 1)

tracker = FCUFrequencyTracker(fcu_frequency=fcu_frequency)
logger.info(f"FCU frequency tracker initialized with frequency: {fcu_frequency}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe only log if not 0 (default) assuming the above. I.e only log if the flag is provided.

Suggested change
logger.info(f"FCU frequency tracker initialized with frequency: {fcu_frequency}")
logger.info(f"FCU frequency tracker initialized with frequency: {fcu_frequency}")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or debug level logs.


def __init__(self, max_group_size: int = 400):
"""Initialize the mapper with a maximum group size."""
self.max_group_size = max_group_size
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could be nice to add another check here for max group size, re division by zero. Even though the flag checks this.

CACHED_DOWNLOADS_DIRECTORY = (
Path(platformdirs.user_cache_dir("ethereum-execution-spec-tests")) / "cached_downloads"
)


class XDistGroupMapper:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm very happy with this here already, but might be nice to consider moving it to its own file in helpers!

Comment on lines +100 to +112
group_counts = {}
for item in items:
if hasattr(item, "callspec") and "test_case" in item.callspec.params:
test_case = item.callspec.params["test_case"]
if hasattr(test_case, "pre_hash"):
# Get group identifier from xdist marker if available
group_identifier = None
for marker in item.iter_markers("xdist_group"):
if hasattr(marker, "kwargs") and "name" in marker.kwargs:
group_identifier = marker.kwargs["name"]
break

# Fallback to pre_hash if no xdist marker (sequential execution)
if group_identifier is None:
group_identifier = test_case.pre_hash

group_counts[group_identifier] = group_counts.get(group_identifier, 0) + 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are duplicating this logic I believe. In both the if and else. Maybe we can shift it above the if/else?

Comment on lines +31 to +35
Args:
request: The pytest request object containing test metadata
pre_hash: The pre-allocation group hash

Returns:
Group identifier string to use for client tracking
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personal nit again, as not a huge fan of these in the docstrings. There are some more like this within this file. Won't push hard though, just personal preference

Copy link
Member

@marioevz marioevz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gave this a quick lock and I have a couple of comments, we can chat about it when you have time.

Comment on lines +349 to +353
def __new__(cls) -> "MultiTestClientManager":
"""Ensure only one instance of MultiTestClientManager exists."""
if cls._instance is None:
cls._instance = super().__new__(cls)
cls._instance._initialized = False
return cls._instance
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of this to have a single client instance across all xdist workers?
If so, I don't think it'll work because multi_test_client_manager(), even though marked with scope="session", will have one instance per worker.
If we want two different multi_test_client_manager instances from two different workers to access the same client, we need to have inter-process comms.
Here's an example:

@pytest.fixture(autouse=True, scope="session")
def client(
base_hive_test: HiveTest,
client_files: dict,
environment: dict,
client_type: ClientType,
session_temp_folder: Path,
) -> Generator[Client, None, None]:
"""Initialize the client with the appropriate files and environment variables."""
base_name = "hive_client"
base_file = session_temp_folder / base_name
base_error_file = session_temp_folder / f"{base_name}.err"
base_lock_file = session_temp_folder / f"{base_name}.lock"
client: Client | None = None
with FileLock(base_lock_file):
if not base_error_file.exists():
if base_file.exists():
with open(base_file, "r") as f:
client = Client(**json.load(f))
else:
base_error_file.touch() # Assume error
client = base_hive_test.start_client(
client_type=client_type, environment=environment, files=client_files
)
if client is not None:
base_error_file.unlink() # Success
with open(base_file, "w") as f:
json.dump(
asdict(replace(client, config=None)), # type: ignore
f,
)
error_message = (
f"Unable to connect to the client container ({client_type.name}) via Hive during test "
"setup. Check the client or Hive server logs for more information."
)
assert client is not None, error_message
users_file_name = f"{base_name}_users"
users_file = session_temp_folder / users_file_name
users_lock_file = session_temp_folder / f"{users_file_name}.lock"
with FileLock(users_lock_file):
if users_file.exists():
with open(users_file, "r") as f:
users = json.load(f)
else:
users = 0
users += 1
with open(users_file, "w") as f:
json.dump(users, f)
yield client
with FileLock(users_lock_file):
with open(users_file, "r") as f:
users = json.load(f)
users -= 1
with open(users_file, "w") as f:
json.dump(users, f)
if users == 0:
client.stop()
base_file.unlink()
users_file.unlink()

Copy link
Member Author

@danceratopz danceratopz Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This refers to only creating once instance of the client manager. It doesn't impose any contraints to how those clients are used by the manager.

It's a "multi-test client manager" in the sense "it manages a client on which multiple tests can be executed". However, what clearly needs better documentation/explanation is that this client will only ever execute payloads from tests from exactly one pre-alloc-group and never will from a different pre-alloc group. There's a 1-to-1 mapping between (sub-)groups and clients; "sub-" is explained below.

This behavior is enabled by the xdist flag --dist=loadgroup which distributes tests by groups which we define as pre-alloc group. Unless --enginex-max-group-size=N is used, then sub-groups of max size test size N are created from the pre-alloc group. And each sub-group then maps 1-to-1 to a client instance.

The label is applied here (comment should be improved):

# Add xdist group marker for load balancing
markers.append(pytest.mark.xdist_group(name=xdist_group_name))

This means that we only ever send payloads from tests from a specific (sub-)group to any one worker (and a group ^= client instance). So there is never any cross-thread interaction with regards to clients. Each worker is responsible for one client at a time, and on that single client, the tests from a (sub-)group are executed sequentially. Therefore, there's no need to lock clients. It's quite the trick, but it appears to work adequately.

Btw, I need to add --dist=loadgroup to enginex's pytest_configure() by default. This is a small change that's on the todo list.

Do you think there's an argument to allow multiple workers to send payloads to the same client instance?

Copy link
Member Author

@danceratopz danceratopz Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About --enginex-max-group-size=N. If you have 8 groups and 8 workers, but say group-0 has a 1000 tests and all other groups have 100 tests, then we'll clearly be limited by the worker that gets group-0. If we set this flag to 100, then we'll get 19 groups of equal size across 8 workers. Not optimal, but much better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting the default for --enginex-max-group-size=100 seems reasonable imo!



@dataclass
class FCUFrequencyTracker:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one will also need inter-process comms, then we might need to lock the client when we send an FCU to a specific client.

@@ -150,7 +168,7 @@ def test_blockchain_via_engine(
f"Unexpected error code: {e.code}, expected: {payload.error_code}"
) from e

if payload.valid():
if payload.valid() and should_perform_fcus:
with payload_timing.time(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of questions:

  • Should we lock the client here
  • Should we fcu back to genesis after the fcu is completed
  • If we don't fcu back to genesis, what happens when:
    • On test A, we FcU to block A, where G <- A, now canonical head is A
    • On test B, we new-payload to block B, where G <- B

@danceratopz danceratopz force-pushed the feat/consume-enginex branch from 743e6f7 to f2ec3eb Compare July 1, 2025 08:48
@spencer-tb
Copy link
Contributor

spencer-tb commented Jul 1, 2025

Just some additional ideas (we spoke about already) for enginex. Either in this PR or follow-ups:

  • Always apply --dist=loadgroup as the default. To me it makes sense to make this the default behaviour and looks dangerous for this to not be the case. My first run was on the hive server with -n 16 and no --dist=loadgroup. This lead to the creation of 200+ instances of geth and max file usage on my user 😆

  • Automatic optimised group/test distribution primarily based on available cores. The primary basis being performance. This will be different for running all the static tests vs. just the fusaka tests. My message in discord below.

It would be cool to derive the equation that gives us an optima (using the worst case client times), where -n/--enginex-max-group-size/number of tests are the parameters. Then maybe define a flag --auto-parallelize that sets -n/--enginex-max-group-size accordingly for ~ the fastest runtime based on the number of tests. This flag would only be used in CI or the hive server etc!

@danceratopz danceratopz force-pushed the feat/consume-enginex branch from 9855463 to 6e7140e Compare July 15, 2025 11:19
danceratopz and others added 5 commits August 11, 2025 11:09
Also removes a file that was unintentionally added in #1718.
This create pre_hash subgroups w/max size for better xdist balancing.

- Replace function name check with fixture format detection to properly
distinguish between engine and enginex simulators. Both use the same
test function name but process different fixture formats:

  - Engine simulator: "blockchain_test_engine" → standard parametrization
  - EngineX simulator: "blockchain_test_engine_x" → enhanced parametrization with xdist group splitting and load balancing

- Fixes unit test failures by checking for 'test_case' parameter presence instead of maintaining an allowlist of function names.
@danceratopz danceratopz force-pushed the feat/consume-enginex branch from e91d29b to cdf62c0 Compare August 11, 2025 10:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
scope:consume Scope: Consume command suite type:feat type: Feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants