Skip to content

Conversation

@spencer-tb
Copy link
Contributor

@spencer-tb spencer-tb commented Nov 14, 2025

🗒️ Description

This PR adds a CLI tool benchmark_parser to automatically scan benchmark tests and generate a configuration file .fixed_opcode_counts.json for the --fixed-opcode-count feature from #1747.

Key Changes

  • New CLI tool: uv run benchmark_parser
    • Uses Python AST to scan tests/benchmark/ for tests with @pytest.mark.repricing marker
    • Extracts opcode patterns from @pytest.mark.parametrize decorators
    • Generates .fixed_opcode_counts.json at repo root with opcode counts mapping
    • Supports --check mode for CI validation: uv run benchmark_parser --check
  • Config file format: .fixed_opcode_counts.json
    • Gitignored (user-local configuration)
    • All patterns default to [1] (1K opcodes)
    • Users can customize counts per pattern, [1, 10, 100] for 1K, 10K, 100K, by manually editing the file
    • Custom counts are preserved when re-running the parser.
  • Help text improvements:
    • Added benchmark options to fill --fill-help and execute remote --execute-remote-help
    • Simplified help text with examples for --gas-benchmark-values and --fixed-opcode-count
  • Test updates:
    • Renamed op parameters to opcode in test_arithmetic.py for consistency

Usage

  1. Generate/update config (first time or after benchmark test changes): uv run benchmark_parser
  2. Customize counts by editing .fixed_opcode_counts.json:
{
  "scenario_configs": {
    "test_codecopy.*": [
      1
    ],
    ...
}
  1. Run with configured opcode counts:
# Fill fixtures (useful for fast one shot checks if count is 1)
uv run fill --fixed-opcode-count --fork Prague -m repricing tests/benchmark

# Execute on remote RPC
uv run execute remote --fixed-opcode-count --fork Prague -m repricing tests/benchmark --rpc-seed-key <key> --rpc-endpoint <url> --chain-id <id>

Fill works correctly and I tested execute remote on Hoodi with 1K opcode count set, the latest txs: https://hoodi.etherscan.io/address/0x83fd666bfb2b345f932c3e4e04b6d85e5ed3568d

Future Items

  • Add CI for fill/execute with --fixed-opcode-count after generating the config file.
  • Verify --fixed-opcode-count with debug_traceTransaction using execute hive.
  • Add documentation & framework tests.

🔗 Related Issues or PRs

#1747

✅ Checklist

  • All: Ran fast tox checks to avoid unnecessary CI fails, see also Code Standards and Enabling Pre-commit Checks:
    uvx tox -e static
  • All: PR title adheres to the repo standard - it will be used as the squash commit message and should start type(scope):.
  • All: Considered adding an entry to CHANGELOG.md.
  • All: Considered updating the online docs in the ./docs/ directory.
  • All: Set appropriate labels for the changes (only maintainers can apply labels).

@spencer-tb spencer-tb added C-enhance Category: an improvement or new feature A-test-benchmark Area: Tests Benchmarks—Performance measurement (eg. `tests/benchmark/*`, `p/t/s/e/benchmark/*`) P-high labels Nov 15, 2025
Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for this! I left some suggestions, but I’m happy to discuss further. I’ll share this with Kamil to confirm it aligns with their needs.

Comment on lines 165 to 184
if has_repricing:
if fixed_opcode_counts:
opcode_counts = [
int(x.strip()) for x in fixed_opcode_counts.split(",")
opcode_counts_to_use = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This contains a few logic issues, but they will be resolved in my upcoming PR.

@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch from 7ca99dc to 7c68d19 Compare December 4, 2025 16:34
@codecov
Copy link

codecov bot commented Dec 4, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.31%. Comparing base (2a6f9ee) to head (4427257).
⚠️ Report is 1 commits behind head on forks/osaka.

Additional details and impacted files
@@             Coverage Diff              @@
##           forks/osaka    #1790   +/-   ##
============================================
  Coverage        87.31%   87.31%           
============================================
  Files              541      541           
  Lines            32832    32832           
  Branches          3015     3015           
============================================
  Hits             28668    28668           
  Misses            3557     3557           
  Partials           607      607           
Flag Coverage Δ
unittests 87.31% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@LouisTsai-Csie LouisTsai-Csie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some suggestion for the parser! Thanks

Comment on lines 9 to 11
uv run python parser.py --check # Check for new/missing entries (CI)
uv run python parser.py --dry-run # Show config without changes
uv run python parser.py --update # Update fixed_opcode_counts.py
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure but this seems not working for me. I still need to specify the path like:

uv run tests/benchmark/configs/parser.py --check

I also wonder whether we should place this script under packages/testing/src/execution_testing/cli/, since most related scripts, such as extract_config and diff_opcode_count, are located there. wdyt?

Comment on lines 410 to 414
parser.add_argument(
"--no-filter",
action="store_true",
help="Include all benchmark tests, not just those with code_generator",
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My personal view: we could remove this and related logic, wdyt?

Looking into the fixed-opcode-count logic, only those test that are (1) BenchmarkTestFiller and (2) code_generator could support this mode.

This constraint is added in PR #1810 with comment.

Comment on lines 37 to 49
if not node.name.startswith("test_"):
self.generic_visit(node)
return

# Check if function has benchmark_test parameter
if not self._has_benchmark_test_param(node):
self.generic_visit(node)
return

# Optional: Filter for code generator usage
if self.filter_code_gen and not self._uses_code_generator(node):
self.generic_visit(node)
return
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure whether we need self.generic_visit(node) in each of the if–else branches. My understanding is that once a condition matches, it will recursively visit the nested function anyway.

This seems relevant only for cases like:

def test_function1(...):

    def test_function2(...):
        ...

If so, maybe we could remove the logic here:

Suggested change
if not node.name.startswith("test_"):
self.generic_visit(node)
return
# Check if function has benchmark_test parameter
if not self._has_benchmark_test_param(node):
self.generic_visit(node)
return
# Optional: Filter for code generator usage
if self.filter_code_gen and not self._uses_code_generator(node):
self.generic_visit(node)
return
if not node.name.startswith("test_"):
return
# Check if function has benchmark_test parameter
if not self._has_benchmark_test_param(node):
return
# Optional: Filter for code generator usage
if self.filter_code_gen and not self._uses_code_generator(node):
return

Comment on lines 100 to 102
# Check if "opcode" or "op" is in parameter names
if "opcode" not in param_str and param_str not in ("op", "op,"):
continue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest only use opcode as the keyword, instead of three combinations. Here we need a detailed documentation!

This would requires us to refactor the parametrization for the following tests:

  • test_arithmetic.py: test_mod (op), test_mod_arithmetic (op),

Comment on lines 138 to 157
# Case 1: Op.ADD
if isinstance(node, ast.Attribute):
return node.attr

# Case 2: pytest.param(Op.ADD, ...) or pytest.param((Op.ADD, x), ...)
if isinstance(node, ast.Call):
if len(node.args) > 0:
first_arg = node.args[0]
if isinstance(first_arg, ast.Attribute):
return first_arg.attr
elif isinstance(first_arg, ast.Tuple) and first_arg.elts:
first_elem = first_arg.elts[0]
if isinstance(first_elem, ast.Attribute):
return first_elem.attr

# Case 3: Tuple like (Op.ADD, args)
if isinstance(node, ast.Tuple) and node.elts:
first_elem = node.elts[0]
if isinstance(first_elem, ast.Attribute):
return first_elem.attr
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More explanation for the logic here! Suggestion:

Case 1: Direct opcode reference
Example: @pytest.mark.parametrize("opcode", [Op.ADD, Op.MUL])
Result: ADD, MUL

Case 2a: Pytest param with tuple
Example: @pytest.mark.parametrize("opcode,args", [pytest.param((Op.ADD, 123))])
Result: ADD

Case 2b: Plain tuples
Example: @pytest.mark.parametrize("opcode,args", [(Op.ADD, 123), (Op.MUL, 456)])
Result: ADD, MUL

And here we assume the parametrization is always "opcode", this should be documented somewhere.


IIUC, this would not be recognized

@pytest.mark.parametrize("opcode_args,opcode", [pytest.param((123, Op.ADD))])

Comment on lines 205 to 221
categories: dict[str, list[str]] = {
"ACCOUNT QUERY OPERATIONS": [],
"ARITHMETIC OPERATIONS": [],
"BITWISE OPERATIONS": [],
"BLOCK CONTEXT OPERATIONS": [],
"CALL CONTEXT OPERATIONS": [],
"COMPARISON OPERATIONS": [],
"CONTROL FLOW OPERATIONS": [],
"HASHING OPERATIONS": [],
"LOGGING OPERATIONS": [],
"MEMORY OPERATIONS": [],
"STACK OPERATIONS": [],
"STORAGE OPERATIONS": [],
"SYSTEM OPERATIONS": [],
"TRANSACTION CONTEXT OPERATIONS": [],
"OTHER": [],
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current keyword-based solution requires manually update the mapping, but i notice a pattern here:

"ACCOUNT QUERY OPERATIONS": [
      "selfbalance",
      "codesize",
      "ext_account",
      "BALANCE",
      "EXTCODE",
  ],

The key (ACCOUNT QUERY OPERATIONS) here could be derived from the file name:
test_account_query_operations.py. So when we loop over the benchmark test, we could also record its parent file (in scan_benchmark_tests function).

And then update the categorize_patterns function like this:

    ...

    categories: dict[str, list[str]] = {}

    for pattern in config.keys():
        if pattern not in pattern_sources:
            # Fallback for patterns without source info
            category = "OTHER"
        else:
            source_file = pattern_sources[pattern]
            # Extract test file name: test_arithmetic.py -> arithmetic
            file_name = source_file.stem  # Gets filename without extension
            if file_name.startswith("test_"):
                category_name = file_name[5:]  # Remove "test_" prefix
                # arithmetic -> ARITHMETIC OPERATIONS
                category = (
                    f"{category_name.upper().replace('_', ' ')} OPERATIONS"
                )
            else:
                category = "OTHER"

        if category not in categories:
            categories[category] = []
        categories[category].append(pattern)

    # Sort patterns within each category and sort categories alphabetically
    return {k: sorted(v) for k, v in sorted(categories.items())}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on this approach, we could remove category_keywords

@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch 2 times, most recently from 5506d25 to 14b6b64 Compare December 5, 2025 19:15
@spencer-tb spencer-tb force-pushed the enhance/benchmarking/fixed-opcode-count-config branch from 14b6b64 to 697a7d0 Compare December 5, 2025 19:18
@spencer-tb spencer-tb marked this pull request as ready for review December 5, 2025 19:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-test-benchmark Area: Tests Benchmarks—Performance measurement (eg. `tests/benchmark/*`, `p/t/s/e/benchmark/*`) C-enhance Category: an improvement or new feature P-high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants