Skip to content

fix: Fix incomplete renaming of tool-specific methods, and imports in benchall.py.#5

Open
PacificViking wants to merge 2 commits intoy-scope:mainfrom
PacificViking:john_fix_assets_rename
Open

fix: Fix incomplete renaming of tool-specific methods, and imports in benchall.py.#5
PacificViking wants to merge 2 commits intoy-scope:mainfrom
PacificViking:john_fix_assets_rename

Conversation

@PacificViking
Copy link
Contributor

@PacificViking PacificViking commented Jul 28, 2025

Description

In #1, we renamed some of the asset directories but didn't make the corresponding changes to the imports in benchall.py. In addition, we didn't rename all the tool-specific methods for consistency.

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Ran scripts/benchall.py successfully.

Summary by CodeRabbit

Summary by CodeRabbit

  • Refactor
    • Updated benchmark class names for consistency across modules.
    • Enhanced error handling by allowing attachment on error during benchmark runs.

@PacificViking PacificViking requested a review from a team as a code owner July 28, 2025 06:09
@coderabbitai
Copy link

coderabbitai bot commented Jul 28, 2025

Walkthrough

This change renames benchmark classes in four asset modules for consistency and updates all corresponding instantiations and imports. The benchmark orchestration script is updated to reflect the new class names, adjusts the import paths, adds a new parameter to the run function for conditional Docker attachment on error, and modifies the benchmark execution loop to call run with attach_on_error=True while leaving previous calls commented out.

Changes

Cohort / File(s) Change Summary
Benchmark class renaming (ClickHouse)
assets/clickhouse/main.py
Renamed class clickhouse_native_json_bench to clickhouse_bench and updated instantiation in main().
Benchmark class renaming (CLP)
assets/clp/main.py
Renamed class clp_s_bench to clp_bench and updated instantiation in main().
Benchmark class renaming (Presto CLP)
assets/presto_clp/main.py
Renamed class clp_presto_bench to presto_clp_bench and updated instantiation in main().
Benchmark class renaming (Presto Parquet)
assets/presto_parquet/main.py
Renamed class parquet_bench to presto_parquet_bench and updated instantiation in main().
Benchmark orchestration and imports
scripts/benchall.py
Updated imports and benchmark list to use new class names and paths; modified run function to add attach_on_error parameter; changed error handling to use this parameter; replaced unconditional run calls with calls using attach_on_error=True; left previous calls and dataset filtering commented out.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~7 minutes

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d6d1ceb and 3b4aa8e.

📒 Files selected for processing (2)
  • assets/presto_parquet/main.py (2 hunks)
  • scripts/benchall.py (4 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (1)
assets/presto_parquet/main.py (1)
src/template.py (1)
  • Benchmark (55-420)
🪛 Ruff (0.12.2)
scripts/benchall.py

41-41: Trailing comma missing

Add trailing comma

(COM812)


44-44: Trailing comma missing

Add trailing comma

(COM812)


55-55: Missing return type annotation for public function run

Add return type annotation: None

(ANN201)


55-55: Missing type annotation for function argument bencher

(ANN001)


55-55: Missing type annotation for function argument kwargs

(ANN001)


55-55: Missing type annotation for function argument bench_target

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach_on_error

(ANN001)

🔇 Additional comments (5)
assets/presto_parquet/main.py (2)

18-18: LGTM! Class renamed for consistency.

The class name change from parquet_bench to presto_parquet_bench aligns with the naming conventions established for other benchmark classes in the project.


148-148: LGTM! Instantiation updated correctly.

The class instantiation properly reflects the new class name, maintaining consistency with the class definition change.

scripts/benchall.py (3)

3-4: LGTM! Imports updated correctly.

The import statements have been properly updated to reflect the renamed benchmark classes, maintaining consistency across the codebase.

Also applies to: 6-6, 9-9


72-72: LGTM! Error handling logic enhanced.

The updated condition properly incorporates the new attach_on_error parameter, allowing attachment on error even when attach is False.


84-85: LGTM! Execution flow updated appropriately.

The main execution loop has been updated to use attach_on_error=True, which provides better debugging capabilities when benchmarks encounter errors.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🔭 Outside diff range comments (2)
assets/overhead_test/Dockerfile (1)

5-11: Add cleanup flags to keep the Ubuntu image slim

After switching to ubuntu:jammy, the layer now includes the full APT cache, which bloats the final image by ~100 MB. Append an apt-get clean plus removal of /var/lib/apt/lists (and optionally --no-install-recommends) so the benchmark container pulls faster and uses less disk.

-RUN apt-get update \
-    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
-    tmux \
-    vim
+RUN apt-get update \
+    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+       tmux \
+       vim \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
assets/presto_clp/main.py (1)

19-23: Class name violates PEP 8 – prefer PascalCase

Python classes should use CapWords style. Renaming presto_clp_bench to PrestoClpBench improves readability and matches the rest of the ecosystem.

-class presto_clp_bench(Benchmark):
+class PrestoClpBench(Benchmark):

If you adopt this change, remember to update the instantiation at line 105 as well:

-bench = presto_clp_bench(sys.argv[1])
+bench = PrestoClpBench(sys.argv[1])
♻️ Duplicate comments (3)
assets/template/Dockerfile (1)

5-11: Mirror the image-size cleanup used elsewhere

Same remark as in assets/overhead_test/Dockerfile: add --no-install-recommends, apt-get clean and cache removal to avoid shipping unnecessary packages and APT metadata.
Keeping all Dockerfiles consistent also simplifies maintenance.

assets/clp/main.py (1)

8-10: Consistent class casing

Like the Presto benchmark, clp_bench should be ClpBench to follow PEP 8.
(This is the same naming concern; no functional impact.)

assets/clickhouse/main.py (1)

16-19: Apply PEP 8 class-name casing

Change clickhouse_benchClickhouseBench (and update usages) for consistency with standard Python conventions and the other benchmarks once renamed.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 22783a0 and 0834906.

📒 Files selected for processing (9)
  • assets/clickhouse/main.py (2 hunks)
  • assets/clp/main.py (2 hunks)
  • assets/elasticsearch/Dockerfile (1 hunks)
  • assets/gzip/Dockerfile (1 hunks)
  • assets/gzip/main.py (1 hunks)
  • assets/overhead_test/Dockerfile (1 hunks)
  • assets/presto_clp/main.py (2 hunks)
  • assets/template/Dockerfile (1 hunks)
  • scripts/benchall.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
assets/clp/main.py (1)
src/template.py (1)
  • Benchmark (55-420)
assets/gzip/main.py (1)
src/template.py (2)
  • docker_execute (240-272)
  • bench_start (293-307)
assets/presto_clp/main.py (1)
src/template.py (1)
  • Benchmark (55-420)
🪛 Checkov (3.2.334)
assets/elasticsearch/Dockerfile

[LOW] 1-24: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[LOW] 1-24: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.12.0)
assets/gzip/Dockerfile

[warning] 8-8: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)


[info] 8-8: Delete the apt-get lists after installing something

(DL3009)


[info] 8-8: Avoid additional packages by specifying --no-install-recommends

(DL3015)

🪛 Ruff (0.12.2)
assets/gzip/main.py

40-40: Function call with shell=True parameter identified, security issue

(S604)


42-42: Missing return type annotation for public function search

Add return type annotation: int

(ANN201)


42-42: Missing type annotation for function argument query

(ANN001)


46-46: Use of assert detected

(S101)


48-48: Function call with shell=True parameter identified, security issue

(S604)


50-50: Missing return type annotation for public function bench_search

Add return type annotation: None

(ANN201)


50-50: Boolean default positional argument in function definition

(FBT002)


50-50: Missing type annotation for function argument cold

(ANN001)


56-56: Function call with shell=True parameter identified, security issue

(S604)

scripts/benchall.py

41-41: Trailing comma missing

Add trailing comma

(COM812)


55-55: Missing return type annotation for public function run

Add return type annotation: None

(ANN201)


55-55: Missing type annotation for function argument bencher

(ANN001)


55-55: Missing type annotation for function argument kwargs

(ANN001)


55-55: Missing type annotation for function argument bench_target

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach_on_error

(ANN001)

🔇 Additional comments (11)
assets/gzip/Dockerfile (1)

5-5: Verify compatibility with the new base image.

The change from the custom ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-jammy:main image to the standard ubuntu:jammy image removes dependencies that may have been included in the custom image. Ensure that all required tools and libraries for the gzip benchmark are still available.

assets/elasticsearch/Dockerfile (1)

1-1: Consistent base image update aligns with project standardization.

The change from the custom CLP image to ubuntu:jammy is consistent with similar changes across other asset Dockerfiles. This standardization simplifies maintenance and reduces dependencies on custom images.

assets/gzip/main.py (4)

40-40: Appropriate use of shell=True for output redirection.

The addition of shell=True is necessary for the shell redirection (>) to work properly within the Docker container. This change aligns with the base image update in the Dockerfile, which may have affected shell behaviour.


48-48: Consistent shell parameter usage for redirection commands.

The shell=True parameter is correctly added to handle the output redirection. This maintains consistency with the other shell commands in the file.


56-56: Proper shell handling for decompression command.

The addition of shell=True ensures the gunzip command with output redirection executes correctly in the Docker environment.


40-84: Security consideration: shell=True usage is justified.

While static analysis flags the use of shell=True as a security concern, it's necessary here for shell features like output redirection. The commands use controlled file paths and don't include user input, mitigating the security risk.

The security risk is acceptable in this context because:

  1. Commands use predefined file paths (GZIP_FILE_PATH, DECOMPRESSED_FILE_PATH)
  2. No user input is directly interpolated into shell commands
  3. Shell features (redirection) are required for functionality
scripts/benchall.py (5)

3-9: Import fixes successfully address renamed benchmark classes.

The updated imports correctly reflect the renamed benchmark classes, which aligns perfectly with the PR objective "Fix renamed assets import in benchall.py". The changes maintain consistency across the benchmarking framework.


36-53: Benchmark list updated to use new class names.

The benchmarks list correctly references the renamed benchmark functions, ensuring the orchestration script works with the updated class names.


55-55: Enhanced error handling with attach_on_error parameter.

The addition of the attach_on_error parameter provides better debugging capabilities by allowing Docker container attachment specifically when errors occur, even if general attachment is disabled.


73-77: Improved exception handling logic.

The updated condition if attach or attach_on_error: properly implements the new error-specific attachment behaviour, enhancing debugging capabilities without always requiring full attachment.


83-86: No change needed for dataset restriction
The data/ directory currently contains only the mongod subdirectory, so the if dataset_name != 'mongod' check has no effect on today’s benchmarks.

• If you add more datasets in the future, you may revisit or remove this filter then.

Comment on lines 8 to 12
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
tmux \
vim \
gzip
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider Docker best practices for package management.

While not critical for functionality, consider implementing these Docker best practices:

  • Pin package versions for reproducible builds
  • Clean up apt cache to reduce image size
  • Use --no-install-recommends to avoid unnecessary packages
 RUN apt-get update \
-    && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
-    tmux \
-    vim \
-    gzip
+    && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+    tmux=3.2a-4build1 \
+    vim=2:8.2.3458-2ubuntu2.2 \
+    gzip=1.10-4ubuntu4.1 \
+    && apt-get clean \
+    && rm -rf /var/lib/apt/lists/*
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
tmux \
vim \
gzip
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
tmux=3.2a-4build1 \
vim=2:8.2.3458-2ubuntu2.2 \
gzip=1.10-4ubuntu4.1 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*
🧰 Tools
🪛 Checkov (3.2.334)

[LOW] 1-12: Ensure that HEALTHCHECK instructions have been added to container images

(CKV_DOCKER_2)


[LOW] 1-12: Ensure that a user for the container has been created

(CKV_DOCKER_3)

🪛 Hadolint (2.12.0)

[warning] 8-8: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>

(DL3008)


[info] 8-8: Delete the apt-get lists after installing something

(DL3009)


[info] 8-8: Avoid additional packages by specifying --no-install-recommends

(DL3015)

🤖 Prompt for AI Agents
In assets/gzip/Dockerfile around lines 8 to 12, improve the package installation
by pinning package versions to ensure reproducible builds, adding the
--no-install-recommends flag to avoid installing unnecessary packages, and
cleaning up the apt cache after installation to reduce the final image size.
Modify the RUN command to include these changes accordingly.

# Install necessary packages
RUN apt-get update \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
Copy link

@coderabbitai coderabbitai bot Jul 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

Consider removing the --allow-downgrades flag if not necessary.

The addition of --allow-downgrades suggests potential package version conflicts. This flag should only be used if package downgrades are explicitly required for compatibility.

-    && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
+    && DEBIAN_FRONTEND=noninteractive apt-get install -y \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
&& DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
&& DEBIAN_FRONTEND=noninteractive apt-get install -y \
🤖 Prompt for AI Agents
In assets/gzip/Dockerfile at line 9, the use of the --allow-downgrades flag in
the apt-get install command may be unnecessary and could introduce package
version conflicts. Review the package requirements and if no downgrades are
explicitly needed for compatibility, remove the --allow-downgrades flag from the
apt-get install command to avoid unintended downgrades.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--allow-downgrades allows the benchmark to run on the specified version even when ubuntu has a newer version.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PacificViking Thank you for the clarification! I understand now that the --allow-downgrades flag is intentionally used to ensure the benchmark can run with specific package versions for consistency, even when Ubuntu's repositories have newer versions available. This makes perfect sense in a benchmarking context where version consistency is crucial for reliable results.

I appreciate you explaining the rationale behind this decision - it helps me provide better suggestions in the future.

  /|   /|  
 ( :v:  )
  |(_)|

✏️ Learnings added
Learnt from: PacificViking
PR: y-scope/log-archival-bench#5
File: assets/gzip/Dockerfile:0-0
Timestamp: 2025-07-30T13:55:45.493Z
Learning: In benchmarking Docker files, the --allow-downgrades flag in apt-get install commands is intentionally used to ensure benchmarks can run on specified package versions even when Ubuntu has newer versions, maintaining consistency for reliable benchmark results.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

@PacificViking PacificViking changed the title Fix renamed assets import in benchall.py fix: Fix renamed assets import in benchall.py Jul 28, 2025
@PacificViking PacificViking changed the title fix: Fix renamed assets import in benchall.py fix: Fix renamed assets import in benchall.py. Jul 28, 2025
@PacificViking PacificViking force-pushed the john_fix_assets_rename branch from 0834906 to d6d1ceb Compare July 28, 2025 19:14
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0834906 and d6d1ceb.

📒 Files selected for processing (4)
  • assets/clickhouse/main.py (2 hunks)
  • assets/clp/main.py (2 hunks)
  • assets/presto_clp/main.py (2 hunks)
  • scripts/benchall.py (5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
assets/presto_clp/main.py (1)
src/template.py (1)
  • Benchmark (55-420)
assets/clickhouse/main.py (1)
src/template.py (1)
  • Benchmark (55-420)
🪛 Ruff (0.12.2)
scripts/benchall.py

41-41: Trailing comma missing

Add trailing comma

(COM812)


55-55: Missing return type annotation for public function run

Add return type annotation: None

(ANN201)


55-55: Missing type annotation for function argument bencher

(ANN001)


55-55: Missing type annotation for function argument kwargs

(ANN001)


55-55: Missing type annotation for function argument bench_target

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach_on_error

(ANN001)

🔇 Additional comments (9)
assets/presto_clp/main.py (2)

19-19: LGTM! Class renamed for consistency.

The class name change from clp_presto_bench to presto_clp_bench aligns with the standardization effort across benchmark modules and follows a consistent naming pattern.


105-105: LGTM! Instantiation updated correctly.

The class instantiation correctly uses the new class name presto_clp_bench.

assets/clp/main.py (2)

8-8: LGTM! Class renamed for consistency.

The class name change from clp_s_bench to clp_bench simplifies the naming and aligns with the standardization effort across benchmark modules.


64-64: LGTM! Instantiation updated correctly.

The class instantiation correctly uses the new class name clp_bench.

assets/clickhouse/main.py (2)

16-16: LGTM! Class renamed for consistency.

The class name change from clickhouse_native_json_bench to clickhouse_bench simplifies the naming and aligns with the standardization effort across benchmark modules.


157-157: LGTM! Instantiation updated correctly.

The class instantiation correctly uses the new class name clickhouse_bench.

scripts/benchall.py (3)

3-9: LGTM! Import statements updated correctly.

All import statements have been properly updated to reflect the renamed benchmark classes across the different modules.


73-73: LGTM! Exception handling logic improved.

The updated condition correctly implements the new attach_on_error functionality, allowing Docker attachment on errors even when attach is False.


83-86: Clarify the ‘mongod’-only filter in benchall.py

I noticed that under scripts/benchall.py (lines 83–86), the code now skips all datasets except ‘mongod’:

if dataset_name != 'mongod':  # only use mongod for now
    continue
run(bencher, kwargs, bench_target, attach_on_error=True)

However, your data/ directory contains multiple datasets (cockroachdb, elasticsearch, mongod, postgresql, spark-event-logs).
Please confirm whether restricting to ‘mongod’ is a deliberate, permanent change. If it’s only for testing, consider one of the following:

  • Add a TODO comment indicating it’s temporary.
  • Make the filter configurable (e.g. via a CLI flag or config file).
  • Document the rationale in the README.

[scripts/benchall.py:83–86]

Comment on lines +36 to 45
(clp_bench, {}),
(clickhouse_bench, {
'manual_column_names': False,
'keys': [],
'additional_order_by': [],
'timestamp_key': True
}),
(clp_presto_bench, {
(presto_clp_bench, {
'dataset_variation': "cleaned_log"
}),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick (assertive)

LGTM! Benchmark list updated correctly.

The benchmarks list has been properly updated to use the new class names, maintaining the same configuration parameters.

Consider adding a trailing comma after the dictionary on line 44 for consistency:

        (presto_clp_bench, {
            'dataset_variation': "cleaned_log"
-            }),
+            }),

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.12.2)

41-41: Trailing comma missing

Add trailing comma

(COM812)


44-44: Trailing comma missing

Add trailing comma

(COM812)

🤖 Prompt for AI Agents
In scripts/benchall.py around lines 36 to 45, add a trailing comma after the
dictionary on line 44 to maintain consistency in the list formatting. This means
placing a comma after the closing brace of the dictionary assigned to
'dataset_variation' in the presto_clp_bench tuple.

]

def run(bencher, kwargs, bench_target, attach=False):
def run(bencher, kwargs, bench_target, attach=False, attach_on_error=False):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add type annotations and consider parameter naming.

The function signature has been extended appropriately with the attach_on_error parameter. However, several improvements could be made for code quality.

Apply this diff to add type annotations and improve the function signature:

-def run(bencher, kwargs, bench_target, attach=False, attach_on_error=False):
+def run(bencher: type, kwargs: dict, bench_target: Path, attach: bool = False, attach_on_error: bool = False) -> None:

You'll need to add this import at the top of the file:

+from typing import Optional

Committable suggestion skipped: line range outside the PR's diff.

🧰 Tools
🪛 Ruff (0.12.2)

55-55: Missing return type annotation for public function run

Add return type annotation: None

(ANN201)


55-55: Missing type annotation for function argument bencher

(ANN001)


55-55: Missing type annotation for function argument kwargs

(ANN001)


55-55: Missing type annotation for function argument bench_target

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach

(ANN001)


55-55: Boolean default positional argument in function definition

(FBT002)


55-55: Missing type annotation for function argument attach_on_error

(ANN001)

🤖 Prompt for AI Agents
In scripts/benchall.py at line 55, the function run lacks type annotations and
could benefit from clearer parameter naming. Add appropriate type annotations to
the function signature for all parameters, including bencher, kwargs,
bench_target, attach, and attach_on_error. Also, rename parameters if needed to
improve clarity and maintain consistency. Import any necessary typing modules at
the top of the file to support these annotations.

@kirkrodrigues kirkrodrigues changed the title fix: Fix renamed assets import in benchall.py. fix(: Fix incomplete renaming of imports in benchall.py. Jul 30, 2025
@kirkrodrigues kirkrodrigues changed the title fix(: Fix incomplete renaming of imports in benchall.py. fix: Fix incomplete renaming of imports in benchall.py. Jul 30, 2025
@kirkrodrigues kirkrodrigues changed the title fix: Fix incomplete renaming of imports in benchall.py. fix: Fix incomplete renaming of tool-specific methods, and imports in benchall.py. Jul 30, 2025
Copy link
Member

@kirkrodrigues kirkrodrigues Jul 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a reference to clp_presto in a comment on line 60.

from assets.clickhouse.main import clickhouse_bench
from assets.sparksql.main import sparksql_bench
from assets.parquet.main import parquet_bench
from assets.presto_parquet.main import parquet_bench
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename parquet_bench to presto_parquet_bench for consistency?

Comment on lines 83 to 84
if dataset_name != 'mongod': # only use mongod for now
continue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to commit this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope.

run(bencher, kwargs, bench_target)
if dataset_name != 'mongod': # only use mongod for now
continue
#run(bencher, kwargs, bench_target)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we delete this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a change for it but I can't cherry pick it from the initial commit, and it doesn't fit in the scope of this PR. I'll open a separate PR for it when these get merged (or I can put it here).

Copy link
Contributor

@Bill-hbrhbr Bill-hbrhbr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will be fixed one by one in future PRs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants