fix: Fix incomplete renaming of tool-specific methods, and imports in benchall.py.#5
fix: Fix incomplete renaming of tool-specific methods, and imports in benchall.py.#5PacificViking wants to merge 2 commits intoy-scope:mainfrom
Conversation
WalkthroughThis change renames benchmark classes in four asset modules for consistency and updates all corresponding instantiations and imports. The benchmark orchestration script is updated to reflect the new class names, adjusts the import paths, adds a new parameter to the Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🧰 Additional context used🧬 Code Graph Analysis (1)assets/presto_parquet/main.py (1)
🪛 Ruff (0.12.2)scripts/benchall.py41-41: Trailing comma missing Add trailing comma (COM812) 44-44: Trailing comma missing Add trailing comma (COM812) 55-55: Missing return type annotation for public function Add return type annotation: (ANN201) 55-55: Missing type annotation for function argument (ANN001) 55-55: Missing type annotation for function argument (ANN001) 55-55: Missing type annotation for function argument (ANN001) 55-55: Boolean default positional argument in function definition (FBT002) 55-55: Missing type annotation for function argument (ANN001) 55-55: Boolean default positional argument in function definition (FBT002) 55-55: Missing type annotation for function argument (ANN001) 🔇 Additional comments (5)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Actionable comments posted: 2
🔭 Outside diff range comments (2)
assets/overhead_test/Dockerfile (1)
5-11: Add cleanup flags to keep the Ubuntu image slimAfter switching to
ubuntu:jammy, the layer now includes the full APT cache, which bloats the final image by ~100 MB. Append anapt-get cleanplus removal of/var/lib/apt/lists(and optionally--no-install-recommends) so the benchmark container pulls faster and uses less disk.-RUN apt-get update \ - && DEBIAN_FRONTEND=noninteractive apt-get install -y \ - tmux \ - vim +RUN apt-get update \ + && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ + tmux \ + vim \ + && apt-get clean \ + && rm -rf /var/lib/apt/lists/*assets/presto_clp/main.py (1)
19-23: Class name violates PEP 8 – preferPascalCasePython classes should use
CapWordsstyle. Renamingpresto_clp_benchtoPrestoClpBenchimproves readability and matches the rest of the ecosystem.-class presto_clp_bench(Benchmark): +class PrestoClpBench(Benchmark):If you adopt this change, remember to update the instantiation at line 105 as well:
-bench = presto_clp_bench(sys.argv[1]) +bench = PrestoClpBench(sys.argv[1])
♻️ Duplicate comments (3)
assets/template/Dockerfile (1)
5-11: Mirror the image-size cleanup used elsewhereSame remark as in
assets/overhead_test/Dockerfile: add--no-install-recommends,apt-get cleanand cache removal to avoid shipping unnecessary packages and APT metadata.
Keeping all Dockerfiles consistent also simplifies maintenance.assets/clp/main.py (1)
8-10: Consistent class casingLike the Presto benchmark,
clp_benchshould beClpBenchto follow PEP 8.
(This is the same naming concern; no functional impact.)assets/clickhouse/main.py (1)
16-19: Apply PEP 8 class-name casingChange
clickhouse_bench→ClickhouseBench(and update usages) for consistency with standard Python conventions and the other benchmarks once renamed.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (9)
assets/clickhouse/main.py(2 hunks)assets/clp/main.py(2 hunks)assets/elasticsearch/Dockerfile(1 hunks)assets/gzip/Dockerfile(1 hunks)assets/gzip/main.py(1 hunks)assets/overhead_test/Dockerfile(1 hunks)assets/presto_clp/main.py(2 hunks)assets/template/Dockerfile(1 hunks)scripts/benchall.py(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (3)
assets/clp/main.py (1)
src/template.py (1)
Benchmark(55-420)
assets/gzip/main.py (1)
src/template.py (2)
docker_execute(240-272)bench_start(293-307)
assets/presto_clp/main.py (1)
src/template.py (1)
Benchmark(55-420)
🪛 Checkov (3.2.334)
assets/elasticsearch/Dockerfile
[LOW] 1-24: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[LOW] 1-24: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.12.0)
assets/gzip/Dockerfile
[warning] 8-8: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>
(DL3008)
[info] 8-8: Delete the apt-get lists after installing something
(DL3009)
[info] 8-8: Avoid additional packages by specifying --no-install-recommends
(DL3015)
🪛 Ruff (0.12.2)
assets/gzip/main.py
40-40: Function call with shell=True parameter identified, security issue
(S604)
42-42: Missing return type annotation for public function search
Add return type annotation: int
(ANN201)
42-42: Missing type annotation for function argument query
(ANN001)
46-46: Use of assert detected
(S101)
48-48: Function call with shell=True parameter identified, security issue
(S604)
50-50: Missing return type annotation for public function bench_search
Add return type annotation: None
(ANN201)
50-50: Boolean default positional argument in function definition
(FBT002)
50-50: Missing type annotation for function argument cold
(ANN001)
56-56: Function call with shell=True parameter identified, security issue
(S604)
scripts/benchall.py
41-41: Trailing comma missing
Add trailing comma
(COM812)
55-55: Missing return type annotation for public function run
Add return type annotation: None
(ANN201)
55-55: Missing type annotation for function argument bencher
(ANN001)
55-55: Missing type annotation for function argument kwargs
(ANN001)
55-55: Missing type annotation for function argument bench_target
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach_on_error
(ANN001)
🔇 Additional comments (11)
assets/gzip/Dockerfile (1)
5-5: Verify compatibility with the new base image.The change from the custom
ghcr.io/y-scope/clp/clp-core-dependencies-x86-ubuntu-jammy:mainimage to the standardubuntu:jammyimage removes dependencies that may have been included in the custom image. Ensure that all required tools and libraries for the gzip benchmark are still available.assets/elasticsearch/Dockerfile (1)
1-1: Consistent base image update aligns with project standardization.The change from the custom CLP image to
ubuntu:jammyis consistent with similar changes across other asset Dockerfiles. This standardization simplifies maintenance and reduces dependencies on custom images.assets/gzip/main.py (4)
40-40: Appropriate use ofshell=Truefor output redirection.The addition of
shell=Trueis necessary for the shell redirection (>) to work properly within the Docker container. This change aligns with the base image update in the Dockerfile, which may have affected shell behaviour.
48-48: Consistent shell parameter usage for redirection commands.The
shell=Trueparameter is correctly added to handle the output redirection. This maintains consistency with the other shell commands in the file.
56-56: Proper shell handling for decompression command.The addition of
shell=Trueensures the gunzip command with output redirection executes correctly in the Docker environment.
40-84: Security consideration:shell=Trueusage is justified.While static analysis flags the use of
shell=Trueas a security concern, it's necessary here for shell features like output redirection. The commands use controlled file paths and don't include user input, mitigating the security risk.The security risk is acceptable in this context because:
- Commands use predefined file paths (
GZIP_FILE_PATH,DECOMPRESSED_FILE_PATH)- No user input is directly interpolated into shell commands
- Shell features (redirection) are required for functionality
scripts/benchall.py (5)
3-9: Import fixes successfully address renamed benchmark classes.The updated imports correctly reflect the renamed benchmark classes, which aligns perfectly with the PR objective "Fix renamed assets import in benchall.py". The changes maintain consistency across the benchmarking framework.
36-53: Benchmark list updated to use new class names.The benchmarks list correctly references the renamed benchmark functions, ensuring the orchestration script works with the updated class names.
55-55: Enhanced error handling withattach_on_errorparameter.The addition of the
attach_on_errorparameter provides better debugging capabilities by allowing Docker container attachment specifically when errors occur, even if general attachment is disabled.
73-77: Improved exception handling logic.The updated condition
if attach or attach_on_error:properly implements the new error-specific attachment behaviour, enhancing debugging capabilities without always requiring full attachment.
83-86: No change needed for dataset restriction
Thedata/directory currently contains only themongodsubdirectory, so theif dataset_name != 'mongod'check has no effect on today’s benchmarks.• If you add more datasets in the future, you may revisit or remove this filter then.
| RUN apt-get update \ | ||
| && DEBIAN_FRONTEND=noninteractive apt-get install -y \ | ||
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \ | ||
| tmux \ | ||
| vim \ | ||
| gzip |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider Docker best practices for package management.
While not critical for functionality, consider implementing these Docker best practices:
- Pin package versions for reproducible builds
- Clean up apt cache to reduce image size
- Use
--no-install-recommendsto avoid unnecessary packages
RUN apt-get update \
- && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
- tmux \
- vim \
- gzip
+ && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
+ tmux=3.2a-4build1 \
+ vim=2:8.2.3458-2ubuntu2.2 \
+ gzip=1.10-4ubuntu4.1 \
+ && apt-get clean \
+ && rm -rf /var/lib/apt/lists/*📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| RUN apt-get update \ | |
| && DEBIAN_FRONTEND=noninteractive apt-get install -y \ | |
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \ | |
| tmux \ | |
| vim \ | |
| gzip | |
| RUN apt-get update \ | |
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \ | |
| tmux=3.2a-4build1 \ | |
| vim=2:8.2.3458-2ubuntu2.2 \ | |
| gzip=1.10-4ubuntu4.1 \ | |
| && apt-get clean \ | |
| && rm -rf /var/lib/apt/lists/* |
🧰 Tools
🪛 Checkov (3.2.334)
[LOW] 1-12: Ensure that HEALTHCHECK instructions have been added to container images
(CKV_DOCKER_2)
[LOW] 1-12: Ensure that a user for the container has been created
(CKV_DOCKER_3)
🪛 Hadolint (2.12.0)
[warning] 8-8: Pin versions in apt get install. Instead of apt-get install <package> use apt-get install <package>=<version>
(DL3008)
[info] 8-8: Delete the apt-get lists after installing something
(DL3009)
[info] 8-8: Avoid additional packages by specifying --no-install-recommends
(DL3015)
🤖 Prompt for AI Agents
In assets/gzip/Dockerfile around lines 8 to 12, improve the package installation
by pinning package versions to ensure reproducible builds, adding the
--no-install-recommends flag to avoid installing unnecessary packages, and
cleaning up the apt cache after installation to reduce the final image size.
Modify the RUN command to include these changes accordingly.
assets/gzip/Dockerfile
Outdated
| # Install necessary packages | ||
| RUN apt-get update \ | ||
| && DEBIAN_FRONTEND=noninteractive apt-get install -y \ | ||
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \ |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
Consider removing the --allow-downgrades flag if not necessary.
The addition of --allow-downgrades suggests potential package version conflicts. This flag should only be used if package downgrades are explicitly required for compatibility.
- && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \
+ && DEBIAN_FRONTEND=noninteractive apt-get install -y \📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| && DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-downgrades \ | |
| && DEBIAN_FRONTEND=noninteractive apt-get install -y \ |
🤖 Prompt for AI Agents
In assets/gzip/Dockerfile at line 9, the use of the --allow-downgrades flag in
the apt-get install command may be unnecessary and could introduce package
version conflicts. Review the package requirements and if no downgrades are
explicitly needed for compatibility, remove the --allow-downgrades flag from the
apt-get install command to avoid unintended downgrades.
There was a problem hiding this comment.
--allow-downgrades allows the benchmark to run on the specified version even when ubuntu has a newer version.
There was a problem hiding this comment.
@PacificViking Thank you for the clarification! I understand now that the --allow-downgrades flag is intentionally used to ensure the benchmark can run with specific package versions for consistency, even when Ubuntu's repositories have newer versions available. This makes perfect sense in a benchmarking context where version consistency is crucial for reliable results.
I appreciate you explaining the rationale behind this decision - it helps me provide better suggestions in the future.
/| /|
( :v: )
|(_)|
✏️ Learnings added
Learnt from: PacificViking
PR: y-scope/log-archival-bench#5
File: assets/gzip/Dockerfile:0-0
Timestamp: 2025-07-30T13:55:45.493Z
Learning: In benchmarking Docker files, the --allow-downgrades flag in apt-get install commands is intentionally used to ensure benchmarks can run on specified package versions even when Ubuntu has newer versions, maintaining consistency for reliable benchmark results.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
0834906 to
d6d1ceb
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (4)
assets/clickhouse/main.py(2 hunks)assets/clp/main.py(2 hunks)assets/presto_clp/main.py(2 hunks)scripts/benchall.py(5 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
assets/presto_clp/main.py (1)
src/template.py (1)
Benchmark(55-420)
assets/clickhouse/main.py (1)
src/template.py (1)
Benchmark(55-420)
🪛 Ruff (0.12.2)
scripts/benchall.py
41-41: Trailing comma missing
Add trailing comma
(COM812)
55-55: Missing return type annotation for public function run
Add return type annotation: None
(ANN201)
55-55: Missing type annotation for function argument bencher
(ANN001)
55-55: Missing type annotation for function argument kwargs
(ANN001)
55-55: Missing type annotation for function argument bench_target
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach_on_error
(ANN001)
🔇 Additional comments (9)
assets/presto_clp/main.py (2)
19-19: LGTM! Class renamed for consistency.The class name change from
clp_presto_benchtopresto_clp_benchaligns with the standardization effort across benchmark modules and follows a consistent naming pattern.
105-105: LGTM! Instantiation updated correctly.The class instantiation correctly uses the new class name
presto_clp_bench.assets/clp/main.py (2)
8-8: LGTM! Class renamed for consistency.The class name change from
clp_s_benchtoclp_benchsimplifies the naming and aligns with the standardization effort across benchmark modules.
64-64: LGTM! Instantiation updated correctly.The class instantiation correctly uses the new class name
clp_bench.assets/clickhouse/main.py (2)
16-16: LGTM! Class renamed for consistency.The class name change from
clickhouse_native_json_benchtoclickhouse_benchsimplifies the naming and aligns with the standardization effort across benchmark modules.
157-157: LGTM! Instantiation updated correctly.The class instantiation correctly uses the new class name
clickhouse_bench.scripts/benchall.py (3)
3-9: LGTM! Import statements updated correctly.All import statements have been properly updated to reflect the renamed benchmark classes across the different modules.
73-73: LGTM! Exception handling logic improved.The updated condition correctly implements the new
attach_on_errorfunctionality, allowing Docker attachment on errors even whenattachis False.
83-86: Clarify the ‘mongod’-only filter in benchall.pyI noticed that under scripts/benchall.py (lines 83–86), the code now skips all datasets except ‘mongod’:
if dataset_name != 'mongod': # only use mongod for now continue run(bencher, kwargs, bench_target, attach_on_error=True)However, your data/ directory contains multiple datasets (cockroachdb, elasticsearch, mongod, postgresql, spark-event-logs).
Please confirm whether restricting to ‘mongod’ is a deliberate, permanent change. If it’s only for testing, consider one of the following:
- Add a TODO comment indicating it’s temporary.
- Make the filter configurable (e.g. via a CLI flag or config file).
- Document the rationale in the README.
[scripts/benchall.py:83–86]
| (clp_bench, {}), | ||
| (clickhouse_bench, { | ||
| 'manual_column_names': False, | ||
| 'keys': [], | ||
| 'additional_order_by': [], | ||
| 'timestamp_key': True | ||
| }), | ||
| (clp_presto_bench, { | ||
| (presto_clp_bench, { | ||
| 'dataset_variation': "cleaned_log" | ||
| }), |
There was a problem hiding this comment.
🧹 Nitpick (assertive)
LGTM! Benchmark list updated correctly.
The benchmarks list has been properly updated to use the new class names, maintaining the same configuration parameters.
Consider adding a trailing comma after the dictionary on line 44 for consistency:
(presto_clp_bench, {
'dataset_variation': "cleaned_log"
- }),
+ }),Committable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
41-41: Trailing comma missing
Add trailing comma
(COM812)
44-44: Trailing comma missing
Add trailing comma
(COM812)
🤖 Prompt for AI Agents
In scripts/benchall.py around lines 36 to 45, add a trailing comma after the
dictionary on line 44 to maintain consistency in the list formatting. This means
placing a comma after the closing brace of the dictionary assigned to
'dataset_variation' in the presto_clp_bench tuple.
| ] | ||
|
|
||
| def run(bencher, kwargs, bench_target, attach=False): | ||
| def run(bencher, kwargs, bench_target, attach=False, attach_on_error=False): |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Add type annotations and consider parameter naming.
The function signature has been extended appropriately with the attach_on_error parameter. However, several improvements could be made for code quality.
Apply this diff to add type annotations and improve the function signature:
-def run(bencher, kwargs, bench_target, attach=False, attach_on_error=False):
+def run(bencher: type, kwargs: dict, bench_target: Path, attach: bool = False, attach_on_error: bool = False) -> None:You'll need to add this import at the top of the file:
+from typing import OptionalCommittable suggestion skipped: line range outside the PR's diff.
🧰 Tools
🪛 Ruff (0.12.2)
55-55: Missing return type annotation for public function run
Add return type annotation: None
(ANN201)
55-55: Missing type annotation for function argument bencher
(ANN001)
55-55: Missing type annotation for function argument kwargs
(ANN001)
55-55: Missing type annotation for function argument bench_target
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach
(ANN001)
55-55: Boolean default positional argument in function definition
(FBT002)
55-55: Missing type annotation for function argument attach_on_error
(ANN001)
🤖 Prompt for AI Agents
In scripts/benchall.py at line 55, the function run lacks type annotations and
could benefit from clearer parameter naming. Add appropriate type annotations to
the function signature for all parameters, including bencher, kwargs,
bench_target, attach, and attach_on_error. Also, rename parameters if needed to
improve clarity and maintain consistency. Import any necessary typing modules at
the top of the file to support these annotations.
There was a problem hiding this comment.
There's a reference to clp_presto in a comment on line 60.
scripts/benchall.py
Outdated
| from assets.clickhouse.main import clickhouse_bench | ||
| from assets.sparksql.main import sparksql_bench | ||
| from assets.parquet.main import parquet_bench | ||
| from assets.presto_parquet.main import parquet_bench |
There was a problem hiding this comment.
Can we rename parquet_bench to presto_parquet_bench for consistency?
scripts/benchall.py
Outdated
| if dataset_name != 'mongod': # only use mongod for now | ||
| continue |
There was a problem hiding this comment.
Did you mean to commit this?
| run(bencher, kwargs, bench_target) | ||
| if dataset_name != 'mongod': # only use mongod for now | ||
| continue | ||
| #run(bencher, kwargs, bench_target) |
There was a problem hiding this comment.
I have a change for it but I can't cherry pick it from the initial commit, and it doesn't fit in the scope of this PR. I'll open a separate PR for it when these get merged (or I can put it here).
Bill-hbrhbr
left a comment
There was a problem hiding this comment.
Will be fixed one by one in future PRs.
Description
In #1, we renamed some of the asset directories but didn't make the corresponding changes to the imports in benchall.py. In addition, we didn't rename all the tool-specific methods for consistency.
Checklist
breaking change.
Validation performed
Ran
scripts/benchall.pysuccessfully.Summary by CodeRabbit
Summary by CodeRabbit