-
-
Notifications
You must be signed in to change notification settings - Fork 6
chore:benchmark #57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore:benchmark #57
Conversation
adds 'hivemind-encoding-bench' cli command Benchmark Results: Encoding Avg Encoding Time Avg Decoding Time Avg Size Increase Performance Bandwidth Aggregate ============================================================================================================== JSON-B64 0.000001 0.000004 1.38 100.00 81.64 90.82 JSON-URLSAFE-B64 0.000003 0.000005 1.38 63.15 81.64 72.39 JSON-B64-stdlib 0.000009 0.000009 1.38 29.11 81.64 55.38 JSON-B91 0.001877 0.002637 1.24 1.00 100.00 50.50 JSON-Z85B 0.001310 0.001777 1.26 1.05 97.90 49.47 JSON-Z85P 0.001253 0.001550 1.31 1.07 91.12 46.09 JSON-B32 0.000639 0.001146 1.60 1.17 53.26 27.21 JSON-HEX 0.000008 0.000008 2.00 32.74 1.00 16.87
|
Warning Rate limit exceeded@JarbasAl has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 15 minutes and 56 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (1)
WalkthroughThe pull request introduces a comprehensive encoding benchmarking module and updates the project's workflow and setup configurations. The changes include updating the GitHub Actions workflow to use Python 3.10, creating a new benchmarking module for JSON-based encryption encodings in Changes
Sequence DiagramsequenceDiagram
participant User
participant CLI
participant Benchmark
participant Encoders
User->>CLI: Run hivemind-encoding-bench
CLI->>Benchmark: Generate random data
Benchmark->>Encoders: Select encoding method
Encoders-->>Benchmark: Perform encoding/decoding
Benchmark->>Benchmark: Calculate performance metrics
Benchmark->>CLI: Generate benchmark results
CLI->>User: Display/Save results
Poem
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
hivemind_bus_client/encodings/benchmark.py (2)
16-17: Clarify global weights usage.
Sinceperformance_weightandbandwidth_weightare declared at the top as globals, it might be clearer if these are passed as parameters or managed via a config object.
77-80: Efficient random data generation.
Using letters and digits is sufficient; consider also generating truly random bytes fromos.urandomif needed for cryptographic-quality data.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
.github/workflows/release_workflow.yml(1 hunks)hivemind_bus_client/encodings/benchmark.py(1 hunks)setup.py(1 hunks)
🧰 Additional context used
🪛 Ruff (0.8.2)
hivemind_bus_client/encodings/benchmark.py
234-234: Redefinition of unused main from line 184
(F811)
🔇 Additional comments (4)
hivemind_bus_client/encodings/benchmark.py (2)
37-55: Graceful handling of unsupported encodings.
TheInvalidEncodingexception is raised as expected. The approach is clean and maintainable.
131-151: Verify normalization logic.
best_performanceis determined asmin(performance_scores), andworst_performanceasmax(performance_scores). Typically, higher performance metrics are preferred, suggesting the logic might be reversed unless you confirmed that lower times equate to higher scores. Please verify correctness.setup.py (1)
72-73: New entry point addition.
Adding the'hivemind-encoding-bench'CLI script is a great step for discoverability. Ensure it’s clearly documented..github/workflows/release_workflow.yml (1)
49-49: Updated Python version to 3.10.
This helps maintain compatibility with newer language features.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (5)
hivemind_bus_client/encodings/benchmark.py (5)
16-17: Consider making weights immutable and moving them to configuration.The global weight variables are mutable and could lead to thread-safety issues if the module is used in a concurrent context.
Consider using a configuration class or constants:
-performance_weight = 0.5 -bandwidth_weight = 0.5 +class BenchmarkConfig: + PERFORMANCE_WEIGHT: float = 0.5 + BANDWIDTH_WEIGHT: float = 0.5
20-35: Enhance enum documentation with encoding characteristics.While the enum is well documented, it would be helpful to include characteristics of each encoding such as padding requirements, character set limitations, and typical use cases.
37-75: Consider using a mapping for encoder/decoder lookup.The repetitive if-else structure in both
get_encoderandget_decodercould be simplified using a mapping.Consider refactoring to use a dictionary:
+_ENCODERS = { + SupportedEncodings.JSON_B64_STD: base64.b64encode, + SupportedEncodings.JSON_B64: pybase64.b64encode, + # ... other encoders +} def get_encoder(encoding: SupportedEncodings) -> Callable[[bytes], bytes]: - if encoding == SupportedEncodings.JSON_B64_STD: - return base64.b64encode - # ... other if statements + try: + return _ENCODERS[encoding] + except KeyError: + raise InvalidEncoding(f"Invalid encoding: {encoding}")
77-80: Enhance random data generation security.The current implementation uses
randomwhich is not cryptographically secure. While this might be sufficient for benchmarking, usingsecretswould be more appropriate for generating test data that might be used with cryptographic functions.-import random +import secrets def generate_random_data(size: int) -> bytes: """Generate random binary data of a given size.""" - return ''.join(random.choice(string.ascii_letters + string.digits) for _ in range(size)).encode("utf-8") + return secrets.token_bytes(size)
246-253: Consider using a table formatting library.The table output formatting could benefit from using a library like
tabulatefor better readability and maintainability.+from tabulate import tabulate - print( - f"{'Encoding':<20} {'Avg Encoding Time':<20} {'Avg Decoding Time':<20} {'Avg Size Increase':<20} {'Performance':<12} {'Bandwidth':<10} {'Aggregate':<10}") - print("=" * 110) - for row in table: - print( - f"{row[0]:<20} {row[1]:<20.6f} {row[2]:<20.6f} {row[3]:<20.2f} {row[4]:<12.2f} {row[5]:<10.2f} {row[6]:<10.2f}") + headers = ["Encoding", "Avg Encoding Time", "Avg Decoding Time", "Avg Size Increase", + "Performance", "Bandwidth", "Aggregate"] + print(tabulate(table, headers=headers, floatfmt=".6f"))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
hivemind_bus_client/encodings/benchmark.py(1 hunks)
🔇 Additional comments (1)
hivemind_bus_client/encodings/benchmark.py (1)
222-223: Verify result consistency across iterations.The code assumes encoded_size and original_size remain constant across iterations, but this should be verified.
✅ Verification successful
Let me gather more context about how the data is generated and processed in the benchmark.
Let me try a different approach to examine the benchmark implementation.
The assumption about constant encoded and original sizes is correct and verified
After examining the implementation, the assumption is valid because:
- For each size iteration, the same input data is used across all encoding methods via
generate_random_data(size)- The encoding process is deterministic for all supported encodings (Base64, Base32, Hex, Z85B, Z85P, B91)
- The
encoded_sizeandoriginal_sizeare only copied from the last iteration because they remain constant for the same input data, as verified by:
- The input size is fixed per benchmark run
- Each encoding method produces consistent output size for the same input
- The code validates correctness by checking
decoded_data != data🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that the encoded size remains constant for the same input size # across different encoding methods # Search for size-related variables and their usage rg -A 3 "encoded_size|original_size"Length of output: 1847
Script:
#!/bin/bash # Let's examine the full benchmark implementation to understand the data flow ast-grep --pattern 'def benchmark_encoding($_) { $$$ }'Length of output: 59
Script:
#!/bin/bash # Let's examine the full benchmark implementation to understand the data flow cat hivemind_bus_client/encodings/benchmark.pyLength of output: 10387
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
adds
hivemind-encoding-benchcli commandSummary by CodeRabbit
New Features
Chores
Documentation