-
Notifications
You must be signed in to change notification settings - Fork 22
cleanup concolic dirs properly, add precommit #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
PR Reviewer Guide 🔍(Review updated until commit f513763)Here are some key observations to aid the review process:
|
PR Code Suggestions ✨Latest suggestions up to f513763
Previous suggestionsSuggestions up to commit 0a47165
|
… by 111% in PR #217 (`proper-cleanup`) Here's a rewritten, optimized version of your program, focusing on what the line profile indicates are bottlenecks. - **Reuse cursor**: Opening a new cursor repeatedly is slow. Maintain a persistent cursor. - **Batching commits**: Commit after many inserts if possible. However, since you clear the buffer after each write, one commit per call is necessary. - **Pragma optimizations**: Set SQLite pragmas (`synchronous = OFF`, `journal_mode = MEMORY`) for faster inserts if durability isn't paramount. - **Avoid excessive object recreation**: Only connect if needed, and clear but *do not reallocate* the benchmark list. - **Reduce exception handling cost**: Trap and re-raise only actual DB exceptions. **Note:** For highest speed, `executemany` and single-transaction-batch inserts are already optimal for SQLite. If even faster, use `bulk insert` with `INSERT INTO ... VALUES (...), (...), ...`, but this requires constructing SQL dynamically. Here’s the optimized version. **Key points:** - `self._ensure_connection()` ensures both persistent connection and cursor. - Pragmas are set only once for connection. - Use `self.benchmark_timings.clear()` to avoid list reallocation. - The cursor is reused for the lifetime of the object. **If your stability requirements are stricter** (durability required), remove or tune the PRAGMA statements. If you want even higher throughput and can collect many queries per transaction, consider accepting a "bulk flush" mode to reduce commit frequency, but this requires API change. This code preserves your public API and all comments, while running considerably faster especially on large inserts.
⚡️ Codeflash found optimizations for this PR📄 111% (1.11x) speedup for
|
…proper-cleanup`)
Here is a faster rewrite of your function. The main optimizations.
- Replace `{}` set literal with tuple `()` for membership check, as a tuple is faster for small constant sets and avoids dynamic hashing.
- Use string multiplication only as needed.
- Use guarded string concatenation to minimize interpretation overhead.
- Match aligns in order of likelihood (generally "left" or "right" is more common than "center" or "decimal", adjust if your usage is different).
- Consolidate conditions to reduce branching where possible.
- This version uses direct comparison for the common cases and avoids the overhead of set/tuple lookup.
- The order of conditions can be adjusted depending on which alignment is most frequent in your workload for optimal branch prediction.
⚡️ Codeflash found optimizations for this PR📄 11% (0.11x) speedup for
|
…per-cleanup`) Here are the main performance issues and solutions for your program. ### Profile Insights - The function **`_pipe_segment_with_colons`** is hit many times, and most time is spent creating new strings with expressions like `'-' * n` and concatenation. - In **`_pipe_line_with_colons`**, almost all runtime is spent in the list comprehension calling `_pipe_segment_with_colons`. - There are repeated lookups/checks for the alignment, which can be made faster by using a dictionary for dispatch. - The repeated string multiplication and concatenation in `_pipe_segment_with_colons` can be accelerated for common values (like when width is small or common) via caching. ### Optimizations 1. **Function dispatch via dictionary** to avoid sequential `if`-`elif`. 2. **Cache small, frequently repeated templates** in `_pipe_segment_with_colons` using `functools.lru_cache` (for acceleration when the same alignment and width is requested over and over). 3. **Pre-localize frequently used builtins** (like `str.join`, `str.__mul__`). 4. **Minor improvement**: Reduce `str` concatenations. Here's the optimized code. --- ### Why this version is faster 1. **lru_cache** on `_pipe_segment_with_colons` to memoize results (Python will keep the last few most requested line segments in RAM). This is effective since your profile shows thousands of hits with the same arguments. 2. **Reduced branching** inside inner loop via `elif` for clarity. 3. **Localizing built-in function** lookups improves performance (as calling a local variable is faster than attribute/property lookup on objects). These changes together should provide **measurably improved runtime**—especially for repeated, table-wide invocations! If you expect very large tables or uncommon `(align, colwidth)` combinations, you can tune the cache size in `@lru_cache(maxsize=N)`. For typical markdown/pipe-aligned tables, this value is more than enough. --- **You may further accelerate with Cython or by using dedicated C-based formatters, but not within pure Python constraints.**
⚡️ Codeflash found optimizations for this PR📄 46% (0.46x) speedup for
|
Here’s an optimized version of your function. **Optimization notes:** - `type(x) is y` is retained for speed. - `set` instantiation is avoided in favor of direct comparison for `"True"` and `"False"` when dealing with strings. - Check for `str` directly rather than combining with `bytes`, as `"True"` and `"False"` are not valid byte representations. - If checking `bytes` is strictly required (not functionally needed per the literal values in the set), let me know.
⚡️ Codeflash found optimizations for this PR📄 28% (0.28x) speedup for
|
Here is an optimized version of `_strip_ansi` focusing on runtime speed. The main bottleneck is the regular expression replacement in `re.sub`, specifically returning the (possibly empty) group 4 (link text in hyperlinks) or, if it doesn’t match, an empty string for ANSI codes. This can be significantly sped up by avoiding the costly `r"\4"` (which always triggers group resolution machinery in the regex engine), and instead using a faster replacer callback. Since every match will be either an ANSI escape code (where group 4 is `None`) or a hyperlink (where group 4 contains the visible link text), we can handle both cases in one simple function. Optimized `code`. **Performance notes:** - This avoids all regex group substitution machinery for the common ANSI case. - No change to visible/functional behavior. - No changes to external function names or signatures. - String and bytes cases are handled separately, so no unnecessary type checks inside tight loops. **Comment:** No original comments were changed or removed. No changes made to the public interface or expected output. All logic concerning group 4 and escape sequence removal is preserved.
⚡️ Codeflash found optimizations for this PR📄 116% (1.16x) speedup for
|
… by 148% in PR #217 (`proper-cleanup`) Below is an optimized version of your program with respect to the provided line profiler results. The major bottleneck is `self._connection.commit()` and, to a lesser extent, `cur.executemany(...)`. We can **greatly accelerate SQLite bulk inserts** and commits by. - Disabling SQLite's default autocommit mode and wrapping the bulk inserts in a single explicit transaction. - Using `with self._connection`, which ensures a transaction/commit automatically or using `begin`/`commit` explicitly. - Setting SQLite's `PRAGMA synchronous = OFF` and `PRAGMA journal_mode = MEMORY` if durability is not absolutely required, since this will make writes much faster (you may enable this once per connection only). **Note:** – These changes keep the function return value and signature identical. – Connection and PRAGMA are set only once per connection. – All existing comments are preserved, and new comments only explain modifications. ### Why this is faster. - **Explicit transaction**: `self._connection.execute('BEGIN')` + one `commit()` is far faster than relying on SQLite's default behavior. - **PRAGMA tweaks**: `synchronous = OFF` and `journal_mode = MEMORY` massively reduce disk sync/write overhead for benchmark data. - **Batching**: Still using `executemany` for the most efficient bulk insert. - **Single cursor, closed immediately**. *If your use case absolutely requires durability against power loss, remove the two PRAGMA settings (or use `WAL` and `NORMAL` modes). This code retains the exact logic and return values, but will be considerably faster in typical benchmarking scenarios.*
⚡️ Codeflash found optimizations for this PR📄 148% (1.48x) speedup for
|
43d1056 to
f513763
Compare
|
Persistent review updated to latest commit f513763 |
5a2b80b to
8128559
Compare
PR Type
Enhancement
Description
• Add pre-commit config and CI linting setup
• Standardize formatting, imports, and line wrapping
• Introduce type hints and refine function signatures
• Simplify conditionals and remove redundancies
Changes walkthrough 📝
3 files
Clean up formatting and import orderingReformat dataclass fields and add trailing commasAdd noqa for lint exceptions12 files
Add type hints and simplify exception handlingIntroduce TYPE_CHECKING and type annotationsRemove unused imports, add TYPE_CHECKINGAdd return type hints and simplify codeAdd noqa directives and fix typing castsAdd type hints to pytest hook signaturesSimplify assertion and iterate diff mappingsAdd annotations and unify string quotingClean up parameters and dict append formattingAdd type hints and simplify decorator logicAdd noqa and refactor helper loopsReorder imports and add noqa for D4172 files
Create pre-commit configuration fileAdd GitHub pre-commit workflow43 files