-
Notifications
You must be signed in to change notification settings - Fork 132
Reduce interpreter/JIT overhead #694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 issues found across 7 files
Prompt for AI agents (all issues)
Check if these issues are valid — if so, understand the root cause of each and fix them.
<file name="src/jit.c">
<violation number="1" location="src/jit.c:2531">
P1: `vm_reg[0]` is used without initialization after `reset_reg()` clears the register mappings. Unlike `do_fuse9` which properly allocates with `map_vm_reg()`, this code uses an uninitialized register index that defaults to 0 (RAX/R0). This could generate incorrect JIT code that conflicts with calling conventions.</violation>
</file>
<file name="src/cache.c">
<violation number="1" location="src/cache.c:367">
P0: Missing call to `page_index_insert`. The page index is never populated because `page_index_insert` is defined but never called. This makes `cache_invalidate_va`'s O(1) lookup iterate over an empty list, failing to invalidate any blocks. Add a call to insert newly cached blocks into the page index.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks
Details
| Benchmark suite | Current: 3283953 | Previous: ff7e565 | Ratio |
|---|---|---|---|
Dhrystone |
1642.333 DMIPS |
1647.667 DMIPS |
1.00 |
CoreMark |
1011.426 iterations/sec |
935.937 iterations/sec |
0.93 |
This comment was automatically generated by workflow using github-action-benchmark.
c1ec84e to
ff7e565
Compare
This introduces three optimizations: 1. Block-level cycle counting - Remove per-instruction cycle++ from RVOP macro - Pre-compute block->cycle_cost at translation time - Add cycle cost at block entry (interpreter) or exit (JIT) 2. Timer derivation from cycle counter (SYSTEM mode) - Remove per-instruction rv->timer++ - Derive timer on-demand: timer = csr_cycle + timer_offset - Extend CSR sync to TIME/TIMEH registers 3. Page-boundary block termination with fallthrough chaining - Terminate blocks at 4KB page boundaries - Implement fallthrough chaining via branch_taken pointer - Add page_index_insert() for O(1) cache invalidation Fix JIT register allocation in GEN_LOAD/GEN_STORE macros: - After reset_reg(), vm_reg[0] was stale (not reallocated) - Use temp_reg for paddr, properly allocate registers for mem_base - Aligns with patterns used in fused instruction handlers
This introduces three optimizations:
The combination maintains correctness while reducing per-instruction overhead. Block chaining still works across page-bounded blocks through the fallthrough mechanism.
Summary by cubic
Reduces hot-path overhead by deriving TIME from CYCLE, page-bounding blocks with fallthrough chaining, and using block-level cycle counting in JIT. Improves performance and enables O(1) cache invalidation per virtual page.
New Features
Bug Fixes
Written for commit 3283953. Summary will update on new commits.