Reduce interpreter/JIT overhead #694

jserv · 2026-01-08T15:03:37Z

This introduces three optimizations:

Block-level cycle counting
- Remove per-instruction cycle++ from RVOP macro
- Pre-compute block->cycle_cost at translation time
- Add cycle cost at block entry (interpreter) or exit (JIT)
- Maintains accurate cycle counts with less overhead
Timer derivation from cycle counter
- Remove per-instruction rv->timer++ in SYSTEM mode
- Derive timer on-demand: timer = csr_cycle + timer_offset
- Compute timer only at interrupt check points (rv_check_interrupt)
- Extend CSR sync to TIME/TIMEH registers for correct derivation
Page-boundary block termination with fallthrough chaining
- Terminate blocks at 4KB page boundaries
- Add page_terminated flag to block_t structure
- Implement fallthrough chaining for non-branch block endings
- Use branch_taken pointer for fallthrough to next block
- Enables future O(1) cache invalidation via page index

The combination maintains correctness while reducing per-instruction overhead. Block chaining still works across page-bounded blocks through the fallthrough mechanism.

Summary by cubic

Reduces hot-path overhead by deriving TIME from CYCLE, page-bounding blocks with fallthrough chaining, and using block-level cycle counting in JIT. Improves performance and enables O(1) cache invalidation per virtual page.

New Features
- Block-level cycle counting (JIT): precompute cycle_cost; add at JIT exit; interpreter keeps per-instruction cycle++ to support chaining.
- TIME derived from CYCLE: timer = csr_cycle + timer_offset; computed at interrupt checks; CSR sync extended to TIME/TIMEH.
- Page-bounded blocks with fallthrough chaining and a page index for O(1) SFENCE.VMA invalidate.
Bug Fixes
- Trap-safe MMU translation in JIT: skip RAM/MMIO ops when a fault occurs; treat faults as MMIO via flags.
- Validate full access range before treating as RAM to prevent boundary overflows.
- Fix JIT register allocation in GEN_LOAD/GEN_STORE to avoid stale regs after reset_reg and ensure correct mem_base/paddr handling.

^{Written for commit 3283953. Summary will update on new commits.}

cubic-dev-ai

2 issues found across 7 files

Prompt for AI agents (all issues)


Check if these issues are valid — if so, understand the root cause of each and fix them.


<file name="src/jit.c">

<violation number="1" location="src/jit.c:2531">
P1: `vm_reg[0]` is used without initialization after `reset_reg()` clears the register mappings. Unlike `do_fuse9` which properly allocates with `map_vm_reg()`, this code uses an uninitialized register index that defaults to 0 (RAX/R0). This could generate incorrect JIT code that conflicts with calling conventions.</violation>
</file>

<file name="src/cache.c">

<violation number="1" location="src/cache.c:367">
P0: Missing call to `page_index_insert`. The page index is never populated because `page_index_insert` is defined but never called. This makes `cache_invalidate_va`'s O(1) lookup iterate over an empty list, failing to invalidate any blocks. Add a call to insert newly cached blocks into the page index.</violation>
</file>

_{Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.}

src/cache.c

src/jit.c

jserv

Benchmarks

Details

Benchmark suite	Current: `3283953`	Previous: `ff7e565`	Ratio
`Dhrystone`	`1642.333` DMIPS	`1647.667` DMIPS	`1.00`
`CoreMark`	`1011.426` iterations/sec	`935.937` iterations/sec	`0.93`

This comment was automatically generated by workflow using github-action-benchmark.

This introduces three optimizations: 1. Block-level cycle counting - Remove per-instruction cycle++ from RVOP macro - Pre-compute block->cycle_cost at translation time - Add cycle cost at block entry (interpreter) or exit (JIT) 2. Timer derivation from cycle counter (SYSTEM mode) - Remove per-instruction rv->timer++ - Derive timer on-demand: timer = csr_cycle + timer_offset - Extend CSR sync to TIME/TIMEH registers 3. Page-boundary block termination with fallthrough chaining - Terminate blocks at 4KB page boundaries - Implement fallthrough chaining via branch_taken pointer - Add page_index_insert() for O(1) cache invalidation Fix JIT register allocation in GEN_LOAD/GEN_STORE macros: - After reset_reg(), vm_reg[0] was stale (not reallocated) - Use temp_reg for paddr, properly allocate registers for mem_base - Aligns with patterns used in fused instruction handlers

jserv added this to the release-2026.1 milestone Jan 8, 2026

cubic-dev-ai bot reviewed Jan 8, 2026

View reviewed changes

src/cache.c Show resolved Hide resolved

src/jit.c Show resolved Hide resolved

jserv commented Jan 8, 2026

View reviewed changes

jserv force-pushed the system-jit branch 3 times, most recently from c1ec84e to ff7e565 Compare January 9, 2026 14:31

jserv force-pushed the system-jit branch from ff7e565 to 3283953 Compare January 10, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce interpreter/JIT overhead #694

Reduce interpreter/JIT overhead #694

Uh oh!

jserv commented Jan 8, 2026 •

edited by cubic-dev-ai bot

Loading

Uh oh!

cubic-dev-ai bot left a comment

Uh oh!

Uh oh!

Uh oh!

jserv left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Reduce interpreter/JIT overhead #694

Are you sure you want to change the base?

Reduce interpreter/JIT overhead #694

Uh oh!

Conversation

jserv commented Jan 8, 2026 • edited by cubic-dev-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by cubic

Uh oh!

cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

jserv left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Benchmarks

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jserv commented Jan 8, 2026 •

edited by cubic-dev-ai bot

Loading

jserv left a comment •

edited

Loading