@@ -12,106 +12,191 @@ Lua-RS has achieved **production-ready correctness** with **302/302 tests passin
1212### Key Performance Highlights
1313
1414π ** Excellent Performance (>90% of native)** :
15- - ** Integer addition** : ** 101%** of native (251.89 M/s vs 250.00 M/s) - ** Faster than native!**
16- - ** Float multiplication** : ** 99%** of native (248.50 M/s vs 250.00 M/s)
17- - ** Table insertion** : ** 101%** of native (71.99 M/s vs 71.43 M/s) - ** Faster than native!**
18- - ** Nested loops** : ** 97%** of native (243.30 M/s vs 250.00 M/s)
19-
20- π― ** Good Performance (60-90% of native)** :
21- - ** While loop** : ** 85%** (127.10 M/s vs 149.25 M/s)
22- - ** If-else control** : ** 84%** (99.71 M/s vs 119.05 M/s)
23- - ** Mixed operations** : ** 80%** (125.22 M/s vs 156.25 M/s)
24- - ** Table access** : ** 77%** (128.46 M/s vs 166.67 M/s)
25- - ** Hash table insertion** : ** 136%** (0.022s vs 0.030s) - ** Faster than native!**
26- - ** Repeat-until** : ** 61%** (114.36 M/s vs 188.68 M/s)
27- - ** String concatenation** : ** 60%** (2748 K/s vs 4545 K/s)
28- - ** Simple function call** : ** 59%** (32.77 M/s vs 55.56 M/s)
29-
30- π ** Acceptable Performance (30-60% of native)** :
31- - ** Array creation & access** : ** 45%** (5.10 M/s vs 11.24 M/s)
32- - ** Recursive fib(25)** : ** 40%** (0.010s vs 0.004s)
33- - ** Vararg function** : ** 36%** (1.29 M/s vs 3.58 M/s)
34- - ** ipairs iteration** : ** 31%** (6.785s vs 2.098s)
35- - ** string.sub** : ** 33%** (8155 K/s vs 25000 K/s)
36- - ** string.find** : ** 33%** (5553 K/s vs 16666 K/s)
37-
38- π ** Faster than Native** :
39- - ** string.gsub** : ** 146%** (0.104s vs 0.152s) - ** 46% faster!**
40- - ** Hash table insertion** : ** 136%** (0.022s vs 0.030s) - ** 36% faster!**
15+ - ** Integer addition** : ** ~ 220 M ops/sec** - Near native performance
16+ - ** Float multiplication** : ** ~ 210 M ops/sec** - Near native performance
17+ - ** Local variable access** : ** ~ 220 M ops/sec** - Extremely fast
18+ - ** Nested loops** : ** ~ 210 M ops/sec** - Excellent optimization
19+ - ** String length** : ** ~ 150 M ops/sec** - Faster than native!
20+ - ** Table access** : ** ~ 115 M ops/sec** - Solid performance
21+ - ** String equality** : ** ~ 82 M ops/sec** - Fast comparison
22+
23+ π― ** Good Performance (>50% of native)** :
24+ - ** While loop** : ~ 125 M ops/sec
25+ - ** If-else control** : ~ 93 M ops/sec
26+ - ** Upvalue access** : ~ 95 M ops/sec
27+ - ** Table insertion** : ~ 50 M ops/sec
28+ - ** Simple function call** : ~ 24 M calls/sec
29+ - ** Bitwise operations** : ~ 80 M ops/sec
30+ - ** Integer division** : ~ 190 M ops/sec
31+
32+ π ** Areas for Optimization** :
33+ - ** ipairs/pairs iteration** : ~ 13-15 K iters/sec (vs ~ 120 K for numeric for)
34+ - ** Vararg to table** : ~ 0.06 M ops/sec (GC overhead)
35+ - ** Object creation** : ~ 40-160 K ops/sec (allocation overhead)
4136
4237---
4338
44- ## Latest Benchmark Results (November 30, 2025)
45-
46- ### Arithmetic Operations
47- | Operation | Lua-RS | Native Lua | % of Native | Status |
48- | -----------| --------| -----------| -------------| --------|
49- | Integer addition | ** 251.89 M/s** | 250.00 M/s | ** 101%** | Excellent π |
50- | Float multiplication | ** 248.50 M/s** | 250.00 M/s | ** 99%** | Excellent π |
51- | Mixed operations | ** 125.22 M/s** | 156.25 M/s | ** 80%** | Good |
52-
53- ### Function Calls
54- | Operation | Lua-RS | Native Lua | % of Native | Status |
55- | -----------| --------| -----------| -------------| --------|
56- | Simple function call | ** 32.77 M/s** | 55.56 M/s | ** 59%** | Good |
57- | Recursive fib(25) | ** 0.010s** | 0.004s | ** 40%** | Acceptable |
58- | Vararg function | ** 1.29 M/s** | 3.58 M/s | ** 36%** | Acceptable |
59-
60- ### Table Operations
61- | Operation | Lua-RS | Native Lua | % of Native | Status |
62- | -----------| --------| -----------| -------------| --------|
63- | Array creation & access | ** 5.10 M/s** | 11.24 M/s | ** 45%** | Acceptable |
64- | Table insertion | ** 71.99 M/s** | 71.43 M/s | ** 101%** | Excellent π |
65- | Table access | ** 128.46 M/s** | 166.67 M/s | ** 77%** | Good |
66- | Hash table insertion (100k) | ** 0.022s** | 0.030s | ** 136%** | Excellent π |
67- | ipairs iteration (100Γ1M) | ** 6.785s** | 2.098s | ** 31%** | Needs optimization |
68-
69- ### String Operations
70- | Operation | Lua-RS | Native Lua | % of Native | Status |
71- | -----------| --------| -----------| -------------| --------|
72- | String concatenation | ** 2748.53 K/s** | 4545.45 K/s | ** 60%** | Good |
73- | String length | ** 156.99 M/s** | 100.00 M/s | ** 157%** | Excellent π |
74- | string.sub | ** 8155.08 K/s** | 25000.00 K/s | ** 33%** | Acceptable |
75- | string.find | ** 5553.24 K/s** | 16666.67 K/s | ** 33%** | Acceptable |
76- | string.gsub (10k) | ** 0.104s** | 0.152s | ** 146%** | Excellent π |
77-
78- ### Control Flow
79- | Operation | Lua-RS | Native Lua | % of Native | Status |
80- | -----------| --------| -----------| -------------| --------|
81- | If-else | ** 99.71 M/s** | 119.05 M/s | ** 84%** | Good |
82- | While loop | ** 127.10 M/s** | 149.25 M/s | ** 85%** | Good |
83- | Repeat-until | ** 114.36 M/s** | 188.68 M/s | ** 61%** | Good |
84- | Nested loops (1000Γ1000) | ** 243.30 M/s** | 250.00 M/s | ** 97%** | Excellent π |
39+ ## Latest Comprehensive Benchmark Results (November 30, 2025)
40+
41+ ### Core Operations (10M iterations)
42+ | Operation | Performance | Notes |
43+ | -----------| -------------| -------|
44+ | Integer addition | ** 219 M ops/sec** | Near native |
45+ | Float multiplication | ** 200 M ops/sec** | Near native |
46+ | Mixed operations | ** 111 M ops/sec** | Good |
47+ | Local var access | ** 219 M ops/sec** | Excellent |
48+ | Global var access | ** 43 M ops/sec** | 5x slower than local |
49+ | Upvalue access | ** 96 M ops/sec** | Good |
50+
51+ ### Control Flow (10M iterations)
52+ | Operation | Performance | Notes |
53+ | -----------| -------------| -------|
54+ | If-else | ** 93 M ops/sec** | Good |
55+ | While loop | ** 121 M ops/sec** | Excellent |
56+ | Repeat-until | ** 110 M ops/sec** | Good |
57+ | Nested loops | ** 218 M ops/sec** | Excellent |
58+ | Numeric for | ** 122 K iters/sec** | Fast |
59+
60+ ### Functions & Closures (1M iterations)
61+ | Operation | Performance | Notes |
62+ | -----------| -------------| -------|
63+ | Simple function call | ** 22 M calls/sec** | Good |
64+ | Recursive fib(25) | ** 0.010s** | Acceptable |
65+ | Vararg function | ** 1.5 M calls/sec** | OK |
66+ | Closure creation | ** 6.8 M ops/sec** | Good |
67+ | Upvalue read/write | ** 22 M ops/sec** | Excellent |
68+ | Nested closures | ** 18 M ops/sec** | Good |
69+
70+ ### Multiple Returns (1M iterations)
71+ | Operation | Performance | Notes |
72+ | -----------| -------------| -------|
73+ | Single return | ** 34 M ops/sec** | Excellent |
74+ | Triple return | ** 15 M ops/sec** | Good |
75+ | 10 returns | ** 4.8 M ops/sec** | OK |
76+ | select('#') | ** 4.4 M ops/sec** | OK |
77+ | table.pack | ** 4 M ops/sec** | OK |
78+ | table.unpack | ** 8.9 M ops/sec** | Good |
79+
80+ ### Tables (1M iterations unless noted)
81+ | Operation | Performance | Notes |
82+ | -----------| -------------| -------|
83+ | Table insertion | ** 51 M inserts/sec** | Excellent |
84+ | Table access | ** 117 M accesses/sec** | Excellent |
85+ | Hash table (100k) | ** 0.022s** | Fast |
86+ | # operator | ** 44 M ops/sec** | Excellent |
87+ | table.insert (end) | ** 25.7 M ops/sec** | Excellent |
88+ | table.insert (mid) | ** 8.8 M ops/sec** | Good |
89+ | table.remove | ** 16.3 M ops/sec** | Good |
90+ | table.concat (1k) | ** 26 K ops/sec** | OK |
91+ | table.sort (random) | ** 6.6 K ops/sec** | OK |
92+
93+ ### Iterators (100K iterations Γ 1000 items)
94+ | Operation | Performance | Notes |
95+ | -----------| -------------| -------|
96+ | Numeric for | ** 122 K iters/sec** | Fast (baseline) |
97+ | ipairs | ** 14.8 K iters/sec** | 8x slower than for |
98+ | pairs (array) | ** 12.7 K iters/sec** | Iterator overhead |
99+ | pairs (hash) | ** 14 K iters/sec** | Similar |
100+ | next() | ** 14.9 K iters/sec** | Similar |
101+ | Custom iterator | ** 11.2 K iters/sec** | Overhead |
102+
103+ ### Strings (100K iterations)
104+ | Operation | Performance | Notes |
105+ | -----------| -------------| -------|
106+ | Concatenation | ** 2.7 M ops/sec** | Good |
107+ | String length | ** 185 M ops/sec** | Excellent |
108+ | string.upper | ** 8.5 M ops/sec** | Good |
109+ | string.lower | ** 7.9 M ops/sec** | Good |
110+ | string.sub | ** 7.1 M ops/sec** | Good |
111+ | string.find | ** 5.1 M ops/sec** | Good |
112+ | string.format | ** 3.4 M ops/sec** | Good |
113+ | string.match | ** 1.5 M ops/sec** | OK |
114+ | string.gsub | ** 1.1 M ops/sec** | OK |
115+ | String equality | ** 82 M ops/sec** | Excellent |
116+
117+ ### Math Library (5M iterations)
118+ | Operation | Performance | Notes |
119+ | -----------| -------------| -------|
120+ | Integer mul/add/mod | ** 103 M ops/sec** | Excellent |
121+ | Float mul/add/div | ** 77 M ops/sec** | Good |
122+ | math.sqrt | ** 22 M ops/sec** | Good |
123+ | math.sin | ** 20 M ops/sec** | Good |
124+ | math.floor/ceil | ** 11 M ops/sec** | OK |
125+ | math.abs | ** 20 M ops/sec** | Good |
126+ | math.random | ** 11 M ops/sec** | Good |
127+ | Bitwise ops | ** 82 M ops/sec** | Excellent |
128+ | Integer division | ** 170 M ops/sec** | Excellent |
129+ | Power (^2) | ** 43 M ops/sec** | Good |
130+
131+ ### Metatables & OOP (500K/100K iterations)
132+ | Operation | Performance | Notes |
133+ | -----------| -------------| -------|
134+ | __ index (function) | ** 6 M ops/sec** | Good |
135+ | __ index (table) | ** 19 M ops/sec** | Good |
136+ | __ newindex | ** 7.2 M ops/sec** | Good |
137+ | __ call | ** 13 M ops/sec** | Good |
138+ | __ len | ** 7.3 M ops/sec** | Good |
139+ | rawget | ** 15.4 M ops/sec** | Good |
140+ | Object creation | ** 41 K ops/sec** | Allocation overhead |
141+ | Method call | ** 4.5 M calls/sec** | Good |
142+ | Property access | ** 56 M ops/sec** | Excellent |
143+
144+ ### Coroutines (100K iterations)
145+ | Operation | Performance | Notes |
146+ | -----------| -------------| -------|
147+ | Create/resume/yield | ** 27 K cycles/sec** | OK |
148+ | Repeated yield | ** 5.6 M yields/sec** | Good |
149+ | coroutine.wrap | ** 22 K ops/sec** | OK |
150+ | coroutine.status | ** 13 M ops/sec** | Excellent |
151+
152+ ### Error Handling (100K iterations)
153+ | Operation | Performance | Notes |
154+ | -----------| -------------| -------|
155+ | pcall (success) | ** 4.3 M ops/sec** | Good |
156+ | pcall (error) | ** 3.6 M ops/sec** | Good |
157+ | xpcall (error) | ** 1.8 M ops/sec** | OK |
158+ | Direct call | ** 41 M ops/sec** | Baseline |
159+ | assert (success) | ** 16 M ops/sec** | Good |
85160
86161---
87162
88163## Running Benchmarks
89164
90- ### Windows (PowerShell)
91- ``` powershell
165+ ### Run All Benchmarks
166+ ``` bash
167+ # Using PowerShell script (compares with native Lua)
92168.\r un_benchmarks.ps1
169+
170+ # Run with lua-rs only
171+ .\t arget\r elease\l ua.exe .\b enchmarks\r un_all.lua
93172```
94173
95- ### Linux/macOS (Bash)
174+ ### Individual Benchmarks
96175``` bash
97- chmod +x run_benchmarks.sh
98- ./run_benchmarks.sh
176+ .\t arget\r elease\l ua.exe .\b enchmarks\b ench_arithmetic.lua
177+ .\t arget\r elease\l ua.exe .\b enchmarks\b ench_tables.lua
178+ .\t arget\r elease\l ua.exe .\b enchmarks\b ench_strings.lua
179+ # ... etc
99180```
100181
101- ### CI
102- Performance benchmarks run automatically on push to ` main ` or ` refactor ` branches. See the [ Benchmarks workflow] ( https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml ) for cross-platform results.
182+ ### Benchmark Files (16 total)
183+ - ** Core** : bench_arithmetic, bench_control_flow, bench_locals
184+ - ** Functions** : bench_functions, bench_closures, bench_multiret
185+ - ** Tables** : bench_tables, bench_table_lib, bench_iterators
186+ - ** Strings** : bench_strings, bench_string_lib
187+ - ** Math** : bench_math
188+ - ** Advanced** : bench_metatables, bench_oop, bench_coroutines, bench_errors
103189
104190---
105191
106192## Performance History
107193
108- ### November 30, 2025 - call_function_internal Optimization
109- - Eliminated duplicate dispatch loop in ` call_function_internal `
110- - Now directly calls ` luavm_execute ` instead of copying 300+ lines of dispatch code
111- - Reduced code size, improved CPU cache efficiency
112- - Integer addition now ** 101% of native** (faster than native Lua!)
113- - Float multiplication now ** 99% of native**
114- - Table insertion now ** 101% of native** (faster than native Lua!)
194+ ### November 30, 2025 - Comprehensive Benchmarks & Optimizations
195+ - Added 11 new benchmark files (16 total)
196+ - Fixed floating-point for loop bug
197+ - Optimized ` call_function_internal ` - reduced code by ~ 300 lines
198+ - All 302 tests passing
199+ - Total benchmark runtime: ~ 120 seconds
115200
116201### November 29, 2025 - While Loop Optimization
117202- Optimized while/repeat loop bytecode generation
@@ -127,20 +212,21 @@ Performance benchmarks run automatically on push to `main` or `refactor` branche
127212
128213## Architecture Notes
129214
130- ### Why Some Operations are Faster Than Native
131- - ** Integer addition/Table insertion** : Rust's optimizations for integer operations
132- - ** string.gsub** : Rust's string handling is more efficient for pattern matching
133- - ** Hash table insertion** : Optimized Lua-style open addressing hash table
134- - ** String length** : Direct access to pre-computed length field
215+ ### Performance Characteristics
216+ - ** Local variables are ~ 5x faster** than global variables
217+ - ** Numeric for is ~ 8-9x faster** than ipairs/pairs
218+ - ** Property access** is very fast (~ 56 M ops/sec)
219+ - ** Function calls** are efficient (~ 22 M calls/sec)
220+ - ** Bitwise operations** are very fast (~ 82 M ops/sec)
135221
136222### Known Performance Bottlenecks
137- 1 . ** ipairs iteration** : Iterator overhead compared to C implementation
138- 2 . ** Vararg functions ** : Extra allocation and copying overhead
139- 3 . ** Recursive calls ** : Frame allocation overhead
140- 4 . ** Array creation ** : GC allocation patterns
141-
142- ---
143-
144- ## Detailed Optimization History
145-
146- See git history for detailed optimization phases (Phase 1-24).
223+ 1 . ** ipairs/pairs iteration** : Iterator protocol overhead
224+ 2 . ** Object creation ** : Allocation and setmetatable overhead
225+ 3 . ** Vararg to table ** : Extra allocation and copying
226+ 4 . ** Complex pattern matching ** : Regex-like overhead
227+
228+ ### Optimization Opportunities
229+ 1 . Iterator fast-path for ipairs/pairs
230+ 2 . Object pooling for common patterns
231+ 3 . Inlining for small functions
232+ 4 . Better GC tuning for allocation-heavy code
0 commit comments