Skip to content

Commit fd30d28

Browse files
committed
update
1 parent bbba36c commit fd30d28

22 files changed

+1525
-308
lines changed

β€ŽPERFORMANCE_REPORT.mdβ€Ž

Lines changed: 182 additions & 96 deletions
Original file line numberDiff line numberDiff line change
@@ -12,106 +12,191 @@ Lua-RS has achieved **production-ready correctness** with **302/302 tests passin
1212
### Key Performance Highlights
1313

1414
πŸ† **Excellent Performance (>90% of native)**:
15-
- **Integer addition**: **101%** of native (251.89 M/s vs 250.00 M/s) - **Faster than native!**
16-
- **Float multiplication**: **99%** of native (248.50 M/s vs 250.00 M/s)
17-
- **Table insertion**: **101%** of native (71.99 M/s vs 71.43 M/s) - **Faster than native!**
18-
- **Nested loops**: **97%** of native (243.30 M/s vs 250.00 M/s)
19-
20-
🎯 **Good Performance (60-90% of native)**:
21-
- **While loop**: **85%** (127.10 M/s vs 149.25 M/s)
22-
- **If-else control**: **84%** (99.71 M/s vs 119.05 M/s)
23-
- **Mixed operations**: **80%** (125.22 M/s vs 156.25 M/s)
24-
- **Table access**: **77%** (128.46 M/s vs 166.67 M/s)
25-
- **Hash table insertion**: **136%** (0.022s vs 0.030s) - **Faster than native!**
26-
- **Repeat-until**: **61%** (114.36 M/s vs 188.68 M/s)
27-
- **String concatenation**: **60%** (2748 K/s vs 4545 K/s)
28-
- **Simple function call**: **59%** (32.77 M/s vs 55.56 M/s)
29-
30-
πŸ“Š **Acceptable Performance (30-60% of native)**:
31-
- **Array creation & access**: **45%** (5.10 M/s vs 11.24 M/s)
32-
- **Recursive fib(25)**: **40%** (0.010s vs 0.004s)
33-
- **Vararg function**: **36%** (1.29 M/s vs 3.58 M/s)
34-
- **ipairs iteration**: **31%** (6.785s vs 2.098s)
35-
- **string.sub**: **33%** (8155 K/s vs 25000 K/s)
36-
- **string.find**: **33%** (5553 K/s vs 16666 K/s)
37-
38-
πŸ† **Faster than Native**:
39-
- **string.gsub**: **146%** (0.104s vs 0.152s) - **46% faster!**
40-
- **Hash table insertion**: **136%** (0.022s vs 0.030s) - **36% faster!**
15+
- **Integer addition**: **~220 M ops/sec** - Near native performance
16+
- **Float multiplication**: **~210 M ops/sec** - Near native performance
17+
- **Local variable access**: **~220 M ops/sec** - Extremely fast
18+
- **Nested loops**: **~210 M ops/sec** - Excellent optimization
19+
- **String length**: **~150 M ops/sec** - Faster than native!
20+
- **Table access**: **~115 M ops/sec** - Solid performance
21+
- **String equality**: **~82 M ops/sec** - Fast comparison
22+
23+
🎯 **Good Performance (>50% of native)**:
24+
- **While loop**: ~125 M ops/sec
25+
- **If-else control**: ~93 M ops/sec
26+
- **Upvalue access**: ~95 M ops/sec
27+
- **Table insertion**: ~50 M ops/sec
28+
- **Simple function call**: ~24 M calls/sec
29+
- **Bitwise operations**: ~80 M ops/sec
30+
- **Integer division**: ~190 M ops/sec
31+
32+
πŸ“Š **Areas for Optimization**:
33+
- **ipairs/pairs iteration**: ~13-15 K iters/sec (vs ~120 K for numeric for)
34+
- **Vararg to table**: ~0.06 M ops/sec (GC overhead)
35+
- **Object creation**: ~40-160 K ops/sec (allocation overhead)
4136

4237
---
4338

44-
## Latest Benchmark Results (November 30, 2025)
45-
46-
### Arithmetic Operations
47-
| Operation | Lua-RS | Native Lua | % of Native | Status |
48-
|-----------|--------|-----------|-------------|--------|
49-
| Integer addition | **251.89 M/s** | 250.00 M/s | **101%** | Excellent πŸ† |
50-
| Float multiplication | **248.50 M/s** | 250.00 M/s | **99%** | Excellent πŸ† |
51-
| Mixed operations | **125.22 M/s** | 156.25 M/s | **80%** | Good |
52-
53-
### Function Calls
54-
| Operation | Lua-RS | Native Lua | % of Native | Status |
55-
|-----------|--------|-----------|-------------|--------|
56-
| Simple function call | **32.77 M/s** | 55.56 M/s | **59%** | Good |
57-
| Recursive fib(25) | **0.010s** | 0.004s | **40%** | Acceptable |
58-
| Vararg function | **1.29 M/s** | 3.58 M/s | **36%** | Acceptable |
59-
60-
### Table Operations
61-
| Operation | Lua-RS | Native Lua | % of Native | Status |
62-
|-----------|--------|-----------|-------------|--------|
63-
| Array creation & access | **5.10 M/s** | 11.24 M/s | **45%** | Acceptable |
64-
| Table insertion | **71.99 M/s** | 71.43 M/s | **101%** | Excellent πŸ† |
65-
| Table access | **128.46 M/s** | 166.67 M/s | **77%** | Good |
66-
| Hash table insertion (100k) | **0.022s** | 0.030s | **136%** | Excellent πŸ† |
67-
| ipairs iteration (100Γ—1M) | **6.785s** | 2.098s | **31%** | Needs optimization |
68-
69-
### String Operations
70-
| Operation | Lua-RS | Native Lua | % of Native | Status |
71-
|-----------|--------|-----------|-------------|--------|
72-
| String concatenation | **2748.53 K/s** | 4545.45 K/s | **60%** | Good |
73-
| String length | **156.99 M/s** | 100.00 M/s | **157%** | Excellent πŸ† |
74-
| string.sub | **8155.08 K/s** | 25000.00 K/s | **33%** | Acceptable |
75-
| string.find | **5553.24 K/s** | 16666.67 K/s | **33%** | Acceptable |
76-
| string.gsub (10k) | **0.104s** | 0.152s | **146%** | Excellent πŸ† |
77-
78-
### Control Flow
79-
| Operation | Lua-RS | Native Lua | % of Native | Status |
80-
|-----------|--------|-----------|-------------|--------|
81-
| If-else | **99.71 M/s** | 119.05 M/s | **84%** | Good |
82-
| While loop | **127.10 M/s** | 149.25 M/s | **85%** | Good |
83-
| Repeat-until | **114.36 M/s** | 188.68 M/s | **61%** | Good |
84-
| Nested loops (1000Γ—1000) | **243.30 M/s** | 250.00 M/s | **97%** | Excellent πŸ† |
39+
## Latest Comprehensive Benchmark Results (November 30, 2025)
40+
41+
### Core Operations (10M iterations)
42+
| Operation | Performance | Notes |
43+
|-----------|-------------|-------|
44+
| Integer addition | **219 M ops/sec** | Near native |
45+
| Float multiplication | **200 M ops/sec** | Near native |
46+
| Mixed operations | **111 M ops/sec** | Good |
47+
| Local var access | **219 M ops/sec** | Excellent |
48+
| Global var access | **43 M ops/sec** | 5x slower than local |
49+
| Upvalue access | **96 M ops/sec** | Good |
50+
51+
### Control Flow (10M iterations)
52+
| Operation | Performance | Notes |
53+
|-----------|-------------|-------|
54+
| If-else | **93 M ops/sec** | Good |
55+
| While loop | **121 M ops/sec** | Excellent |
56+
| Repeat-until | **110 M ops/sec** | Good |
57+
| Nested loops | **218 M ops/sec** | Excellent |
58+
| Numeric for | **122 K iters/sec** | Fast |
59+
60+
### Functions & Closures (1M iterations)
61+
| Operation | Performance | Notes |
62+
|-----------|-------------|-------|
63+
| Simple function call | **22 M calls/sec** | Good |
64+
| Recursive fib(25) | **0.010s** | Acceptable |
65+
| Vararg function | **1.5 M calls/sec** | OK |
66+
| Closure creation | **6.8 M ops/sec** | Good |
67+
| Upvalue read/write | **22 M ops/sec** | Excellent |
68+
| Nested closures | **18 M ops/sec** | Good |
69+
70+
### Multiple Returns (1M iterations)
71+
| Operation | Performance | Notes |
72+
|-----------|-------------|-------|
73+
| Single return | **34 M ops/sec** | Excellent |
74+
| Triple return | **15 M ops/sec** | Good |
75+
| 10 returns | **4.8 M ops/sec** | OK |
76+
| select('#') | **4.4 M ops/sec** | OK |
77+
| table.pack | **4 M ops/sec** | OK |
78+
| table.unpack | **8.9 M ops/sec** | Good |
79+
80+
### Tables (1M iterations unless noted)
81+
| Operation | Performance | Notes |
82+
|-----------|-------------|-------|
83+
| Table insertion | **51 M inserts/sec** | Excellent |
84+
| Table access | **117 M accesses/sec** | Excellent |
85+
| Hash table (100k) | **0.022s** | Fast |
86+
| # operator | **44 M ops/sec** | Excellent |
87+
| table.insert (end) | **25.7 M ops/sec** | Excellent |
88+
| table.insert (mid) | **8.8 M ops/sec** | Good |
89+
| table.remove | **16.3 M ops/sec** | Good |
90+
| table.concat (1k) | **26 K ops/sec** | OK |
91+
| table.sort (random) | **6.6 K ops/sec** | OK |
92+
93+
### Iterators (100K iterations Γ— 1000 items)
94+
| Operation | Performance | Notes |
95+
|-----------|-------------|-------|
96+
| Numeric for | **122 K iters/sec** | Fast (baseline) |
97+
| ipairs | **14.8 K iters/sec** | 8x slower than for |
98+
| pairs (array) | **12.7 K iters/sec** | Iterator overhead |
99+
| pairs (hash) | **14 K iters/sec** | Similar |
100+
| next() | **14.9 K iters/sec** | Similar |
101+
| Custom iterator | **11.2 K iters/sec** | Overhead |
102+
103+
### Strings (100K iterations)
104+
| Operation | Performance | Notes |
105+
|-----------|-------------|-------|
106+
| Concatenation | **2.7 M ops/sec** | Good |
107+
| String length | **185 M ops/sec** | Excellent |
108+
| string.upper | **8.5 M ops/sec** | Good |
109+
| string.lower | **7.9 M ops/sec** | Good |
110+
| string.sub | **7.1 M ops/sec** | Good |
111+
| string.find | **5.1 M ops/sec** | Good |
112+
| string.format | **3.4 M ops/sec** | Good |
113+
| string.match | **1.5 M ops/sec** | OK |
114+
| string.gsub | **1.1 M ops/sec** | OK |
115+
| String equality | **82 M ops/sec** | Excellent |
116+
117+
### Math Library (5M iterations)
118+
| Operation | Performance | Notes |
119+
|-----------|-------------|-------|
120+
| Integer mul/add/mod | **103 M ops/sec** | Excellent |
121+
| Float mul/add/div | **77 M ops/sec** | Good |
122+
| math.sqrt | **22 M ops/sec** | Good |
123+
| math.sin | **20 M ops/sec** | Good |
124+
| math.floor/ceil | **11 M ops/sec** | OK |
125+
| math.abs | **20 M ops/sec** | Good |
126+
| math.random | **11 M ops/sec** | Good |
127+
| Bitwise ops | **82 M ops/sec** | Excellent |
128+
| Integer division | **170 M ops/sec** | Excellent |
129+
| Power (^2) | **43 M ops/sec** | Good |
130+
131+
### Metatables & OOP (500K/100K iterations)
132+
| Operation | Performance | Notes |
133+
|-----------|-------------|-------|
134+
| __index (function) | **6 M ops/sec** | Good |
135+
| __index (table) | **19 M ops/sec** | Good |
136+
| __newindex | **7.2 M ops/sec** | Good |
137+
| __call | **13 M ops/sec** | Good |
138+
| __len | **7.3 M ops/sec** | Good |
139+
| rawget | **15.4 M ops/sec** | Good |
140+
| Object creation | **41 K ops/sec** | Allocation overhead |
141+
| Method call | **4.5 M calls/sec** | Good |
142+
| Property access | **56 M ops/sec** | Excellent |
143+
144+
### Coroutines (100K iterations)
145+
| Operation | Performance | Notes |
146+
|-----------|-------------|-------|
147+
| Create/resume/yield | **27 K cycles/sec** | OK |
148+
| Repeated yield | **5.6 M yields/sec** | Good |
149+
| coroutine.wrap | **22 K ops/sec** | OK |
150+
| coroutine.status | **13 M ops/sec** | Excellent |
151+
152+
### Error Handling (100K iterations)
153+
| Operation | Performance | Notes |
154+
|-----------|-------------|-------|
155+
| pcall (success) | **4.3 M ops/sec** | Good |
156+
| pcall (error) | **3.6 M ops/sec** | Good |
157+
| xpcall (error) | **1.8 M ops/sec** | OK |
158+
| Direct call | **41 M ops/sec** | Baseline |
159+
| assert (success) | **16 M ops/sec** | Good |
85160

86161
---
87162

88163
## Running Benchmarks
89164

90-
### Windows (PowerShell)
91-
```powershell
165+
### Run All Benchmarks
166+
```bash
167+
# Using PowerShell script (compares with native Lua)
92168
.\run_benchmarks.ps1
169+
170+
# Run with lua-rs only
171+
.\target\release\lua.exe .\benchmarks\run_all.lua
93172
```
94173

95-
### Linux/macOS (Bash)
174+
### Individual Benchmarks
96175
```bash
97-
chmod +x run_benchmarks.sh
98-
./run_benchmarks.sh
176+
.\target\release\lua.exe .\benchmarks\bench_arithmetic.lua
177+
.\target\release\lua.exe .\benchmarks\bench_tables.lua
178+
.\target\release\lua.exe .\benchmarks\bench_strings.lua
179+
# ... etc
99180
```
100181

101-
### CI
102-
Performance benchmarks run automatically on push to `main` or `refactor` branches. See the [Benchmarks workflow](https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml) for cross-platform results.
182+
### Benchmark Files (16 total)
183+
- **Core**: bench_arithmetic, bench_control_flow, bench_locals
184+
- **Functions**: bench_functions, bench_closures, bench_multiret
185+
- **Tables**: bench_tables, bench_table_lib, bench_iterators
186+
- **Strings**: bench_strings, bench_string_lib
187+
- **Math**: bench_math
188+
- **Advanced**: bench_metatables, bench_oop, bench_coroutines, bench_errors
103189

104190
---
105191

106192
## Performance History
107193

108-
### November 30, 2025 - call_function_internal Optimization
109-
- Eliminated duplicate dispatch loop in `call_function_internal`
110-
- Now directly calls `luavm_execute` instead of copying 300+ lines of dispatch code
111-
- Reduced code size, improved CPU cache efficiency
112-
- Integer addition now **101% of native** (faster than native Lua!)
113-
- Float multiplication now **99% of native**
114-
- Table insertion now **101% of native** (faster than native Lua!)
194+
### November 30, 2025 - Comprehensive Benchmarks & Optimizations
195+
- Added 11 new benchmark files (16 total)
196+
- Fixed floating-point for loop bug
197+
- Optimized `call_function_internal` - reduced code by ~300 lines
198+
- All 302 tests passing
199+
- Total benchmark runtime: ~120 seconds
115200

116201
### November 29, 2025 - While Loop Optimization
117202
- Optimized while/repeat loop bytecode generation
@@ -127,20 +212,21 @@ Performance benchmarks run automatically on push to `main` or `refactor` branche
127212

128213
## Architecture Notes
129214

130-
### Why Some Operations are Faster Than Native
131-
- **Integer addition/Table insertion**: Rust's optimizations for integer operations
132-
- **string.gsub**: Rust's string handling is more efficient for pattern matching
133-
- **Hash table insertion**: Optimized Lua-style open addressing hash table
134-
- **String length**: Direct access to pre-computed length field
215+
### Performance Characteristics
216+
- **Local variables are ~5x faster** than global variables
217+
- **Numeric for is ~8-9x faster** than ipairs/pairs
218+
- **Property access** is very fast (~56 M ops/sec)
219+
- **Function calls** are efficient (~22 M calls/sec)
220+
- **Bitwise operations** are very fast (~82 M ops/sec)
135221

136222
### Known Performance Bottlenecks
137-
1. **ipairs iteration**: Iterator overhead compared to C implementation
138-
2. **Vararg functions**: Extra allocation and copying overhead
139-
3. **Recursive calls**: Frame allocation overhead
140-
4. **Array creation**: GC allocation patterns
141-
142-
---
143-
144-
## Detailed Optimization History
145-
146-
See git history for detailed optimization phases (Phase 1-24).
223+
1. **ipairs/pairs iteration**: Iterator protocol overhead
224+
2. **Object creation**: Allocation and setmetatable overhead
225+
3. **Vararg to table**: Extra allocation and copying
226+
4. **Complex pattern matching**: Regex-like overhead
227+
228+
### Optimization Opportunities
229+
1. Iterator fast-path for ipairs/pairs
230+
2. Object pooling for common patterns
231+
3. Inlining for small functions
232+
4. Better GC tuning for allocation-heavy code

β€ŽREADME.mdβ€Ž

Lines changed: 18 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -18,16 +18,24 @@ Current test status: **302 out of 302 tests passing (100%)** βœ…
1818

1919
[![Benchmarks](https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml/badge.svg)](https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml)
2020

21-
**Overall**: 30-100%+ of native Lua 5.4.6 performance
21+
**Overall**: 30-100%+ of native Lua 5.4.6 performance with **16 comprehensive benchmark suites**.
2222

2323
**Highlights** (November 30, 2025):
24-
- πŸ† **Integer addition**: **101%** of native (faster than native Lua!)
25-
- πŸ† **Float multiplication**: **99%** of native
26-
- πŸ† **Table insertion**: **101%** of native (faster than native Lua!)
27-
- πŸ† **Hash table insertion**: **136%** of native (36% faster!)
28-
- πŸ† **string.gsub**: **146%** of native (46% faster!)
29-
- 🎯 Good performance: Control flow (61-97%), Table access (77%)
30-
- πŸ“Š Acceptable: Function calls (36-59%), String operations (33-60%)
24+
- πŸ† **Integer addition**: **~220 M ops/sec** (near native)
25+
- πŸ† **Local variable access**: **~220 M ops/sec** (5x faster than globals!)
26+
- πŸ† **Nested loops**: **~218 M ops/sec** (excellent)
27+
- πŸ† **Table access**: **~117 M ops/sec** (solid)
28+
- πŸ† **String length**: **~185 M ops/sec** (faster than native!)
29+
- 🎯 **Numeric for**: ~122 K iters/sec vs ~15 K for ipairs (8x faster)
30+
- πŸ“Š **Function calls**: ~22 M calls/sec
31+
32+
**Benchmark Coverage** (16 benchmark files):
33+
- Core: arithmetic, control_flow, locals
34+
- Functions: functions, closures, multiret
35+
- Tables: tables, table_lib, iterators
36+
- Strings: strings, string_lib
37+
- Math: math
38+
- Advanced: metatables, oop, coroutines, errors
3139

3240
See detailed analysis: [Performance Report](PERFORMANCE_REPORT.md)
3341

@@ -200,9 +208,10 @@ The codebase was developed through iterative AI assistance with human oversight.
200208
- βœ… Reached **production-ready correctness** with **competitive performance in key areas**
201209

202210
### Recent Improvements (November 2025)
211+
- **November 30**: Added 11 new benchmark files (16 total) with comprehensive coverage
212+
- **November 30**: Fixed floating-point for loop bug
203213
- **November 30**: Optimized `call_function_internal` (eliminated duplicate dispatch loop)
204214
- **November 30**: Added 30 new tests for IO/OS standard libraries (302 total tests)
205-
- **November 30**: Integer addition/Table insertion now **faster than native Lua**
206215
- **November 29**: While loop bytecode optimization
207216
- **November 24**: CallFrame code pointer caching
208217
- **November 24**: C function call optimization (eliminated copying)

0 commit comments

Comments
Β (0)