|
2 | 2 |
|
3 | 3 | ## Executive Summary |
4 | 4 |
|
5 | | -Lua-RS has achieved **production-ready performance** with **252/252 tests passing (100%)**. After systematic optimizations including control flow optimization, function call optimization (eliminating HashMap lookups), and recent C function call + hash table optimizations, the interpreter now delivers **65-120% of native Lua 5.4.6 performance** across most operations, with several areas **exceeding native performance**. |
| 5 | +Lua-RS has achieved **production-ready correctness** with **252/252 tests passing (100%)**. After systematic optimizations including control flow optimization, function call optimization (eliminating HashMap lookups), and recent C function call + hash table optimizations, the interpreter now delivers **22-69% of native Lua 5.4.6 performance** across most operations, with **hash table insertion and string.gsub outperforming native Lua** by 20-50%. |
6 | 6 |
|
7 | 7 | ## Latest Performance Results (November 24, 2025) |
8 | 8 |
|
9 | 9 |
|
10 | 10 | ### Arithmetic Operations |
11 | 11 | | Operation | Lua-RS | Native Lua | % of Native | Status | |
12 | 12 | |-----------|--------|-----------|-------------|--------| |
13 | | -| Integer addition | **74.89 M/s** | 62.11 M/s | **120.6%** π | **Faster!** | |
14 | | -| Float multiplication | **65.59 M/s** | 60.98 M/s | **107.6%** π | **Faster!** | |
15 | | -| Mixed operations | **40.78 M/s** | 37.17 M/s | **109.7%** π | **Faster!** | |
| 13 | +| Integer addition | **74.45 M/s** | 212.77 M/s | **35.0%** | Good | |
| 14 | +| Float multiplication | **63.42 M/s** | 169.49 M/s | **37.4%** | Good | |
| 15 | +| Mixed operations | **40.50 M/s** | 96.15 M/s | **42.1%** | Good | |
16 | 16 |
|
17 | 17 | ### Function Calls |
18 | 18 | | Operation | Lua-RS | Native Lua | % of Native | Status | |
19 | 19 | |-----------|--------|-----------|-------------|--------| |
20 | | -| Simple function call | **13.39 M/s** | 9.35 M/s | **143.2%** π | **1.4x Faster!** | |
21 | | -| Recursive fib(25) | **0.031s** | 0.015s | **48.4%** | Good | |
22 | | -| Vararg function | **0.59 M/s** | 0.69 M/s | **85.5%** | Excellent | |
| 20 | +| Simple function call | **13.77 M/s** | 33.33 M/s | **41.3%** | Good | |
| 21 | +| Recursive fib(25) | **0.031s** | 0.008s | **25.8%** | Needs optimization | |
| 22 | +| Vararg function | **0.60 M/s** | 2.12 M/s | **28.3%** | Needs optimization | |
23 | 23 |
|
24 | 24 | ### Table Operations |
25 | 25 | | Operation | Lua-RS | Native Lua | % of Native | Status | |
26 | 26 | |-----------|--------|-----------|-------------|--------| |
27 | | -| Array creation & access | **1.48 M/s** | 1.56 M/s | **94.9%** | Excellent | |
28 | | -| Table insertion | **25.24 M/s** | 24.39 M/s | **103.5%** π | **Faster!** | |
29 | | -| Table access | **33.98 M/s** | 37.04 M/s | **91.7%** | Excellent | |
30 | | -| Hash table insertion (100k) | **0.065s** | 0.168s | **258%** π | **2.6x Faster!** | |
31 | | -| ipairs iteration (100Γ1M) | **11.647s** | 9.851s | **84.6%** | Excellent | |
| 27 | +| Array creation & access | **1.40 M/s** | 5.95 M/s | **23.5%** | Needs optimization | |
| 28 | +| Table insertion | **22.89 M/s** | 33.33 M/s | **68.7%** | Good | |
| 29 | +| Table access | **32.35 M/s** | 125.00 M/s | **25.9%** | Needs optimization | |
| 30 | +| Hash table insertion (100k) | **0.066s** | 0.079s | **119.7%** π | **1.2x Faster!** | |
| 31 | +| ipairs iteration (100Γ1M) | **11.316s** | 3.241s | **28.6%** | Needs optimization | |
32 | 32 |
|
33 | 33 | ### String Operations |
34 | 34 | | Operation | Lua-RS | Native Lua | % of Native | Status | |
35 | 35 | |-----------|--------|-----------|-------------|--------| |
36 | | -| String concatenation | **588.72 K/s** | 699.30 K/s | **84.2%** | Excellent | |
37 | | -| String length | **77.84 M/s** | 50.00 M/s | **155.7%** π | **1.6x Faster!** | |
38 | | -| string.sub | **2629.68 K/s** | 5000.00 K/s | **52.6%** | Good | |
39 | | -| string.find | **5197.07 K/s** | 3333.33 K/s | **155.9%** π | **1.6x Faster!** | |
40 | | -| string.gsub (10k) | **0.130s** | 0.456s | **351%** π | **3.5x Faster!** | |
| 36 | +| String concatenation | **563.78 K/s** | 2564.10 K/s | **22.0%** | Needs optimization | |
| 37 | +| String length | **77.07 M/s** | β M/s | **N/A** | - | |
| 38 | +| string.sub | **2647.65 K/s** | 14285.71 K/s | **18.5%** | Needs optimization | |
| 39 | +| string.find | **5275.90 K/s** | 14285.71 K/s | **36.9%** | Good | |
| 40 | +| string.gsub (10k) | **0.134s** | 0.201s | **150%** π | **1.5x Faster!** | |
41 | 41 |
|
42 | 42 | ### Control Flow |
43 | 43 | | Operation | Lua-RS | Native Lua | % of Native | Status | |
44 | 44 | |-----------|--------|-----------|-------------|--------| |
45 | | -| If-else | **28.77 M/s** | 25.77 M/s | **111.6%** π | **Faster!** | |
46 | | -| While loop | **31.30 M/s** | 45.05 M/s | **69.5%** | Good | |
47 | | -| Repeat-until | **34.68 M/s** | 51.28 M/s | **67.6%** | Good | |
48 | | -| Nested loops (1000Γ1000) | **78.66 M/s** | 62.50 M/s | **125.9%** π | **1.3x Faster!** | |
| 45 | +| If-else | **28.95 M/s** | 53.48 M/s | **54.1%** | Good | |
| 46 | +| While loop | **30.96 M/s** | 121.95 M/s | **25.4%** | Needs optimization | |
| 47 | +| Repeat-until | **31.40 M/s** | 142.86 M/s | **22.0%** | Needs optimization | |
| 48 | +| Nested loops (1000Γ1000) | **84.18 M/s** | 200.00 M/s | **42.1%** | Good | |
| 49 | + |
| 50 | +## Important Note on Performance Testing |
| 51 | + |
| 52 | +**Testing Methodology Change (November 24, 2025)**: |
| 53 | + |
| 54 | +Earlier performance reports compared lua-rs against **debug-build native Lua**, which artificially inflated our performance percentages. This led to misleading results showing lua-rs "faster" than native Lua in arithmetic operations (108-120%). |
| 55 | + |
| 56 | +**Current results** now use **release-build native Lua 5.4.6** as the baseline, providing realistic performance comparisons: |
| 57 | + |
| 58 | +- **Arithmetic operations**: ~35-42% of native (was incorrectly shown as 108-120%) |
| 59 | +- **Function calls**: ~41% of native (was incorrectly shown as 143%) |
| 60 | +- **Hash operations**: Still competitive at 120% (0.066s vs 0.079s) |
| 61 | +- **string.gsub**: Still faster at 150% (0.134s vs 0.201s) |
| 62 | + |
| 63 | +This correction provides a more accurate picture of lua-rs performance and identifies genuine optimization opportunities. |
49 | 64 |
|
50 | 65 | ## Performance Highlights |
51 | 66 |
|
52 | | -π **8 operations now exceed native Lua performance (100-351%)**: |
53 | | -- String operations: gsub **3.5x faster**, length & find **1.6x faster** |
54 | | -- Hash table insertion: **2.6x faster** (0.065s vs 0.168s) |
55 | | -- Function calls: **1.4x faster** (simple calls) |
56 | | -- Nested loops: **1.3x faster** |
57 | | -- Arithmetic: **8-20% faster** (integer, float, mixed) |
58 | | -- Basic control flow: if-else **12% faster** |
| 67 | +π **2 operations exceed native Lua performance**: |
| 68 | +- Hash table insertion: **1.2x faster** (0.066s vs 0.079s) |
| 69 | +- string.gsub: **1.5x faster** (0.134s vs 0.201s) |
59 | 70 |
|
60 | | -π― **Most operations at 80-100% of native performance**: |
61 | | -- Table operations: 85-103% (ipairs, array access, insertions) |
62 | | -- String operations: 84% (concatenation) |
63 | | -- Function calls: 86% (varargs) |
| 71 | +π― **Good performance (35-70% of native)**: |
| 72 | +- Arithmetic operations: 35-42% (consistent overhead from dispatch) |
| 73 | +- Table insertion: 69% (good) |
| 74 | +- Function calls: 41% (simple calls) |
| 75 | +- Control flow: 25-54% (needs loop optimization) |
64 | 76 |
|
65 | | -π **Areas for future optimization**: |
66 | | -- String.sub: 53% (buffered string building) |
67 | | -- While/repeat loops: 68-70% (loop detection overhead) |
68 | | -- Recursive fibonacci: 48% (stack frame overhead) |
| 77 | +π **Critical areas for optimization**: |
| 78 | +- Table access: 26% (cacheline/memory layout) |
| 79 | +- ipairs iteration: 29% (iterator overhead) |
| 80 | +- While/repeat loops: 22-25% (dispatch overhead) |
| 81 | +- String operations: 19-37% (allocation/copying) |
| 82 | +- Vararg functions: 28% (argument handling) |
69 | 83 |
|
70 | 84 | ## Key Achievements |
71 | 85 |
|
72 | | -1. **Production Quality**: 252/252 tests passing, stable performance |
| 86 | +1. **Production Quality**: 252/252 tests passing, stable correctness |
73 | 87 | 2. **Memory Safety**: Validated direct pointer access for hot paths |
74 | | -3. **C Function Optimization**: Eliminated parameter/return copying (40% improvement in ipairs) |
75 | | -4. **Hash Table Optimization**: Lua-style open addressing with O(1) load factor checks (145x faster insertion) |
76 | | -5. **Iterator Optimization**: Direct pointer access in pairs/next (2.7x improvement) |
77 | | -6. **Arithmetic Excellence**: Integer operations faster than native C implementation |
78 | | -7. **String Operations**: Pattern matching and replacement 1.6-3.5x faster |
| 88 | +3. **C Function Optimization**: Eliminated parameter/return copying (significant improvement) |
| 89 | +4. **Hash Table Optimization**: Lua-style open addressing with O(1) load factor checks (now competitive with native) |
| 90 | +5. **Iterator Optimization**: Direct pointer access in pairs/next |
| 91 | +6. **Competitive Areas**: Hash table insertion and string.gsub outperform native Lua |
| 92 | +7. **Stable Foundation**: Ready for further performance optimizations |
79 | 93 |
|
80 | 94 | --- |
81 | 95 |
|
@@ -111,11 +125,13 @@ Lua-RS has achieved **production-ready performance** with **252/252 tests passin |
111 | 125 | - Implementation: `unsafe { (*table_ptr).borrow().next(&index_val) }` |
112 | 126 | - Impact: 3.867s β 1.449s (**2.7x improvement**) |
113 | 127 |
|
114 | | -**Results**: |
115 | | -- Hash table insertion (100k): Now **2.6x faster** than native Lua |
116 | | -- ipairs iteration: Improved to **85% of native** (was 55%) |
117 | | -- Function calls: **1.4x faster** than native (simple calls) |
118 | | -- All arithmetic operations: **108-120% of native** |
| 128 | +**Results** (compared to release native Lua 5.4.6): |
| 129 | +- Hash table insertion (100k): **1.2x faster** (0.066s vs 0.079s) |
| 130 | +- ipairs iteration: **29% of native** (11.316s vs 3.241s) - still needs optimization |
| 131 | +- Function calls: **41% of native** (13.77M vs 33.33M calls/sec) |
| 132 | +- Arithmetic operations: **35-42% of native** (consistent dispatch overhead) |
| 133 | + |
| 134 | +**Note**: Earlier results showing >100% performance were comparing against debug-build native Lua. The current results reflect comparison with release-build native Lua 5.4.6, providing a realistic performance baseline. |
119 | 135 |
|
120 | 136 | **Architecture Decision**: |
121 | 137 | - Hot paths (VM execution): Use direct pointers for O(1) access |
@@ -272,8 +288,8 @@ For loop: FORLOOP + body (1 instruction/iteration) |
272 | 288 | **Result**: |
273 | 289 | - While loop: 41.23 β 44.12 M/s (+7.0%, **53.8% of native**) |
274 | 290 | - Repeat-until: 45.43 β 50.51 M/s (+11.2%, **56.6% of native**) |
275 | | -- Integer addition: 115.79 β 123.22 M/s (+6.4%, **105% of native!** π) |
276 | | -- **Bonus**: Integer arithmetic now **faster than native Lua**! |
| 291 | +- Integer addition: 115.79 β 123.22 M/s (+6.4%) |
| 292 | +- **Note**: These percentages were measured against debug-build native Lua. Against release-build native Lua, performance is approximately 35-42% for arithmetic operations. |
277 | 293 |
|
278 | 294 | **Why not as fast as for-loops?** |
279 | 295 | - For-loops: 1 optimized instruction per iteration |
|
0 commit comments