Skip to content

Commit b6673f3

Browse files
committed
update performance report
1 parent 75dad4b commit b6673f3

File tree

2 files changed

+75
-60
lines changed

2 files changed

+75
-60
lines changed

β€ŽPERFORMANCE_REPORT.mdβ€Ž

Lines changed: 65 additions & 49 deletions
Original file line numberDiff line numberDiff line change
@@ -2,80 +2,94 @@
22

33
## Executive Summary
44

5-
Lua-RS has achieved **production-ready performance** with **252/252 tests passing (100%)**. After systematic optimizations including control flow optimization, function call optimization (eliminating HashMap lookups), and recent C function call + hash table optimizations, the interpreter now delivers **65-120% of native Lua 5.4.6 performance** across most operations, with several areas **exceeding native performance**.
5+
Lua-RS has achieved **production-ready correctness** with **252/252 tests passing (100%)**. After systematic optimizations including control flow optimization, function call optimization (eliminating HashMap lookups), and recent C function call + hash table optimizations, the interpreter now delivers **22-69% of native Lua 5.4.6 performance** across most operations, with **hash table insertion and string.gsub outperforming native Lua** by 20-50%.
66

77
## Latest Performance Results (November 24, 2025)
88

99

1010
### Arithmetic Operations
1111
| Operation | Lua-RS | Native Lua | % of Native | Status |
1212
|-----------|--------|-----------|-------------|--------|
13-
| Integer addition | **74.89 M/s** | 62.11 M/s | **120.6%** πŸ† | **Faster!** |
14-
| Float multiplication | **65.59 M/s** | 60.98 M/s | **107.6%** πŸ† | **Faster!** |
15-
| Mixed operations | **40.78 M/s** | 37.17 M/s | **109.7%** πŸ† | **Faster!** |
13+
| Integer addition | **74.45 M/s** | 212.77 M/s | **35.0%** | Good |
14+
| Float multiplication | **63.42 M/s** | 169.49 M/s | **37.4%** | Good |
15+
| Mixed operations | **40.50 M/s** | 96.15 M/s | **42.1%** | Good |
1616

1717
### Function Calls
1818
| Operation | Lua-RS | Native Lua | % of Native | Status |
1919
|-----------|--------|-----------|-------------|--------|
20-
| Simple function call | **13.39 M/s** | 9.35 M/s | **143.2%** πŸ† | **1.4x Faster!** |
21-
| Recursive fib(25) | **0.031s** | 0.015s | **48.4%** | Good |
22-
| Vararg function | **0.59 M/s** | 0.69 M/s | **85.5%** | Excellent |
20+
| Simple function call | **13.77 M/s** | 33.33 M/s | **41.3%** | Good |
21+
| Recursive fib(25) | **0.031s** | 0.008s | **25.8%** | Needs optimization |
22+
| Vararg function | **0.60 M/s** | 2.12 M/s | **28.3%** | Needs optimization |
2323

2424
### Table Operations
2525
| Operation | Lua-RS | Native Lua | % of Native | Status |
2626
|-----------|--------|-----------|-------------|--------|
27-
| Array creation & access | **1.48 M/s** | 1.56 M/s | **94.9%** | Excellent |
28-
| Table insertion | **25.24 M/s** | 24.39 M/s | **103.5%** πŸ† | **Faster!** |
29-
| Table access | **33.98 M/s** | 37.04 M/s | **91.7%** | Excellent |
30-
| Hash table insertion (100k) | **0.065s** | 0.168s | **258%** πŸ† | **2.6x Faster!** |
31-
| ipairs iteration (100Γ—1M) | **11.647s** | 9.851s | **84.6%** | Excellent |
27+
| Array creation & access | **1.40 M/s** | 5.95 M/s | **23.5%** | Needs optimization |
28+
| Table insertion | **22.89 M/s** | 33.33 M/s | **68.7%** | Good |
29+
| Table access | **32.35 M/s** | 125.00 M/s | **25.9%** | Needs optimization |
30+
| Hash table insertion (100k) | **0.066s** | 0.079s | **119.7%** πŸ† | **1.2x Faster!** |
31+
| ipairs iteration (100Γ—1M) | **11.316s** | 3.241s | **28.6%** | Needs optimization |
3232

3333
### String Operations
3434
| Operation | Lua-RS | Native Lua | % of Native | Status |
3535
|-----------|--------|-----------|-------------|--------|
36-
| String concatenation | **588.72 K/s** | 699.30 K/s | **84.2%** | Excellent |
37-
| String length | **77.84 M/s** | 50.00 M/s | **155.7%** πŸ† | **1.6x Faster!** |
38-
| string.sub | **2629.68 K/s** | 5000.00 K/s | **52.6%** | Good |
39-
| string.find | **5197.07 K/s** | 3333.33 K/s | **155.9%** πŸ† | **1.6x Faster!** |
40-
| string.gsub (10k) | **0.130s** | 0.456s | **351%** πŸ† | **3.5x Faster!** |
36+
| String concatenation | **563.78 K/s** | 2564.10 K/s | **22.0%** | Needs optimization |
37+
| String length | **77.07 M/s** | ∞ M/s | **N/A** | - |
38+
| string.sub | **2647.65 K/s** | 14285.71 K/s | **18.5%** | Needs optimization |
39+
| string.find | **5275.90 K/s** | 14285.71 K/s | **36.9%** | Good |
40+
| string.gsub (10k) | **0.134s** | 0.201s | **150%** πŸ† | **1.5x Faster!** |
4141

4242
### Control Flow
4343
| Operation | Lua-RS | Native Lua | % of Native | Status |
4444
|-----------|--------|-----------|-------------|--------|
45-
| If-else | **28.77 M/s** | 25.77 M/s | **111.6%** πŸ† | **Faster!** |
46-
| While loop | **31.30 M/s** | 45.05 M/s | **69.5%** | Good |
47-
| Repeat-until | **34.68 M/s** | 51.28 M/s | **67.6%** | Good |
48-
| Nested loops (1000Γ—1000) | **78.66 M/s** | 62.50 M/s | **125.9%** πŸ† | **1.3x Faster!** |
45+
| If-else | **28.95 M/s** | 53.48 M/s | **54.1%** | Good |
46+
| While loop | **30.96 M/s** | 121.95 M/s | **25.4%** | Needs optimization |
47+
| Repeat-until | **31.40 M/s** | 142.86 M/s | **22.0%** | Needs optimization |
48+
| Nested loops (1000Γ—1000) | **84.18 M/s** | 200.00 M/s | **42.1%** | Good |
49+
50+
## Important Note on Performance Testing
51+
52+
**Testing Methodology Change (November 24, 2025)**:
53+
54+
Earlier performance reports compared lua-rs against **debug-build native Lua**, which artificially inflated our performance percentages. This led to misleading results showing lua-rs "faster" than native Lua in arithmetic operations (108-120%).
55+
56+
**Current results** now use **release-build native Lua 5.4.6** as the baseline, providing realistic performance comparisons:
57+
58+
- **Arithmetic operations**: ~35-42% of native (was incorrectly shown as 108-120%)
59+
- **Function calls**: ~41% of native (was incorrectly shown as 143%)
60+
- **Hash operations**: Still competitive at 120% (0.066s vs 0.079s)
61+
- **string.gsub**: Still faster at 150% (0.134s vs 0.201s)
62+
63+
This correction provides a more accurate picture of lua-rs performance and identifies genuine optimization opportunities.
4964

5065
## Performance Highlights
5166

52-
πŸ† **8 operations now exceed native Lua performance (100-351%)**:
53-
- String operations: gsub **3.5x faster**, length & find **1.6x faster**
54-
- Hash table insertion: **2.6x faster** (0.065s vs 0.168s)
55-
- Function calls: **1.4x faster** (simple calls)
56-
- Nested loops: **1.3x faster**
57-
- Arithmetic: **8-20% faster** (integer, float, mixed)
58-
- Basic control flow: if-else **12% faster**
67+
πŸ† **2 operations exceed native Lua performance**:
68+
- Hash table insertion: **1.2x faster** (0.066s vs 0.079s)
69+
- string.gsub: **1.5x faster** (0.134s vs 0.201s)
5970

60-
🎯 **Most operations at 80-100% of native performance**:
61-
- Table operations: 85-103% (ipairs, array access, insertions)
62-
- String operations: 84% (concatenation)
63-
- Function calls: 86% (varargs)
71+
🎯 **Good performance (35-70% of native)**:
72+
- Arithmetic operations: 35-42% (consistent overhead from dispatch)
73+
- Table insertion: 69% (good)
74+
- Function calls: 41% (simple calls)
75+
- Control flow: 25-54% (needs loop optimization)
6476

65-
πŸ“Š **Areas for future optimization**:
66-
- String.sub: 53% (buffered string building)
67-
- While/repeat loops: 68-70% (loop detection overhead)
68-
- Recursive fibonacci: 48% (stack frame overhead)
77+
πŸ“Š **Critical areas for optimization**:
78+
- Table access: 26% (cacheline/memory layout)
79+
- ipairs iteration: 29% (iterator overhead)
80+
- While/repeat loops: 22-25% (dispatch overhead)
81+
- String operations: 19-37% (allocation/copying)
82+
- Vararg functions: 28% (argument handling)
6983

7084
## Key Achievements
7185

72-
1. **Production Quality**: 252/252 tests passing, stable performance
86+
1. **Production Quality**: 252/252 tests passing, stable correctness
7387
2. **Memory Safety**: Validated direct pointer access for hot paths
74-
3. **C Function Optimization**: Eliminated parameter/return copying (40% improvement in ipairs)
75-
4. **Hash Table Optimization**: Lua-style open addressing with O(1) load factor checks (145x faster insertion)
76-
5. **Iterator Optimization**: Direct pointer access in pairs/next (2.7x improvement)
77-
6. **Arithmetic Excellence**: Integer operations faster than native C implementation
78-
7. **String Operations**: Pattern matching and replacement 1.6-3.5x faster
88+
3. **C Function Optimization**: Eliminated parameter/return copying (significant improvement)
89+
4. **Hash Table Optimization**: Lua-style open addressing with O(1) load factor checks (now competitive with native)
90+
5. **Iterator Optimization**: Direct pointer access in pairs/next
91+
6. **Competitive Areas**: Hash table insertion and string.gsub outperform native Lua
92+
7. **Stable Foundation**: Ready for further performance optimizations
7993

8094
---
8195

@@ -111,11 +125,13 @@ Lua-RS has achieved **production-ready performance** with **252/252 tests passin
111125
- Implementation: `unsafe { (*table_ptr).borrow().next(&index_val) }`
112126
- Impact: 3.867s β†’ 1.449s (**2.7x improvement**)
113127

114-
**Results**:
115-
- Hash table insertion (100k): Now **2.6x faster** than native Lua
116-
- ipairs iteration: Improved to **85% of native** (was 55%)
117-
- Function calls: **1.4x faster** than native (simple calls)
118-
- All arithmetic operations: **108-120% of native**
128+
**Results** (compared to release native Lua 5.4.6):
129+
- Hash table insertion (100k): **1.2x faster** (0.066s vs 0.079s)
130+
- ipairs iteration: **29% of native** (11.316s vs 3.241s) - still needs optimization
131+
- Function calls: **41% of native** (13.77M vs 33.33M calls/sec)
132+
- Arithmetic operations: **35-42% of native** (consistent dispatch overhead)
133+
134+
**Note**: Earlier results showing >100% performance were comparing against debug-build native Lua. The current results reflect comparison with release-build native Lua 5.4.6, providing a realistic performance baseline.
119135

120136
**Architecture Decision**:
121137
- Hot paths (VM execution): Use direct pointers for O(1) access
@@ -272,8 +288,8 @@ For loop: FORLOOP + body (1 instruction/iteration)
272288
**Result**:
273289
- While loop: 41.23 β†’ 44.12 M/s (+7.0%, **53.8% of native**)
274290
- Repeat-until: 45.43 β†’ 50.51 M/s (+11.2%, **56.6% of native**)
275-
- Integer addition: 115.79 β†’ 123.22 M/s (+6.4%, **105% of native!** πŸ†)
276-
- **Bonus**: Integer arithmetic now **faster than native Lua**!
291+
- Integer addition: 115.79 β†’ 123.22 M/s (+6.4%)
292+
- **Note**: These percentages were measured against debug-build native Lua. Against release-build native Lua, performance is approximately 35-42% for arithmetic operations.
277293

278294
**Why not as fast as for-loops?**
279295
- For-loops: 1 optimized instruction per iteration

β€ŽREADME.mdβ€Ž

Lines changed: 10 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -18,14 +18,13 @@ Current test status: **252 out of 252 tests passing (100%)** βœ…
1818

1919
**Note**: The absolute values of performance tests depend on specific hardware and environment configurations, and may vary. The results in this document were obtained on an older Intel CPU. On newer AMD CPUs, most percentage results tend to be lower, often below 100%, so the conclusions are subject to fluctuation. The current performance results were generated by AI and are not rigorous or broadly representativeβ€”please treat them as illustrative only.
2020

21-
**Overall**: 65-120% of native Lua 5.4.6 performance, with **8 operations exceeding native performance**
21+
**Overall**: 22-69% of native Lua 5.4.6 performance, with **2 operations outperforming native Lua**
2222

2323
**Highlights**:
24-
- πŸ† String operations: gsub **3.5x faster**, length & find **1.6x faster**
25-
- πŸ† Hash table insertion: **2.6x faster** (0.065s vs 0.168s)
26-
- πŸ† Function calls: **1.4x faster** (simple calls)
27-
- πŸ† Arithmetic: **108-120% of native** (integer, float, mixed)
28-
- 🎯 Most operations: **80-100% of native** (excellent compatibility)
24+
- πŸ† Hash table insertion: **1.2x faster** (0.066s vs 0.079s)
25+
- πŸ† string.gsub: **1.5x faster** (pattern matching optimization)
26+
- 🎯 Good performance: Arithmetic (35-42%), table insertion (69%), function calls (41%)
27+
- πŸ“Š Optimization opportunities: Loop dispatch, table access, string operations
2928

3029
See detailed analysis: [Performance Report](PERFORMANCE_REPORT.md)
3130

@@ -185,12 +184,12 @@ The codebase was developed through iterative AI assistance with human oversight.
185184
- βœ… Achieved 100% test compatibility (252/252 tests)
186185
- βœ… Successfully debugged and fixed critical memory safety issues
187186
- βœ… Implemented advanced optimizations (tail calls, hash tables, direct pointers)
188-
- βœ… Reached **production-ready performance** with areas **exceeding native Lua**
187+
- βœ… Reached **production-ready correctness** with **competitive performance in key areas**
189188

190189
### Recent Improvements (November 2025)
191-
- **Phase 18**: C function call optimization (eliminated copying, +40% ipairs performance)
192-
- **Phase 18**: Hash table restructure (Lua-style open addressing, 145x faster insertion)
193-
- **Phase 18**: pairs/next optimization (direct pointers, 2.7x faster iteration)
190+
- **Phase 18**: C function call optimization (eliminated copying)
191+
- **Phase 18**: Hash table restructure (Lua-style open addressing, now 1.2x faster than native)
192+
- **Phase 18**: pairs/next optimization (direct pointers)
194193
- **Phase 17**: Control flow optimization (inline + unsafe direct access)
195194
- Fixed HashMap rehash pointer invalidation bug with Rc wrappers
196195
- Optimized LuaCallFrame size: 152β†’64 bytes (58% reduction)
@@ -214,4 +213,4 @@ MIT License - See [LICENSE](LICENSE) file for details.
214213

215214
---
216215

217-
**Status**: Production-ready with **8 operations exceeding native Lua performance**. Suitable for embedded scripting, game engines, and performance-critical applications.
216+
**Status**: Production-ready correctness (252/252 tests) with competitive performance in hash operations and pattern matching. Suitable for embedded scripting and educational purposes.

0 commit comments

Comments
Β (0)