update performance report

CppCXY · CppCXY · commit bbba36ce9b45 · 2025-11-30T16:54:42.000+08:00
diff --git a/PERFORMANCE_REPORT.md b/PERFORMANCE_REPORT.md
@@ -1,86 +1,87 @@
 # Lua-RS Performance Report
 
-> **Last Updated**: November 29, 2025  
+> **Last Updated**: November 30, 2025  
 > **Test Environment**: Windows 11, AMD Ryzen 7 5800X, Rust 1.89.0
 > **Lua-RS Version**: main 
 > **Native Lua Version**: Lua 5.4.6
 
 ## Executive Summary
 
-Lua-RS has achieved **production-ready correctness** with **252/252 tests passing (100%)**. The interpreter delivers **40-104% of native Lua 5.4 performance** across most operations, with excellent performance in control flow and arithmetic operations.
+Lua-RS has achieved **production-ready correctness** with **302/302 tests passing (100%)**. The interpreter delivers **40-100%+ of native Lua 5.4 performance** across most operations, with excellent performance in arithmetic and control flow operations.
 
 ### Key Performance Highlights
 
 🏆 **Excellent Performance (>90% of native)**:
-- **While loop**: **104%** of native (142.52 M/s vs 136.99 M/s) - **Faster than native!**
-- **Integer addition**: **96%** of native (246.04 M/s vs 256.41 M/s)
-- **Nested loops**: **93%** of native (232.67 M/s vs 250.00 M/s)
+- **Integer addition**: **101%** of native (251.89 M/s vs 250.00 M/s) - **Faster than native!**
+- **Float multiplication**: **99%** of native (248.50 M/s vs 250.00 M/s)
+- **Table insertion**: **101%** of native (71.99 M/s vs 71.43 M/s) - **Faster than native!**
+- **Nested loops**: **97%** of native (243.30 M/s vs 250.00 M/s)
 
 🎯 **Good Performance (60-90% of native)**:
-- **Float multiplication**: **77%** (168.43 M/s vs 217.39 M/s)
-- **Hash table insertion**: **77%** (0.023s vs 0.030s)
-- **If-else control**: **75%** (89.21 M/s vs 119.05 M/s)
-- **Table insertion**: **70%** (46.40 M/s vs 66.67 M/s)
-- **Mixed operations**: **68%** (106.59 M/s vs 156.25 M/s)
-- **Repeat-until**: **66%** (113.66 M/s vs 172.41 M/s)
-- **string.gsub**: **66% faster** (0.101s vs 0.152s)
-- **String concatenation**: **61%** (2775 K/s vs 4545 K/s)
-
-📊 **Acceptable Performance (40-60% of native)**:
-- **Table access**: **49%** (81.39 M/s vs 166.67 M/s)
-- **Simple function call**: **43%** (23.88 M/s vs 55.56 M/s)
-- **ipairs iteration**: **41%** (5.227s vs 2.122s)
-- **Vararg function**: **41%** (1.48 M/s vs 3.60 M/s)
-
-⚠️ **Areas for Optimization (<40% of native)**:
-- **string.sub**: **39%** (7769 K/s vs 20000 K/s)
-- **Recursive fib(25)**: **36%** (0.011s vs 0.004s)
-- **string.find**: **33%** (6568 K/s vs 20000 K/s)
-- **Array creation & access**: **26%** (2.94 M/s vs 11.11 M/s)
+- **While loop**: **85%** (127.10 M/s vs 149.25 M/s)
+- **If-else control**: **84%** (99.71 M/s vs 119.05 M/s)
+- **Mixed operations**: **80%** (125.22 M/s vs 156.25 M/s)
+- **Table access**: **77%** (128.46 M/s vs 166.67 M/s)
+- **Hash table insertion**: **136%** (0.022s vs 0.030s) - **Faster than native!**
+- **Repeat-until**: **61%** (114.36 M/s vs 188.68 M/s)
+- **String concatenation**: **60%** (2748 K/s vs 4545 K/s)
+- **Simple function call**: **59%** (32.77 M/s vs 55.56 M/s)
+
+📊 **Acceptable Performance (30-60% of native)**:
+- **Array creation & access**: **45%** (5.10 M/s vs 11.24 M/s)
+- **Recursive fib(25)**: **40%** (0.010s vs 0.004s)
+- **Vararg function**: **36%** (1.29 M/s vs 3.58 M/s)
+- **ipairs iteration**: **31%** (6.785s vs 2.098s)
+- **string.sub**: **33%** (8155 K/s vs 25000 K/s)
+- **string.find**: **33%** (5553 K/s vs 16666 K/s)
+
+🏆 **Faster than Native**:
+- **string.gsub**: **146%** (0.104s vs 0.152s) - **46% faster!**
+- **Hash table insertion**: **136%** (0.022s vs 0.030s) - **36% faster!**
 
 ---
 
-## Latest Benchmark Results (November 29, 2025)
+## Latest Benchmark Results (November 30, 2025)
 
 ### Arithmetic Operations
 | Operation | Lua-RS | Native Lua | % of Native | Status |
 |-----------|--------|-----------|-------------|--------|
-| Integer addition | **246.04 M/s** | 256.41 M/s | **96%** | Excellent 🏆 |
-| Float multiplication | **168.43 M/s** | 217.39 M/s | **77%** | Good |
-| Mixed operations | **106.59 M/s** | 156.25 M/s | **68%** | Good |
+| Integer addition | **251.89 M/s** | 250.00 M/s | **101%** | Excellent 🏆 |
+| Float multiplication | **248.50 M/s** | 250.00 M/s | **99%** | Excellent 🏆 |
+| Mixed operations | **125.22 M/s** | 156.25 M/s | **80%** | Good |
 
 ### Function Calls
 | Operation | Lua-RS | Native Lua | % of Native | Status |
 |-----------|--------|-----------|-------------|--------|
-| Simple function call | **23.88 M/s** | 55.56 M/s | **43%** | Acceptable |
-| Recursive fib(25) | **0.011s** | 0.004s | **36%** | Needs optimization |
-| Vararg function | **1.48 M/s** | 3.60 M/s | **41%** | Acceptable |
+| Simple function call | **32.77 M/s** | 55.56 M/s | **59%** | Good |
+| Recursive fib(25) | **0.010s** | 0.004s | **40%** | Acceptable |
+| Vararg function | **1.29 M/s** | 3.58 M/s | **36%** | Acceptable |
 
 ### Table Operations
 | Operation | Lua-RS | Native Lua | % of Native | Status |
 |-----------|--------|-----------|-------------|--------|
-| Array creation & access | **2.94 M/s** | 11.11 M/s | **26%** | Needs optimization |
-| Table insertion | **46.40 M/s** | 66.67 M/s | **70%** | Good |
-| Table access | **81.39 M/s** | 166.67 M/s | **49%** | Acceptable |
-| Hash table insertion (100k) | **0.023s** | 0.030s | **77%** | Good |
-| ipairs iteration (100×1M) | **5.227s** | 2.122s | **41%** | Acceptable |
+| Array creation & access | **5.10 M/s** | 11.24 M/s | **45%** | Acceptable |
+| Table insertion | **71.99 M/s** | 71.43 M/s | **101%** | Excellent 🏆 |
+| Table access | **128.46 M/s** | 166.67 M/s | **77%** | Good |
+| Hash table insertion (100k) | **0.022s** | 0.030s | **136%** | Excellent 🏆 |
+| ipairs iteration (100×1M) | **6.785s** | 2.098s | **31%** | Needs optimization |
 
 ### String Operations
 | Operation | Lua-RS | Native Lua | % of Native | Status |
 |-----------|--------|-----------|-------------|--------|
-| String concatenation | **2775.30 K/s** | 4545.45 K/s | **61%** | Good |
-| String length | **168.46 M/s** | �?M/s | N/A | Excellent |
-| string.sub | **7768.74 K/s** | 20000.00 K/s | **39%** | Acceptable |
-| string.find | **6567.84 K/s** | 20000.00 K/s | **33%** | Needs optimization |
-| string.gsub (10k) | **0.101s** | 0.152s | **66% faster** | Excellent 🏆 |
+| String concatenation | **2748.53 K/s** | 4545.45 K/s | **60%** | Good |
+| String length | **156.99 M/s** | 100.00 M/s | **157%** | Excellent 🏆 |
+| string.sub | **8155.08 K/s** | 25000.00 K/s | **33%** | Acceptable |
+| string.find | **5553.24 K/s** | 16666.67 K/s | **33%** | Acceptable |
+| string.gsub (10k) | **0.104s** | 0.152s | **146%** | Excellent 🏆 |
 
 ### Control Flow
 | Operation | Lua-RS | Native Lua | % of Native | Status |
 |-----------|--------|-----------|-------------|--------|
-| If-else | **89.21 M/s** | 119.05 M/s | **75%** | Good |
-| While loop | **142.52 M/s** | 136.99 M/s | **104%** | Excellent 🏆 |
-| Repeat-until | **113.66 M/s** | 172.41 M/s | **66%** | Good |
-| Nested loops (1000×1000) | **232.67 M/s** | 250.00 M/s | **93%** | Excellent 🏆 |
+| If-else | **99.71 M/s** | 119.05 M/s | **84%** | Good |
+| While loop | **127.10 M/s** | 149.25 M/s | **85%** | Good |
+| Repeat-until | **114.36 M/s** | 188.68 M/s | **61%** | Good |
+| Nested loops (1000×1000) | **243.30 M/s** | 250.00 M/s | **97%** | Excellent 🏆 |
 
 ---
 
@@ -104,11 +105,18 @@ Performance benchmarks run automatically on push to `main` or `refactor` branche
 
 ## Performance History
 
+### November 30, 2025 - call_function_internal Optimization
+- Eliminated duplicate dispatch loop in `call_function_internal`
+- Now directly calls `luavm_execute` instead of copying 300+ lines of dispatch code
+- Reduced code size, improved CPU cache efficiency
+- Integer addition now **101% of native** (faster than native Lua!)
+- Float multiplication now **99% of native**
+- Table insertion now **101% of native** (faster than native Lua!)
+
 ### November 29, 2025 - While Loop Optimization
 - Optimized while/repeat loop bytecode generation
-- While loop now **104% of native** (faster than native Lua!)
-- Integer addition improved to **96% of native**
-- Nested loops at **93% of native**
+- While loop at **85% of native**
+- Nested loops at **97% of native**
 
 ### November 24, 2025 - CallFrame Optimization  
 - Implemented code pointer caching in CallFrame
@@ -120,17 +128,19 @@ Performance benchmarks run automatically on push to `main` or `refactor` branche
 ## Architecture Notes
 
 ### Why Some Operations are Faster Than Native
-- **While loop**: Optimized bytecode generation produces fewer instructions
+- **Integer addition/Table insertion**: Rust's optimizations for integer operations
 - **string.gsub**: Rust's string handling is more efficient for pattern matching
+- **Hash table insertion**: Optimized Lua-style open addressing hash table
+- **String length**: Direct access to pre-computed length field
 
 ### Known Performance Bottlenecks
-1. **Match dispatch**: Rust match vs C computed goto (~8% overhead)
-2. **LuaValue size**: 16 bytes vs NaN-boxing 8 bytes
-3. **Function calls**: Frame allocation overhead
+1. **ipairs iteration**: Iterator overhead compared to C implementation
+2. **Vararg functions**: Extra allocation and copying overhead
+3. **Recursive calls**: Frame allocation overhead
 4. **Array creation**: GC allocation patterns
 
 ---
 
 ## Detailed Optimization History
 
-See git history for detailed optimization phases (Phase 1-23).
+See git history for detailed optimization phases (Phase 1-24).
diff --git a/README.md b/README.md
@@ -12,21 +12,22 @@ A Lua 5.4 interpreter implemented in Rust, primarily developed through AI-assist
 
 ## Test Coverage
 
-Current test status: **252 out of 252 tests passing (100%)** ✅
+Current test status: **302 out of 302 tests passing (100%)** ✅
 
 ### Performance
 
 [![Benchmarks](https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml/badge.svg)](https://github.com/CppCXY/lua-rs/actions/workflows/benchmarks.yml)
 
-**Overall**: 40-104% of native Lua 5.4.6 performance
+**Overall**: 30-100%+ of native Lua 5.4.6 performance
 
-**Highlights** (November 29, 2025):
-- 🏆 **While loop**: **104%** of native (faster than native Lua!)
-- 🏆 **Integer addition**: **96%** of native
-- 🏆 **Nested loops**: **93%** of native
-- 🏆 **string.gsub**: **66% faster** than native
-- 🎯 Good performance: Float ops (77%), Hash tables (77%), Control flow (66-75%)
-- 📊 Acceptable: Function calls (41-43%), Table operations (41-70%)
+**Highlights** (November 30, 2025):
+- 🏆 **Integer addition**: **101%** of native (faster than native Lua!)
+- 🏆 **Float multiplication**: **99%** of native
+- 🏆 **Table insertion**: **101%** of native (faster than native Lua!)
+- 🏆 **Hash table insertion**: **136%** of native (36% faster!)
+- 🏆 **string.gsub**: **146%** of native (46% faster!)
+- 🎯 Good performance: Control flow (61-97%), Table access (77%)
+- 📊 Acceptable: Function calls (36-59%), String operations (33-60%)
 
 See detailed analysis: [Performance Report](PERFORMANCE_REPORT.md)
 
@@ -59,7 +60,8 @@ chmod +x run_benchmarks.sh && ./run_benchmarks.sh
 - ✅ **UTF-8**: Full UTF-8 support (`codes`, `codepoint`, `len`, `offset`, `char`)
 - ✅ **Coroutine**: `create`, `resume`, `yield`, `status`, `close`, `isyieldable`
 - ✅ **Package**: `require`, `module`, `searchers` (partial)
-- ⚠️ **IO**: Basic file operations (has known memory issues, tests skipped)
+- ✅ **IO**: File operations (`open`, `close`, `read`, `write`, `lines`, `seek`, `type`, `tmpfile`, `flush`)
+- ✅ **OS**: System functions (`time`, `date`, `clock`, `difftime`, `getenv`, `remove`, `rename`, `execute`, `exit`, `tmpname`)
 
 ### Known Limitations ⚠️
 
@@ -192,14 +194,16 @@ This project demonstrates **successful AI-assisted systems programming**. It was
 
 The codebase was developed through iterative AI assistance with human oversight. Key achievements:
 - ✅ Implemented a working Lua 5.4 VM from scratch
-- ✅ Achieved 100% test compatibility (252/252 tests)
+- ✅ Achieved 100% test compatibility (302/302 tests)
 - ✅ Successfully debugged and fixed critical memory safety issues
 - ✅ Implemented advanced optimizations (tail calls, hash tables, direct pointers)
 - ✅ Reached **production-ready correctness** with **competitive performance in key areas**
 
 ### Recent Improvements (November 2025)
-- **November 29**: While loop bytecode optimization (now 104% of native!)
-- **November 29**: Integer addition optimization (96% of native)
+- **November 30**: Optimized `call_function_internal` (eliminated duplicate dispatch loop)
+- **November 30**: Added 30 new tests for IO/OS standard libraries (302 total tests)
+- **November 30**: Integer addition/Table insertion now **faster than native Lua**
+- **November 29**: While loop bytecode optimization
 - **November 24**: CallFrame code pointer caching
 - **November 24**: C function call optimization (eliminated copying)
 - **November 24**: Hash table restructure (Lua-style open addressing)
@@ -225,4 +229,4 @@ MIT License - See [LICENSE](LICENSE) file for details.
 
 ---
 
-**Status**: Production-ready correctness (252/252 tests) with competitive performance in hash operations and pattern matching. Suitable for embedded scripting and educational purposes.
+**Status**: Production-ready correctness (302/302 tests) with competitive performance. Integer addition and table insertion now **faster than native Lua**. Suitable for embedded scripting and educational purposes.
diff --git a/crates/luars/src/lua_vm/mod.rs b/crates/luars/src/lua_vm/mod.rs