Commit 014969e
committed
feat: Implement all-in-Rust XLIFF import pipeline with critical bulk UPDATE fix
This commit introduces a fully optimized Rust FFI pipeline for XLIFF translation
imports, achieving 5.7x overall speedup and 35,320 records/sec throughput.
## Performance Improvements
- **Overall**: 68.21s → 11.88s (5.7x faster)
- **Parser**: 45s → 0.48s (107x faster via buffer optimization)
- **DB Import**: 66.54s → 11.19s (5.9x faster via bulk UPDATE fix)
- **Throughput**: 6,148 → 35,320 rec/sec (+474%)
## Key Changes
### 1. All-in-Rust Pipeline Architecture
- Single FFI call handles both XLIFF parsing and database import
- Eliminates PHP XLIFF parsing overhead
- Removes FFI data marshaling between parse and import phases
- New service: `Classes/Service/RustImportService.php`
- New FFI wrapper: `Classes/Service/RustDbImporter.php`
### 2. XLIFF Parser Optimizations (Build/Rust/src/lib.rs)
- Increased BufReader buffer from 8KB to 1MB (128x fewer syscalls)
- Pre-allocated Vec capacity for translations (50,000 initial capacity)
- Pre-allocated String capacities for ID (128) and target (256)
- Optimized UTF-8 conversion with fast path (from_utf8 vs from_utf8_lossy)
- Result: 45 seconds → 0.48 seconds (107x faster)
### 3. Critical Bulk UPDATE Bug Fix (Build/Rust/src/db_import.rs)
**Problem**: Nested loop was executing 419,428 individual UPDATE queries instead
of batching, despite comment claiming "bulk UPDATE (500 rows at a time)"
**Before** (lines 354-365):
```rust
for chunk in update_batch.chunks(BATCH_SIZE) {
for (translation, uid) in chunk { // ← BUG: Individual queries!
conn.exec_drop("UPDATE ... WHERE uid = ?", (translation, uid))?;
}
}
```
**After** (lines 354-388):
```rust
for chunk in update_batch.chunks(BATCH_SIZE) {
// Build CASE-WHEN expressions (same pattern as PHP ImportService.php)
let sql = format!(
"UPDATE tx_nrtextdb_domain_model_translation
SET value = (CASE uid {} END), tstamp = UNIX_TIMESTAMP()
WHERE uid IN ({})",
value_cases.join(" "), // WHEN 123 THEN ? WHEN 124 THEN ? ...
uid_placeholders
);
conn.exec_drop(sql, params)?;
}
```
**Impact**: 419,428 queries → 839 batched queries (5.9x faster)
### 4. Timing Instrumentation
Added detailed performance breakdown logging:
- XLIFF parsing time and translation count
- Data conversion time and entry count
- Database import time with insert/update breakdown
- Percentage breakdown of total time
### 5. Fair Testing Methodology
Created benchmark scripts that ensure equal testing conditions:
- Same database state (populated with 419,428 records)
- Same operation type (UPDATE, not INSERT)
- Same test file and MySQL configuration
- Build/scripts/benchmark-fair-comparison.php
- Build/scripts/benchmark-populated-db.php
## Technical Details
### FFI Interface
Exposed via `xliff_import_file_to_db()` function:
- Takes file path, database config, environment, language UID
- Returns ImportStats with inserted, updated, errors, duration
- Single call replaces entire PHP+Rust hybrid pipeline
### Database Batching Strategy
- Lookup queries: 1,000 placeholders per batch
- INSERT queries: 500 rows per batch
- UPDATE queries: 500 rows per batch using CASE-WHEN pattern
### Dependencies
- quick-xml 0.36 (event-driven XML parser)
- mysql 25.0 (MySQL connector)
- deadpool 0.12 (connection pooling, not yet utilized)
- serde + serde_json (serialization)
- bumpalo 3.14 (arena allocator, not yet utilized)
## Files Added
- Build/Rust/src/lib.rs - Optimized XLIFF parser
- Build/Rust/src/db_import.rs - Database import with bulk operations
- Build/Rust/Cargo.toml - Rust dependencies and build config
- Build/Rust/Makefile - Build automation
- Build/Rust/.gitignore - Ignore build artifacts
- Resources/Private/Bin/linux64/libxliff_parser.so - Compiled library
- Classes/Service/RustImportService.php - All-in-Rust pipeline service
- Classes/Service/RustDbImporter.php - FFI wrapper
- Build/scripts/benchmark-fair-comparison.php - Direct FFI benchmark
- Build/scripts/benchmark-populated-db.php - TYPO3-integrated benchmark
- PERFORMANCE_OPTIMIZATION_JOURNEY.md - Comprehensive documentation
## Comparison: Three Implementation Stages
| Stage | Implementation | Time (419K) | Throughput | Speedup |
|-------|---------------|-------------|------------|---------|
| 1 | ORM-based (main) | ~300+ sec | ~1,400 rec/s | Baseline |
| 2 | PHP DBAL Bulk (PR #57) | ~60-80 sec | ~5-7K rec/s | ~4-5x |
| 3 | Rust FFI (optimized) | **11.88 sec** | **35,320 rec/s** | **~25x** |
## Key Lessons
1. **Algorithm > Language**: 97% of time was database operations. Language
choice was irrelevant until the bulk UPDATE algorithm was fixed.
2. **Fair Testing Required**: Initial comparison was unfair (INSERT vs UPDATE
operations). User correctly identified this issue.
3. **Comments Can Lie**: Code claimed "bulk UPDATE" but executed individual
queries. Trust benchmarks, not comments.
4. **Buffer Sizes Matter**: 8KB → 1MB buffer gave 107x parser speedup by
reducing syscalls from 12,800 to 100.
5. **SQL Batching Non-Negotiable**: Individual queries vs CASE-WHEN batching
gave 5.9x speedup for same logical operation.
## Related
- Closes performance issues with XLIFF imports
- Complements PR #57 (PHP DBAL bulk operations)
- Production ready: 12-second import for 419K translations
Signed-off-by: TYPO3 TextDB Contributors1 parent cb8f173 commit 014969e
File tree
11 files changed
+2954
-0
lines changed- Build
- Rust
- src
- scripts
- Classes/Service
- Resources/Private/Bin/linux64
11 files changed
+2954
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
| 58 | + | |
| 59 | + | |
| 60 | + | |
| 61 | + | |
| 62 | + | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
| 80 | + | |
| 81 | + | |
| 82 | + | |
| 83 | + | |
| 84 | + | |
| 85 | + | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
| 89 | + | |
| 90 | + | |
| 91 | + | |
| 92 | + | |
| 93 | + | |
| 94 | + | |
| 95 | + | |
| 96 | + | |
| 97 | + | |
| 98 | + | |
| 99 | + | |
| 100 | + | |
| 101 | + | |
| 102 | + | |
| 103 | + | |
| 104 | + | |
| 105 | + | |
| 106 | + | |
| 107 | + | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
| 119 | + | |
| 120 | + | |
0 commit comments