Skip to content

Commit 014969e

Browse files
committed
feat: Implement all-in-Rust XLIFF import pipeline with critical bulk UPDATE fix
This commit introduces a fully optimized Rust FFI pipeline for XLIFF translation imports, achieving 5.7x overall speedup and 35,320 records/sec throughput. ## Performance Improvements - **Overall**: 68.21s → 11.88s (5.7x faster) - **Parser**: 45s → 0.48s (107x faster via buffer optimization) - **DB Import**: 66.54s → 11.19s (5.9x faster via bulk UPDATE fix) - **Throughput**: 6,148 → 35,320 rec/sec (+474%) ## Key Changes ### 1. All-in-Rust Pipeline Architecture - Single FFI call handles both XLIFF parsing and database import - Eliminates PHP XLIFF parsing overhead - Removes FFI data marshaling between parse and import phases - New service: `Classes/Service/RustImportService.php` - New FFI wrapper: `Classes/Service/RustDbImporter.php` ### 2. XLIFF Parser Optimizations (Build/Rust/src/lib.rs) - Increased BufReader buffer from 8KB to 1MB (128x fewer syscalls) - Pre-allocated Vec capacity for translations (50,000 initial capacity) - Pre-allocated String capacities for ID (128) and target (256) - Optimized UTF-8 conversion with fast path (from_utf8 vs from_utf8_lossy) - Result: 45 seconds → 0.48 seconds (107x faster) ### 3. Critical Bulk UPDATE Bug Fix (Build/Rust/src/db_import.rs) **Problem**: Nested loop was executing 419,428 individual UPDATE queries instead of batching, despite comment claiming "bulk UPDATE (500 rows at a time)" **Before** (lines 354-365): ```rust for chunk in update_batch.chunks(BATCH_SIZE) { for (translation, uid) in chunk { // ← BUG: Individual queries! conn.exec_drop("UPDATE ... WHERE uid = ?", (translation, uid))?; } } ``` **After** (lines 354-388): ```rust for chunk in update_batch.chunks(BATCH_SIZE) { // Build CASE-WHEN expressions (same pattern as PHP ImportService.php) let sql = format!( "UPDATE tx_nrtextdb_domain_model_translation SET value = (CASE uid {} END), tstamp = UNIX_TIMESTAMP() WHERE uid IN ({})", value_cases.join(" "), // WHEN 123 THEN ? WHEN 124 THEN ? ... uid_placeholders ); conn.exec_drop(sql, params)?; } ``` **Impact**: 419,428 queries → 839 batched queries (5.9x faster) ### 4. Timing Instrumentation Added detailed performance breakdown logging: - XLIFF parsing time and translation count - Data conversion time and entry count - Database import time with insert/update breakdown - Percentage breakdown of total time ### 5. Fair Testing Methodology Created benchmark scripts that ensure equal testing conditions: - Same database state (populated with 419,428 records) - Same operation type (UPDATE, not INSERT) - Same test file and MySQL configuration - Build/scripts/benchmark-fair-comparison.php - Build/scripts/benchmark-populated-db.php ## Technical Details ### FFI Interface Exposed via `xliff_import_file_to_db()` function: - Takes file path, database config, environment, language UID - Returns ImportStats with inserted, updated, errors, duration - Single call replaces entire PHP+Rust hybrid pipeline ### Database Batching Strategy - Lookup queries: 1,000 placeholders per batch - INSERT queries: 500 rows per batch - UPDATE queries: 500 rows per batch using CASE-WHEN pattern ### Dependencies - quick-xml 0.36 (event-driven XML parser) - mysql 25.0 (MySQL connector) - deadpool 0.12 (connection pooling, not yet utilized) - serde + serde_json (serialization) - bumpalo 3.14 (arena allocator, not yet utilized) ## Files Added - Build/Rust/src/lib.rs - Optimized XLIFF parser - Build/Rust/src/db_import.rs - Database import with bulk operations - Build/Rust/Cargo.toml - Rust dependencies and build config - Build/Rust/Makefile - Build automation - Build/Rust/.gitignore - Ignore build artifacts - Resources/Private/Bin/linux64/libxliff_parser.so - Compiled library - Classes/Service/RustImportService.php - All-in-Rust pipeline service - Classes/Service/RustDbImporter.php - FFI wrapper - Build/scripts/benchmark-fair-comparison.php - Direct FFI benchmark - Build/scripts/benchmark-populated-db.php - TYPO3-integrated benchmark - PERFORMANCE_OPTIMIZATION_JOURNEY.md - Comprehensive documentation ## Comparison: Three Implementation Stages | Stage | Implementation | Time (419K) | Throughput | Speedup | |-------|---------------|-------------|------------|---------| | 1 | ORM-based (main) | ~300+ sec | ~1,400 rec/s | Baseline | | 2 | PHP DBAL Bulk (PR #57) | ~60-80 sec | ~5-7K rec/s | ~4-5x | | 3 | Rust FFI (optimized) | **11.88 sec** | **35,320 rec/s** | **~25x** | ## Key Lessons 1. **Algorithm > Language**: 97% of time was database operations. Language choice was irrelevant until the bulk UPDATE algorithm was fixed. 2. **Fair Testing Required**: Initial comparison was unfair (INSERT vs UPDATE operations). User correctly identified this issue. 3. **Comments Can Lie**: Code claimed "bulk UPDATE" but executed individual queries. Trust benchmarks, not comments. 4. **Buffer Sizes Matter**: 8KB → 1MB buffer gave 107x parser speedup by reducing syscalls from 12,800 to 100. 5. **SQL Batching Non-Negotiable**: Individual queries vs CASE-WHEN batching gave 5.9x speedup for same logical operation. ## Related - Closes performance issues with XLIFF imports - Complements PR #57 (PHP DBAL bulk operations) - Production ready: 12-second import for 419K translations Signed-off-by: TYPO3 TextDB Contributors
1 parent cb8f173 commit 014969e

File tree

11 files changed

+2954
-0
lines changed

11 files changed

+2954
-0
lines changed

Build/Rust/.gitignore

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
# Rust build artifacts
2+
/target/
3+
Cargo.lock
4+
5+
# IDE files
6+
.idea/
7+
.vscode/
8+
*.swp
9+
*.swo
10+
*~

Build/Rust/Cargo.toml

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,37 @@
1+
[package]
2+
name = "xliff_parser"
3+
version = "1.0.0"
4+
edition = "2021"
5+
authors = ["TYPO3 TextDB Contributors"]
6+
description = "High-performance XLIFF parser for TYPO3 TextDB extension"
7+
license = "GPL-3.0-or-later"
8+
9+
[lib]
10+
crate-type = ["cdylib"] # Creates shared library (.so/.dll/.dylib)
11+
name = "xliff_parser"
12+
13+
[dependencies]
14+
quick-xml = { version = "0.36", features = ["serialize"] }
15+
libc = "0.2"
16+
17+
# Database support (MySQL/MariaDB for TYPO3)
18+
mysql = "25.0"
19+
# Connection pooling
20+
deadpool = "0.12"
21+
22+
# Serialization
23+
serde = { version = "1.0", features = ["derive"] }
24+
serde_json = "1.0"
25+
26+
# Performance optimization
27+
bumpalo = "3.14" # Arena allocator for reduced allocations
28+
29+
[profile.release]
30+
opt-level = 3 # Maximum optimization
31+
lto = true # Link-time optimization
32+
codegen-units = 1 # Better optimization, slower compile
33+
strip = true # Remove debug symbols
34+
panic = "abort" # Smaller binary size
35+
36+
[profile.dev]
37+
opt-level = 0

Build/Rust/Makefile

Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
.PHONY: help build build-release clean test install check-deps generate-header build-all
2+
3+
# Default target
4+
.DEFAULT_GOAL := help
5+
6+
# Detect platform
7+
UNAME_S := $(shell uname -s)
8+
UNAME_M := $(shell uname -m)
9+
10+
# Library naming
11+
ifeq ($(UNAME_S),Linux)
12+
LIB_NAME = libxliff_parser.so
13+
PLATFORM_DIR = linux64
14+
ifeq ($(UNAME_M),aarch64)
15+
PLATFORM_DIR = linux-arm64
16+
endif
17+
else ifeq ($(UNAME_S),Darwin)
18+
LIB_NAME = libxliff_parser.dylib
19+
PLATFORM_DIR = darwin64
20+
ifeq ($(UNAME_M),arm64)
21+
PLATFORM_DIR = darwin-arm64
22+
endif
23+
else ifeq ($(OS),Windows_NT)
24+
LIB_NAME = xliff_parser.dll
25+
PLATFORM_DIR = win64
26+
endif
27+
28+
# Paths
29+
TARGET_DIR = target
30+
OUTPUT_DIR = ../../Resources/Private/Bin/$(PLATFORM_DIR)
31+
HEADER_FILE = xliff_parser.h
32+
33+
help: ## Show this help message
34+
@echo 'Usage: make [target]'
35+
@echo ''
36+
@echo 'Available targets:'
37+
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | \
38+
awk 'BEGIN {FS = ":.*?## "}; {printf " \033[36m%-20s\033[0m %s\n", $$1, $$2}'
39+
40+
check-deps: ## Check required dependencies
41+
@command -v cargo >/dev/null 2>&1 || { echo "Error: cargo not found. Install Rust from https://rustup.rs/"; exit 1; }
42+
@command -v cbindgen >/dev/null 2>&1 || { echo "Warning: cbindgen not found. Run: cargo install cbindgen"; }
43+
@echo "✓ Dependencies OK"
44+
45+
build: check-deps ## Build debug version
46+
cargo build
47+
48+
build-release: check-deps ## Build optimized release version
49+
cargo build --release
50+
51+
generate-header: ## Generate C header file
52+
@command -v cbindgen >/dev/null 2>&1 || { echo "Installing cbindgen..."; cargo install cbindgen; }
53+
cbindgen --config cbindgen.toml --crate xliff_parser --output $(HEADER_FILE)
54+
@echo "Generated header: $(HEADER_FILE)"
55+
56+
test: ## Run tests
57+
cargo test
58+
cargo test --release
59+
60+
clean: ## Clean build artifacts
61+
cargo clean
62+
rm -f $(HEADER_FILE)
63+
rm -rf ../../Resources/Private/Bin/*/
64+
65+
install: build-release generate-header ## Install library to extension directory
66+
@echo "Installing to: $(OUTPUT_DIR)"
67+
@mkdir -p $(OUTPUT_DIR)
68+
@cp $(TARGET_DIR)/release/$(LIB_NAME) $(OUTPUT_DIR)/
69+
@cp $(HEADER_FILE) ../../Build/Rust/
70+
@echo "✓ Installed: $(OUTPUT_DIR)/$(LIB_NAME)"
71+
@echo "✓ Header: ../../Build/Rust/$(HEADER_FILE)"
72+
73+
# Cross-compilation targets
74+
build-linux-x64: ## Cross-compile for Linux x86_64
75+
cargo build --release --target x86_64-unknown-linux-gnu
76+
@mkdir -p ../../Resources/Private/Bin/linux64
77+
@cp target/x86_64-unknown-linux-gnu/release/libxliff_parser.so ../../Resources/Private/Bin/linux64/
78+
79+
build-linux-arm64: ## Cross-compile for Linux ARM64
80+
cargo build --release --target aarch64-unknown-linux-gnu
81+
@mkdir -p ../../Resources/Private/Bin/linux-arm64
82+
@cp target/aarch64-unknown-linux-gnu/release/libxliff_parser.so ../../Resources/Private/Bin/linux-arm64/
83+
84+
build-macos-x64: ## Cross-compile for macOS Intel
85+
cargo build --release --target x86_64-apple-darwin
86+
@mkdir -p ../../Resources/Private/Bin/darwin64
87+
@cp target/x86_64-apple-darwin/release/libxliff_parser.dylib ../../Resources/Private/Bin/darwin64/
88+
89+
build-macos-arm64: ## Cross-compile for macOS Apple Silicon
90+
cargo build --release --target aarch64-apple-darwin
91+
@mkdir -p ../../Resources/Private/Bin/darwin-arm64
92+
@cp target/aarch64-apple-darwin/release/libxliff_parser.dylib ../../Resources/Private/Bin/darwin-arm64/
93+
94+
build-windows-x64: ## Cross-compile for Windows x64
95+
cargo build --release --target x86_64-pc-windows-msvc
96+
@mkdir -p ../../Resources/Private/Bin/win64
97+
@cp target/x86_64-pc-windows-msvc/release/xliff_parser.dll ../../Resources/Private/Bin/win64/
98+
99+
build-all: generate-header build-linux-x64 build-linux-arm64 build-macos-x64 build-macos-arm64 build-windows-x64 ## Build for all platforms
100+
@echo "✓ Built for all platforms"
101+
102+
# Development helpers
103+
dev: build ## Build and run tests in development mode
104+
cargo test
105+
106+
watch: ## Watch for changes and rebuild
107+
cargo watch -x build -x test
108+
109+
bench: ## Run benchmarks
110+
cargo bench
111+
112+
size: build-release ## Show library size
113+
@ls -lh $(TARGET_DIR)/release/$(LIB_NAME)
114+
115+
# CI/CD helpers
116+
ci-test: check-deps ## Run tests in CI mode
117+
cargo test --release --verbose
118+
119+
ci-build: check-deps generate-header build-release ## Build for CI
120+
@echo "Build complete for $(PLATFORM_DIR)"

0 commit comments

Comments
 (0)