Skip to content

Commit 091037a

Browse files
committed
[hipify-perl][ROCm#1776][performance][fix] Major performance optimization and correctness fixes via O(1) Hash Architecture
### [Overview] This PR delivers a fundamental algorithmic and architectural overhaul to hipify-perl. ### [Problem] Previously, hipify-perl suffered from an O(N*M) temporal complexity bottleneck. The C++ code generator wrote a script that executed thousands of sequential s/// substitutions and line-by-line regex evaluations for every file. For a standard file, this forced the Perl interpreter to perform millions of redundant regex evaluations to look for API translations, deprecated functions, and unsupported device types, resulting in severe performance degradation and “uninitialized value” warning spam. ### [Solution] This PR changes the regex engine from O(N*M) search-and-replace to a single-pass O(N) tokenizer with fast hash lookups. **Key Changes:** * **Hash-Based Dictionary Generation:** Refactored the hipify-perl generator to output global Perl hashes instead of emitting thousands of hardcoded subroutine calls. * **Single-Pass Identifier Extraction:** Replaced the brute-force API regexes with a single, lightning-fast C-identifier regex `(qr/\b([a-zA-Z_]\w*)\b/)`. * Identifiers are extracted in one pass and replaced instantly via **O(1) hash lookups**, avoiding sequential regex scans. * **Global Initialization:** Moved all hash dictionary building and regex compilation outside the file-processing loop. The initialization penalty is now paid exactly once per invocation, rather than per file. #### 1. Performance Optimization (Up to ~50x Speedup) Removing O(N*M) regex passes and the API “ZAP” loop greatly reduces translation time. * Sequential Batch Processing times: reduced from approximately 15.7 minutes to 22 seconds. * Parallel Batch Processing (16 threads): reduced from approximately 98 seconds to under 2 seconds. * Warning generation for unmapped identifiers is now instant with negligible performance overhead. #### 2. Correctness Improvements (Regex Collision Fixes) Switching to O(1) token matching fixes previous regex collision issues. For example, regarding cuRAND and CUB headers, issues were fixed where greedy regular expressions (e.g., s/curand/hiprand/g or namespace substitutions) prematurely mutated text before specific header translations could evaluate. Includes like curand_discrete.h now map directly and safely to hiprand/hiprand_kernel.h strictly as defined by the mapping dictionaries, without intermediate string corruption. ### [Proof of Validation & Reproducibility] To verify the performance claims and ensure strict output parity, a custom multi-threaded Python batch script was used to process 180 hipify-clang test files through both the legacy engine and the new O(1) engine. Here is the breakdown of the performance measurements throughout those 180 files: **Scenario A: Standard Execution (Warnings Enabled)** This tests the standard behavior where the script must verify unmapped or unsupported CUDA APIs. * **Sequential Processing:** * Old Engine: 947.7675 seconds * New O(1) Hash Engine: 22.6031 seconds * Improvement: ~41.9x faster * **Parallel Processing (16 Threads):** * Old Engine: 98.2947 seconds * New O(1) Hash Engine: 1.9389 seconds * Improvement: ~50.7x faster **Scenario B: Quiet Execution (-quiet-warnings Enabled)** This tests the baseline regex translation speed by skipping the expensive unmapped API checks (the old “ZAP” loop). * **Sequential Processing:** * Old Engine: 25.2505 seconds * New O(1) Hash Engine: 19.0246 seconds * Improvement: ~1.3x faster * **Parallel Processing (16 Threads):** * Old Engine: 2.2796 seconds * New O(1) Hash Engine: 1.9331 seconds * Improvement: ~1.18x faster **Key Analytical Takeaways:** * **The “ZAP” Loop Bottleneck Eliminated:** Old scripts saw execution time balloon from 25 seconds to 947 seconds when warnings were enabled, demonstrating the inefficiency of the previous approach. * **Warnings are now “Free”:** Single-pass extraction and O(1) hash lookups mean enabling warnings has minimal impact—22.6s with warnings vs. 19.0s without—so developers can keep warnings on with virtually no overhead. * **Parallel Scaling:** Processing 180 files in under 2 seconds demonstrates excellent multi-threading scalability with no regex compilation overheads. ### [Conclusion] `hipify-perl` achieves speeds similar to or better than `hipify-clang` on the same CUDA files under the tested conditions. ### [Notes for Reviewers] To keep this PR strictly focused on the core engine optimization and O(1) architecture, some resulting cleanup tasks have been deferred. Follow-up PRs will address: * Purging the remaining “dead” Perl subroutine generators from CUDA2HIP_Perl.cpp (e.g., the legacy generateDeviceFunctions and generateDeprecatedAndUnsupportedFunctions loop builders). * General C++ generator refactoring (e.g., removing the unique_ptr by reference anti-patterns for cleaner stream handling). * Minor performance improvements.
1 parent a7295fe commit 091037a

File tree

2 files changed

+8342
-8412
lines changed

2 files changed

+8342
-8412
lines changed

0 commit comments

Comments
 (0)