@@ -1047,26 +1047,49 @@ if (!needs_escape)
10471047
10481048---
10491049
1050- # Optimization #3 : Fast Integer Conversion
1050+ # Optimization #3 : Fast Digit Counting
10511051
10521052** Traditional:**
10531053``` cpp
1054- std::to_string (value); // 2 divisions per digit!
1055- ```
1054+ std::to_string (value).length(); // Allocates string just to count digits!
10561055
1057- **Optimized:**
1058- ```cpp
1059- fast_itoa(value, buffer); // 50% fewer divisions + lookup tables
1060- ```
1056+ Optimized:
1057+ fast_digit_count(value); // Bit operations + lookup table, no allocation
10611058
1062- | Dataset | Baseline | No Fast Digits | Impact | ** Speedup** |
1063- | ---------| ----------| ----------------| --------| ---- ---------|
1064- | Twitter | 3,231 MB/s | 3,041 MB/s | -6% | ** 1.06x** |
1065- | CITM | 2,341 MB/s | 1,841 MB/s | -21% | ** 1.27x** |
1059+ | Dataset | Baseline | No Fast Digits | Impact | Speedup |
1060+ | ---------| ------------ | ----------------| --------| ---------|
1061+ | Twitter | 3,231 MB/s | 3,041 MB/s | -6% | 1.06x |
1062+ | CITM | 2,341 MB/s | 1,841 MB/s | -21% | 1.27x |
10661063
1067- ** CITM has ~ 10,000+ integers per document! **
1064+ CITM has ~ 10,000+ numbers needing digit counts!
10681065
10691066---
1067+ How Fast Digit Counting Works
1068+
1069+ The Problem: Need to know buffer size before converting number to string
1070+
1071+ Traditional Approach (Disabled by NO_FAST_DIGITS):
1072+ size_t digit_count(uint64_t v) {
1073+ return std::to_string(v).length();
1074+ // 1. Allocates memory
1075+ // 2. Converts entire number to string
1076+ // 3. Gets length
1077+ // 4. Deallocates string
1078+ }
1079+
1080+ Our Optimization:
1081+ int fast_digit_count(uint64_t x) {
1082+ // Approximate using bit operations (no division!)
1083+ int y = (19 * int_log2(x) >> 6);
1084+
1085+ // Refine using lookup table
1086+ static uint64_t table[] = {9, 99, 999, 9999, ...};
1087+ y += x > table[y];
1088+
1089+ return y + 1;
1090+ }
1091+
1092+ Zero allocations, no string conversion, just math!
10701093
10711094# Optimizations #4 & #5 : Branch Hints & Buffer Growth
10721095
0 commit comments