Commit 2552607
authored
Optimize fix_nan_category
The optimized version achieves a **137% speedup** by eliminating unnecessary work through two key optimizations:
**What was optimized:**
1. **Pre-filtered categorical detection**: Instead of checking `column.dtype.name == "category"` for every column in the loop, the optimization identifies all categorical columns upfront using `enumerate(df.dtypes)` and stores their indices.
2. **Early exit for non-categorical DataFrames**: Added a guard clause that returns immediately if no categorical columns exist, avoiding any loop overhead.
**Why this is faster:**
- **Reduced dtype access overhead**: The original code called `df.iloc[:, i]` (expensive pandas indexing) for every column, then checked its dtype. The optimization accesses `df.dtypes` once, which is much faster than repeated `iloc` calls.
- **Eliminated wasted iterations**: For DataFrames with few/no categorical columns, the original code still iterates through all columns. The optimization skips non-categorical columns entirely and exits early when possible.
**Performance characteristics from tests:**
- **Large DataFrames with mixed types**: Shows significant gains (16-22% faster) when many columns exist but only some are categorical
- **No categorical columns**: Dramatic improvement (33-58% faster) due to early exit
- **Small DataFrames**: Slight overhead (9-16% slower) due to upfront processing, but this is negligible in absolute terms (microseconds)
The line profiler confirms this: the original spent 66.8% of time on `df.iloc` access across all columns, while the optimized version only accesses iloc for the pre-identified categorical columns, reducing this bottleneck substantially.1 parent 67a97b4 commit 2552607
1 file changed
+11
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
21 | 21 | | |
22 | 22 | | |
23 | 23 | | |
24 | | - | |
25 | | - | |
26 | | - | |
27 | | - | |
28 | | - | |
29 | | - | |
30 | | - | |
31 | | - | |
32 | | - | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
33 | 35 | | |
34 | 36 | | |
35 | 37 | | |
| |||
0 commit comments