Commit 688fe67
authored
Optimize validate_gantt
The optimization achieves a **58x speedup** by eliminating the major performance bottleneck in pandas DataFrame processing.
**Key optimizations:**
1. **Pre-fetch column data as numpy arrays**: The original code used `df.iloc[index][key]` for each cell access, which triggers pandas' slow row-based indexing mechanism. The optimized version extracts all column data upfront using `df[key].values` and stores it in a dictionary, then uses direct numpy array indexing `columns[key][index]` inside the loop.
2. **More efficient key validation**: Replaced the nested loop checking for missing keys with a single list comprehension `missing_keys = [key for key in REQUIRED_GANTT_KEYS if key not in df]`.
3. **Use actual DataFrame columns**: Instead of iterating over the DataFrame object itself (which includes metadata), the code now uses `list(df.columns)` to get only the actual column names.
**Why this is dramatically faster:**
- `df.iloc[index][key]` creates temporary pandas Series objects and involves complex indexing logic for each cell
- Direct numpy array indexing `columns[key][index]` is orders of magnitude faster
- The line profiler shows the original `df.iloc` line consumed 96.8% of execution time (523ms), while the optimized dictionary comprehension takes only 44.9% (4.2ms)
**Performance characteristics:**
- **Large DataFrames see massive gains**: 8000%+ speedup on 1000-row DataFrames
- **Small DataFrames**: 40-50% faster
- **List inputs**: Slight slowdown (3-13%) due to additional validation overhead, but still microsecond-level performance
- **Empty DataFrames**: Some slowdown due to upfront column extraction, but still fast overall
This optimization is most beneficial for DataFrame inputs with many rows, where the repeated `iloc` calls created a severe performance bottleneck.1 parent aac1b66 commit 688fe67
1 file changed
+12
-9
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
32 | 32 | | |
33 | 33 | | |
34 | 34 | | |
35 | | - | |
36 | | - | |
37 | | - | |
38 | | - | |
39 | | - | |
40 | | - | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
41 | 41 | | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
42 | 45 | | |
43 | 46 | | |
| 47 | + | |
| 48 | + | |
44 | 49 | | |
45 | | - | |
46 | | - | |
47 | | - | |
| 50 | + | |
48 | 51 | | |
49 | 52 | | |
50 | 53 | | |
| |||
0 commit comments