-
Notifications
You must be signed in to change notification settings - Fork 254
Open
Labels
enhancementNew feature or requestNew feature or request
Description
What is your idea?
Hello,
I’ve been investigating performance in web-ifc, specifically around parsing and IfcLine storage.
Test data
- File size: ~420 MB
- Lines: 7,413,409
Your current workflow
- Read file in chunks
- Parse each line lazily
- Store parsed results in
std::unordered_map
In this workflow, allocation and reallocation of std::unordered_map takes ~2.3s, which is a significant bottleneck.
Proposed approach
Scan the file once to count the number of ; delimiters (used as a aproximation number of IfcLines)
- Allocate a array of that size
- Parse lines and fill the array
- Sort the array by 'ref number'
This also enables easy parallel line parsing.
Perform lookups using binary search (or better: since IFC reference numbers are roughly ordered, the target index is usually near the reference value)
Performance measurements
Test machine:
- CPU: AMD Ryzen 5 3600 3.6GHz
Counting ; in the file takes:
- SIMD 128: 72.6 ms
- 4 threads (native): 20.9 ms
- Single thread: 83.1 ms
Additional costs:
- Array allocation: ~20 µs
- Sorting: ~140 ms
Even in single-threaded mode, total time is significantly lower than the current unordered_map allocation cost alone.
Questions
- Are there known constraints (e.g. memory limits, streaming requirements) that would make full-file buffering unacceptable in some cases?
- Would you be interested in discussing this approach further?
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request