[Feat]: Faster IFC parsing via preallocation and sorted storage

### What is your idea?

Hello,

I’ve been investigating performance in web-ifc, specifically around parsing and IfcLine storage.

Test data
 - File size: ~420 MB
 - Lines: 7,413,409


Your current workflow
 1. Read file in chunks
 2. Parse each line lazily
 3. Store parsed results in `std::unordered_map`

In this workflow, allocation and reallocation of std::unordered_map takes ~2.3s, which is a significant bottleneck.

<img width="1940" height="932" alt="Image" src="https://github.com/user-attachments/assets/7b9caf5c-22ac-4e1f-8ea0-e9c2e35945ae" />

## Proposed approach
Scan the file once to count the number of `;` delimiters (used as a aproximation number of IfcLines)
 - Allocate a array of that size
 - Parse lines and fill the array
 - Sort the array by 'ref number'

This also enables easy parallel line parsing.

Perform lookups using binary search (or better: since IFC reference numbers are roughly ordered, the target index is usually near the reference value)

## Performance measurements
Test machine:
 - CPU: AMD Ryzen 5 3600 3.6GHz

Counting `;` in the file takes:
 - SIMD 128: 72.6 ms
 - 4 threads (native): 20.9 ms
 - Single thread: 83.1 ms

Additional costs:
 - Array allocation: ~20 µs
 - Sorting: ~140 ms

Even in single-threaded mode, total time is significantly lower than the current `unordered_map` allocation cost alone.

## Questions
 1. Are there known constraints (e.g. memory limits, streaming requirements) that would make full-file buffering unacceptable in some cases?
 2. Would you be interested in discussing this approach further?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat]: Faster IFC parsing via preallocation and sorted storage #1752

What is your idea?

Proposed approach

Performance measurements

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feat]: Faster IFC parsing via preallocation and sorted storage #1752

Description

What is your idea?

Proposed approach

Performance measurements

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions