Skip to content

[Feat]: Faster IFC parsing via preallocation and sorted storage #1752

@barvirm

Description

@barvirm

What is your idea?

Hello,

I’ve been investigating performance in web-ifc, specifically around parsing and IfcLine storage.

Test data

  • File size: ~420 MB
  • Lines: 7,413,409

Your current workflow

  1. Read file in chunks
  2. Parse each line lazily
  3. Store parsed results in std::unordered_map

In this workflow, allocation and reallocation of std::unordered_map takes ~2.3s, which is a significant bottleneck.

Image

Proposed approach

Scan the file once to count the number of ; delimiters (used as a aproximation number of IfcLines)

  • Allocate a array of that size
  • Parse lines and fill the array
  • Sort the array by 'ref number'

This also enables easy parallel line parsing.

Perform lookups using binary search (or better: since IFC reference numbers are roughly ordered, the target index is usually near the reference value)

Performance measurements

Test machine:

  • CPU: AMD Ryzen 5 3600 3.6GHz

Counting ; in the file takes:

  • SIMD 128: 72.6 ms
  • 4 threads (native): 20.9 ms
  • Single thread: 83.1 ms

Additional costs:

  • Array allocation: ~20 µs
  • Sorting: ~140 ms

Even in single-threaded mode, total time is significantly lower than the current unordered_map allocation cost alone.

Questions

  1. Are there known constraints (e.g. memory limits, streaming requirements) that would make full-file buffering unacceptable in some cases?
  2. Would you be interested in discussing this approach further?

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions