Skip to content

Optimization of memory usage by dirents during ZIM creation#1055

Open
veloman-yunkan wants to merge 2 commits intomainfrom
writer_dirent_memusage_optimization
Open

Optimization of memory usage by dirents during ZIM creation#1055
veloman-yunkan wants to merge 2 commits intomainfrom
writer_dirent_memusage_optimization

Conversation

@veloman-yunkan
Copy link
Copy Markdown
Collaborator

This is the easy part of the optimization addressing #1048.

Using the test flow from #1045:

def test_creator_many_redirects(fpath, lipsum):
    random.seed(123)
    with Creator(fpath) as c:
        for item_i in range(1000000):
            c.add_item(StaticItem(path=f"dedup{item_i}", content=lipsum, mimetype="text/html"))
            for redirect_i in range(10):
                to_index = random.randint(0, item_i)
                c.add_redirection(f"home{item_i}{redirect_i}", "", f"dedup{to_index}", {})

the performance figures changed as follows:

High watermark memory usage (MB) Runtime (seconds)
main branch 2241 661
This PR 1690 540

As you can see, there is improvement in speed too.

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 27, 2026

Codecov Report

❌ Patch coverage is 58.33333% with 5 lines in your changes missing coverage. Please review.
✅ Project coverage is 56.33%. Comparing base (0f16c3c) to head (bc0fa61).

Files with missing lines Patch % Lines
src/writer/creator.cpp 58.33% 0 Missing and 5 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1055      +/-   ##
==========================================
+ Coverage   56.26%   56.33%   +0.07%     
==========================================
  Files         101      101              
  Lines        5014     5013       -1     
  Branches     2186     2183       -3     
==========================================
+ Hits         2821     2824       +3     
  Misses        737      737              
+ Partials     1456     1452       -4     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@veloman-yunkan
Copy link
Copy Markdown
Collaborator Author

@benoit74 @kelson42 Further optimization is possible and I plan to pursue it in any case. Should I include it in this PR or you want to be able to enjoy current improvements by merging the changes early?

@benoit74
Copy link
Copy Markdown

I'm not in a hurry to have anything merge on this matter. This is more a medium term (very important) improvement for me.

@kelson42
Copy link
Copy Markdown
Contributor

@benoit74 @kelson42 Further optimization is possible and I plan to pursue it in any case. Should I include it in this PR or you want to be able to enjoy current improvements by merging the changes early?

Great to see this PR coming!

It is really up to you, but if you bundle these optims in one PR, this sounds optimal to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants