Skip to content

Conversation

@sbc100
Copy link
Collaborator

@sbc100 sbc100 commented Aug 27, 2024

IIUC, there are cases where the allocator can take advantage of the fact that it knows the memory is fresh and avoid the memset completely.

@sbc100 sbc100 requested review from dschuff and kripken August 27, 2024 22:07
@sbc100 sbc100 force-pushed the calloc branch 2 times, most recently from 6ef9c14 to 8c40b7d Compare August 27, 2024 22:51
Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if we actually benefit from this atm, but we could with some work. At least mimalloc could benefit iirc, if we tracked the last sbrk point, and then reasoned that each sbrk increment is fresh zeroed memory.

@@ -1 +1 @@
9347
9760
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is calloc really 413 bytes larger than malloc? That seems surprising.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think its a huge deal the JS size went down a few bytes the native size went up a few bytes.

Also this is the only code size test in the test suite that was effected, so it only appear to effect the dyanmic linking using case.. at least that is the only test where any change occured.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, but it is more than a few bytes: 413 bytes seems quite a lot for calloc to add over malloc. It should just have a loop that writes zeroes? I'm worries something else is going on there.

(The JS change in the other direction is only 9 bytes, which does make sense.)

@sbc100
Copy link
Collaborator Author

sbc100 commented Aug 28, 2024

I don't know if we actually benefit from this atm, but we could with some work. At least mimalloc could benefit iirc, if we tracked the last sbrk point, and then reasoned that each sbrk increment is fresh zeroed memory.

Agreed, that is a forward looking goal. Regarding this makes the code more compact (just one call instead of two), and more clearly shows intent, so I think its worth landing purely on that basis.

@juj
Copy link
Collaborator

juj commented Aug 28, 2024

I looked into this pristine memory optimization for emmalloc some time ago. The takeaway is that it is not really that helpful, since sbrk grows occur diminishingly rarely compared to reallocating memory in a program. The overhead of keeping track of pristine memory is nonzero.

Only after we get memory.discard to free up pages (if that ever becomes a thing), then we can really optimize calloc implementation.

One thing I worry with assuming everything above sbrk would be free is that we have users who say they rerun the application on their web page by reinitializing it. I am not sure exactly how they go about doing this, but if they might be rerunning static ctors i.e. effectively rerunning the program on a now dirty heap, then the pristine sbrk assumption would no longer hold. We would need memory.discard support for that guarantee to happen.

Nevertheless, using calloc in JS library code instead of zeroMemory is a nice cleanup.

@sbc100
Copy link
Collaborator Author

sbc100 commented Aug 28, 2024

One thing I worry with assuming everything above sbrk would be free is that we have users who say they rerun the application on their web page by reinitializing it. I am not sure exactly how they go about doing this, but if they might be rerunning static ctors i.e. effectively rerunning the program on a now dirty heap, then the pristine sbrk assumption would no longer hold. We would need memory.discard support for that guarantee to happen.

Regarding this particular point. When the wasm module itself creates and exports its memory we already rely on the fact that the initial memory is clean. When the wasm module imports its memory we do not make this assumption. For example when the module initially loads we have a BSS segment that contains all zeros will initialize the static data region to zero. In the case we know the memory is fresh (i.e. the memory itself is created within the module) we simple elide the BSS section since it would serve no purpose to initialize the fresh memory with zeros.

If a user wants to take an existing instance and somehow restore it to it initial state they already have their work cut for them. They would need to somehow restore all the static data to its initial state. I don't know of any way to do that.. since the data data is stored in active segments (i.e. segments that are active on initialization, cannot be used with the data.init instruction later on). So such as user would have find a way to re-iniitalize static data. Asking such users to also zero the whole heap when the "rewind" the sbkr pointer seems like one of the easier parts of such a multi-step, and error-prone, process.

@dschuff
Copy link
Member

dschuff commented Aug 28, 2024

+1 that this is a nice cleanup and worth doing. But also that it would be good to understand why the wasm code size increases as much as it does.
Also, the plan of record is still to enable bulk memory as soon as we get around to turning the crank and working out the kinks, so it would be good to see the effect with that enabled, maybe it's better.

IIUC, there are cases where the allocator can take advantage of the fact
that it knows the memory is fresh and avoid the memset completely.
@sbc100
Copy link
Collaborator Author

sbc100 commented Sep 7, 2024

It looks like the codesize regression in that one tests is simply the delta between dlmalloc and dlcalloc.

I've attached the old and new wast files and the diff.

files.zip

@sbc100 sbc100 merged commit 6619b4a into emscripten-core:main Sep 9, 2024
2 checks passed
@sbc100 sbc100 deleted the calloc branch September 9, 2024 20:21
@kripken
Copy link
Member

kripken commented Sep 9, 2024

I've attached the old and new wast files and the diff.

Looks like calloc adds an overflow check (which I was not aware the spec required, of the multiplication of num * size) and a call to memset, so almost all the size is due to memset it seems.

memset does seem surprisingly large. When we enable bulk memory by default that can shrink, at least?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants