Skip to content

Something wrong with integer hashes and/or set grow, Cuis is 1000x faster #129

@dtlewis290

Description

@dtlewis290

Reported by Eliot on squeak-dev:
https://lists.squeakfoundation.org/archives/list/[email protected]/thread/ABGLWILW6YJ2NPAHICWSLCOUIAKXKLFL/

Copied from the original report:

Hi All,

   recently the identityHash implementation in the VM has been upgraded. 

Open Smalltalk Cog[Spur] VM [CoInterpreterPrimitives VMMaker.oscog-eem.3500] 5.20250103.0325
platform sources revision VM: 202501030325 [email protected]:oscogvm Date: Thu Jan 2 19:25:19 2025 CommitHash: 5e05e38

It now uses an xorshift register PNG that cycles through all possible 2^22 - 1 valid identity hashes in 2^22 - 1 iterations. One way to test this is to create a set, and loop creating 2^22-1 objects, adding their hashes to the set, and answering the size of the set.

So let's do this and see how long it takes at the same time. e.g. this is on my 2021 M1 Max MBP

| size |
{ [| hashes |
    hashes := Set new: (2 raisedTo: 22).
    1 to: (2 raisedTo: 22) - 1 do:
        [:ign| hashes add: Object new identityHash].
    size := hashes size] timeToRun.
    size.
    (2 raisedTo: 22) - 1 }. #(450 4194303 4194303)

also #(483 4194303 4194303)

So avoiding grows this takes less than half a second.

However, if we just use Set new and rely on it growing we get approximately a 1,500 to 1,800 fold slowdown:

| size |
{ [| hashes |
    hashes := Set new.
    1 to: (2 raisedTo: 22) - 1 do:
        [:ign| hashes add: Object new identityHash].
    size := hashes size] timeToRun.
    size.
    (2 raisedTo: 22) - 1 }. #(768992 4194303 4194303)

also #(800874 4194303 4194303).

768992 / 483.0 = 1592
800874 / 450.0 1780

Now Cuis has moved away from allowing stretchy collections to have their capacity initialized with new:. One has to use newWithRoomForMoreThan:. So

| size |
{ [| hashes |
hashes := Set newWithRoomForMoreThan: (2 raisedTo: 22).
1 to: (2 raisedTo: 22) - 1 do:
[:ign| hashes add: Object new identityHash].
size := hashes size] timeToRun.
size.
(2 raisedTo: 22) - 1 }. #(506 4194303 4194303)

BUT!!!! If we just use new and allow Cuis to grow the set we get e.g.

| size |
{ [| hashes |
hashes := Set new.
1 to: (2 raisedTo: 22) - 1 do:
[:ign| hashes add: Object new identityHash].
size := hashes size] timeToRun.
size.
(2 raisedTo: 22) - 1 }. #(725 4194303 4194303) .

This is at least 1,000 times faster than Squeak.  Something is seriously wrong with the Squeak code.

768992 / 725.0 = 1061

,,,^..^,,,
Happy New Year! Eliot

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions