Skip to content

Conversation

@Explorer09
Copy link
Contributor

No description provided.

@Explorer09 Explorer09 changed the title Hashtable code shrink Hashtable code improvements Dec 24, 2025
@BenBE BenBE added enhancement Extension or improvement to existing feature code quality ♻️ Code quality enhancement labels Dec 24, 2025
Hashtable.c Outdated
if (SIZE_MAX / 2 < this->size)
CRT_fatalError("Hashtable: size overflow");

if (10 * this->items > 7 * this->size)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only reason this is safe from overflows is that sizeof(HashtableItem) > 7
But I'm not sure, this can also necessarily be said about sizeof(HashtableItem) > 10, thus the multiplications here technically should be overflow-checked …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sizeof(HashtableItem) is currently 12 for 32-bit systems and 24 for 64-bit systems. Since htop doesn't support 16-bit systems, I really doubt there would be a case where sizeof(HashtableItem) can be less than 10.

I can add a sizeof(HashtableItem) > 10 assertion just to be safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIA.

@Explorer09 Explorer09 force-pushed the hashtable-primes branch 3 times, most recently from fabbd8e to 3870132 Compare December 25, 2025 09:54
Hashtable.c Outdated
if (sizeof(HashtableItem) < 10 && SIZE_MAX / 10 < this->size)
CRT_fatalError("Hashtable: size overflow");

Hashtable_setSize(this, 2 * this->size);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am tempted to adjust this line so that it becomes Hashtable_setSize(this, this->size + 1);

Since Hashtable.size is supposed to be a prime number close to a power of 2. There can be a case where multiplying the size by 2 skips an order of magnitude on the buffer size allocation. Example: (2^14 - 3) * 2 = 2^15 - 6 > 2^15 - 19, thus (2^15 - 19) might be skipped in the buffer size allocation.

I just doubt if this change is safe.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On first glance this should work. Not tested in detail though. But given that setSize rounds up to the next prime anyway this should likely work.

Hashtable.c Outdated

assert(Hashtable_isConsistent(this));
assert(Hashtable_get(this, key) != NULL);
assert(this->size > this->items);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if I can ask @cgzones a question:

Is it allowed for the assertion of this line be changed to assert(this->size >= this->items);?

In other words, what would happen if this->items == this->size?

There is a commit, b45eaf2, that changed the minimum size to 7, but the reason stated in that commit didn't fully make sense to me. While between 2 and 3, the grow factor ((3 - 2)/2 = 50%) is indeed less than 70%, but between 3 and 7, there'd be no problem with the grow factor ((7 - 3)/3 = 133%). The cause of the assertion error was more of an off-by-one from the conditional 10 * this->items > 7 * this->size. It should be 10 * (this->items + 1) > 7 * this->size instead if we have to satisfy the assertion this->size > this->items.

I am reluctant to add a +1 to the this->items conditional above, as I guess that the whole Hashtable structure should work fine if we allow this->items == this->size.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given size is the number of allocted entries and items the number of actually used entries, the >= should be fine.

The reason for avoiding hash tables below 7 is efficiency: It doesn't make sense to allocate smaller blocks as most of our hash tables are far larger anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE I know efficiency is a good reason, but I just want to write a good technical reason in the code comments. Especially that some numbers in the primeDiffs array (in my commit) will be intentionally unused.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another (technical) reason is memory fragmentation. Allocating small blocks tends to fragment memory far more than using larger blocks (which for small collections might not even need resizing).

@BenBE
Copy link
Member

BenBE commented Dec 26, 2025

How's the last commit related?

@Explorer09 Explorer09 force-pushed the hashtable-primes branch 5 times, most recently from be3e2f7 to 493c3cb Compare December 26, 2025 21:05
@Explorer09
Copy link
Contributor Author

How's the last commit related?

My improvement on the Hashtable code is to hope that the ht_key_t type can be upgraded from unsigned int to size_t, otherwise it would make no sense to support a Hashtable size of more than 2^32 entries.

The last commit might look distracting to the code improvement commits. I apologize. If the last commit would need more review, I'm happy to move it to a separate pull request.

@BenBE
Copy link
Member

BenBE commented Dec 26, 2025

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

@Explorer09
Copy link
Contributor Author

NP with the commit itself. Just wondered if they are complete. Also, do the current set of changes in the first 3 commits do work without the last one.

I think the last commit needs some cleanup or discussion, but the first 3 commits are ready and can be cherry-picked to main early.

The last commit depends on the first 3 commits but the first 3 can work without the last.

@BenBE
Copy link
Member

BenBE commented Dec 27, 2025

Can you split off the last commit into its own PR? TIA.

if (this->items >= this->size * 7 / 10)
Hashtable_setSize(this, this->size + 1);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICS the Hashtable_setSize should be called in either path to allocate new entries as needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean? This part of code addresses expanding the buffer. The buckets buffer is allocated stating at Hashtable_new.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but once the items get near the size because it can't allocate any more buffer space, you could at some point reach items == size, and thus the next insert will fail due to no more space allocated.

Instead when nearing the maximum capacity we should fall of to a more linear allocation regime …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@BenBE When it "can't allocate any more buffer space" htop will exit, because of the xCalloc call.

The case where items == size can only happen on small sizes such as 2 or 3 (the minimum size is 7 now, so the sizes of 2 and 3 are theoretical situations), but even when that happens, the next Hashtable_put call will always grow the buffer. Thus there's no problem here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was more thinking for very large allocations. Will have to take a closer look after New Year's …

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any harm in making the call to Hashtable_setSize unconditional? The above bounds check should be part of that function already.

Copy link
Contributor Author

@Explorer09 Explorer09 Jan 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking how that would affect the shrinking of the buffer, with respect to this code:

https://github.com/Explorer09/htop-1/blob/3dc65f62da56befae7ed7dcb6e66ca5bea856710/Hashtable.c#L292

I personally like the idea, by centralizing the conditionals that readjust the buffer size, we can save some sanity checks in the Hashtable_setSize function.

Copy link
Contributor Author

@Explorer09 Explorer09 Jan 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update: It seems that there's a side effect if I try to merge the conditionals of expanding and shrinking the buffer in Hashtable_setSize, thus I have to give up on the idea.

When creating a Hashtable through Hashtable_new, it is allowed to specify a larger size for initial allocation. During the initial population of the items, this avoids unnecessary expansion or relocation of the buffer. If I move the shrinking condition to Hashtable_setSize, then the buffer will shrink automatically when adding an element to it. This would remove the benefits of initialing a Hashtable with larger size.

Copy link
Member

@BenBE BenBE Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we arrived at this issue before … IIRC.

Maybe inhibit shrinking while we try to insert items …

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe inhibit shrinking while we try to insert items …

Inhibit shrinking means a flag argument in Hashtable_setSize. It seems like we cannot have less than 2 arguments for the setSize function. If we cannot reduce the number of arguments for it, then I'd like to keep the current function prototype, and use the size argument to determine whether the buffer should grow or shrink.

@Explorer09 Explorer09 force-pushed the hashtable-primes branch 2 times, most recently from 7cf3e2a to 3dc65f6 Compare January 1, 2026 18:57
@Explorer09 Explorer09 force-pushed the hashtable-primes branch 4 times, most recently from bd9690c to 80c8562 Compare January 11, 2026 04:01
Extend the nextPrime() function so that it can report primes up to
(2^63 - 25).

The old prime number sequence stops at (2^37 - 25). Although it's
possible to extend the sequence https://oeis.org/A014234 to support
up to 2^63, the code size can be smaller by encoding the differences to
the prime numbers (i.e. the sequence https://oeis.org/A013603 ) as the
array instead.

The next prime in the sequence, (2^64 - 59), is not useful for the
Hashtable purpose (and it exceeds SSIZE_MAX) and so is left out.

Signed-off-by: Kang-Che Sung <[email protected]>
* Move assertions about hash table sizes to Hashtable_isConsistent() so
  they can be checked in all Hashtable methods.
* Slightly improve conditionals of growing and shrinking the "buckets"
  buffer. Specifically the calculations are now less prone to
  arithmetic overflow and can work with Hashtable.size value up to
  (SIZE_MAX / 7). (Original limit was (SIZE_MAX / 10)).
* If `Hashtable.size > SIZE_MAX / sizeof(HashtableItem)`, allow the
  compiler to optimize out one conditional of checking overflow.
  (The buffer allocation would still fail at xCalloc() in that case.)
* Hashtable_setSize() is now a private method.

Signed-off-by: Kang-Che Sung <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

code quality ♻️ Code quality enhancement enhancement Extension or improvement to existing feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants