-
Notifications
You must be signed in to change notification settings - Fork 1.1k
feat(tiering): Serialize hashes #6015
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Vladislav Oleshko <[email protected]>
1839c60 to
a6f53a6
Compare
Signed-off-by: Vladislav Oleshko <[email protected]>
a6f53a6 to
4205592
Compare
Signed-off-by: Vladislav Oleshko <[email protected]>
9dce7a9 to
c178a09
Compare
src/core/compact_object.h
Outdated
|
|
||
| // Prequisite: IsCool() is true. | ||
| // Keeps cool record only as external value and discard in-memory part. | ||
| void KeepExternal(size_t offset, size_t sz); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KeepExternal is not clear imho.
Freeze - a more creative option, and maybe DropCool is more direct explaining what it does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.. The names just become more and more cryptic 😃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed
src/server/tiered_storage.cc
Outdated
| auto estimated = EstimateSerializedSize(*value); | ||
| if (OccupiesWholePages(estimated->first)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a bug, we have to compute it not by value->Size(), but by the estimated serialization size. This might become an issue with large maps
Signed-off-by: Vladislav Oleshko <[email protected]>
ecdfad3 to
40d054b
Compare
| if (pv.IsInline()) | ||
| return {}; | ||
| return std::make_pair(pv.GetRawString().view().size(), CompactObj::ExternalRep::STRING); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we can't call GetRawString unconditionally. I'll try to make it nicer in the future, for now it just becomes too much for a single PR
romange
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a big PR, I wish it could be split into multiple parts.
| {"third key", "third value"}, | ||
| {"fourth key", "fourth value"}, | ||
| {"fifth key", "fifth value"}}; | ||
| const vector<std::pair<string, string>> kBase = {{"first key", "first value"}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious, what's the reason for string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turing listpack into vector<p<string_view, string_view>> is even more code, so I just changed the type to string. In the future, the serialized map constructor needs to be updated eitehr way to be copy-less, but its too much for this PR. Its all temporary
src/server/tiered_storage.cc
Outdated
| if (pv.Encoding() == kEncodingListPack) { | ||
| auto* lp = static_cast<uint8_t*>(pv.RObjPtr()); | ||
| size_t bytes = lpBytes(lp); | ||
| bytes += lpLength(lp) * 2 * 4; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a comment explaining the formula?
src/server/tiered_storage.cc
Outdated
| // TODO(vlad): Maybe split into different accessors? | ||
| // Do NOT enforce rules depending on dynamic runtime values as this is called | ||
| // when scheduling stash and just before succeeeding and is expected to return the same results | ||
| optional<pair<size_t /*size*/, CompactObj::ExternalRep>> EstimateSerializedSize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does more than EstimateSize, and with having optional and precise ExternalRep the behavior is confusing. Maybe call it GetSerializationDescriptor which will return
struct SerializationDescriptor {
size estimated_size;
CompactObj::ExternalRep repr;
and add NONE to ExternalRep enum? or alternatively add is_valid() { return size> 0; } and return size=0 for unfit objects.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally, I am not a big fan of chaining optionals—or other types like std::expected—one within another, especially when we control the wrapped class and it can describe the "undef" state itself. For example, in our codebase, we have using Result = std::optional; where ResultType is another optional, or std::optional<facade::ErrorReply> where ErrorReply can hold an empty state. These levels of indirection decrease readability, imho.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree here, but in a properly structured code base type composition is always almost preferable
src/server/tiered_storage.cc
Outdated
| stats->tiered_used_bytes += segment.length; | ||
| stats_.total_stashes++; | ||
|
|
||
| CompactObj::ExternalRep rep = EstimateSerializedSize(*pv)->second; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADD a DCHECK before dereferencing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the nullopt access with throw an assert either way
|
Removed the optional and just keeped a pair |
Ok, so this changed a lot. Add active path for serializing hashes:
EstimateSerializedSizeSerialize