-
Notifications
You must be signed in to change notification settings - Fork 25.7k
Add ordinal range encode for tsid #133018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
3ff7761 to
f535b79
Compare
f535b79 to
a5e4ee0
Compare
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
Show resolved
Hide resolved
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first look and this is a great idea to speedup reading the _tsid (and other doc values fields that are primary sort field) without having to enable a doc value skipper.
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesConsumer.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesFormat.java
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
Outdated
Show resolved
Hide resolved
server/src/main/java/org/elasticsearch/index/codec/tsdb/es819/ES819TSDBDocValuesProducer.java
Show resolved
Hide resolved
| () -> new ES819TSDBDocValuesFormat(random().nextInt(Integer.MIN_VALUE, 2), random().nextInt(1, 32), random().nextBoolean()) | ||
| ); | ||
| assertTrue(ex.getMessage().contains("skipIndexIntervalSize must be > 1")); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in TsdbDocValueBwcTests we should also add a test method that tests ES819TSDBDocValuesFormat without ordinal range encoding to with ordinal range encoding for bwc purposes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a test in 6bdfd69
|
I executed the tsdb track with this change and seeing the following results: The p50 latency of rate based queries nicely improved. I think the difference of p50 of other queries is within noise range. Other details: Indexing throughput is similar, while tsid disk usage got much smaller (compared to total store size this is neglectable). Full rally compare: |
martijnvg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat! LGTM
| startDocs.add(doc); | ||
| } | ||
| } | ||
| startDocs.add(maxDoc); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible that we've already inserted maxDoc in the loop above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not, because startDocs should be at most maxDoc - 1.
| private long rangeEndExclusive = -1; | ||
|
|
||
| SortedOrdinalReader(long maxOrd, DirectMonotonicReader startDocs) { | ||
| this.maxOrd = Math.toIntExact(maxOrd); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: toIntExact not needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks - I removed it in 8dc9641.
kkrik-es
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Nhat!
|
Pinging @elastic/es-storage-engine (Team:StorageEngine) |
|
Hi @dnhatn, I've created a changelog YAML for you. |
|
@martijnvg @kkrik-es Thank you so much for the discussion and review! |
When a keyword is the primary sort field, we store the starting document of each ordinal instead of blocks of ordinals. By default, this is not enabled if the average number of documents per ordinal is less than 512, as storing block values may be more efficient and safer. Reading a large range of documents—a common pattern in ES|QL—can be more efficient with this approach.