-
Notifications
You must be signed in to change notification settings - Fork 25.6k
Conditionally use sequential stored field reader in LuceneSyntheticSourceChangesSnapshot #121636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
6eb67b2
41d4e68
8e9436c
99623e7
161a429
1d0a210
c6cc1e7
78f0b7e
0f8dca1
9a7a3e7
447de83
31b81c0
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||
|---|---|---|---|---|
|
|
@@ -9,6 +9,8 @@ | |||
|
|
||||
| package org.elasticsearch.index.engine; | ||||
|
|
||||
| import com.carrotsearch.hppc.IntArrayList; | ||||
|
|
||||
| import org.apache.lucene.index.LeafReaderContext; | ||||
| import org.apache.lucene.search.FieldDoc; | ||||
| import org.apache.lucene.search.ScoreDoc; | ||||
|
|
@@ -83,6 +85,7 @@ public LuceneSyntheticSourceChangesSnapshot( | |||
| this.maxMemorySizeInBytes = maxMemorySizeInBytes > 0 ? maxMemorySizeInBytes : 1; | ||||
| this.sourceLoader = mapperService.mappingLookup().newSourceLoader(null, SourceFieldMetrics.NOOP); | ||||
| Set<String> storedFields = sourceLoader.requiredStoredFields(); | ||||
|
|
||||
| this.storedFieldLoader = StoredFieldLoader.create(false, storedFields); | ||||
| this.lastSeenSeqNo = fromSeqNo - 1; | ||||
| } | ||||
|
|
@@ -191,8 +194,24 @@ private Translog.Operation[] loadDocuments(List<SearchRecord> documentRecords) t | |||
| maxDoc = leafReaderContext.reader().maxDoc(); | ||||
| } while (docRecord.docID() >= docBase + maxDoc); | ||||
|
|
||||
| leafFieldLoader = storedFieldLoader.getLoader(leafReaderContext, null); | ||||
| leafSourceLoader = sourceLoader.leaf(leafReaderContext.reader(), null); | ||||
| // TODO: instead of building an array, let's just check whether doc ids are (semi) dense | ||||
| IntArrayList nextDocIds = new IntArrayList(); | ||||
| for (int j = i; j < documentRecords.size(); j++) { | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As far as I understand we're not increasing the complexity of the method by iterating on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we only compute the docids for the current lead reader. If docid is higher then or equal to docbase + maxDoc then it means current document record belongs to the next leaf reader. |
||||
| var record = documentRecords.get(j); | ||||
| if (record.isTombstone()) { | ||||
| continue; | ||||
| } | ||||
| int docID = record.docID(); | ||||
| if (docID >= docBase + maxDoc) { | ||||
| break; | ||||
| } | ||||
| int segmentDocID = docID - docBase; | ||||
| nextDocIds.add(segmentDocID); | ||||
| } | ||||
|
|
||||
| int[] nextDocIdArray = nextDocIds.toArray(); | ||||
| leafFieldLoader = storedFieldLoader.getLoader(leafReaderContext, nextDocIdArray); | ||||
| leafSourceLoader = sourceLoader.leaf(leafReaderContext.reader(), nextDocIdArray); | ||||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Another side effect of providing the array of document IDs is that some field loaders may choose to load their values eagerly. I don't see this as a problem, but I wanted to point out that we would lose this behavior if we implement the TODO above. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Agreed. I will update the TODO to include that perspective. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Note that some doc values loaders already apply this strategy when doc ids are provided and there is a single value per field: Line 55 in b6facb2
|
||||
| setNextSourceMetadataReader(leafReaderContext); | ||||
| } | ||||
| int segmentDocID = docRecord.docID() - docBase; | ||||
|
|
||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did you intend to use this namespace (seems odd since it comes from a testing tool)?
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think hppc may have originated from the randomized testing framework that both Elasticsearch and Lucene use today. However it is currently a standalone high performance primitive collection library: https://github.com/carrotsearch/hppc, which does have other production usage in Elasticsearch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using the Lucene one:
package org.apache.lucene.internal.hppcThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Lucene core library only forked a subset of the hppc primitive collection library and
IntArrayListis included. But the javadocs says it forked from version0.10.0and Elasticsearch is uses version0.8.1of that library. I see most Elasticsearch usages of hppc use the dependency directly.