- 
                Notifications
    You must be signed in to change notification settings 
- Fork 25.6k
Remove upper limit for chunking settings #133718
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove upper limit for chunking settings #133718
Conversation
| Hi @kderusso, I've created a changelog YAML for you. | 
| Pinging @elastic/ml-core (Team:ML) | 
| ChunkingSettings chunkingSettings | ||
| ) { | ||
| super(inferenceEntityId, taskType, service, serviceSettings, taskSettings, chunkingSettings); | ||
| if (chunkingSettings != null && chunkingSettings.maxChunkSize() != null) { | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add the same check to MultilingualE5SmallModel so multilingual-e5-small will also pick up the restriction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in f19f2ff
| return field; | ||
| } | ||
|  | ||
| public static Integer extractRequiredPositiveIntegerGreaterThanOrEqualToMin( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to add some tests for this new method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in f104004
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The boundary for chunking settings was set at a max chunk size of 300, which is too small for many models with large window sizes. This PR removes that cap, and instead enforces a cap for ELSER models so they are not truncated.