-
Notifications
You must be signed in to change notification settings - Fork 171
MAGE-1109: Add Batching Optimizer feature #1797
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/3.17.0-dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love where this is going. I know this is WIP - just a couple of small observations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just fantastic work @damcou !! 🙌
I do see some issues if you wouldn't mind taking a look and had a few suggestions.
I also think we should add some language that says something along the lines that these numbers are estimates only and that indexing activity should be monitored after making changes to ensure batches are not exceeding the recommended size of 10 MB.
const INCREASED_MARGIN = 50; | ||
|
||
const DEFAULT_SAMPLE_SIZE = 20; | ||
const LARGE_SAMPLE_SIZE = 100; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should keep the default but allow the user to explicitly set the sample size to whatever they want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what was causing that but I just added a POC for a sample size option (did not add for margin). Is this what you were going for or something different?
* Arbitrary increased margin to ensure not to exceed recommended batch size when catalog is a mix between complex and other product types | ||
* (i.e. with a lot of record sizes variations) | ||
*/ | ||
const INCREASED_MARGIN = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just like with sample size I think having an option to set the margin makes sense.
{ | ||
return [ | ||
new InputOption( | ||
self::LARGE_SAMPLE_OPTION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would recommend replacing this with two options: margin and sample size
$standardDeviation = $this->getStandardDeviation($sample, $sizeAverage); | ||
$this->output->writeln('<comment>Standard Deviation</comment> : ' . $standardDeviation); | ||
|
||
$recommendedBatchCountLow = $this->getRecommendedBatchCount($sizeAverage, $standardDeviation, self::INCREASED_MARGIN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we know the max possible size in the sample we can recommend an absolute floor that would be below the standard deviation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a POC for the issue you mentioned - if you think this can work then we could do something for margin as well.
Also noted one issue on the division by zero check.
Additional comments in Jira.
*/ | ||
protected function getSizeAverage(array $sizes): int | ||
{ | ||
if (count($sizes) <= 1) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (count($sizes) <= 1) { | |
if (empty($sizes)) { |
That earlier suggestion was intended for the standard deviation n-1
adjustment which doesn't apply here. Here we can just check for 0
.
This PR contains the Batching Optimizer CLI command which include the following process:
store_id
argument).simple
,virtual
,downloadable
,giftcard
) and "complex products" (configurable
,bundle
,grouped
) in the catalog. (for example, a sample of 20 products from a catalog composed of 40% of simple products and 60% of complex products will have 8 simple products and 12 complex products).