-
Notifications
You must be signed in to change notification settings - Fork 43
Avoid definition computation if results are in the existing definition #1306
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid definition computation if results are in the existing definition #1306
Conversation
…xisting definition Check whether the harvest results have been included in the computed definition before triggering definition computes. This prevents unnecessary computations, reduce load on internal stores and improve performance.
|
Does this mean that even if all tools have completed processing, the final result will only be available in 5 minutes? |
Tool results are often available at different times and are pushed into the result queue store as they become ready, with the first completed tool being clearly defined. If all tools finish within a 5-minute window, the final definition is available 5 minutes after the first tool result has been pushed into the queue. |
|
If I understand correctly, this might have impact on integration tests, increasing the time it takes to run the complete the test suite. If I'm not mistaken, in integration tests we calculate each definition one after another, not in parallel. Some of those definitions are quite small and become available much faster than 5 minutes (especially when running on a dedicated VM), so the overall process will get slower. |
|
@RomanIakovlev Thanks for the feedback! I had the same question about whether integration tests could be impacted. The development deployment does not use
If |
|
@qtomlinson Thanks for the clarification! |
Refactor and move the computeIfNecessary logic to definitionService and reuse this in webhook.
|
@elrayle @RomanIakovlev I have refactored the "compute if necessary" logic to the definitionService, so it can be used in the webhook API as well. |
Background
Upon the availability of harvest tool results, the service currently recomputes the component's definition. As there are four harvest tools involved in the harvest process, this results in the definition being computed up to four times per component.
Further analysis of harvest result timings from March 24 to April 24 indicated that approximately 80% of components have two or more results available within a 5-minute window.

Additionally, based on a 5-day analysis from April 19 to April 24, 50% of components have all four tool results available within the same 5-minute window.

This presents an opportunity to optimize the process by reducing the number of definition calculations, ideally to one per component in 50% of cases.
Solution
To address this, a delay mechanism is implemented to defer the definition computation for a short period (e.g., 5 minutes). This allows multiple harvest tool results to be aggregated before recomputing the definition, thereby reducing the computation load on the service.
Benefits:
Reduced Computation Load:
Improved Performance:
Changes:
Visibility Timeout Implementation:
/harvestsqueue store with a visibility timeout. This is handled by the crawler PR.Service-Side Optimization:
Future work:
Documentation:
Need to update relevant documentation to explain the environment variable introduced by the delay mechanism
Notes:
crawlerrepository, as specified in the crawler PR.serviceandcrawlercan be deployed independently. To achieve the best result, both changes need to be deployed to work together.