DOG-6508: Fix metrics pickling issue in multiprocessing runtime#497
DOG-6508: Fix metrics pickling issue in multiprocessing runtime#497
Conversation
Summary of ChangesHello @Hmnt39, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where metrics instances failed to serialize correctly when passed between processes in a multiprocessing runtime, leading to errors and unreliable metrics collection. The solution refactors the metrics handling mechanism to pass the metrics class itself, allowing each process to initialize its own metrics instance using a controlled singleton pattern. This ensures proper metrics collection, avoids conflicts with the Prometheus registry, and enhances the robustness of the metrics system in distributed environments. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively addresses the metrics serialization issue in a multiprocessing environment by shifting from passing metric instances to passing metric classes. The introduction of a singleton pattern for metrics initialization within the extractor is a solid approach to prevent Prometheus registry conflicts. The changes are well-structured and the addition of a comprehensive integration test in test_runtime.py is excellent for verifying the fix. I've found one critical issue in the implementation of the metrics loading logic that could lead to runtime errors and inconsistencies. My feedback focuses on making this implementation more robust and consistent.
| if metrics_class: | ||
| metrics_instance = safe_get(metrics_class) | ||
| else: | ||
| metrics_instance = BaseMetrics(extractor_name=self.EXTERNAL_ID, extractor_version=self.VERSION) |
There was a problem hiding this comment.
This logic for creating the metrics instance has two issues:
- The call to
safe_get(metrics_class)assumes a no-argument constructor, which contradicts theBaseMetricsconstructor and theMyMetricsexample in the docstring. This will lead to aTypeError. - The
elseblock instantiatesBaseMetricsdirectly, bypassingsafe_get. This breaks the singleton pattern and can cause Prometheus registry conflicts ifBaseMetricsis instantiated viasafe_getelsewhere.
A more robust implementation would be to always use safe_get and pass the required extractor_name and extractor_version arguments. This will require updating any custom metric classes in tests (like TestMetrics) to accept these arguments in their __init__ method.
cls_to_use = metrics_class or BaseMetrics
metrics_instance = safe_get(cls_to_use, extractor_name=self.EXTERNAL_ID, extractor_version=self.VERSION)6fc1fb3 to
44b90c0
Compare
This PR reesolves metrics serialization errors when passing
BaseMetricsinstances through multiprocessing by changing the runtime to pass the metrics class instead of an instance. The extractor now initializes metrics internally using a singleton pattern to avoid Prometheus registry conflicts.Added comprehensive tests to verify metrics work correctly across process boundaries.