-
Notifications
You must be signed in to change notification settings - Fork 14
execution_plans: add MetricsCollectingStream #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
7b98cac to
30f44db
Compare
NGA-TRAN
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. I just have some minor suggestions for the tests
| stage_id: 2, | ||
| task_number: 2, | ||
| }, | ||
| ]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the purpose of this test? Do you want to test collecting metrics for one query with 2 stages or you want to test collecting metrics for 2 queries each has one stage?
If you want to test both, you may want to split them in 2 different tests and make it clear in the names of the tests or add comments to explain so. Also, do we collect metrics of different queries in one pass and need such tests?
Either way, you want to make the test valid by:
- Have the same
query_idfor the test on the same query. And have differentquery_idif they are for 2 different queries - Use smaller number for child
stage_id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the purpose of this test?
I added a comment.
Do you want to test collecting metrics for one query with 2 stages or you want to test collecting metrics for 2 queries each has one stage?
This metrics stream doesn't really know the semantics of how queries are executed. The only "guarantee" it offers is that it collects metrics by StageKey but it doesn't know what those are. That's why I just used random StageKeys. However, I do agree that it's confusing, so I updated the test to use same query_id and stage_id with two different task ids. This should make more sense.
| panic!("expected Gauge metric"); | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test is a bit long. What do you think to make different functions for creating metrics, stages and result verification, then create a macro to use them. There are many examples for that in Vanilla DF tests and I think Gabriel has that in this repro, too. That way, you can reuse them and it is easier for us to review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gabotechs : I am a bit ambiguous here. It would be great, if you can hep point Jayant to some good examples how to do so
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed. Plus I made it much more concise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a test utils for metrics creation as well :) Now, you can call make_test_metrics_set_proto_from_seed to make test data. Also all metrics protos implement Eq now so it's easy to assert metrics equality in tests
30f44db to
0147b8d
Compare
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! not much to say, this looks just right
| where | ||
| S: Stream<Item = Result<FlightData, FlightError>> + Send + Unpin, | ||
| { | ||
| inner: S, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this should instead be:
#[pin_project]
pub struct MetricsCollectingStream<S>
where
S: Stream<Item = Result<FlightData, FlightError>> + Send + Unpin,
{
#[pin]
inner: S,I don't know the details about when this is needed, but I usually see wrapping streams be implemented this way in Rust.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👀 Will look into it.
34185c1 to
147579f
Compare
This change adds a new type called `MetricsCollectingStream`. It wraps a stream of `FlightData` and collects any metrics that are passed in the `app_metadata`. This change also introduces an `FlightAppMetadata` enum proto which can be used to define our app metadata protocol.
147579f to
19e5222
Compare
This change adds a new type called
MetricsCollectingStream. It wraps a stream ofFlightDataand collects any metrics that are passed in the
app_metadata. This change also introducesan
FlightAppMetadataenum proto which can be used to define our app metadata protocol.Informs #123