-
Notifications
You must be signed in to change notification settings - Fork 14
flight_service: emit metrics from ArrowFlightEndpoint #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
8d54d00 to
cbfa6da
Compare
cbfa6da to
543a085
Compare
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! 🚀 this seems to go in the right direction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some thought unrelated to this PR:
this src/execution_plans module is dedicated for DataFusion ExecutionPlan implementations. If a file inside here is not providing an ExecutionPlan implementation, it should probably not live here along with other ExecutionPlan implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense to me. I think it's okay to address in this PR. I pushed a commit 'refactor modules and files' to ensure that
- all metrics related code is in the
metricsmodule - only the
MetricsWrapperExeclives inexecution_plans
| /// sends metrics for a task to the last NetworkCoalesceExec to read from it, which may or may | ||
| /// not be this instance. | ||
| pub(crate) metrics_collection: Arc<DashMap<StageKey, Vec<MetricsSetProto>>>, | ||
| pub(crate) child_task_metrics: Arc<DashMap<StageKey, Vec<MetricsSetProto>>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: It would be consistent with the naming in other parts of the project and call this input_task_metrics
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in latest commit
src/flight_service/do_get.rs
Outdated
| // Creates a tonic response from a stream of record batches. Handles | ||
| // - RecordBatch to flight conversion partition tracking, stage eviction, and trailing metrics. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In Rust, documentation for functions is declared with triple ///. A very immediate benefit is that IDEs are smart enough to display you the docs upon hovering usages.
src/flight_service/do_get.rs
Outdated
| stage: Arc<StageExec>, | ||
| stage_data: TaskData, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like we are unnecessarily passing Arc<StageExec> twice. TaskData already contains Arc<StageExec>
|
|
||
| /// metrics_collection is used to collect metrics from child tasks. It is empty when a | ||
| /// [NetworkBoundary] is instantiated (deserialized, created via new() etc...). | ||
| /// Metrics are populated by executing() the [NetworkBoundary]. It's expected that the | ||
| /// collection is complete after the [NetworkBoundary] has been executed. It is undefined | ||
| /// what this returns during execution. | ||
| /// | ||
| /// An instance may receive metrics for 0 to N child tasks, where N is the number of tasks | ||
| /// in the stage it is reading from. This is because, by convention, the ArrowFlightEndpoint | ||
| /// sends metrics for a task to the last [NetworkBoundary] to read from it, which may or may | ||
| /// not be this instance. | ||
| fn metrics_collection(&self) -> Option<Arc<DashMap<StageKey, Vec<MetricsSetProto>>>> { | ||
| None | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to not have this method here.
The only place where this is used is in:
datafusion-distributed/src/execution_plans/metrics.rs
Lines 56 to 72 in 543a085
| let metrics_collection = | |
| if let Some(node) = plan.as_any().downcast_ref::<NetworkShuffleExec>() { | |
| node.metrics_collection() | |
| .map(Some) | |
| .ok_or(DataFusionError::Internal( | |
| "could not collect metrics from NetworkShuffleExec".to_string(), | |
| )) | |
| } else if let Some(node) = plan.as_any().downcast_ref::<NetworkCoalesceExec>() { | |
| node.metrics_collection() | |
| .map(Some) | |
| .ok_or(DataFusionError::Internal( | |
| "could not collect metrics from NetworkCoalesceExec".to_string(), | |
| )) | |
| } else { | |
| Ok(None) | |
| }?; | |
And there, we are not making use of the NetworkBoundary trait, so this could perfectly just be normal methods in both NetworkShuffleExec and NetworkCoalesceExec.
It'd be nice to scope as much as possible the functionality of NetworkBoundary, leaving it exclusively dedicated to distributed planning, otherwise, unrelated features can creep in over time, bloating the trait and making the SOLID principles gods unhappy because we are not complying with the I
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
I do think that network boundaries should be responsible for metrics collection, unfortunately, we don't gain the benefit of putting metrics_collection() in the trait here because we can't cast from Arc<dyn ExecutionPlan> to NetworkBoundary sadly. So, I've dropped the commit distributed_physical_optimizer_rule: move metrics_collection to NetworkBoundary trait
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that the functions in this files are not really test utils used in different places of the codebase, they are just used only in one place.
I'd advocate towards:
- If a function is used only once, just place it where it's used
- If a function is used twice or more, promote it to a "utils"/"helpers"/"common" module for reuse
Otherwise, "utils"/"helpers"/"common" modules will grow unbounded with functions that are not really commonly used.
src/execution_plans/metrics.rs
Outdated
| #[allow(dead_code)] | ||
| pub struct MetricsCollectorResult { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is no longer dead code right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Removed all unnecessary dead code markers.
src/execution_plans/metrics.rs
Outdated
| .collect::<Result<Vec<MetricsSetProto>, FlightError>>()?; | ||
| result | ||
| .child_task_metrics | ||
| .insert(stage_key.clone(), proto_task_metrics.clone()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two clones seem unnecessary:
| .insert(stage_key.clone(), proto_task_metrics.clone()); | |
| .insert(stage_key, proto_task_metrics); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Good catch :)
src/execution_plans/metrics.rs
Outdated
| df_metrics_set_to_proto(metrics) | ||
| .map_err(|err| FlightError::ProtocolError(err.to_string())) | ||
| }) | ||
| .collect::<Result<Vec<MetricsSetProto>, FlightError>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you prefer, you can even let Rust infer the types:
| .collect::<Result<Vec<MetricsSetProto>, FlightError>>()?; | |
| .collect::<Result<Vec<_>, _>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
src/execution_plans/metrics.rs
Outdated
| } | ||
|
|
||
| #[cfg(test)] | ||
| mod tests { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like several structs in this file are unused (just used in tests):
TaskMetricsRewriteris unused in production code, it's just used in the tests, so it probably should move there- same for
MetricsWrapperExec, it's unused in production code, so a better place for it could be undermod tests {}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rewriter and wrapper will both be used in production soon when I implement the display portion of explain analyze, so I think it's okay to leave them.
543a085 to
3f5ce99
Compare
|
TYFR @gabotechs :) I've addressed the comments |
6257e29 to
4453842
Compare
gabotechs
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome work 💯
| /// Creates a FlightData with the given app_metadata and empty RecordBatch using the provided schema. | ||
| /// We don't use [arrow_flight::encode::FlightDataEncoder] (and by extension, the [arrow_flight::encode::FlightDataEncoderBuilder]) | ||
| /// since they skip messages with empty RecordBatch data. | ||
| pub fn empty_flight_data_with_app_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, only used once here, so it can be private:
| pub fn empty_flight_data_with_app_metadata( | |
| fn empty_flight_data_with_app_metadata( |
| } | ||
|
|
||
| #[tokio::test] | ||
| async fn test_metrics_collection_e2e_1() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! any opinions on adding an integration test that does, for example, a TPCH query, and expects that all nodes have collected metrics?
It seems like this tests already provide good coverage, but giving ideas for if you want to take it one step further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Ill add these when I can display the full explain output and assert the contents of the output. I'll probably need some way to obfuscate / hide the numbers since those are nondeterministic.
| /// TaskMetricsRewriter is used to enrich a task with metrics by re-writing the plan using [MetricsWrapperExec] nodes. | ||
| /// | ||
| /// Ex. for a plan with the form | ||
| /// AggregateExec | ||
| /// └── ProjectionExec | ||
| /// └── NetworkShuffleExec | ||
| /// | ||
| /// the task will be rewritten as | ||
| /// | ||
| /// MetricsWrapperExec (wrapped: AggregateExec) | ||
| /// └── MetricsWrapperExec (wrapped: ProjectionExec) | ||
| /// └── NetworkShuffleExec | ||
| /// (Note that the NetworkShuffleExec node is not wrapped) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is not specific to this PR, but here's a question just for my understanding:
How is it that we cannot rely on the existing mechanism in ExecutionPlans for storing metrics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we don't is because these metrics are populated during execution (cannot be written directly) and cannot be serialized. So, if a remote plan node runs, there's no way to have a copy of the same plan node return those metrics by calling metrics(). The workaround is to wrap the nodes.
It would be nice if we could use a more native method to inject metrics though if the ExecutionPlan interface were to allow it.
This change updates the ArrowFlightEndpoint to collect metrics and emit them. When the last partition in a task is finished, the ArrowFlightEndpoint collects metrics and emits them via the TrailingFlightDataStream. Previously, we would determine if a partition is finished when the request first hit the endpoint. Now, we do it on stream completition. This is crutial for metrics collection because we need to know that the stream is exhausted, meaning that there's no data flowing in the plan and metrics are not actively being updated. Since the ArrowFlightEndpoint now emits metrics and NetworkBoundary plan nodes collect metrics, all coordinating StageExecs will now have the full collection of metrics for all tasks. This commit adds integration style tests that assert that the coordinator is recieving the full set of metrics. Finally, this change refactors and unskips some old metrics tests. These tests were skipped because the plans would change. Now, we use test utils to count the number of nodes etc to make these tests more resilient to cahnges. Follow up work - Only collect metrics if a configuration is set in the SessionContext, removing extra overhead - Display metrics in the plan using EXPLAIN (ANALYZE) - consider using sqllogictest or similar to test the output Closes: #94 Informs: #123
9dab95c to
2163a73
Compare
flight_service: emit metrics from ArrowFlightEndpoint
This change updates the ArrowFlightEndpoint to collect metrics and emit them. When the last partition
in a task is finished, the ArrowFlightEndpoint collects metrics and emits them via the
TrailingFlightDataStream.
Previously, we would determine if a partition is finished when the request first hit the endpoint. Now,
we do it on stream completition. This is crutial for metrics collection because we need to know that
the stream is exhausted, meaning that there's no data flowing in the plan and metrics are not actively
being updated.
Since the ArrowFlightEndpoint now emits metrics and NetworkBoundary plan nodes collect metrics,
all coordinating StageExecs will now have the full collection of metrics for all tasks. This commit
adds integration style tests that assert that the coordinator is recieving the full set of metrics.
Finally, this change refactors and unskips some old metrics tests. These tests were skipped because
the plans would change. Now, we use test utils to count the number of nodes etc to make these tests
more resilient to cahnges.
Follow up work
to test the output