Skip to content

Conversation

@gabotechs
Copy link
Collaborator

@gabotechs gabotechs commented Aug 1, 2025

Allows propagating DataFusionErrors over the wire so that upstream stages can read the original error without loosing information.

This is done by providing some mirroring protobuf messages for all the current variants of DataFusionErrors. As these errors might also hold references to other errors from other libraries, also those other error message get a mirroring protobuf message.

This PR ships a bunch of boilerplate code, but the actual important parts and the only ones that are exposed to the rest of the project are just two functions:

/// Encodes a [DataFusionError] into a [tonic::Status] error. The produced error is suitable
/// to be sent over the wire and decoded by the receiving end, recovering the original
/// [DataFusionError] across a network boundary with [tonic_status_to_datafusion_error].
pub fn datafusion_error_to_tonic_status(err: &DataFusionError) -> tonic::Status;

/// Decodes a [DataFusionError] from a [tonic::Status] error. If the provided [tonic::Status]
/// error was produced with [datafusion_error_to_tonic_status], this function will be able to
/// recover it even across a network boundary.
///
/// The provided [tonic::Status] error might also be something else, like an actual network
/// failure. This function returns `None` for those cases.
pub fn tonic_status_to_datafusion_error(status: &tonic::Status) -> Option<DataFusionError>;

Everything else is hidden in the src/errors/ modules


The PR ships both unit tests for all the error variations and an integration tests that checks that errors are properly propagated through the networks and can be recovered on the receiving end

Copy link
Collaborator

@NGA-TRAN NGA-TRAN left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice infrastructure, Gabriel. You make things very easy for us to send and use error

Comment on lines +5 to +13
pub struct ArrowErrorProto {
#[prost(string, optional, tag = "1")]
pub ctx: Option<String>,
#[prost(
oneof = "ArrowErrorInnerProto",
tags = "2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19"
)]
pub inner: Option<ArrowErrorInnerProto>,
}
Copy link
Collaborator

@NGA-TRAN NGA-TRAN Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: If a new error number 20 is added in the future, can we simply add 20 here without worrying about backward compatibility?
Also this means whenever we upgrade datafusion and there are new arrow (datafusion, parquet, ...) errors, we have to add them here (or in the right error file) right?

Copy link
Collaborator Author

@gabotechs gabotechs Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right. The only thing we cannot do is to tweak the order of the messages, or deleting numbers, but adding new numbers (and therefore error variants), is backwards compatible.

If there's a DataFusion upgrade and a new error variant is added, we'll need to come here and add it ourselves, otherwise the Rust compiler will tell us there are some error variants not handled.

Copy link
Collaborator

@NGA-TRAN NGA-TRAN Aug 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be perfect if we get compile error

Collection(DataFusionCollectionErrorProto),
#[prost(message, boxed, tag = "19")]
Shared(Box<DataFusionErrorProto>),
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is very nice that you make and put all Arrow, Parquet, Schema, Context, SQL parser,... error protos in here so we only worry about serialize and deserialize one type.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it was quite a bit of work, but it's something that's probably going to need close 0 maintenance, and it's going to help us a lot in knowing what goes wrong.

I expect initially to see more errors than actual queries running.

assert_eq!(
DataFusionError::Execution("something failed".to_string()).to_string(),
err.to_string()
);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am amazed reading this test. We do not have to convert anything here. You already deserialize and covert the error in arrow_flight_read.rs

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I expect these integration tests to be key not only for ensuring our changes remain healthy, but also for development and debugging.

If they feel like magic it means that we are on the right track.

Copy link
Collaborator

@robtandy robtandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense to me and will allow for propagation of root cause errors; thank you @gabotechs !

@gabotechs gabotechs merged commit ebde2bf into main Aug 4, 2025
3 checks passed
@gabotechs gabotechs deleted the add-error-serde branch August 4, 2025 13:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants