Skip to content

Conversation

@gabotechs
Copy link
Collaborator

@gabotechs gabotechs commented Aug 18, 2025

DataFusion allows users to provide their own ConfigExtensions that get carried by the SessionConfig through the execution of a query.

https://github.com/apache/datafusion/blob/b84ddfde0689e8ecb9b71521fd95c7245d68d4ca/datafusion/common/src/config.rs#L1174-L1223

Users can thread their own ConfigExtension in order to provide additional context to their own execution nodes or other custom implementations.


In Distributed DataFusion, sessions are not automatically propagated across network boundaries, and any ConfigExtension set in the head stage will be lost when doing a network hop.

This PR allows propagating user-provided ConfigExtensions in a distributed context, using GRPC metadata for propagating these extensions across network boundaries.

For example, if a user has this ConfigExtension:

    extensions_options! {
        pub struct CustomExtension {
            pub foo: String, default = "".to_string()
            pub bar: usize, default = 0
            pub baz: bool, default = false
        }
    }

    impl ConfigExtension for CustomExtension {
        const PREFIX: &'static str = "custom";
    }

It can be:

  1. added to the session in the head stage:
        let custom_extension = CustomExtension {
            foo: "foo".to_string(),
            bar: 1,
            baz: true,
        };
-       state.config_mut().options().extensions.insert(custom_extension); // <- this is normal datafusion
+       state.add_distributed_option_extension(custom_extension)?;
  1. recursively propagated across network calls in the Arrow Flight Endpoint
        #[async_trait]
        impl SessionBuilder for CustomSessionBuilder {
            async fn session_state(&self, mut state: SessionState) -> Result<SessionState, DataFusionError> {
+               state.propagate_distributed_option_extension::<CustomExtension>()?;
                Ok(state)
            }
        }

With just the two lines above, any user-provided CustomExtension will be able to cross network boundaries and be accessible anywhere in the plan, regardless of the physical node executing the query.

@NGA-TRAN
Copy link
Collaborator

NGA-TRAN commented Aug 18, 2025

I did not review the details but the high level. Intuitively, this makes sense and I support this. What I am trying to figure out is where we use this. I am working on a very specific example of explain analyze and see if we will need this so it will be a use case for it. Otherwise, maybe you will help come up with a use case?

@gabotechs
Copy link
Collaborator Author

I did not review the details but the high level. Intuitively, this makes sense and I support this. What I am trying to figure out is where we use this. I am working on a very specific example of explain analyze and see if we will need this so it will be a use case for it. Otherwise, maybe you will help come up with a use case?

This satisfies the same use-cases as upstream's ConfigExtension, so for the same reason users of normal DataFusion would want ConfigExtensions in their programs, they would also want this if running in distributed mode.

One example: A JWT auth token that needs to make it's way from a end user request all the way to a leaf execution node in a distributed query with 10 stages.

Copy link
Collaborator

@fmonjalet fmonjalet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly minor remarks, the design looks good to me!

@gabotechs gabotechs force-pushed the gabrielmusat/propagate-config-extensions branch from 0d7febf to 104e3fa Compare August 19, 2025 15:22
Copy link
Collaborator

@robtandy robtandy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally understand the need for this now and this design looks good.

@gabotechs gabotechs merged commit 2715587 into main Aug 21, 2025
2 of 3 checks passed
@gabotechs gabotechs deleted the gabrielmusat/propagate-config-extensions branch August 21, 2025 12:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants