Skip to content

make DefaultLogicalExtensionCodec support serialisation of build in file formats #16944

@milenkovicm

Description

@milenkovicm

Is your feature request related to a problem or challenge?

At the moment ballista overrides LogicalExtensionCodec::try_decode_file_format & LogicalExtensionCodec::try_encode_file_format providing support for:

file_format_codecs: vec![
            Arc::new(ParquetLogicalExtensionCodec {}),
            Arc::new(CsvLogicalExtensionCodec {}),
            Arc::new(JsonLogicalExtensionCodec {}),
            Arc::new(ArrowLogicalExtensionCodec {}),
            Arc::new(AvroLogicalExtensionCodec {}),
        ],

as seen at 1.

Should we want to integrate ballista with datafusion python we would need to provide a custom LogicalExtensionCodec implementing same logic or reuse ballista LogicalExtensionCodec implementation, which would complicate integration a bit.

As this file types are supported out of the box in datafusion would it make sense to implement encoder/decoder or them in DefaultLogicalExtensionCodec?

Describe the solution you'd like

Should this proposal make sense, implement support for it in DefaultLogicalExtensionCodec similar to what is supported in ballista already:

fn try_decode_file_format(
      &self,
      buf: &[u8],
      ctx: &datafusion::prelude::SessionContext,
  ) -> Result<Arc<dyn datafusion::datasource::file_format::FileFormatFactory>> {
      let proto = FileFormatProto::decode(buf)
          .map_err(|e| DataFusionError::Internal(e.to_string()))?;

      let codec = self
          .file_format_codecs
          .get(proto.encoder_position as usize)
          .ok_or(DataFusionError::Internal(
              "Can't find required codec in file codec list".to_owned(),
          ))?;

      codec.try_decode_file_format(&proto.blob, ctx)
  }

  fn try_encode_file_format(
      &self,
      buf: &mut Vec<u8>,
      node: Arc<dyn datafusion::datasource::file_format::FileFormatFactory>,
  ) -> Result<()> {
      let mut blob = vec![];
      let (encoder_position, _) =
          self.try_any(|codec| codec.try_encode_file_format(&mut blob, node.clone()))?;

      let proto = FileFormatProto {
          encoder_position,
          blob,
      };
      proto
          .encode(buf)
          .map_err(|e| DataFusionError::Internal(e.to_string()))
  }

as seen at 2

Describe alternatives you've considered

alternative would be to keep everything as it is.

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgood first issueGood for newcomershelp wantedExtra attention is neededprotoRelated to proto crate

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions