Conversation
- Updated the `Step` trait to use async methods for execution, enhancing the pipeline's ability to handle asynchronous operations. - Modified various pipeline steps (CSV, Avro, Parquet, ORC, JSON, XLSX, YAML) to implement the new async `execute` method. - Adjusted command implementations in `convert`, `head`, `tail`, and other features to await the execution of steps, ensuring proper async behavior. - Enhanced tests to validate the new async functionality across different file formats and commands, improving overall robustness and performance.
- Added `async-trait` as a dependency in `Cargo.toml` to support asynchronous trait methods. - Updated the REPL evaluation logic to construct and execute pipelines asynchronously, improving performance and responsiveness. - Refactored the `PipelineStage` enum to include a `Print` stage and modified the `exec_select` method to handle column specifications more effectively. - Enhanced the display functionality for pipeline stages, providing clearer output for users during REPL interactions.
- Updated the `PipelineStage` enum to replace `Vec<String>` with `Vec<ColumnSpec>` for column selection, enhancing type safety and clarity. - Refactored the `exec_select` method and related functions to handle `ColumnSpec`, allowing for both exact and case-insensitive column matching. - Introduced `resolve_column_specs` function to resolve column specifications against the schema, improving the selection logic. - Updated tests to validate the new column selection behavior, ensuring robust functionality across various scenarios.
- Renamed `data_frame_reader` module to `dataframe` for consistency and clarity. - Updated references in `pipeline.rs` and `convert.rs` to use the new `dataframe` module. - Introduced `DataFrameSource` and `DataFrameWriter` structs in the new `dataframe.rs` file, encapsulating DataFrame reading and writing functionality. - Enhanced the `read_to_batches` and `write_batches` functions to utilize the new DataFrame structures, improving modularity and readability.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Steptrait to async (async_trait) and update command paths (convert,head,tail) plus tests to await step execution.src/pipeline/dataframe.rs(renamed fromdata_frame_reader.rs) so DataFrameSource/DataFrameWriter live with pipeline code.evalnow returns stages, thenexecute_pipelineruns them), and print the resolved pipeline stages before execution.Why
These changes align pipeline step execution with async DataFusion usage, reduce cross-module coupling between CLI and pipeline internals, and make REPL execution flow clearer for future async stage expansion.