array json support for datafusion#27
Merged
zhuqi-lucas merged 3 commits intobranch-51from Jan 22, 2026
Merged
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for reading JSON files in array format [{...}, {...}] in addition to the existing line-delimited (NDJSON) format. The implementation adds a new format_array boolean option to JsonOptions, along with a compression_level field for future compression support.
Changes:
- Added
format_arrayandcompression_levelfields toJsonOptionsprotobuf and configuration structures - Implemented JSON array format parsing in
datasource-jsonmodule with proper schema inference - Added validation to prevent incompatible range-based scanning with array format
- Comprehensive test coverage including unit tests and sqllogictest
Reviewed changes
Copilot reviewed 14 out of 18 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| datafusion/proto-common/proto/datafusion_common.proto | Added compression_level and format_array fields to JsonOptions protobuf definition |
| datafusion/proto-common/src/generated/prost.rs | Generated protobuf code with new JsonOptions fields |
| datafusion/proto-common/src/generated/pbjson.rs | Generated JSON serialization code for new fields |
| datafusion/proto-common/src/to_proto/mod.rs | Serialization logic for JsonOptions (has type mismatch bug) |
| datafusion/proto-common/src/from_proto/mod.rs | Deserialization logic for JsonOptions (missing compression_level field) |
| datafusion/proto/src/generated/datafusion_proto_common.rs | Duplicate generated protobuf code for JsonOptions |
| datafusion/proto/src/logical_plan/file_formats.rs | Proto conversion for JsonOptions with correct type casting |
| datafusion/common/src/config.rs | Added compression_level and format_array to JsonOptions config |
| datafusion/datasource-json/src/source.rs | Implemented JSON array parsing logic with memory-based approach |
| datafusion/datasource-json/src/file_format.rs | Added schema inference for JSON array format |
| datafusion/datasource-json/Cargo.toml | Added serde_json dependency |
| datafusion/core/src/datasource/file_format/options.rs | Added format_array option to NdJsonReadOptions |
| datafusion/core/src/datasource/file_format/json.rs | Comprehensive unit tests for array format functionality |
| datafusion/core/tests/data/json_array.json | Test data file with JSON array format |
| datafusion/core/tests/data/json_empty_array.json | Test data file with empty JSON array |
| datafusion/sqllogictest/test_files/json.slt | Integration tests for JSON array format |
| datafusion-examples/examples/csv_json_opener.rs | Updated example to pass new format_array parameter |
| Cargo.lock | Updated with serde_json dependency |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
xudong963
approved these changes
Jan 22, 2026
zhuqi-lucas
added a commit
that referenced
this pull request
Jan 30, 2026
This reverts commit 8583685.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Array json support
apache#19924