Commit f475a0f
[logical-types] update working branch (#12812)
* Add support for external tables with qualified names (#12645)
* Make support schemas
* Set default name to table
* Remove print statements and stale comment
* Add tests for create table
* Fix typo
* Update datafusion/sql/src/statement.rs
Co-authored-by: Jonah Gao <[email protected]>
* convert create_external_table to objectname
* Add sqllogic tests
* Fix failing tests
---------
Co-authored-by: Jonah Gao <[email protected]>
* Fix Regex signature types (#12690)
* Fix Regex signature types
* Uncomment the shared tests in string_query.slt.part and removed tests copies everywhere else
* Test `LIKE` and `MATCH` with flags; Remove new tests from regexp.slt
* Refactor `ByteGroupValueBuilder` to use `MaybeNullBufferBuilder` (#12681)
* Fix malformed hex string literal in docs (#12708)
* Simplify match patterns in coercion rules (#12711)
Remove conditions where unnecessary.
Refactor to improve readability.
* Remove aggregate functions dependency on frontend (#12715)
* Remove aggregate functions dependency on frontend
DataFusion is a SQL query engine and also a reusable library for
building query engines. The core functionality should not depend on
frontend related functionalities like `sqlparser` or `datafusion-sql`.
* Remove duplicate license header
* Minor: Remove clone in `transform_to_states` (#12707)
* rm clone
Signed-off-by: jayzhan211 <[email protected]>
* fmt
Signed-off-by: jayzhan211 <[email protected]>
---------
Signed-off-by: jayzhan211 <[email protected]>
* Refactor tests for union sorting properties, add tests for unions and constants (#12702)
* Refactor tests for union sorting properties
* update doc test
* Undo import reordering
* remove unecessary static lifetimes
* Fix: support Qualified Wildcard in count aggregate function (#12673)
* Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics (#12703)
* Reduce code duplication in `PrimitiveGroupValueBuilder` with const generics
* Fix docs
* Disallow duplicated qualified field names (#12608)
* Disallow duplicated qualified field names
* Fix tests
* Optimize base64/hex decoding by pre-allocating output buffers (~2x faster) (#12675)
* add bench
* replace macro with generic function
* remove duplicated code
* optimize base64/hex decode
* Allow DynamicFileCatalog support to query partitioned file (#12683)
* support to query partitioned table for dynamic file catalog
* cargo clippy
* split partitions inferring to another function
* Support `LIMIT` Push-down logical plan optimization for `Extension` nodes (#12685)
* Update trait `UserDefinedLogicalNodeCore`
Signed-off-by: Austin Liu <[email protected]>
* Update corresponding interface
Signed-off-by: Austin Liu <[email protected]>
Add rewrite rule for `push-down-limit` for `Extension`
Signed-off-by: Austin Liu <[email protected]>
* Add rewrite rule for `push-down-limit` for `Extension` and tests
Signed-off-by: Austin Liu <[email protected]>
* Update corresponding interface
Signed-off-by: Austin Liu <[email protected]>
* Reorganize to match guard
Signed-off-by: Austin Liu <[email protected]>
* Clena up
Signed-off-by: Austin Liu <[email protected]>
Clean up
Signed-off-by: Austin Liu <[email protected]>
---------
Signed-off-by: Austin Liu <[email protected]>
* Fix AvroReader: Add union resolving for nested struct arrays (#12686)
* Add union resolving for nested struct arrays
* Add test
* Change test
* Reproduce index error
* fmt
---------
Co-authored-by: Andrew Lamb <[email protected]>
* Adds macros for creating `WindowUDF` and `WindowFunction` expression (#12693)
* Adds macro for udwf singleton
* Adds a doc comment parameter to macro
* Add doc comment for `create_udwf` macro
* Uses default constructor
* Update `Cargo.lock` in `datafusion-cli`
* Fixes: expand `$FN_NAME` in doc strings
* Adds example for macro usage
* Renames macro
* Improve doc comments
* Rename udwf macro
* Minor: doc copy edits
* Adds macro for creating fluent-style expression API
* Adds support for 1 or more parameters in expression function
* Rewrite doc comments
* Rename parameters
* Minor: formatting
* Adds doc comment for `create_udwf_expr` macro
* Improve example docs
* Hides extraneous code in doc comments
* Add a one-line readme
* Adds doc test assertions + minor formatting fixes
* Adds common macro for defining user-defined window functions
* Adds doc comment for `define_udwf_and_expr`
* Defines `RowNumber` using common macro
* Add usage example for common macro
* Adds usage for custom constructor
* Add examples for remaining patterns
* Improve doc comments for usage examples
* Rewrite inner line docs
* Rewrite `create_udwf_expr!` doc comments
* Minor doc improvements
* Fix doc test and usage example
* Add inline comments for macro patterns
* Minor: change doc comment in example
* Support unparsing plans with both Aggregation and Window functions (#12705)
* Support unparsing plans with both Aggregation and Window functions (#35)
* Fix unparsing for aggregation grouping sets
* Add test for grouping set unparsing
* Update datafusion/sql/src/unparser/utils.rs
Co-authored-by: Jax Liu <[email protected]>
* Update datafusion/sql/src/unparser/utils.rs
Co-authored-by: Jax Liu <[email protected]>
* Update
* More tests
---------
Co-authored-by: Jax Liu <[email protected]>
* Fix strpos invocation with dictionary and null (#12712)
In 1b3608d `strpos` signature was
modified to indicate it supports dictionary as input argument, but the
invoke method doesn't support them.
* docs: Update DataFusion introduction to clarify that DataFusion does provide an "out of the box" query engine (#12666)
* Update DataFusion introduction to show that DataFusion offers packaged versions for end users
* change order
* Update README.md
Co-authored-by: Andrew Lamb <[email protected]>
* refine wording and update user guide for consistency
* prettier
---------
Co-authored-by: Andrew Lamb <[email protected]>
* Framework for generating function docs from embedded code documentation (#12668)
* Initial work on #12432 to allow for generation of udf docs from embedded documentation in the code
* Add missing license header.
* Fixed examples.
* Fixing a really weird RustRover/wsl ... something. No clue what happened there.
* permission change
* Cargo fmt update.
* Refactored Documentation to allow it to be used in a const.
* Add documentation for syntax_example
* Refactoring Documentation based on PR feedback.
* Cargo fmt update.
* Doc update
* Fixed copy/paste error.
* Minor text updates.
---------
Co-authored-by: Andrew Lamb <[email protected]>
* Add IMDB(JOB) Benchmark [2/N] (imdb queries) (#12529)
* imdb dataset
* cargo fmt
* Add 113 queries for IMDB(JOB)
Signed-off-by: Austin Liu <[email protected]>
* Add `get_query_sql` from `query_id` string
Signed-off-by: Austin Liu <[email protected]>
* Fix CSV reader & Remove Parquet partition
Signed-off-by: Austin Liu <[email protected]>
* Add benchmark IMDB runner
Signed-off-by: Austin Liu <[email protected]>
* Add `run_imdb` script
Signed-off-by: Austin Liu <[email protected]>
* Add checker for imdb option
Signed-off-by: Austin Liu <[email protected]>
* Add SLT for IMDB
Signed-off-by: Austin Liu <[email protected]>
* Fix `get_query_sql()` for CI roundtrip test
Signed-off-by: Austin Liu <[email protected]>
Fix `get_query_sql()` for CI roundtrip test
Signed-off-by: Austin Liu <[email protected]>
Fix `get_query_sql()` for CI roundtrip test
Signed-off-by: Austin Liu <[email protected]>
* Clean up
Signed-off-by: Austin Liu <[email protected]>
* Add missing license
Signed-off-by: Austin Liu <[email protected]>
* Add IMDB(JOB) queries `2b` to `5c`
Signed-off-by: Austin Liu <[email protected]>
* Add `INCLUDE_IMDB` in CI verify-benchmark-results
Signed-off-by: Austin Liu <[email protected]>
* Prepare IMDB dataset
Signed-off-by: Austin Liu <[email protected]>
Prepare IMDB dataset
Signed-off-by: Austin Liu <[email protected]>
* use uint as id type
* format
* Seperate `tpch` and `imdb` benchmarking CI jobs
Signed-off-by: Austin Liu <[email protected]>
Fix path
Signed-off-by: Austin Liu <[email protected]>
Fix path
Signed-off-by: Austin Liu <[email protected]>
Remove `tpch` in `imdb` benchmark
Signed-off-by: Austin Liu <[email protected]>
* Remove IMDB(JOB) slt in CI
Signed-off-by: Austin Liu <[email protected]>
Remove IMDB(JOB) slt in CI
Signed-off-by: Austin Liu <[email protected]>
---------
Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: DouPache <[email protected]>
* Minor: avoid clone while calculating union equivalence properties (#12722)
* Minor: avoid clone while calculating union equivalence properties
* Update datafusion/physical-expr/src/equivalence/properties.rs
* fmt
* Simplify streaming_merge function parameters (#12719)
* simplify streaming_merge function parameters
* revert test change
* change StreamingMergeConfig into builder pattern
* Fix links on docs index page (#12750)
* Provide field and schema metadata missing on cross joins, and union with null fields. (#12729)
* test: reproducer for missing schema metadata on cross join
* fix: pass thru schema metadata on cross join
* fix: preserve metadata when transforming to view types
* test: reproducer for missing field metadata in left hand NULL field of union
* fix: preserve field metadata from right side of union
* chore: safe indexing
* Minor: Update string tests for strpos (#12739)
* Apply `type_union_resolution` to array and values (#12753)
* cleanup make array coercion rule
Signed-off-by: jayzhan211 <[email protected]>
* change to type union resolution
Signed-off-by: jayzhan211 <[email protected]>
* change value too
Signed-off-by: jayzhan211 <[email protected]>
* fix tpyo
Signed-off-by: jayzhan211 <[email protected]>
---------
Signed-off-by: jayzhan211 <[email protected]>
* Add `DocumentationBuilder::with_standard_argument` to reduce copy/paste (#12747)
* Add `DocumentationBuilder::with_standard_expression` to reduce copy/paste
* fix doc
* fix standard argument
* Update docs
* Improve documentation to explain what is different
* fix `equal_to` in `PrimitiveGroupValueBuilder` (#12758)
* fix `equal_to` in `PrimitiveGroupValueBuilder`.
* fix typo.
* add uts.
* reduce calling of `is_null`.
* Minor: doc how field name is to be set (#12757)
* Fix `equal_to` in `ByteGroupValueBuilder` (#12770)
* Fix `equal_to` in `ByteGroupValueBuilder`
* refactor null_equal_to
* Update datafusion/physical-plan/src/aggregates/group_values/group_column.rs
* Allow simplification even when nullable (#12746)
The nullable requirement seem to have been added in #1401 but as far as
I can tell they are not needed for these 2 cases.
I think this can be shown using this truth table: (generated using
datafusion-cli without this patch)
```
> CREATE TABLE t (v BOOLEAN) as values (true), (false), (NULL);
> select t.v, t2.v, t.v AND (t.v OR t2.v), t.v OR (t.v AND t2.v) from t cross join t as t2;
+-------+-------+---------------------+---------------------+
| v | v | t.v AND t.v OR t2.v | t.v OR t.v AND t2.v |
+-------+-------+---------------------+---------------------+
| true | true | true | true |
| true | false | true | true |
| true | | true | true |
| false | true | false | false |
| false | false | false | false |
| false | | false | false |
| | true | | |
| | false | | |
| | | | |
+-------+-------+---------------------+---------------------+
```
And it seems Spark applies both of these and DuckDB applies only the
first one.
* Fix unnest conjunction with selecting wildcard expression (#12760)
* fix unnest statement with wildcard expression
* add commnets
* Improve `round` scalar function unparsing for Postgres (#12744)
* Postgres: enforce required `NUMERIC` type for `round` scalar function (#34)
Includes initial support for dialects to override scalar functions unparsing
* Document scalar_function_to_sql_overrides fn
* Fix stack overflow calculating projected orderings (#12759)
* Fix stack overflow calculating projected orderings
* fix docs
* Port / Add Documentation for `VarianceSample` and `VariancePopulation` (#12742)
* Upgrade arrow/parquet to `53.1.0` / fix clippy (#12724)
* Update to arrow/parquet 53.1.0
* Update some API
* update for changed file sizes
* Use non deprecated APIs
* Use ParquetMetadataReader from @etseidl
* remove upstreamed implementation
* Update CSV schema
* Use upstream is_null and is_not_null kernels
* feat: add support for Substrait ExtendedExpression (#12728)
* Add support for serializing and deserializing Substrait ExtendedExpr message
* Address clippy reviews
* Reuse existing rename method
* Transformed::new_transformed: Fix documentation formatting (#12787)
Co-authored-by: Andrew Lamb <[email protected]>
* fix: Correct results for grouping sets when columns contain nulls (#12571)
* Fix grouping sets behavior when data contains nulls
* PR suggestion comment
* Update new test case
* Add grouping_id to the logical plan
* Add doc comment next to INTERNAL_GROUPING_ID
* Fix unparsing of Aggregate with grouping sets
---------
Co-authored-by: Andrew Lamb <[email protected]>
* Migrate documentation for all string functions from scalar_functions.md to code (#12775)
* Added documentation for string and unicode functions.
* Fixed issues with aliases.
* Cargo fmt.
* Minor doc fixes.
* Update docs for var_pop/samp
---------
Co-authored-by: Andrew Lamb <[email protected]>
* Account for constant equivalence properties in union, tests (#12562)
* Minor: clarify comment about empty dependencies (#12786)
* Introduce Signature::String and return error if input of `strpos` is integer (#12751)
* fix sig
Signed-off-by: jayzhan211 <[email protected]>
* fix
Signed-off-by: jayzhan211 <[email protected]>
* fix error
Signed-off-by: jayzhan211 <[email protected]>
* fix all signature
Signed-off-by: jayzhan211 <[email protected]>
* fix all signature
Signed-off-by: jayzhan211 <[email protected]>
* change default type
Signed-off-by: jayzhan211 <[email protected]>
* clippy
Signed-off-by: jayzhan211 <[email protected]>
* fix docs
Signed-off-by: jayzhan211 <[email protected]>
* rm deadcode
Signed-off-by: jayzhan211 <[email protected]>
* cleanup
Signed-off-by: jayzhan211 <[email protected]>
* cleanup
Signed-off-by: jayzhan211 <[email protected]>
* rm test
Signed-off-by: jayzhan211 <[email protected]>
---------
Signed-off-by: jayzhan211 <[email protected]>
* Minor: improve docs on MovingMin/MovingMax (#12790)
* Add slt tests (#12721)
---------
Signed-off-by: jayzhan211 <[email protected]>
Signed-off-by: Austin Liu <[email protected]>
Co-authored-by: OussamaSaoudi <[email protected]>
Co-authored-by: Jonah Gao <[email protected]>
Co-authored-by: Dmitrii Blaginin <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
Co-authored-by: Tomoaki Kawada <[email protected]>
Co-authored-by: Piotr Findeisen <[email protected]>
Co-authored-by: Jay Zhan <[email protected]>
Co-authored-by: HuSen <[email protected]>
Co-authored-by: Emil Ejbyfeldt <[email protected]>
Co-authored-by: Simon Vandel Sillesen <[email protected]>
Co-authored-by: Jax Liu <[email protected]>
Co-authored-by: Austin Liu <[email protected]>
Co-authored-by: JonasDev1 <[email protected]>
Co-authored-by: jcsherin <[email protected]>
Co-authored-by: Sergei Grebnov <[email protected]>
Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Bruce Ritchie <[email protected]>
Co-authored-by: DouPache <[email protected]>
Co-authored-by: mertak-synnada <[email protected]>
Co-authored-by: Bryce Mecum <[email protected]>
Co-authored-by: wiedld <[email protected]>
Co-authored-by: kamille <[email protected]>
Co-authored-by: Weston Pace <[email protected]>
Co-authored-by: Val Lorentz <[email protected]>1 parent 454db7e commit f475a0f
File tree
287 files changed
+9951
-3563
lines changed- .github/workflows
- benchmarks
- queries/imdb
- src
- bin
- imdb
- datafusion-cli
- datafusion
- common/src
- core
- src
- bin
- catalog_common
- dataframe
- datasource
- avro_to_arrow
- file_format
- listing
- tests
- expr_api
- fuzz_cases
- user_defined
- expr-common/src
- type_coercion
- expr
- src
- logical_plan
- type_coercion
- functions-aggregate
- src
- functions-nested/src
- functions-window
- src
- functions
- benches
- src
- core
- crypto
- datetime
- encoding
- math
- regex
- string
- unicode
- optimizer
- src
- analyzer
- optimize_projections
- simplify_expressions
- test
- tests
- physical-expr-common/src
- physical-expr/src
- equivalence
- expressions
- physical-plan/src
- aggregates
- group_values
- joins
- repartition
- sorts
- proto/tests/cases
- sqllogictest
- src
- test_files
- string
- sql
- src
- expr
- unparser
- tests
- cases
- substrait
- src/logical_plan
- tests/cases
- dev
- docs/source
- user-guide
- sql
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
287 files changed
+9951
-3563
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
521 | 521 | | |
522 | 522 | | |
523 | 523 | | |
524 | | - | |
| 524 | + | |
525 | 525 | | |
526 | 526 | | |
527 | 527 | | |
| |||
542 | 542 | | |
543 | 543 | | |
544 | 544 | | |
| 545 | + | |
| 546 | + | |
| 547 | + | |
| 548 | + | |
| 549 | + | |
545 | 550 | | |
546 | 551 | | |
547 | 552 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
70 | 70 | | |
71 | 71 | | |
72 | 72 | | |
73 | | - | |
| 73 | + | |
74 | 74 | | |
75 | 75 | | |
76 | | - | |
| 76 | + | |
77 | 77 | | |
78 | 78 | | |
79 | | - | |
80 | | - | |
| 79 | + | |
| 80 | + | |
81 | 81 | | |
82 | 82 | | |
83 | | - | |
| 83 | + | |
84 | 84 | | |
85 | 85 | | |
86 | | - | |
87 | | - | |
88 | | - | |
| 86 | + | |
| 87 | + | |
| 88 | + | |
89 | 89 | | |
90 | 90 | | |
91 | 91 | | |
| |||
126 | 126 | | |
127 | 127 | | |
128 | 128 | | |
129 | | - | |
| 129 | + | |
130 | 130 | | |
131 | 131 | | |
132 | 132 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
42 | 42 | | |
43 | 43 | | |
44 | 44 | | |
45 | | - | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
| 49 | + | |
| 50 | + | |
| 51 | + | |
| 52 | + | |
| 53 | + | |
| 54 | + | |
| 55 | + | |
| 56 | + | |
| 57 | + | |
46 | 58 | | |
47 | 59 | | |
48 | 60 | | |
49 | | - | |
| 61 | + | |
50 | 62 | | |
51 | 63 | | |
52 | | - | |
53 | 64 | | |
54 | 65 | | |
55 | 66 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
211 | 211 | | |
212 | 212 | | |
213 | 213 | | |
| 214 | + | |
214 | 215 | | |
215 | 216 | | |
216 | 217 | | |
| |||
239 | 240 | | |
240 | 241 | | |
241 | 242 | | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
242 | 246 | | |
243 | 247 | | |
244 | 248 | | |
| |||
510 | 514 | | |
511 | 515 | | |
512 | 516 | | |
| 517 | + | |
| 518 | + | |
| 519 | + | |
| 520 | + | |
| 521 | + | |
| 522 | + | |
| 523 | + | |
| 524 | + | |
| 525 | + | |
| 526 | + | |
513 | 527 | | |
514 | 528 | | |
515 | 529 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
0 commit comments