-
Notifications
You must be signed in to change notification settings - Fork 202
feat: add tiledb_query_add_predicate API to parse a SQL expression string into a QueryCondition
#5566
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rroelke
wants to merge
56
commits into
main
Choose a base branch
from
rr/core-25-add-predicate
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+4,437
−1,377
Open
feat: add tiledb_query_add_predicate API to parse a SQL expression string into a QueryCondition
#5566
Changes from 13 commits
Commits
Show all changes
56 commits
Select commit
Hold shift + click to select a range
94eeeff
tiledb_query_add_predicate
rroelke ad3f43c
Remove assert post rebase
rroelke c10ef48
Fill in WHERE a IS NOT NULL example
rroelke 6017582
Fill in other global order unit test examples
rroelke 303e831
unit-query-add-predicate.cc tests for other readers
rroelke 7aaedd5
Move datafusion session to query instead of ContextResources
rroelke 3668cda
Tweak example comment
rroelke 4e2eebf
Enumeration::create const std::vector
rroelke bd16e7b
Tweak example
rroelke 40cb894
cpp example
rroelke 2139297
clippy
rroelke f10d157
Add test on evolved schema
rroelke 2b72ba4
Change test names
rroelke 1d962ec
Fix non-rust build
rroelke b0d36d3
Fix osx build errors
rroelke f246bd7
Fix C API example print_elem buffer
rroelke 744728d
Comment new/updated test support functions
rroelke bead919
Remove unnecessary ExternType impl
rroelke 02fb0ca
Self-review code comments
rroelke ec829e0
Attempt to fix query_add_predicate error
rroelke f2ddb4b
Undo clang-format-17 string splits
rroelke 176a021
Change C++ API to use std::string
rroelke d1c7680
Remove logger_->status
rroelke 653c890
SQL dialect in API comments
rroelke a00301c
Query add predicate to in progress query
rroelke be32216
Fix bizarre -Warray-bounds error for b_data_offsets
rroelke 763a3e2
Query add predicate with query condition
rroelke 1ab697f
Add tests demonstrating field escaping
rroelke 2c05e33
Add some FFI for sm Buffer
rroelke 666e138
FFI use_enumeration
rroelke 069162b
Bindings for accessing enumeration contents and locating them in a sc…
rroelke 9758e7f
ArrowSchema => ArrowArraySchema, contains dyn ArrowArray for enumerat…
rroelke b41d1aa
Move definitions to .cc file to avoid multiple definition error
rroelke d78e434
Add WhichSchema to distinguish schema for view vs. storage, passes un…
rroelke 860d3d5
Fix wrong write size in unit_query_condition.cc
rroelke 9d8e4ff
Fix UTF-8, unit_query_condition passes
rroelke d6889bd
Stopgap for enumerations in WhichSchema::View
rroelke d5caf2c
clippy
rroelke 314b169
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke 9046ef1
RestClientFactory can construct in place
rroelke 33c8a72
Handle TILEDB_RUST=OFF in unit-query-add-predicate.cc
rroelke a3e6617
HeapMemoryLinter ignores oxidize dir
rroelke e18ccd7
Fix empty dimension tuple
rroelke f1d7eb5
make format
rroelke 27a1be3
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke 572ba80
Remove non-experimental version of add_predicate
rroelke 0edb237
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke 77c0f85
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke 1823e8a
Fix error message grammar
rroelke c242c16
Split no predicate and WHERE TRUE in examples
rroelke d8eb419
cargo update and remove cxxbridge version pin
rroelke 66ff1fc
Single QueryPredicates FFI boundary compiles, not tested
rroelke 5d2521e
Single QueryPredicates FFI boundary passes existing tests
rroelke 48195cf
unit_query_condition and API test both pass
rroelke 6f7d080
self review
rroelke 199ea2f
clippy
rroelke File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,348 @@ | ||
| /** | ||
| * @file query_add_predicate.cc | ||
| * | ||
| * @section LICENSE | ||
| * | ||
| * The MIT License | ||
| * | ||
| * @copyright Copyright (c) 2025 TileDB, Inc. | ||
| * | ||
| * Permission is hereby granted, free of charge, to any person obtaining a copy | ||
| * of this software and associated documentation files (the "Software"), to deal | ||
| * in the Software without restriction, including without limitation the rights | ||
| * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell | ||
| * copies of the Software, and to permit persons to whom the Software is | ||
| * furnished to do so, subject to the following conditions: | ||
| * | ||
| * The above copyright notice and this permission notice shall be included in | ||
| * all copies or substantial portions of the Software. | ||
| * | ||
| * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR | ||
| * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, | ||
| * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE | ||
| * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER | ||
| * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, | ||
| * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN | ||
| * THE SOFTWARE. | ||
| * | ||
| * @section DESCRIPTION | ||
| * | ||
| * This example demonstrates using the `QueryExperimental::add_predicate` | ||
| * API to add one or more text predicates to a query. This API parses a SQL | ||
| * predicate and uses it to filter results inside of the storage engine | ||
| * before returning them to the user. | ||
| * | ||
| * The array used in this example is identical to that of the | ||
| * `query_condition_sparse` example. The first group of predicates which | ||
| * run are text equivalents of the predicates in that example, and produce | ||
| * the same results. | ||
| * | ||
| * This example also has additional queries which use predicates which | ||
| * combine dimensions and attributes, highlighting a capability which | ||
| * cannot be replicated by just subarrays and query conditions. | ||
| */ | ||
|
|
||
| #include <iostream> | ||
| #include <optional> | ||
| #include <tiledb/tiledb> | ||
| #include <tiledb/tiledb_experimental> | ||
| #include <vector> | ||
|
|
||
| using namespace tiledb; | ||
|
|
||
| // Name of array. | ||
| std::string array_name("array_query_add_predicate"); | ||
|
|
||
| // Enumeration variants | ||
| const std::vector<std::string> us_states = { | ||
| "alabama", | ||
| "alaska", | ||
| "arizona", | ||
| "arkansas", | ||
| "california", | ||
| "colorado", | ||
| "connecticut", | ||
| "etc"}; | ||
|
|
||
| /** | ||
| * @brief Function to print the values of all the attributes for one | ||
| * index of this array. | ||
| * | ||
| * @param a Attribute a's value. | ||
| * @param b Attribute b's value. | ||
| * @param c Attribute c's value. | ||
| * @param d Attribute d's value. | ||
| */ | ||
| void print_elem( | ||
| std::optional<int> a, | ||
| std::string b, | ||
| int32_t c, | ||
| float d, | ||
| std::optional<uint8_t> e) { | ||
| std::cout << "{" << (a.has_value() ? std::to_string(a.value()) : "null") | ||
| << ", " << b << ", " << c << ", " << d << ", " | ||
| << (e.has_value() ? | ||
| (e.value() < us_states.size() ? | ||
| us_states[e.value()] : | ||
| "(invalid key " + std::to_string(e.value()) + ")") : | ||
| "null") | ||
| << "}" << std::endl; | ||
| } | ||
|
|
||
| /** | ||
| * @brief Function to create the TileDB array used in this example. | ||
| * The array will be 1D with size 1 with dimension "index". | ||
| * The bounds on the index will be 0 through 9, inclusive. | ||
| * | ||
| * The array has four attributes. The four attributes are | ||
| * - "a" (type int) | ||
| * - "b" (type std::string) | ||
| * - "c" (type int32_t) | ||
| * - "d" (type float) | ||
| * | ||
| * @param ctx The context. | ||
| */ | ||
| void create_array(Context& ctx) { | ||
| // Creating the domain and the dimensions. | ||
| Domain domain(ctx); | ||
| domain.add_dimension(Dimension::create<int32_t>(ctx, "index", {{0, 9}})); | ||
|
|
||
| // The array will be sparse. | ||
| ArraySchema schema(ctx, TILEDB_SPARSE); | ||
| schema.set_domain(domain).set_order({{TILEDB_ROW_MAJOR}}); | ||
|
|
||
| // Adding the attributes of the array to the array schema. | ||
| Attribute a = Attribute::create<int>(ctx, "a").set_nullable(true); | ||
| schema.add_attribute(a) | ||
| .add_attribute(Attribute::create<std::string>(ctx, "b")) | ||
| .add_attribute(Attribute::create<int32_t>(ctx, "c")) | ||
| .add_attribute(Attribute::create<float>(ctx, "d")); | ||
|
|
||
| // Create enumeration and an attribute using it | ||
| ArraySchemaExperimental::add_enumeration( | ||
| ctx, | ||
| schema, | ||
| Enumeration::create(ctx, std::string("us_states"), us_states)); | ||
|
|
||
| { | ||
| auto e = Attribute::create<uint8_t>(ctx, "e").set_nullable(true); | ||
| AttributeExperimental::set_enumeration_name(ctx, e, "us_states"); | ||
| schema.add_attribute(e); | ||
| } | ||
|
|
||
| // Create the (empty) array. | ||
| Array::create(ctx, array_name, schema); | ||
| } | ||
|
|
||
| /** | ||
| * @brief Execute a write on array query_condition_sparse array | ||
| * which then stores the following data in the array. The table | ||
| * is organized by dimension/attribute. | ||
| * | ||
| * index | a | b | c | d | ||
| * ------------------------------- | ||
| * 0 | null | alice | 0 | 4.1 | ||
| * 1 | 2 | bob | 0 | 3.4 | ||
| * 2 | null | craig | 0 | 5.6 | ||
| * 3 | 4 | dave | 0 | 3.7 | ||
| * 4 | null | erin | 0 | 2.3 | ||
| * 5 | 6 | frank | 0 | 1.7 | ||
| * 6 | null | grace | 1 | 3.8 | ||
| * 7 | 8 | heidi | 2 | 4.9 | ||
| * 8 | null | ivan | 3 | 3.2 | ||
| * 9 | 10 | judy | 4 | 3.1 | ||
| * | ||
| * @param ctx The context. | ||
| */ | ||
| void write_array(Context& ctx) { | ||
| // Create data buffers that store the values to be written in. | ||
| std::vector<int32_t> dim_data = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}; | ||
| std::vector<int32_t> a_data = {0, 2, 0, 4, 0, 6, 0, 8, 0, 10}; | ||
| std::vector<uint8_t> a_data_validity = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1}; | ||
| std::vector<std::string> b_strs = { | ||
| "alice", | ||
| "bob", | ||
| "craig", | ||
| "dave", | ||
| "erin", | ||
| "frank", | ||
| "grace", | ||
| "heidi", | ||
| "ivan", | ||
| "judy"}; | ||
| std::string b_data = ""; | ||
| std::vector<uint64_t> b_data_offsets; | ||
| for (const auto& elem : b_strs) { | ||
| b_data_offsets.push_back(b_data.size()); | ||
| b_data += elem; | ||
| } | ||
| std::vector<int32_t> c_data = {0, 0, 0, 0, 0, 0, 1, 2, 3, 4}; | ||
| std::vector<float> d_data = { | ||
| 4.1, 3.4, 5.6, 3.7, 2.3, 1.7, 3.8, 4.9, 3.2, 3.1}; | ||
|
|
||
| std::vector<uint8_t> e_keys = {2, 7, 5, 6, 100, 3, 7, 7, 5, 4}; | ||
| std::vector<uint8_t> e_validity = {1, 1, 1, 1, 0, 1, 1, 1, 1, 1}; | ||
|
|
||
| // Execute the write query. | ||
| Array array_w(ctx, array_name, TILEDB_WRITE); | ||
| Query query_w(ctx, array_w); | ||
| query_w.set_layout(TILEDB_UNORDERED) | ||
| .set_data_buffer("index", dim_data) | ||
| .set_data_buffer("a", a_data) | ||
| .set_validity_buffer("a", a_data_validity) | ||
| .set_data_buffer("b", b_data) | ||
| .set_offsets_buffer("b", b_data_offsets) | ||
| .set_data_buffer("c", c_data) | ||
| .set_data_buffer("d", d_data) | ||
| .set_data_buffer("e", e_keys) | ||
| .set_validity_buffer("e", e_validity); | ||
|
|
||
| query_w.submit(); | ||
| query_w.finalize(); | ||
| array_w.close(); | ||
| } | ||
|
|
||
| /** | ||
| * @brief Executes the read query for the array created in write_array. | ||
| * | ||
| * @param ctx The context. | ||
| * @param qc The query condition to execute the query with. | ||
| */ | ||
| void read_array_with_predicates( | ||
| Context& ctx, std::vector<std::string> predicates) { | ||
| const unsigned reserve_cells = 16; | ||
|
|
||
| // Create data buffers to read the values into. | ||
| std::vector<int> a_data(reserve_cells); | ||
| std::vector<uint8_t> a_data_validity(reserve_cells); | ||
|
|
||
| // We initialize the string b_data to have enough space to | ||
| // contain the total length of all of the strings written | ||
| // into attribute b | ||
| std::string b_data; | ||
| b_data.resize(256); | ||
|
|
||
| std::vector<uint64_t> b_data_offsets(reserve_cells); | ||
| std::vector<int32_t> c_data(reserve_cells); | ||
| std::vector<float> d_data(reserve_cells); | ||
| std::vector<uint8_t> e_keys(reserve_cells); | ||
| std::vector<uint8_t> e_validity(reserve_cells); | ||
|
|
||
| // Execute the read query. | ||
| Array array(ctx, array_name, TILEDB_READ); | ||
| Query query(ctx, array); | ||
| query.set_layout(TILEDB_GLOBAL_ORDER) | ||
| .set_data_buffer("a", a_data) | ||
| .set_validity_buffer("a", a_data_validity) | ||
| .set_data_buffer("b", b_data) | ||
| .set_offsets_buffer("b", b_data_offsets) | ||
| .set_data_buffer("c", c_data) | ||
| .set_data_buffer("d", d_data) | ||
| .set_data_buffer("e", e_keys) | ||
| .set_validity_buffer("e", e_validity); | ||
|
|
||
| for (const auto& predicate : predicates) { | ||
| QueryExperimental::add_predicate(ctx, query, predicate.c_str()); | ||
| } | ||
|
|
||
| query.submit(); | ||
|
|
||
| // Collect the results of the read query. The number of elements | ||
| // the filtered array contains is in num_elements_result. | ||
| // The length of the filtered substring of all the data is in | ||
| // b_data, and all the offsets for filtered individual elements | ||
| // are in b_data_offsets. | ||
| auto table = query.result_buffer_elements_nullable(); | ||
| size_t num_elements_result = std::get<1>(table["c"]); | ||
| uint64_t b_str_length = std::get<1>(table["b"]); | ||
| b_data_offsets.push_back(b_str_length); | ||
|
|
||
| // Here we print all the elements that are returned by the query. | ||
| for (size_t i = 0; i < num_elements_result; ++i) { | ||
| // We pass in nullopt if the data is invalid, per the validity buffer. | ||
| print_elem( | ||
| (a_data_validity[i] ? std::optional{a_data[i]} : std::nullopt), | ||
| b_data.substr( | ||
| b_data_offsets[i], b_data_offsets[i + 1] - b_data_offsets[i]), | ||
| c_data[i], | ||
| d_data[i], | ||
| (e_validity[i] ? std::optional{e_keys[i]} : std::nullopt)); | ||
| } | ||
|
|
||
| query.finalize(); | ||
| array.close(); | ||
| } | ||
|
|
||
| int main() { | ||
| // Create the context. | ||
| Context ctx; | ||
| VFS vfs(ctx); | ||
| if (!vfs.is_dir(array_name)) { | ||
| // Create and write data to the array. | ||
| create_array(ctx); | ||
| write_array(ctx); | ||
| } | ||
|
|
||
| // EXAMPLES FROM query_condition_sparse.cc EXAMPLE | ||
|
|
||
| // Printing the entire array. | ||
| std::cout << "WHERE TRUE" << std::endl; | ||
| read_array_with_predicates(ctx, {}); | ||
rroelke marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| std::cout << std::endl; | ||
|
|
||
| // Execute a read query with query condition `a = null`. | ||
| std::cout << "WHERE a IS NULL" << std::endl; | ||
| read_array_with_predicates(ctx, {"a IS NULL"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // Execute a read query with query condition `b < "eve"`. | ||
| std::cout << "WHERE b < 'eve'" << std::endl; | ||
| read_array_with_predicates(ctx, {"b < 'eve'"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // Execute a read query with query condition `c >= 1`. | ||
| std::cout << "WHERE c >= 1" << std::endl; | ||
| read_array_with_predicates(ctx, {"c >= 1"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // Execute a read query with query condition `3.0f <= d AND d <= 4.0f`. | ||
| std::cout << "WHERE d BETWEEN 3.0 AND 4.0" << std::endl; | ||
| QueryCondition qc3(ctx); | ||
| read_array_with_predicates(ctx, {"d BETWEEN 3.0 AND 4.0"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // Execute a read query with query condition `3.0f <= d AND d <= 4.0f AND a != | ||
| // null AND b < \"eve\"`. | ||
| std::cout << "WHERE d BETWEEN 3.0 AND 4.0 AND a IS NOT NULL AND b < 'eve'" | ||
| << std::endl; | ||
| read_array_with_predicates( | ||
| ctx, {"d BETWEEN 3.0 AND 4.0", "a IS NOT NULL", "b < 'eve'"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // BEGIN EXAMPLES WITH ENUMERATIONS | ||
| // error is expected as enumerations are not supported yet | ||
| std::cout << "WHERE e = 'california'" << std::endl; | ||
| try { | ||
| read_array_with_predicates(ctx, {"e = 'california'"}); | ||
| // should not get here | ||
| return TILEDB_ERR; | ||
| } catch (const std::exception& e) { | ||
| std::cout << e.what() << std::endl; | ||
| } | ||
| std::cout << std::endl; | ||
|
|
||
| // BEGIN EXAMPLES WITH NO EQUIVALENT | ||
| // these examples cannot be expressed using subarray + query condition | ||
|
|
||
| // query condition does not have functions, here we use coalesce | ||
| std::cout << "WHERE coalesce(a, 2) + c < index" << std::endl; | ||
| read_array_with_predicates(ctx, {"coalesce(a, 2) + c < index"}); | ||
| std::cout << std::endl; | ||
|
|
||
| // FIXME: this is query-condition-able, use arithmetic | ||
| std::cout << "WHERE a > 6 OR a IS NULL" << std::endl; | ||
| read_array_with_predicates(ctx, {"a > 6 OR a IS NULL"}); | ||
| std::cout << std::endl; | ||
|
|
||
| return 0; | ||
| } | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I doubt that an example in C is useful to have. As I have said in the past, the C API is hard to use directly, and users should prefer using a higher-level language.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the C examples are useful for developers, if not customers (who I agree are better off using higher-level APIs). I have referred to them often.