Skip to content
Open
Show file tree
Hide file tree
Changes from 45 commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
94eeeff
tiledb_query_add_predicate
rroelke Apr 4, 2025
ad3f43c
Remove assert post rebase
rroelke Jun 30, 2025
c10ef48
Fill in WHERE a IS NOT NULL example
rroelke Jun 30, 2025
6017582
Fill in other global order unit test examples
rroelke Jun 30, 2025
303e831
unit-query-add-predicate.cc tests for other readers
rroelke Jun 30, 2025
7aaedd5
Move datafusion session to query instead of ContextResources
rroelke Jul 1, 2025
3668cda
Tweak example comment
rroelke Jul 1, 2025
4e2eebf
Enumeration::create const std::vector
rroelke Jul 1, 2025
bd16e7b
Tweak example
rroelke Jul 1, 2025
40cb894
cpp example
rroelke Jul 1, 2025
2139297
clippy
rroelke Jul 1, 2025
f10d157
Add test on evolved schema
rroelke Jul 1, 2025
2b72ba4
Change test names
rroelke Jul 1, 2025
1d962ec
Fix non-rust build
rroelke Jul 1, 2025
b0d36d3
Fix osx build errors
rroelke Jul 1, 2025
f246bd7
Fix C API example print_elem buffer
rroelke Jul 1, 2025
744728d
Comment new/updated test support functions
rroelke Jul 1, 2025
bead919
Remove unnecessary ExternType impl
rroelke Jul 1, 2025
02fb0ca
Self-review code comments
rroelke Jul 1, 2025
ec829e0
Attempt to fix query_add_predicate error
rroelke Jul 1, 2025
f2ddb4b
Undo clang-format-17 string splits
rroelke Jul 2, 2025
176a021
Change C++ API to use std::string
rroelke Jul 2, 2025
d1c7680
Remove logger_->status
rroelke Jul 2, 2025
653c890
SQL dialect in API comments
rroelke Jul 2, 2025
a00301c
Query add predicate to in progress query
rroelke Jul 2, 2025
be32216
Fix bizarre -Warray-bounds error for b_data_offsets
rroelke Jul 2, 2025
763a3e2
Query add predicate with query condition
rroelke Jul 2, 2025
1ab697f
Add tests demonstrating field escaping
rroelke Jul 2, 2025
2c05e33
Add some FFI for sm Buffer
rroelke Jul 3, 2025
666e138
FFI use_enumeration
rroelke Jul 3, 2025
069162b
Bindings for accessing enumeration contents and locating them in a sc…
rroelke Jul 3, 2025
9758e7f
ArrowSchema => ArrowArraySchema, contains dyn ArrowArray for enumerat…
rroelke Jul 3, 2025
b41d1aa
Move definitions to .cc file to avoid multiple definition error
rroelke Jul 3, 2025
d78e434
Add WhichSchema to distinguish schema for view vs. storage, passes un…
rroelke Jul 3, 2025
860d3d5
Fix wrong write size in unit_query_condition.cc
rroelke Jul 3, 2025
9d8e4ff
Fix UTF-8, unit_query_condition passes
rroelke Jul 3, 2025
d6889bd
Stopgap for enumerations in WhichSchema::View
rroelke Jul 3, 2025
d5caf2c
clippy
rroelke Jul 3, 2025
314b169
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke Jul 21, 2025
9046ef1
RestClientFactory can construct in place
rroelke Jul 21, 2025
33c8a72
Handle TILEDB_RUST=OFF in unit-query-add-predicate.cc
rroelke Jul 21, 2025
a3e6617
HeapMemoryLinter ignores oxidize dir
rroelke Jul 21, 2025
e18ccd7
Fix empty dimension tuple
rroelke Jul 21, 2025
f1d7eb5
make format
rroelke Jul 21, 2025
27a1be3
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke Sep 15, 2025
572ba80
Remove non-experimental version of add_predicate
rroelke Oct 22, 2025
0edb237
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke Oct 22, 2025
77c0f85
Merge remote-tracking branch 'origin/main' into rr/core-25-add-predicate
rroelke Nov 12, 2025
1823e8a
Fix error message grammar
rroelke Nov 12, 2025
c242c16
Split no predicate and WHERE TRUE in examples
rroelke Nov 12, 2025
d8eb419
cargo update and remove cxxbridge version pin
rroelke Nov 12, 2025
66ff1fc
Single QueryPredicates FFI boundary compiles, not tested
rroelke Nov 13, 2025
5d2521e
Single QueryPredicates FFI boundary passes existing tests
rroelke Nov 14, 2025
48195cf
unit_query_condition and API test both pass
rroelke Nov 17, 2025
6f7d080
self review
rroelke Nov 17, 2025
199ea2f
clippy
rroelke Nov 17, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
503 changes: 503 additions & 0 deletions examples/c_api/query_add_predicate.c
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I doubt that an example in C is useful to have. As I have said in the past, the C API is hard to use directly, and users should prefer using a higher-level language.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the C examples are useful for developers, if not customers (who I agree are better off using higher-level APIs). I have referred to them often.

Large diffs are not rendered by default.

358 changes: 358 additions & 0 deletions examples/cpp_api/query_add_predicate.cc
Original file line number Diff line number Diff line change
@@ -0,0 +1,358 @@
/**
* @file query_add_predicate.cc
*
* @section LICENSE
*
* The MIT License
*
* @copyright Copyright (c) 2025 TileDB, Inc.
*
* Permission is hereby granted, free of charge, to any person obtaining a copy
* of this software and associated documentation files (the "Software"), to deal
* in the Software without restriction, including without limitation the rights
* to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
* copies of the Software, and to permit persons to whom the Software is
* furnished to do so, subject to the following conditions:
*
* The above copyright notice and this permission notice shall be included in
* all copies or substantial portions of the Software.
*
* THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
* IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
* FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
* AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
* LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
* OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
* THE SOFTWARE.
*
* @section DESCRIPTION
*
* This example demonstrates using the `QueryExperimental::add_predicate`
* API to add one or more text predicates to a query. This API parses a SQL
* predicate and uses it to filter results inside of the storage engine
* before returning them to the user.
*
* The array used in this example is identical to that of the
* `query_condition_sparse` example. The first group of predicates which
* run are text equivalents of the predicates in that example, and produce
* the same results.
*
* This example also has additional queries which use predicates which
* combine dimensions and attributes, highlighting a capability which
* cannot be replicated by just subarrays and query conditions.
*/

#include <iostream>
#include <optional>
#include <tiledb/tiledb>
#include <tiledb/tiledb_experimental>
#include <vector>

using namespace tiledb;

// Name of array.
std::string array_name("array_query_add_predicate");

// Enumeration variants
const std::vector<std::string> us_states = {
"alabama",
"alaska",
"arizona",
"arkansas",
"california",
"colorado",
"connecticut",
"etc"};

/**
* @brief Function to print the values of all the attributes for one
* index of this array.
*
* @param a Attribute a's value.
* @param b Attribute b's value.
* @param c Attribute c's value.
* @param d Attribute d's value.
*/
void print_elem(
std::optional<int> a,
std::string b,
int32_t c,
float d,
std::optional<uint8_t> e) {
std::cout << "{" << (a.has_value() ? std::to_string(a.value()) : "null")
<< ", " << b << ", " << c << ", " << d << ", "
<< (e.has_value() ?
(e.value() < us_states.size() ?
us_states[e.value()] :
"(invalid key " + std::to_string(e.value()) + ")") :
"null")
<< "}" << std::endl;
}

/**
* @brief Function to create the TileDB array used in this example.
* The array will be 1D with size 1 with dimension "index".
* The bounds on the index will be 0 through 9, inclusive.
*
* The array has four attributes. The four attributes are
* - "a" (type int)
* - "b" (type std::string)
* - "c" (type int32_t)
* - "d" (type float)
*
* @param ctx The context.
*/
void create_array(Context& ctx) {
// Creating the domain and the dimensions.
Domain domain(ctx);
domain.add_dimension(Dimension::create<int32_t>(ctx, "index", {{0, 9}}));

// The array will be sparse.
ArraySchema schema(ctx, TILEDB_SPARSE);
schema.set_domain(domain).set_order({{TILEDB_ROW_MAJOR}});

// Adding the attributes of the array to the array schema.
Attribute a = Attribute::create<int>(ctx, "a").set_nullable(true);
schema.add_attribute(a)
.add_attribute(Attribute::create<std::string>(ctx, "b"))
.add_attribute(Attribute::create<int32_t>(ctx, "c"))
.add_attribute(Attribute::create<float>(ctx, "d"));

// Create enumeration and an attribute using it
ArraySchemaExperimental::add_enumeration(
ctx,
schema,
Enumeration::create(ctx, std::string("us_states"), us_states));

{
auto e = Attribute::create<uint8_t>(ctx, "e").set_nullable(true);
AttributeExperimental::set_enumeration_name(ctx, e, "us_states");
schema.add_attribute(e);
}

// Create the (empty) array.
Array::create(ctx, array_name, schema);
}

/**
* @brief Execute a write on array query_condition_sparse array
* which then stores the following data in the array. The table
* is organized by dimension/attribute.
*
* index | a | b | c | d
* -------------------------------
* 0 | null | alice | 0 | 4.1
* 1 | 2 | bob | 0 | 3.4
* 2 | null | craig | 0 | 5.6
* 3 | 4 | dave | 0 | 3.7
* 4 | null | erin | 0 | 2.3
* 5 | 6 | frank | 0 | 1.7
* 6 | null | grace | 1 | 3.8
* 7 | 8 | heidi | 2 | 4.9
* 8 | null | ivan | 3 | 3.2
* 9 | 10 | judy | 4 | 3.1
*
* @param ctx The context.
*/
void write_array(Context& ctx) {
// Create data buffers that store the values to be written in.
std::vector<int32_t> dim_data = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9};
std::vector<int32_t> a_data = {0, 2, 0, 4, 0, 6, 0, 8, 0, 10};
std::vector<uint8_t> a_data_validity = {0, 1, 0, 1, 0, 1, 0, 1, 0, 1};
std::vector<std::string> b_strs = {
"alice",
"bob",
"craig",
"dave",
"erin",
"frank",
"grace",
"heidi",
"ivan",
"judy"};
std::string b_data = "";
std::vector<uint64_t> b_data_offsets;
for (const auto& elem : b_strs) {
b_data_offsets.push_back(b_data.size());
b_data += elem;
}
std::vector<int32_t> c_data = {0, 0, 0, 0, 0, 0, 1, 2, 3, 4};
std::vector<float> d_data = {
4.1, 3.4, 5.6, 3.7, 2.3, 1.7, 3.8, 4.9, 3.2, 3.1};

std::vector<uint8_t> e_keys = {2, 7, 5, 6, 100, 3, 7, 7, 5, 4};
std::vector<uint8_t> e_validity = {1, 1, 1, 1, 0, 1, 1, 1, 1, 1};

// Execute the write query.
Array array_w(ctx, array_name, TILEDB_WRITE);
Query query_w(ctx, array_w);
query_w.set_layout(TILEDB_UNORDERED)
.set_data_buffer("index", dim_data)
.set_data_buffer("a", a_data)
.set_validity_buffer("a", a_data_validity)
.set_data_buffer("b", b_data)
.set_offsets_buffer("b", b_data_offsets)
.set_data_buffer("c", c_data)
.set_data_buffer("d", d_data)
.set_data_buffer("e", e_keys)
.set_validity_buffer("e", e_validity);

query_w.submit();
query_w.finalize();
array_w.close();
}

/**
* @brief Executes the read query for the array created in write_array.
*
* @param ctx The context.
* @param qc The query condition to execute the query with.
*/
void read_array_with_predicates(
Context& ctx, std::vector<std::string> predicates) {
const unsigned reserve_cells = 16;

// Create data buffers to read the values into.
std::vector<int> a_data(reserve_cells);
std::vector<uint8_t> a_data_validity(reserve_cells);

// We initialize the string b_data to have enough space to
// contain the total length of all of the strings written
// into attribute b
std::string b_data;
b_data.resize(256);

std::vector<uint64_t> b_data_offsets(reserve_cells);
std::vector<int32_t> c_data(reserve_cells);
std::vector<float> d_data(reserve_cells);
std::vector<uint8_t> e_keys(reserve_cells);
std::vector<uint8_t> e_validity(reserve_cells);

// reserve additional space so we can push a trailing offset
// to make the printing logic more straightforward
// (this should not be necessary but without this the `push_back`
// flags -Werror=array-bounds in some compilers)
b_data_offsets.reserve(reserve_cells + 1);

// Execute the read query.
Array array(ctx, array_name, TILEDB_READ);
Query query(ctx, array);
query.set_layout(TILEDB_GLOBAL_ORDER)
.set_data_buffer("a", a_data)
.set_validity_buffer("a", a_data_validity)
.set_data_buffer("b", b_data)
.set_offsets_buffer("b", b_data_offsets)
.set_data_buffer("c", c_data)
.set_data_buffer("d", d_data)
.set_data_buffer("e", e_keys)
.set_validity_buffer("e", e_validity);

for (const auto& predicate : predicates) {
QueryExperimental::add_predicate(ctx, query, predicate);
}

query.submit();

// Collect the results of the read query. The number of elements
// the filtered array contains is in num_elements_result.
// The length of the filtered substring of all the data is in
// b_data, and all the offsets for filtered individual elements
// are in b_data_offsets.
auto table = query.result_buffer_elements_nullable();
size_t num_elements_result = std::get<1>(table["c"]);
uint64_t b_str_length = std::get<1>(table["b"]);
if (num_elements_result < b_data_offsets.size()) {
b_data_offsets[num_elements_result] = b_str_length;
} else {
b_data_offsets.push_back(b_str_length);
}

// Here we print all the elements that are returned by the query.
for (size_t i = 0; i < num_elements_result; ++i) {
// We pass in nullopt if the data is invalid, per the validity buffer.
print_elem(
(a_data_validity[i] ? std::optional{a_data[i]} : std::nullopt),
b_data.substr(
b_data_offsets[i], b_data_offsets[i + 1] - b_data_offsets[i]),
c_data[i],
d_data[i],
(e_validity[i] ? std::optional{e_keys[i]} : std::nullopt));
}

query.finalize();
array.close();
}

int main() {
// Create the context.
Context ctx;
VFS vfs(ctx);
if (!vfs.is_dir(array_name)) {
// Create and write data to the array.
create_array(ctx);
write_array(ctx);
}

// EXAMPLES FROM query_condition_sparse.cc EXAMPLE

// Printing the entire array.
std::cout << "WHERE TRUE" << std::endl;
read_array_with_predicates(ctx, {});
std::cout << std::endl;

// Execute a read query with query condition `a = null`.
std::cout << "WHERE a IS NULL" << std::endl;
read_array_with_predicates(ctx, {"a IS NULL"});
std::cout << std::endl;

// Execute a read query with query condition `b < "eve"`.
std::cout << "WHERE b < 'eve'" << std::endl;
read_array_with_predicates(ctx, {"b < 'eve'"});
std::cout << std::endl;

// Execute a read query with query condition `c >= 1`.
std::cout << "WHERE c >= 1" << std::endl;
read_array_with_predicates(ctx, {"c >= 1"});
std::cout << std::endl;

// Execute a read query with query condition `3.0f <= d AND d <= 4.0f`.
std::cout << "WHERE d BETWEEN 3.0 AND 4.0" << std::endl;
QueryCondition qc3(ctx);
read_array_with_predicates(ctx, {"d BETWEEN 3.0 AND 4.0"});
std::cout << std::endl;

// Execute a read query with query condition `3.0f <= d AND d <= 4.0f AND a !=
// null AND b < \"eve\"`.
std::cout << "WHERE d BETWEEN 3.0 AND 4.0 AND a IS NOT NULL AND b < 'eve'"
<< std::endl;
read_array_with_predicates(
ctx, {"d BETWEEN 3.0 AND 4.0", "a IS NOT NULL", "b < 'eve'"});
std::cout << std::endl;

// BEGIN EXAMPLES WITH ENUMERATIONS
// error is expected as enumerations are not supported yet
std::cout << "WHERE e = 'california'" << std::endl;
try {
read_array_with_predicates(ctx, {"e = 'california'"});
// should not get here
return TILEDB_ERR;
} catch (const std::exception& e) {
std::cout << e.what() << std::endl;
}
std::cout << std::endl;

// BEGIN EXAMPLES WITH NO EQUIVALENT
// these examples cannot be expressed using subarray + query condition

// query condition does not have functions, here we use coalesce
std::cout << "WHERE coalesce(a, 2) + c < index" << std::endl;
read_array_with_predicates(ctx, {"coalesce(a, 2) + c < index"});
std::cout << std::endl;

// FIXME: this is query-condition-able, use arithmetic
std::cout << "WHERE a > 6 OR a IS NULL" << std::endl;
read_array_with_predicates(ctx, {"a > 6 OR a IS NULL"});
std::cout << std::endl;

return 0;
}
6 changes: 6 additions & 0 deletions scripts/linter.py
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,12 @@ def accept_path(self, file_name: str) -> bool:
path_components = file_name.split(os.sep)
if 'test' in path_components or 'test-support' in path_components:
return False

# the Rust/C++ inter-op using Rust's `cxx` crate can only pass values from
# C++ to Rust using std::unique_ptr
if 'oxidize' in path_components:
return False

return path_components[-1] not in heap_memory_ignored_files


Expand Down
1 change: 1 addition & 0 deletions test/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,7 @@ set(TILEDB_UNIT_TEST_SOURCES
src/unit-ordered-dim-label-reader.cc
src/unit-tile-metadata.cc
src/unit-tile-metadata-generator.cc
src/unit-query-add-predicate.cc
src/unit-query-plan.cc
src/unit-ReadCellSlabIter.cc
src/unit-Reader.cc
Expand Down
Loading
Loading