diff --git a/.github/workflows/linux-ci.yml b/.github/workflows/linux-ci.yml index 51c42058..7d87cacb 100644 --- a/.github/workflows/linux-ci.yml +++ b/.github/workflows/linux-ci.yml @@ -169,7 +169,6 @@ jobs: conan install . --output-folder=build --build=missing -s build_type=RelWithDebInfo -s compiler.cppstd=20 -o "plotjuggler_core/*:with_tests=True" - -o "plotjuggler_core/*:with_parquet_example=False" - name: Save Conan cache to ghcr.io # Only push from the canonical repo on real pushes (forks lack write @@ -208,7 +207,6 @@ jobs: -DCMAKE_BUILD_TYPE=RelWithDebInfo -DCMAKE_C_COMPILER_LAUNCHER=ccache -DCMAKE_CXX_COMPILER_LAUNCHER=ccache - -DPJ_BUILD_PARQUET_IMPORT_EXAMPLE=OFF -DPJ_ENABLE_ABI_CHECK=ON - name: Build diff --git a/.github/workflows/macos-ci.yml b/.github/workflows/macos-ci.yml index 24999752..6865d251 100644 --- a/.github/workflows/macos-ci.yml +++ b/.github/workflows/macos-ci.yml @@ -30,14 +30,12 @@ jobs: conan install . --output-folder=build --build=missing -s build_type=RelWithDebInfo -s compiler.cppstd=20 -o "plotjuggler_core/*:with_tests=True" - -o "plotjuggler_core/*:with_parquet_example=False" - name: Configure run: > cmake -S . -B build -G Ninja -DCMAKE_TOOLCHAIN_FILE=${{ github.workspace }}/build/conan_toolchain.cmake -DCMAKE_BUILD_TYPE=RelWithDebInfo - -DPJ_BUILD_PARQUET_IMPORT_EXAMPLE=OFF - name: Build run: cmake --build build diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index d821f128..ac869d8c 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -261,5 +261,5 @@ jobs: ``` See [README.md](https://github.com/PlotJuggler/plotjuggler_core/blob/main/README.md) - for available components (`base`, `datastore`, `plugin_sdk`, `plugin_host`) + for available components (`base`, `plugin_sdk`, `plugin_host`) and consumer examples. diff --git a/.github/workflows/windows-ci.yml b/.github/workflows/windows-ci.yml index 175376c8..7dfad0ba 100644 --- a/.github/workflows/windows-ci.yml +++ b/.github/workflows/windows-ci.yml @@ -177,7 +177,6 @@ jobs: -DCMAKE_TOOLCHAIN_FILE=${{ github.workspace }}/build/conan_toolchain.cmake -DCMAKE_C_COMPILER_LAUNCHER=sccache -DCMAKE_CXX_COMPILER_LAUNCHER=sccache - -DPJ_BUILD_PARQUET_IMPORT_EXAMPLE=OFF - name: Build run: cmake --build build --config Release diff --git a/CLAUDE.md b/CLAUDE.md index d2ee1da2..44e48695 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -2,10 +2,15 @@ ## Project Overview -PlotJuggler Core — C++20 foundation libraries for PlotJuggler storage, plugin SDKs, and host-side +PlotJuggler Core — C++20 foundation libraries that make up the PlotJuggler plugin SDK and host-side plugin loading. **Read-only submodule** inside PJ4: consumed as-is; changes happen in this repo, not in the PJ4 superproject. This file is the single navigation node for the whole submodule — the -three modules below have no own CLAUDE.md. +two modules below have no own CLAUDE.md. + +> The columnar storage engine (`pj_datastore`) used to live here. It now lives in the PlotJuggler +> application repo as a top-level module: plugins reach storage only through the C ABI defined in +> `pj_base` (the host-side write implementations are not part of the SDK), so the engine does not +> belong in the plugin SDK. ### Modules @@ -15,8 +20,6 @@ three modules below have no own CLAUDE.md. SceneEntities, RobotDescription, CameraInfo, Log, ImageAnnotations, FrameTransforms) and their 14 wire codecs (RobotDescription carries source text as-is — no codec), the C-ABI protocol headers for DataSource/MessageParser/Toolbox + the C++ SDK base classes / host-view helpers built on them. -- **pj_datastore** — columnar storage engine (`DataEngine`) + `ObjectStore` (media/opaque blobs) + - `DerivedEngine` (fmt, tsl::robin_map, nanoarrow). Plugin-data host implementations live here. - **pj_plugins** — host-side loaders + RAII handles + plugin discovery/catalog for four plugin families (DataSource, MessageParser, Dialog, Toolbox), config-envelope helpers, and the **dialog C ABI** (`pj_plugins/dialog_protocol/`). Note the split: the DataSource/MessageParser/Toolbox C-ABI @@ -24,7 +27,6 @@ three modules below have no own CLAUDE.md. ### Dependency graph -- `pj_datastore` → `pj_base` (+ fmt, nanoarrow) - `pj_plugins` → `pj_base` (+ nlohmann/json) ## Read path @@ -53,11 +55,6 @@ documentation check before commit. | `docs/toolbox-porting-gap-analysis.md` | Historical PJ3→PJ4 toolbox SDK gap analysis (most gaps now closed; read as context, not current reference) | | `V4_STORE.md` | ObjectStore plugin ABI: services, ownership rules, lazy fetch | -**Datastore** (`pj_datastore/docs/`): `REQUIREMENTS.md` (data model, ingest contract, schema -evolution, query) · `ARCHITECTURE.md` (domain model, layers, encoding, DerivedEngine) · -`USER_GUIDE.md` (plugin-author write/read patterns, ValueRef, TypedNull) · -`OBJECT_STORE_DESIGN.md` (lazy-fetch blobs, retention). - **Plugin system** (`pj_plugins/docs/`): `REQUIREMENTS.md` (families, capability system, config contract) · `ARCHITECTURE.md` (C ABI protocols, SDK base classes, host loaders, dialog protocol) · `data-source-guide.md` · `message-parser-guide.md` · `dialog-plugin-guide.md` · `toolbox-guide.md`. @@ -123,7 +120,7 @@ or push a release without the user's go-ahead. ## Instructions Glossary - **"Read all documentation"** — read every `.md` in the tree (`find . -name '*.md'`), including - `docs/`, `pj_datastore/docs/`, `pj_plugins/docs/`. + `docs/` and `pj_plugins/docs/`. - **"Update the documentation"** — correct any doc made outdated/inaccurate this session; if a doc disagrees with code, fix the doc to match reality; add info whose absence caused a bug. - **"Check documentation"** — review the docs related to the changed module/API; confirm they still diff --git a/CMakeLists.txt b/CMakeLists.txt index e0318488..15e1860d 100644 --- a/CMakeLists.txt +++ b/CMakeLists.txt @@ -16,13 +16,7 @@ include(PjPluginManifest) option(PJ_ASSERT_THROWS "Use exceptions instead of assert() for PJ_ASSERT" OFF) option(PJ_ENABLE_SANITIZERS "Enable ASAN for Debug builds" OFF) option(PJ_ENABLE_TSAN "Enable ThreadSanitizer for Debug builds" OFF) -option(PJ_INSTALL_SDK "Install plotjuggler_core CMake package (base/datastore/plugin_sdk/plugin_host)" OFF) -option( - PJ_BUILD_PARQUET_IMPORT_EXAMPLE - "Build parquet_import example (requires full Arrow C++ and Parquet)" - ON -) -option(PJ_BUILD_DATASTORE "Build pj_datastore module (requires nanoarrow)" ON) +option(PJ_INSTALL_SDK "Install plotjuggler_core CMake package (base/plugin_sdk/plugin_host)" OFF) option(PJ_BUILD_PORTED_PLUGINS "Build pj_ported_plugins (ported plugins collection)" ON) option(PJ_BUILD_TESTS "Build tests, benchmarks, and examples" ON) option(PJ_ENABLE_ABI_CHECK "Enable abidiff-based ABI drift gate (requires libabigail)" OFF) @@ -87,87 +81,6 @@ target_link_libraries(pj_internal_fmt INTERFACE ${PJ_FMT_TARGET}) # package under the capitalised name `FastFloat`. find_package(FastFloat REQUIRED) -if(PJ_BUILD_DATASTORE) - find_package(tsl-robin-map REQUIRED) - if(PJ_BUILD_TESTS) - find_package(benchmark CONFIG REQUIRED) - endif() - -# --- nanoarrow core --- - -find_package(nanoarrow CONFIG QUIET) - -set(PJ_NANOARROW_TARGET "") -if(TARGET nanoarrow::nanoarrow) - set(PJ_NANOARROW_TARGET nanoarrow::nanoarrow) -elseif(TARGET nanoarrow::nanoarrow_static) - set(PJ_NANOARROW_TARGET nanoarrow::nanoarrow_static) -elseif(TARGET nanoarrow::nanoarrow_shared) - set(PJ_NANOARROW_TARGET nanoarrow::nanoarrow_shared) -else() - find_package(PkgConfig QUIET) - if(PkgConfig_FOUND) - pkg_check_modules(NANOARROW QUIET nanoarrow) - if(NANOARROW_FOUND) - add_library(pj_nanoarrow INTERFACE) - target_include_directories(pj_nanoarrow INTERFACE - ${NANOARROW_INCLUDE_DIRS} - ) - target_link_libraries(pj_nanoarrow INTERFACE ${NANOARROW_LIBRARIES}) - set(PJ_NANOARROW_TARGET pj_nanoarrow) - endif() - endif() -endif() - -if(PJ_NANOARROW_TARGET STREQUAL "") - message(FATAL_ERROR - "nanoarrow is required for pj_datastore. " - "Install nanoarrow and expose a CMake target " - "(nanoarrow::nanoarrow / nanoarrow::nanoarrow_static / nanoarrow::nanoarrow_shared) " - "or provide a pkg-config entry named 'nanoarrow'." - ) -endif() - -# --- nanoarrow IPC --- - -set(PJ_NANOARROW_IPC_TARGET "") -if(TARGET nanoarrow::nanoarrow_ipc) - set(PJ_NANOARROW_IPC_TARGET nanoarrow::nanoarrow_ipc) -elseif(TARGET nanoarrow::nanoarrow_ipc_static) - set(PJ_NANOARROW_IPC_TARGET nanoarrow::nanoarrow_ipc_static) -elseif(TARGET nanoarrow::nanoarrow_ipc_shared) - set(PJ_NANOARROW_IPC_TARGET nanoarrow::nanoarrow_ipc_shared) -else() - if(PkgConfig_FOUND) - pkg_check_modules(NANOARROW_IPC QUIET nanoarrow_ipc) - if(NANOARROW_IPC_FOUND) - add_library(pj_nanoarrow_ipc INTERFACE) - target_include_directories(pj_nanoarrow_ipc INTERFACE - ${NANOARROW_IPC_INCLUDE_DIRS} - ) - target_link_libraries(pj_nanoarrow_ipc INTERFACE ${NANOARROW_IPC_LIBRARIES}) - set(PJ_NANOARROW_IPC_TARGET pj_nanoarrow_ipc) - endif() - endif() -endif() - -if(PJ_NANOARROW_IPC_TARGET STREQUAL "") - message(FATAL_ERROR - "nanoarrow IPC is required for pj_datastore. " - "Install nanoarrow with IPC support and expose a CMake target " - "(nanoarrow::nanoarrow_ipc / nanoarrow::nanoarrow_ipc_static / nanoarrow::nanoarrow_ipc_shared) " - "or provide a pkg-config entry named 'nanoarrow_ipc'." - ) -endif() - -# nanoarrow IPC internally depends on the flatcc runtime (libflatccrt). -# Some package managers (e.g. Conan) bundle it alongside the IPC library -# but don't declare it as a transitive link dependency. Since the IPC target -# already sets INTERFACE_LINK_DIRECTORIES, linking by name suffices. -target_link_libraries(${PJ_NANOARROW_IPC_TARGET} INTERFACE flatccrt) - -endif() # PJ_BUILD_DATASTORE - # --------------------------------------------------------------------------- # Modules # --------------------------------------------------------------------------- @@ -177,9 +90,6 @@ if(PJ_BUILD_TESTS) endif() add_subdirectory(pj_base) -if(PJ_BUILD_DATASTORE) - add_subdirectory(pj_datastore) -endif() add_subdirectory(pj_plugins) if(PJ_BUILD_PORTED_PLUGINS AND EXISTS "${CMAKE_CURRENT_SOURCE_DIR}/pj_ported_plugins/CMakeLists.txt") @@ -193,7 +103,6 @@ endif() # Exported CMake namespace: plotjuggler_core:: # Components: # base — vocabulary types (always available) -# datastore — columnar engine (optional, requires PJ_BUILD_DATASTORE) # plugin_sdk — plugin-author surface: base + dialog SDK + parser SDK # plugin_host — host-side loaders (data_source, message_parser, toolbox, # dialog, catalogs) @@ -202,7 +111,7 @@ endif() if(PJ_INSTALL_SDK) include(CMakePackageConfigHelpers) - set(PJ_PACKAGE_VERSION "0.5.1") + set(PJ_PACKAGE_VERSION "0.6.0") set(PJ_PACKAGE_CMAKE_DIR ${CMAKE_INSTALL_LIBDIR}/cmake/plotjuggler_core) install(EXPORT plotjuggler_coreTargets diff --git a/LICENSE b/LICENSE index 984cc9a9..487ed7f2 100644 --- a/LICENSE +++ b/LICENSE @@ -6,7 +6,6 @@ SPDX-License-Identifier header that is authoritative for that file. pj_base Apache-2.0 LICENSE-APACHE pj_plugins Apache-2.0 LICENSE-APACHE examples Apache-2.0 LICENSE-APACHE - pj_datastore MPL-2.0 LICENSE-MPL Rationale: @@ -15,12 +14,8 @@ Rationale: the SDK without restriction. Apache-2.0 also grants an explicit patent license to downstream users. - - The storage engine (pj_datastore) is MPL-2.0. MPL-2.0 is file-level - (weak) copyleft: modifications to the engine's own source files must be - published, but the engine may be combined with proprietary code and - linked into proprietary applications. - -Plugins load through a stable C ABI and never statically link pj_datastore, -so the MPL-2.0 engine imposes no obligations on plugin authors. +The columnar storage engine (formerly the MPL-2.0 `pj_datastore` module) has +moved to the PlotJuggler application repository; this SDK is now Apache-2.0 +in its entirety. Copyright (c) 2026 Davide Faconti diff --git a/LICENSE-MPL b/LICENSE-MPL deleted file mode 100644 index d0a1fa14..00000000 --- a/LICENSE-MPL +++ /dev/null @@ -1,373 +0,0 @@ -Mozilla Public License Version 2.0 -================================== - -1. Definitions --------------- - -1.1. "Contributor" - means each individual or legal entity that creates, contributes to - the creation of, or owns Covered Software. - -1.2. "Contributor Version" - means the combination of the Contributions of others (if any) used - by a Contributor and that particular Contributor's Contribution. - -1.3. "Contribution" - means Covered Software of a particular Contributor. - -1.4. "Covered Software" - means Source Code Form to which the initial Contributor has attached - the notice in Exhibit A, the Executable Form of such Source Code - Form, and Modifications of such Source Code Form, in each case - including portions thereof. - -1.5. "Incompatible With Secondary Licenses" - means - - (a) that the initial Contributor has attached the notice described - in Exhibit B to the Covered Software; or - - (b) that the Covered Software was made available under the terms of - version 1.1 or earlier of the License, but not also under the - terms of a Secondary License. - -1.6. "Executable Form" - means any form of the work other than Source Code Form. - -1.7. "Larger Work" - means a work that combines Covered Software with other material, in - a separate file or files, that is not Covered Software. - -1.8. "License" - means this document. - -1.9. "Licensable" - means having the right to grant, to the maximum extent possible, - whether at the time of the initial grant or subsequently, any and - all of the rights conveyed by this License. - -1.10. "Modifications" - means any of the following: - - (a) any file in Source Code Form that results from an addition to, - deletion from, or modification of the contents of Covered - Software; or - - (b) any new file in Source Code Form that contains any Covered - Software. - -1.11. "Patent Claims" of a Contributor - means any patent claim(s), including without limitation, method, - process, and apparatus claims, in any patent Licensable by such - Contributor that would be infringed, but for the grant of the - License, by the making, using, selling, offering for sale, having - made, import, or transfer of either its Contributions or its - Contributor Version. - -1.12. "Secondary License" - means either the GNU General Public License, Version 2.0, the GNU - Lesser General Public License, Version 2.1, the GNU Affero General - Public License, Version 3.0, or any later versions of those - licenses. - -1.13. "Source Code Form" - means the form of the work preferred for making modifications. - -1.14. "You" (or "Your") - means an individual or a legal entity exercising rights under this - License. For legal entities, "You" includes any entity that - controls, is controlled by, or is under common control with You. For - purposes of this definition, "control" means (a) the power, direct - or indirect, to cause the direction or management of such entity, - whether by contract or otherwise, or (b) ownership of more than - fifty percent (50%) of the outstanding shares or beneficial - ownership of such entity. - -2. License Grants and Conditions --------------------------------- - -2.1. Grants - -Each Contributor hereby grants You a world-wide, royalty-free, -non-exclusive license: - -(a) under intellectual property rights (other than patent or trademark) - Licensable by such Contributor to use, reproduce, make available, - modify, display, perform, distribute, and otherwise exploit its - Contributions, either on an unmodified basis, with Modifications, or - as part of a Larger Work; and - -(b) under Patent Claims of such Contributor to make, use, sell, offer - for sale, have made, import, and otherwise transfer either its - Contributions or its Contributor Version. - -2.2. Effective Date - -The licenses granted in Section 2.1 with respect to any Contribution -become effective for each Contribution on the date the Contributor first -distributes such Contribution. - -2.3. Limitations on Grant Scope - -The licenses granted in this Section 2 are the only rights granted under -this License. No additional rights or licenses will be implied from the -distribution or licensing of Covered Software under this License. -Notwithstanding Section 2.1(b) above, no patent license is granted by a -Contributor: - -(a) for any code that a Contributor has removed from Covered Software; - or - -(b) for infringements caused by: (i) Your and any other third party's - modifications of Covered Software, or (ii) the combination of its - Contributions with other software (except as part of its Contributor - Version); or - -(c) under Patent Claims infringed by Covered Software in the absence of - its Contributions. - -This License does not grant any rights in the trademarks, service marks, -or logos of any Contributor (except as may be necessary to comply with -the notice requirements in Section 3.4). - -2.4. Subsequent Licenses - -No Contributor makes additional grants as a result of Your choice to -distribute the Covered Software under a subsequent version of this -License (see Section 10.2) or under the terms of a Secondary License (if -permitted under the terms of Section 3.3). - -2.5. Representation - -Each Contributor represents that the Contributor believes its -Contributions are its original creation(s) or it has sufficient rights -to grant the rights to its Contributions conveyed by this License. - -2.6. Fair Use - -This License is not intended to limit any rights You have under -applicable copyright doctrines of fair use, fair dealing, or other -equivalents. - -2.7. Conditions - -Sections 3.1, 3.2, 3.3, and 3.4 are conditions of the licenses granted -in Section 2.1. - -3. Responsibilities -------------------- - -3.1. Distribution of Source Form - -All distribution of Covered Software in Source Code Form, including any -Modifications that You create or to which You contribute, must be under -the terms of this License. You must inform recipients that the Source -Code Form of the Covered Software is governed by the terms of this -License, and how they can obtain a copy of this License. You may not -attempt to alter or restrict the recipients' rights in the Source Code -Form. - -3.2. Distribution of Executable Form - -If You distribute Covered Software in Executable Form then: - -(a) such Covered Software must also be made available in Source Code - Form, as described in Section 3.1, and You must inform recipients of - the Executable Form how they can obtain a copy of such Source Code - Form by reasonable means in a timely manner, at a charge no more - than the cost of distribution to the recipient; and - -(b) You may distribute such Executable Form under the terms of this - License, or sublicense it under different terms, provided that the - license for the Executable Form does not attempt to limit or alter - the recipients' rights in the Source Code Form under this License. - -3.3. Distribution of a Larger Work - -You may create and distribute a Larger Work under terms of Your choice, -provided that You also comply with the requirements of this License for -the Covered Software. If the Larger Work is a combination of Covered -Software with a work governed by one or more Secondary Licenses, and the -Covered Software is not Incompatible With Secondary Licenses, this -License permits You to additionally distribute such Covered Software -under the terms of such Secondary License(s), so that the recipient of -the Larger Work may, at their option, further distribute the Covered -Software under the terms of either this License or such Secondary -License(s). - -3.4. Notices - -You may not remove or alter the substance of any license notices -(including copyright notices, patent notices, disclaimers of warranty, -or limitations of liability) contained within the Source Code Form of -the Covered Software, except that You may alter any license notices to -the extent required to remedy known factual inaccuracies. - -3.5. Application of Additional Terms - -You may choose to offer, and to charge a fee for, warranty, support, -indemnity or liability obligations to one or more recipients of Covered -Software. However, You may do so only on Your own behalf, and not on -behalf of any Contributor. You must make it absolutely clear that any -such warranty, support, indemnity, or liability obligation is offered by -You alone, and You hereby agree to indemnify every Contributor for any -liability incurred by such Contributor as a result of warranty, support, -indemnity or liability terms You offer. You may include additional -disclaimers of warranty and limitations of liability specific to any -jurisdiction. - -4. Inability to Comply Due to Statute or Regulation ---------------------------------------------------- - -If it is impossible for You to comply with any of the terms of this -License with respect to some or all of the Covered Software due to -statute, judicial order, or regulation then You must: (a) comply with -the terms of this License to the maximum extent possible; and (b) -describe the limitations and the code they affect. Such description must -be placed in a text file included with all distributions of the Covered -Software under this License. Except to the extent prohibited by statute -or regulation, such description must be sufficiently detailed for a -recipient of ordinary skill to be able to understand it. - -5. Termination --------------- - -5.1. The rights granted under this License will terminate automatically -if You fail to comply with any of its terms. However, if You become -compliant, then the rights granted under this License from a particular -Contributor are reinstated (a) provisionally, unless and until such -Contributor explicitly and finally terminates Your grants, and (b) on an -ongoing basis, if such Contributor fails to notify You of the -non-compliance by some reasonable means prior to 60 days after You have -come back into compliance. Moreover, Your grants from a particular -Contributor are reinstated on an ongoing basis if such Contributor -notifies You of the non-compliance by some reasonable means, this is the -first time You have received notice of non-compliance with this License -from such Contributor, and You become compliant prior to 30 days after -Your receipt of the notice. - -5.2. If You initiate litigation against any entity by asserting a patent -infringement claim (excluding declaratory judgment actions, -counter-claims, and cross-claims) alleging that a Contributor Version -directly or indirectly infringes any patent, then the rights granted to -You by any and all Contributors for the Covered Software under Section -2.1 of this License shall terminate. - -5.3. In the event of termination under Sections 5.1 or 5.2 above, all -end user license agreements (excluding distributors and resellers) which -have been validly granted by You or Your distributors under this License -prior to termination shall survive termination. - -************************************************************************ -* * -* 6. Disclaimer of Warranty * -* ------------------------- * -* * -* Covered Software is provided under this License on an "as is" * -* basis, without warranty of any kind, either expressed, implied, or * -* statutory, including, without limitation, warranties that the * -* Covered Software is free of defects, merchantable, fit for a * -* particular purpose or non-infringing. The entire risk as to the * -* quality and performance of the Covered Software is with You. * -* Should any Covered Software prove defective in any respect, You * -* (not any Contributor) assume the cost of any necessary servicing, * -* repair, or correction. This disclaimer of warranty constitutes an * -* essential part of this License. No use of any Covered Software is * -* authorized under this License except under this disclaimer. * -* * -************************************************************************ - -************************************************************************ -* * -* 7. Limitation of Liability * -* -------------------------- * -* * -* Under no circumstances and under no legal theory, whether tort * -* (including negligence), contract, or otherwise, shall any * -* Contributor, or anyone who distributes Covered Software as * -* permitted above, be liable to You for any direct, indirect, * -* special, incidental, or consequential damages of any character * -* including, without limitation, damages for lost profits, loss of * -* goodwill, work stoppage, computer failure or malfunction, or any * -* and all other commercial damages or losses, even if such party * -* shall have been informed of the possibility of such damages. This * -* limitation of liability shall not apply to liability for death or * -* personal injury resulting from such party's negligence to the * -* extent applicable law prohibits such limitation. Some * -* jurisdictions do not allow the exclusion or limitation of * -* incidental or consequential damages, so this exclusion and * -* limitation may not apply to You. * -* * -************************************************************************ - -8. Litigation -------------- - -Any litigation relating to this License may be brought only in the -courts of a jurisdiction where the defendant maintains its principal -place of business and such litigation shall be governed by laws of that -jurisdiction, without reference to its conflict-of-law provisions. -Nothing in this Section shall prevent a party's ability to bring -cross-claims or counter-claims. - -9. Miscellaneous ----------------- - -This License represents the complete agreement concerning the subject -matter hereof. If any provision of this License is held to be -unenforceable, such provision shall be reformed only to the extent -necessary to make it enforceable. Any law or regulation which provides -that the language of a contract shall be construed against the drafter -shall not be used to construe this License against a Contributor. - -10. Versions of the License ---------------------------- - -10.1. New Versions - -Mozilla Foundation is the license steward. Except as provided in Section -10.3, no one other than the license steward has the right to modify or -publish new versions of this License. Each version will be given a -distinguishing version number. - -10.2. Effect of New Versions - -You may distribute the Covered Software under the terms of the version -of the License under which You originally received the Covered Software, -or under the terms of any subsequent version published by the license -steward. - -10.3. Modified Versions - -If you create software not governed by this License, and you want to -create a new license for such software, you may create and use a -modified version of this License if you rename the license and remove -any references to the name of the license steward (except to note that -such modified license differs from this License). - -10.4. Distributing Source Code Form that is Incompatible With Secondary -Licenses - -If You choose to distribute Source Code Form that is Incompatible With -Secondary Licenses under the terms of this version of the License, the -notice described in Exhibit B of this License must be attached. - -Exhibit A - Source Code Form License Notice -------------------------------------------- - - This Source Code Form is subject to the terms of the Mozilla Public - License, v. 2.0. If a copy of the MPL was not distributed with this - file, You can obtain one at https://mozilla.org/MPL/2.0/. - -If it is not possible or desirable to put the notice in a particular -file, then You may include the notice in a location (such as a LICENSE -file in a relevant directory) where a recipient would be likely to look -for such a notice. - -You may add additional accurate notices of copyright ownership. - -Exhibit B - "Incompatible With Secondary Licenses" Notice ---------------------------------------------------------- - - This Source Code Form is "Incompatible With Secondary Licenses", as - defined by the Mozilla Public License, v. 2.0. diff --git a/README.md b/README.md index 7c097f29..a33e0c15 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,6 @@ C++20 foundation libraries for [PlotJuggler](https://github.com/facontidavide/Pl | Module | Description | Dependencies | License | |--------|-------------|--------------|---------| | **pj_base** | Vocabulary types: `Timestamp`, `DatasetId`, `TopicId`, type trees, `Expected`, `Span` | None | Apache-2.0 | -| **pj_datastore** | Columnar in-memory storage engine + `ObjectStore` (for media blobs) + `DerivedEngine`; typed schemas, chunk-based encoding, range/latest-at queries, derived transform DAG, Arrow IPC import | pj_base, fmt, tsl::robin_map, nanoarrow | MPL-2.0 | | **pj_plugins** | C-ABI plugin protocol (DataSource, MessageParser, Dialog, Toolbox families), C++ SDK base classes, plugin discovery, host-side loaders, and config helpers | pj_base, nlohmann/json | Apache-2.0 | ## Getting Started @@ -42,25 +41,17 @@ cd plotjuggler_core ``` pj_base/ Vocabulary types (zero deps) -pj_datastore/ Columnar engine + ObjectStore + DerivedEngine pj_plugins/ C-ABI plugin protocol, SDK, host loaders docs/ Project-wide design guides ``` ## License -PlotJuggler Core is licensed per-module; each source file carries an -authoritative `SPDX-License-Identifier` header. - -- **pj_base** and **pj_plugins** — the plugin-facing SDK — are **Apache-2.0** - ([LICENSE-APACHE](LICENSE-APACHE)). You may build **proprietary plugins and - applications** on the SDK without restriction; Apache-2.0 also grants an - explicit patent license. -- **pj_datastore** — the storage engine — is **MPL-2.0** - ([LICENSE-MPL](LICENSE-MPL)). MPL-2.0 is file-level (weak) copyleft: - modifications to the engine's own files must be published, but it may be - linked into proprietary software. - -Plugins load through a stable C ABI and never statically link `pj_datastore`, -so the MPL-2.0 engine imposes no obligations on plugin authors. See -[LICENSE](LICENSE) for the full mapping. +PlotJuggler Core is **Apache-2.0** ([LICENSE-APACHE](LICENSE-APACHE)); each +source file carries an authoritative `SPDX-License-Identifier` header. You may +build **proprietary plugins and applications** on the SDK without restriction; +Apache-2.0 also grants an explicit patent license. + +The columnar storage engine (formerly the MPL-2.0 `pj_datastore` module) has +moved to the PlotJuggler application repo; plugins load through a stable C ABI +and never link it. See [LICENSE](LICENSE) for the full mapping. diff --git a/V4_STORE.md b/V4_STORE.md index 9a27873a..00a30a77 100644 --- a/V4_STORE.md +++ b/V4_STORE.md @@ -34,8 +34,8 @@ object topics. The source-scoped write host supports: - `setRetentionBudget(topic, time_window_ns, max_memory_bytes)`: configure automatic eviction for a topic. -The host-side implementation is `DatastoreSourceObjectWriteHost` in -`pj_datastore/include/pj_datastore/plugin_data_host.hpp`. +The host-side implementation is `DatastoreSourceObjectWriteHost` (in the +`pj_datastore` module of the PlotJuggler application repo, not part of this SDK). ## Parser Object Writes @@ -93,11 +93,11 @@ interpretation above the core store. ## Tests -The ObjectStore ABI surface is covered by datastore tests: +The ObjectStore ABI surface is covered by tests in the `pj_datastore` module of +the PlotJuggler application repo: -- `pj_datastore/tests/plugin_data_host_object_test.cpp` -- `pj_datastore/tests/plugin_data_host_object_read_test.cpp` -- `pj_datastore/tests/plugin_parser_object_write_test.cpp` +- `tests/plugin_data_host_object_test.cpp` +- `tests/plugin_data_host_object_read_test.cpp` +- `tests/plugin_parser_object_write_test.cpp` -The underlying store behavior is covered by -`pj_datastore/tests/object_store_test.cpp`. +The underlying store behavior is covered by `tests/object_store_test.cpp`. diff --git a/build.sh b/build.sh index 19411bd7..3b294991 100755 --- a/build.sh +++ b/build.sh @@ -70,7 +70,6 @@ build_config() { conan install "$SCRIPT_DIR" --output-folder="$build_dir" --build=missing \ -s build_type="$build_type" -s compiler.cppstd=20 \ -o "plotjuggler_core/*:with_tests=True" \ - -o "plotjuggler_core/*:with_parquet_example=True" \ "${conan_extra[@]+"${conan_extra[@]}"}" # Install dependencies from subdirectory conanfiles (e.g. pj_ported_plugins) diff --git a/cmake/plotjuggler_coreConfig.cmake.in b/cmake/plotjuggler_coreConfig.cmake.in index c31d368e..fbb20fee 100644 --- a/cmake/plotjuggler_coreConfig.cmake.in +++ b/cmake/plotjuggler_coreConfig.cmake.in @@ -16,11 +16,6 @@ foreach(_comp ${plotjuggler_core_FIND_COMPONENTS}) if(_comp STREQUAL "base") set(plotjuggler_core_base_FOUND TRUE) - elseif(_comp STREQUAL "datastore") - find_dependency(tsl-robin-map) - find_dependency(nanoarrow) - set(plotjuggler_core_datastore_FOUND TRUE) - elseif(_comp STREQUAL "plugin_sdk") find_dependency(nlohmann_json) # Ship the cmake/PjPluginManifest.cmake helper so plugin authors can call diff --git a/conanfile.py b/conanfile.py index 68913376..dc949e8e 100644 --- a/conanfile.py +++ b/conanfile.py @@ -1,13 +1,12 @@ """Conan 2 recipe for plotjuggler_core. -Exposes four CMake components under the `plotjuggler_core::` namespace: +Exposes three CMake components under the `plotjuggler_core::` namespace: base — pj_base, vocabulary types (always available) - datastore — pj_datastore, columnar engine (option: with_datastore) plugin_sdk — umbrella for plugin authors (base + dialog SDK + parser SDK) plugin_host — umbrella for host loaders (data_source/parser/toolbox/dialog) -A consuming Conan recipe declares e.g. `plotjuggler_core/0.5.1` and then: +A consuming Conan recipe declares e.g. `plotjuggler_core/0.6.0` and then: find_package(plotjuggler_core REQUIRED COMPONENTS plugin_sdk) target_link_libraries(my_plugin PRIVATE plotjuggler_core::plugin_sdk) @@ -15,8 +14,12 @@ The `plugin_sdk` component also ships `PjPluginManifest.cmake`, so authors can call `pj_emit_plugin_manifest()` without copying the helper into their tree. +The columnar storage engine (formerly the `datastore` component) is no longer +part of this SDK package — it now lives in the PlotJuggler application repo, +since plugins reach it only through the C ABI, never by linking it. + Local development (build.sh) uses this same recipe with `with_tests=True` so -gtest/benchmark/arrow are resolved as test_requires. +gtest is resolved as a test_require. """ from conan import ConanFile @@ -27,54 +30,37 @@ class PlotjugglerCoreConan(ConanFile): name = "plotjuggler_core" - version = "0.5.1" - # Apache-2.0 covers pj_base + pj_plugins (the plugin-facing SDK); - # MPL-2.0 covers pj_datastore (the storage engine). See LICENSE. - license = "Apache-2.0 AND MPL-2.0" + version = "0.6.0" + # Apache-2.0 covers the whole SDK (pj_base + pj_plugins). See LICENSE. + license = "Apache-2.0" url = "https://github.com/PlotJuggler/plotjuggler_core" - description = "C++20 foundation libraries for PlotJuggler: storage engine, plugin SDK, plugin host loaders." + description = "C++20 foundation libraries for PlotJuggler: plugin SDK and plugin host loaders." topics = ("plotjuggler", "plugin-sdk", "telemetry", "data-visualization") package_type = "static-library" settings = "os", "compiler", "build_type", "arch" options = { "fPIC": [True, False], - "with_datastore": [True, False], "with_host": [True, False], "with_tests": [True, False], - "with_parquet_example": [True, False], "assert_throws": [True, False], } default_options = { "fPIC": True, - "with_datastore": True, "with_host": True, "with_tests": False, - "with_parquet_example": False, "assert_throws": False, - # Arrow build flags (only resolved when with_parquet_example=True). - "arrow/*:parquet": True, - "arrow/*:with_snappy": True, - # pj_datastore's Arrow IPC import path needs nanoarrow_ipc + flatcc. - "nanoarrow/*:with_ipc": True, # fmt is an implementation detail. Compile it header-only so the # static archives do not export a downstream fmt link dependency. "fmt/*:header_only": True, - # Boost is pulled in transitively by arrow. without_cobalt avoids a - # known upstream packaging error in recent boost recipes; without_test - # trims unneeded modules. - "boost/*:without_test": True, - "boost/*:without_cobalt": True, } exports_sources = ( "CMakeLists.txt", "LICENSE", "LICENSE-APACHE", - "LICENSE-MPL", "cmake/*", "pj_base/*", - "pj_datastore/*", "pj_plugins/*", "examples/*", ) @@ -110,20 +96,10 @@ def requirements(self): transitive_libs=False, ) - if self.options.with_datastore: - # tsl-robin-map is header-only; nanoarrow is in public headers. - self.requires("tsl-robin-map/1.4.0", transitive_headers=True) - self.requires("nanoarrow/0.7.0", transitive_headers=True) - def build_requirements(self): - # Tests + benchmarks + parquet example are local-dev only. Consumers - # of plotjuggler_core never see these. + # Tests are local-dev only. Consumers of plotjuggler_core never see this. if self.options.with_tests: self.test_requires("gtest/1.17.0") - if self.options.with_datastore: - self.test_requires("benchmark/1.9.4") - if self.options.with_parquet_example: - self.test_requires("arrow/23.0.1") def generate(self): deps = CMakeDeps(self) @@ -131,12 +107,8 @@ def generate(self): tc = CMakeToolchain(self) tc.cache_variables["PJ_INSTALL_SDK"] = True - tc.cache_variables["PJ_BUILD_DATASTORE"] = bool(self.options.with_datastore) tc.cache_variables["PJ_BUILD_TESTS"] = bool(self.options.with_tests) tc.cache_variables["PJ_BUILD_PORTED_PLUGINS"] = False - tc.cache_variables["PJ_BUILD_PARQUET_IMPORT_EXAMPLE"] = bool( - self.options.with_parquet_example - ) tc.cache_variables["PJ_ASSERT_THROWS"] = bool(self.options.assert_throws) tc.generate() @@ -148,8 +120,8 @@ def build(self): def package(self): cmake = CMake(self) cmake.install() - # Ships LICENSE (the per-module map) plus the full Apache-2.0 and - # MPL-2.0 texts (LICENSE-APACHE, LICENSE-MPL). + # Ships LICENSE (the per-module map) plus the full Apache-2.0 text + # (LICENSE-APACHE). copy( self, "LICENSE*", @@ -159,7 +131,7 @@ def package(self): def package_info(self): self.cpp_info.set_property("cmake_file_name", "plotjuggler_core") - # No top-level umbrella target: the four components have + # No top-level umbrella target: the three components have # mutually-exclusive audiences. Consumers must request a component. # Conan 2's CMakeDeps only aggregates cmake_build_modules declared at @@ -178,18 +150,6 @@ def package_info(self): base.libs = ["pj_base"] base.includedirs = ["include"] - # --- datastore (optional) --- - if self.options.with_datastore: - ds = self.cpp_info.components["datastore"] - ds.set_property("cmake_target_name", "plotjuggler_core::datastore") - ds.libs = ["pj_datastore"] - ds.includedirs = ["include"] - ds.requires = [ - "base", - "tsl-robin-map::tsl-robin-map", - "nanoarrow::nanoarrow", - ] - # --- plugin_sdk (umbrella INTERFACE: pj_base + pj_dialog_sdk) --- sdk = self.cpp_info.components["plugin_sdk"] sdk.set_property("cmake_target_name", "plotjuggler_core::plugin_sdk") diff --git a/pj_datastore/CLAUDE.md b/pj_datastore/CLAUDE.md deleted file mode 100644 index cc2346af..00000000 --- a/pj_datastore/CLAUDE.md +++ /dev/null @@ -1,31 +0,0 @@ -# pj_datastore — columnar time-series storage engine - -Level-0 foundation library (in the `plotjuggler_core` submodule). Owns the in-memory columnar store that every plugin writes to and every consumer reads from: datasets/topics/chunks/columns, adaptive per-column encoding, schema evolution, derived-series DAG (`DerivedEngine`), the opaque-blob `ObjectStore`, and the host-side C-ABI bridges that translate `pj_plugins` calls into engine operations. Pure C++20 (fmt, tsl::robin_map, nanoarrow); **no Qt, no `pj_plugins` dependency** — `pj_datastore → pj_base` only. It does NOT decode media, choose renderers, own UI/time-display policy, or know about plugin discovery (that is `pj_plugins`). Timestamps are absolute int64 nanoseconds; do not subtract a display base here. - -## Layout -- `include/pj_datastore/` — public headers (engine/writer/reader/query/chunk, `object_store`, `derived_engine` + `builtin_transforms`, `plugin_data_host`, `colormap_registry`, `arrow_import`, low-level buffer/column_buffer/encoding/topic_storage/type_registry). -- `src/` — implementations, one `.cpp` per header. -- `tests/` — one GTest binary per layer (see `CMakeLists.txt` for the live set; several v3-ABI tests are commented out pending Phase 1b). -- `benchmarks/` — `read_benchmark`, `ingest_benchmark`. -- `examples/` — `parquet_import` (gated by `PJ_BUILD_PARQUET_IMPORT_EXAMPLE`). -- `docs/` — see table below. - -## Gotchas -- **`readNumericAsDouble()` does not null-check** — returns 0.0 at nulls. Use `isNull()` first, or batch via `readColumnAsDoubles()` which writes NaN at nulls. See `docs/USER_GUIDE.md §5`. -- **Columns can appear mid-stream**: a new field after rows exist seals the current chunk; earlier chunks have fewer columns. Always bounds-check `col_index < chunk->columns.size()`. See `docs/USER_GUIDE.md §6` / `docs/REQUIREMENTS.md §4.5`. -- **`readString()` returns a `string_view` into chunk dictionary memory** — must not outlive the chunk. -- **Transforms have a strict sequential contract**: `calculate()` is called in ascending timestamp order; state persists across chunks and is cleared only by `reset()` before a batch recompute. See `include/pj_datastore/derived_engine.hpp`. -- **`ObjectStore` is independent storage** alongside `DataEngine`, with its own mutex-per-series threading and lazy/owned payloads — it is NOT covered by `ARCHITECTURE.md`; read `docs/OBJECT_STORE_DESIGN.md`. - -## Read deeper -| For | Read | -|---|---| -| What it must do / data model / schema-evolution contract | `docs/REQUIREMENTS.md` | -| How the scalar engine works (domain model, layers, encoding, DerivedEngine, data flow) | `docs/ARCHITECTURE.md` | -| Plugin-author write/read patterns, ValueRef, pitfalls | `docs/USER_GUIDE.md` | -| Opaque timestamped blob storage (lazy/owned, retention, ABI bridge) | `docs/OBJECT_STORE_DESIGN.md` | -| Engine entry point + commit/flush cycle | `include/pj_datastore/engine.hpp` | -| Write / read facades | `include/pj_datastore/writer.hpp`, `reader.hpp` | -| Series / range / latest-at queries | `include/pj_datastore/query.hpp` | -| Transform interfaces + built-ins | `include/pj_datastore/derived_engine.hpp`, `builtin_transforms.hpp` | -| C-ABI host bridges (source/parser/toolbox, object surfaces) | `include/pj_datastore/plugin_data_host.hpp`, `docs/OBJECT_STORE_DESIGN.md` | diff --git a/pj_datastore/CMakeLists.txt b/pj_datastore/CMakeLists.txt deleted file mode 100644 index cea6bde1..00000000 --- a/pj_datastore/CMakeLists.txt +++ /dev/null @@ -1,156 +0,0 @@ -# --------------------------------------------------------------------------- -# pj_datastore — storage engine, depends on pj_base + fmt + nanoarrow -# --------------------------------------------------------------------------- - -add_library(pj_datastore STATIC - src/buffer.cpp - src/column_buffer.cpp - src/encoding.cpp - src/chunk.cpp - src/topic_storage.cpp - src/type_registry.cpp - src/query.cpp - src/writer.cpp - src/reader.cpp - src/engine.cpp - src/arrow_import.cpp - src/derived_engine.cpp - src/builtin_transforms.cpp - src/plugin_data_host.cpp - src/colormap_registry.cpp - src/colormap_registry_host.cpp - src/object_store.cpp -) -target_include_directories(pj_datastore PUBLIC - $ - $ -) -target_compile_features(pj_datastore PUBLIC cxx_std_20) -target_compile_options(pj_datastore PRIVATE - ${PJ_WARNING_FLAGS} ${PJ_SANITIZER_FLAGS} -) -set_target_properties(pj_datastore PROPERTIES - POSITION_INDEPENDENT_CODE ON - EXPORT_NAME datastore -) -add_library(plotjuggler_core::datastore ALIAS pj_datastore) -if(PJ_ASSERT_THROWS) - target_compile_definitions(pj_datastore PUBLIC PJ_ASSERT_THROWS) -endif() -target_link_libraries(pj_datastore - PUBLIC - pj_base - PRIVATE - tsl::robin_map - $ - ${PJ_NANOARROW_TARGET} - ${PJ_NANOARROW_IPC_TARGET} -) - -# --------------------------------------------------------------------------- -# Install (guarded by PJ_INSTALL_SDK in root CMakeLists.txt) -# --------------------------------------------------------------------------- - -if(PJ_INSTALL_SDK) - install(TARGETS pj_datastore EXPORT plotjuggler_coreTargets - ARCHIVE DESTINATION ${CMAKE_INSTALL_LIBDIR} - LIBRARY DESTINATION ${CMAKE_INSTALL_LIBDIR} - ) - install(DIRECTORY include/ DESTINATION ${CMAKE_INSTALL_INCLUDEDIR}) -endif() - -# --------------------------------------------------------------------------- -# Tests -# --------------------------------------------------------------------------- - -if(PJ_BUILD_TESTS) - set(PJ_DATASTORE_TESTS - tests/type_registry_test.cpp - tests/buffer_test.cpp - tests/column_buffer_test.cpp - tests/encoding_test.cpp - tests/chunk_test.cpp - tests/topic_storage_test.cpp - tests/query_test.cpp - tests/series_reader_test.cpp - tests/engine_integration_test.cpp - tests/derived_engine_test.cpp - tests/array_expansion_test.cpp - tests/regression_test.cpp - tests/object_store_test.cpp - tests/plugin_data_host_object_test.cpp - tests/plugin_data_host_object_read_test.cpp - # tests/plugin_host_read_test.cpp # disabled until Phase 1b lands - # (exercises v3 toolbox read path; rewrite for read_series_arrow) - ) - - foreach(test_src ${PJ_DATASTORE_TESTS}) - get_filename_component(test_name ${test_src} NAME_WE) - add_executable(${test_name} ${test_src}) - target_link_libraries(${test_name} PRIVATE pj_datastore GTest::gtest_main) - add_test(NAME ${test_name} COMMAND ${test_name}) - endforeach() - - # plugin_parser_object_write_test uses MessageParserPluginBase, so it needs - # the plugin SDK for message_parser_plugin_base.hpp and builtin object types. - add_executable(plugin_parser_object_write_test tests/plugin_parser_object_write_test.cpp) - target_link_libraries(plugin_parser_object_write_test PRIVATE - pj_datastore pj_plugin_sdk GTest::gtest_main - ) - add_test(NAME plugin_parser_object_write_test COMMAND plugin_parser_object_write_test) - - # Arrow import test (needs nanoarrow explicitly for building test IPC data) - add_executable(arrow_import_test tests/arrow_import_test.cpp) - target_link_libraries(arrow_import_test PRIVATE - pj_datastore - ${PJ_NANOARROW_TARGET} - ${PJ_NANOARROW_IPC_TARGET} - GTest::gtest_main) - add_test(NAME arrow_import_test COMMAND arrow_import_test) - - # v4 Arrow C Data Interface round-trip test (Phase 1b). - add_executable(arrow_stream_round_trip_test tests/arrow_stream_round_trip_test.cpp) - target_link_libraries(arrow_stream_round_trip_test PRIVATE - pj_datastore - ${PJ_NANOARROW_TARGET} - GTest::gtest_main) - add_test(NAME arrow_stream_round_trip_test COMMAND arrow_stream_round_trip_test) - - # Plugin host write test — DISABLED for v4 ABI migration. - # Exercises v3 appendArrowIpc/readSeries; rewrite in Phase 1b when the - # Arrow-stream write path and read_series_arrow are implemented. - # add_executable(plugin_host_write_test tests/plugin_host_write_test.cpp) - # target_link_libraries(plugin_host_write_test PRIVATE - # pj_datastore - # ${PJ_NANOARROW_TARGET} - # ${PJ_NANOARROW_IPC_TARGET} - # GTest::gtest_main) - # add_test(NAME plugin_host_write_test COMMAND plugin_host_write_test) - - # --------------------------------------------------------------------------- - # Benchmarks - # --------------------------------------------------------------------------- - - add_executable(read_benchmark benchmarks/read_benchmark.cpp) - target_link_libraries(read_benchmark PRIVATE pj_datastore benchmark::benchmark) - - add_executable(ingest_benchmark benchmarks/ingest_benchmark.cpp) - target_link_libraries(ingest_benchmark PRIVATE pj_datastore benchmark::benchmark) -endif() # PJ_BUILD_TESTS - -# --------------------------------------------------------------------------- -# Examples -# --------------------------------------------------------------------------- - -if(PJ_BUILD_PARQUET_IMPORT_EXAMPLE) - if(NOT TARGET Arrow::arrow_static) - find_package(Arrow REQUIRED) - endif() - if(NOT TARGET Parquet::parquet_static) - find_package(Parquet REQUIRED) - endif() - add_executable(parquet_import examples/parquet_import.cpp) - target_link_libraries(parquet_import PRIVATE - pj_datastore Arrow::arrow_static Parquet::parquet_static - ) -endif() diff --git a/pj_datastore/benchmarks/ingest_benchmark.cpp b/pj_datastore/benchmarks/ingest_benchmark.cpp deleted file mode 100644 index 8704d9a9..00000000 --- a/pj_datastore/benchmarks/ingest_benchmark.cpp +++ /dev/null @@ -1,262 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include - -#include "benchmark/benchmark.h" -#include "pj_base/dataset.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -constexpr int kRowCount = 100'000; -constexpr uint32_t kChunkSize = 16'384; - -// --------------------------------------------------------------------------- -// Helpers -// --------------------------------------------------------------------------- - -ColumnDescriptor make_descriptor(PrimitiveType type, std::string path) { - return ColumnDescriptor{/*field_id=*/0, type, std::move(path)}; -} - -// Pre-generate test data arrays (allocated once, reused across iterations). -struct TestData { - std::vector timestamps; - std::vector floats; - std::vector doubles; - std::vector int32s; - std::vector int64s; - - TestData() { - timestamps.resize(kRowCount); - floats.resize(kRowCount); - doubles.resize(kRowCount); - int32s.resize(kRowCount); - int64s.resize(kRowCount); - for (int i = 0; i < kRowCount; ++i) { - timestamps[static_cast(i)] = static_cast(i); - floats[static_cast(i)] = static_cast(i) * 0.1f; - doubles[static_cast(i)] = static_cast(i) * 0.1; - int32s[static_cast(i)] = i % 100; - int64s[static_cast(i)] = static_cast(i); - } - } -}; - -static const TestData& get_test_data() { - static TestData data; - return data; -} - -// =========================================================================== -// TopicChunkBuilder: row-at-a-time vs bulk (single column) -// =========================================================================== - -void BM_Builder_RowAtATime_Float32(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols = {make_descriptor(PrimitiveType::kFloat32, "value")}; - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - for (int i = 0; i < kRowCount; ++i) { - builder.beginRow(data.timestamps[static_cast(i)]); - builder.set(0, data.floats[static_cast(i)]); - builder.finishRow(); - } - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -void BM_Builder_Bulk_Float32(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols = {make_descriptor(PrimitiveType::kFloat32, "value")}; - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - builder.appendTimestamps(data.timestamps); - builder.appendColumn(0, data.floats); - builder.finishBulkAppend(); - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -BENCHMARK(BM_Builder_RowAtATime_Float32); -BENCHMARK(BM_Builder_Bulk_Float32); - -// =========================================================================== -// TopicChunkBuilder: row-at-a-time vs bulk (int64) -// =========================================================================== - -void BM_Builder_RowAtATime_Int64(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols = {make_descriptor(PrimitiveType::kInt64, "value")}; - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - for (int i = 0; i < kRowCount; ++i) { - builder.beginRow(data.timestamps[static_cast(i)]); - builder.set(0, data.int64s[static_cast(i)]); - builder.finishRow(); - } - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -void BM_Builder_Bulk_Int64(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols = {make_descriptor(PrimitiveType::kInt64, "value")}; - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - builder.appendTimestamps(data.timestamps); - builder.appendColumn(0, data.int64s); - builder.finishBulkAppend(); - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -BENCHMARK(BM_Builder_RowAtATime_Int64); -BENCHMARK(BM_Builder_Bulk_Int64); - -// =========================================================================== -// Multi-column: row-at-a-time vs bulk (10 float32 columns) -// =========================================================================== - -constexpr int kMultiColCount = 10; - -void BM_Builder_RowAtATime_MultiCol(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols; - for (int c = 0; c < kMultiColCount; ++c) { - cols.push_back(make_descriptor(PrimitiveType::kFloat32, "col_" + std::to_string(c))); - } - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - for (int i = 0; i < kRowCount; ++i) { - builder.beginRow(data.timestamps[static_cast(i)]); - for (int c = 0; c < kMultiColCount; ++c) { - builder.set(static_cast(c), data.floats[static_cast(i)]); - } - builder.finishRow(); - } - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount * kMultiColCount); -} - -void BM_Builder_Bulk_MultiCol(benchmark::State& state) { - const auto& data = get_test_data(); - std::vector cols; - for (int c = 0; c < kMultiColCount; ++c) { - cols.push_back(make_descriptor(PrimitiveType::kFloat32, "col_" + std::to_string(c))); - } - - for (auto _ : state) { - TopicChunkBuilder builder(1, 1, cols, kRowCount); - builder.appendTimestamps(data.timestamps); - for (int c = 0; c < kMultiColCount; ++c) { - builder.appendColumn(static_cast(c), data.floats); - } - builder.finishBulkAppend(); - auto chunk = builder.seal(); - benchmark::DoNotOptimize(chunk); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount * kMultiColCount); -} - -BENCHMARK(BM_Builder_RowAtATime_MultiCol); -BENCHMARK(BM_Builder_Bulk_MultiCol); - -// =========================================================================== -// DataWriter: row-at-a-time vs appendColumns (end-to-end through engine) -// =========================================================================== - -void BM_Writer_RowAtATime_Float32(benchmark::State& state) { - const auto& data = get_test_data(); - - // Build type tree once outside the loop (structure is cheap, reusable) - auto value_field = makePrimitive("value", PrimitiveType::kFloat32); - auto root = makeStruct("data", {value_field}); - - for (auto _ : state) { - DataEngine engine; - auto ds_id = *engine.createDataset(DatasetDescriptor{.source_name = "bench", .time_domain_id = 0}); - auto writer = engine.createWriter(); - auto schema_id = *writer.registerSchema("bench_schema", root); - TopicDescriptor desc; - desc.name = "bench_topic"; - desc.schema_id = schema_id; - auto topic_id = *writer.registerTopic(ds_id, desc); - - for (int i = 0; i < kRowCount; ++i) { - (void)writer.beginRow(topic_id, data.timestamps[static_cast(i)]); - writer.set(topic_id, 0, data.floats[static_cast(i)]); - (void)writer.finishRow(topic_id); - } - auto chunks = writer.flush(topic_id); - benchmark::DoNotOptimize(chunks); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -void BM_Writer_AppendColumns_Float32(benchmark::State& state) { - const auto& data = get_test_data(); - - auto value_field = makePrimitive("value", PrimitiveType::kFloat32); - auto root = makeStruct("data", {value_field}); - - for (auto _ : state) { - DataEngine engine; - auto ds_id = *engine.createDataset(DatasetDescriptor{.source_name = "bench", .time_domain_id = 0}); - auto writer = engine.createWriter(); - auto schema_id = *writer.registerSchema("bench_schema", root); - TopicDescriptor desc; - desc.name = "bench_topic"; - desc.schema_id = schema_id; - auto topic_id = *writer.registerTopic(ds_id, desc); - - std::vector columns = {ColumnData::Float32(0, data.floats)}; - (void)writer.appendColumns(topic_id, data.timestamps, columns); - auto chunks = writer.flush(topic_id); - benchmark::DoNotOptimize(chunks); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kRowCount); -} - -BENCHMARK(BM_Writer_RowAtATime_Float32); -BENCHMARK(BM_Writer_AppendColumns_Float32); - -} // namespace -} // namespace PJ - -BENCHMARK_MAIN(); diff --git a/pj_datastore/benchmarks/read_benchmark.cpp b/pj_datastore/benchmarks/read_benchmark.cpp deleted file mode 100644 index 510b3aab..00000000 --- a/pj_datastore/benchmarks/read_benchmark.cpp +++ /dev/null @@ -1,518 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include -#include - -#include "benchmark/benchmark.h" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/query.hpp" - -namespace PJ { -namespace { - -constexpr int kPointCount = 100'000; -constexpr uint32_t kChunkSize = 1024; - -// --------------------------------------------------------------------------- -// Helpers -// --------------------------------------------------------------------------- - -ColumnDescriptor make_descriptor(PrimitiveType type, std::string path) { - return ColumnDescriptor{/*field_id=*/0, type, std::move(path)}; -} - -// Build a single sealed TopicChunk with kPointCount rows and one column. -template -TopicChunk build_typed_chunk(PrimitiveType type, SetFn set_fn) { - std::vector cols = {make_descriptor(type, "value")}; - TopicChunkBuilder builder(/*topic_id=*/1, /*schema_id=*/1, cols, kPointCount); - - for (int i = 0; i < kPointCount; ++i) { - builder.beginRow(static_cast(i)); - set_fn(builder, 0, i); - builder.finishRow(); - } - return builder.seal(); -} - -// Build a deque of sealed chunks (kChunkSize rows each) for cursor tests. -template -std::deque build_chunked_deque(PrimitiveType type, SetFn set_fn) { - std::deque chunks; - std::vector cols = {make_descriptor(type, "value")}; - - TopicChunkBuilder* builder = nullptr; - std::unique_ptr owned; - - for (int i = 0; i < kPointCount; ++i) { - if (!builder || builder->isFull()) { - if (builder) { - chunks.push_back(builder->seal()); - } - owned = std::make_unique( - /*topic_id=*/1, /*schema_id=*/1, cols, kChunkSize); - builder = owned.get(); - } - builder->beginRow(static_cast(i)); - set_fn(*builder, 0, i); - builder->finishRow(); - } - if (builder && builder->rowCount() > 0) { - chunks.push_back(builder->seal()); - } - return chunks; -} - -template -std::deque> BuildDequeData(F value_fn) { - std::deque> data; - for (int i = 0; i < kPointCount; ++i) { - data.emplace_back(static_cast(i), value_fn(i)); - } - return data; -} - -// Helper to compute encoded bytes for a chunk column, accounting for new encodings -double encoded_bytes_per_row(const TopicChunk& chunk, std::size_t col) { - switch (chunk.columnEncoding(col)) { - case EncodingType::kConstant: { - const auto& enc = std::get(chunk.columns[col].data); - return static_cast(enc.value_size) / chunk.stats.row_count; - } - case EncodingType::kFrameOfReference: { - const auto& enc = std::get(chunk.columns[col].data); - return static_cast(enc.offsets.size()) / chunk.stats.row_count; - } - case EncodingType::kDictionary: { - const auto& dict = std::get(chunk.columns[col].data); - std::size_t dict_bytes = 0; - for (const auto& s : dict.dictionary) { - dict_bytes += s.size(); - } - return static_cast(dict.indices.size() + dict_bytes) / chunk.stats.row_count; - } - default: - return static_cast(std::get(chunk.columns[col].data).size()) / chunk.stats.row_count; - } -} - -// =========================================================================== -// Tier 1 — TypedColumnBuffer (raw unencoded read) -// =========================================================================== - -void BM_ColumnBuffer_ReadFloat32(benchmark::State& state) { - static TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32, "value")); - static bool init = [&] { - for (int i = 0; i < kPointCount; ++i) { - buf.appendFloat32(static_cast(i) * 0.1f); - } - return true; - }(); - (void)init; - - for (auto _ : state) { - double sum = 0.0; - for (int i = 0; i < kPointCount; ++i) { - sum += buf.readFloat32(static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_ColumnBuffer_ReadInt64(benchmark::State& state) { - static TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt64, "value")); - static bool init = [&] { - for (int i = 0; i < kPointCount; ++i) { - buf.appendInt64(static_cast(i)); - } - return true; - }(); - (void)init; - - for (auto _ : state) { - int64_t sum = 0; - for (int i = 0; i < kPointCount; ++i) { - sum += buf.readInt64(static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_ColumnBuffer_ReadString(benchmark::State& state) { - static const std::vector kEnumValues = {"IDLE", "RUN", "WARN", "ERROR"}; - static TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString, "state")); - static bool init = [&] { - for (int i = 0; i < kPointCount; ++i) { - buf.appendString(kEnumValues[static_cast(i) % 4]); - } - return true; - }(); - (void)init; - - for (auto _ : state) { - std::size_t total_len = 0; - for (int i = 0; i < kPointCount; ++i) { - total_len += buf.readString(static_cast(i)).size(); - } - benchmark::DoNotOptimize(total_len); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -BENCHMARK(BM_ColumnBuffer_ReadFloat32); -BENCHMARK(BM_ColumnBuffer_ReadInt64); -BENCHMARK(BM_ColumnBuffer_ReadString); - -// =========================================================================== -// Tier 2 — Sealed chunk decode (encoded read) -// =========================================================================== - -void BM_Chunk_ReadFloat32(benchmark::State& state) { - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kFloat32, - [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i) * 0.1f); }); - - for (auto _ : state) { - double sum = 0.0; - for (int i = 0; i < kPointCount; ++i) { - sum += chunk.readNumericAsDouble(0, static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); -} - -void BM_Chunk_ReadInt64(benchmark::State& state) { - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i)); }); - - for (auto _ : state) { - double sum = 0.0; - for (int i = 0; i < kPointCount; ++i) { - sum += chunk.readNumericAsDouble(0, static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); -} - -void BM_Chunk_ReadString(benchmark::State& state) { - static const std::vector kEnumValues = {"IDLE", "RUN", "WARN", "ERROR"}; - static TopicChunk chunk = build_typed_chunk(PrimitiveType::kString, [](TopicChunkBuilder& b, std::size_t col, int i) { - b.set(col, std::string_view(kEnumValues[static_cast(i) % 4])); - }); - - for (auto _ : state) { - std::size_t total_len = 0; - for (int i = 0; i < kPointCount; ++i) { - total_len += chunk.readString(0, static_cast(i)).size(); - } - benchmark::DoNotOptimize(total_len); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); -} - -BENCHMARK(BM_Chunk_ReadFloat32); -BENCHMARK(BM_Chunk_ReadInt64); -BENCHMARK(BM_Chunk_ReadString); - -// =========================================================================== -// Tier 2b — Bulk read (switch-once, tight inner loop) -// =========================================================================== - -void BM_Chunk_BulkReadFloat32(benchmark::State& state) { - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kFloat32, - [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i) * 0.1f); }); - - std::vector buf(kPointCount); - for (auto _ : state) { - chunk.readColumnAsDoubles(0, Span(buf), 0); - benchmark::DoNotOptimize(buf.data()); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); -} - -void BM_Chunk_BulkReadInt64(benchmark::State& state) { - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i)); }); - - std::vector buf(kPointCount); - for (auto _ : state) { - chunk.readColumnAsDoubles(0, Span(buf), 0); - benchmark::DoNotOptimize(buf.data()); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); -} - -BENCHMARK(BM_Chunk_BulkReadFloat32); -BENCHMARK(BM_Chunk_BulkReadInt64); - -// =========================================================================== -// Tier 2c — FOR and Constant compressed benchmarks -// =========================================================================== - -void BM_Chunk_ReadInt64_FOR(benchmark::State& state) { - // int64 values mod 100 → range [0,99], FOR uses 1 byte offsets - static TopicChunk chunk = build_typed_chunk(PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { - b.set(col, static_cast(i % 100)); - }); - - for (auto _ : state) { - double sum = 0.0; - for (int i = 0; i < kPointCount; ++i) { - sum += chunk.readNumericAsDouble(0, static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); - state.counters["encoding"] = static_cast(chunk.columnEncoding(0)); -} - -void BM_Chunk_BulkReadInt64_FOR(benchmark::State& state) { - // Same FOR-compressed int64 column, bulk read - static TopicChunk chunk = build_typed_chunk(PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { - b.set(col, static_cast(i % 100)); - }); - - std::vector buf(kPointCount); - for (auto _ : state) { - chunk.readColumnAsDoubles(0, Span(buf), 0); - benchmark::DoNotOptimize(buf.data()); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); - state.counters["encoding"] = static_cast(chunk.columnEncoding(0)); -} - -void BM_Chunk_ReadInt32_Constant(benchmark::State& state) { - // Constant int32 column - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kInt32, [](TopicChunkBuilder& b, std::size_t col, int /*i*/) { b.set(col, 42); }); - - for (auto _ : state) { - double sum = 0.0; - for (int i = 0; i < kPointCount; ++i) { - sum += chunk.readNumericAsDouble(0, static_cast(i)); - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); - state.counters["encoding"] = static_cast(chunk.columnEncoding(0)); -} - -void BM_Chunk_BulkReadInt32_Constant(benchmark::State& state) { - // Same constant column, bulk read - static TopicChunk chunk = build_typed_chunk( - PrimitiveType::kInt32, [](TopicChunkBuilder& b, std::size_t col, int /*i*/) { b.set(col, 42); }); - - std::vector buf(kPointCount); - for (auto _ : state) { - chunk.readColumnAsDoubles(0, Span(buf), 0); - benchmark::DoNotOptimize(buf.data()); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); - state.counters["bytes_per_row"] = encoded_bytes_per_row(chunk, 0); - state.counters["encoding"] = static_cast(chunk.columnEncoding(0)); -} - -BENCHMARK(BM_Chunk_ReadInt64_FOR); -BENCHMARK(BM_Chunk_BulkReadInt64_FOR); -BENCHMARK(BM_Chunk_ReadInt32_Constant); -BENCHMARK(BM_Chunk_BulkReadInt32_Constant); - -// =========================================================================== -// Tier 3 — RangeCursor iteration (cross-chunk) -// =========================================================================== - -void BM_Cursor_ReadFloat32(benchmark::State& state) { - static std::deque chunks = build_chunked_deque( - PrimitiveType::kFloat32, - [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i) * 0.1f); }); - - for (auto _ : state) { - double sum = 0.0; - auto cursor = rangeQuery(chunks, 0, kPointCount - 1); - cursor.forEach([&](const SampleRow& row) { sum += row.chunk->readNumericAsDouble(0, row.row_index); }); - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_Cursor_ReadInt64(benchmark::State& state) { - static std::deque chunks = build_chunked_deque( - PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i)); }); - - for (auto _ : state) { - double sum = 0.0; - auto cursor = rangeQuery(chunks, 0, kPointCount - 1); - cursor.forEach([&](const SampleRow& row) { sum += row.chunk->readNumericAsDouble(0, row.row_index); }); - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_Cursor_ReadString(benchmark::State& state) { - static const std::vector kEnumValues = {"IDLE", "RUN", "WARN", "ERROR"}; - static std::deque chunks = - build_chunked_deque(PrimitiveType::kString, [](TopicChunkBuilder& b, std::size_t col, int i) { - b.set(col, std::string_view(kEnumValues[static_cast(i) % 4])); - }); - - for (auto _ : state) { - std::size_t total_len = 0; - auto cursor = rangeQuery(chunks, 0, kPointCount - 1); - cursor.forEach([&](const SampleRow& row) { total_len += row.chunk->readString(0, row.row_index).size(); }); - benchmark::DoNotOptimize(total_len); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -BENCHMARK(BM_Cursor_ReadFloat32); -BENCHMARK(BM_Cursor_ReadInt64); -BENCHMARK(BM_Cursor_ReadString); - -// =========================================================================== -// Tier 3b — Chunk-at-a-time cursor + bulk read -// =========================================================================== - -void BM_Cursor_ChunkAtATime_Float32(benchmark::State& state) { - static std::deque chunks = build_chunked_deque( - PrimitiveType::kFloat32, - [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i) * 0.1f); }); - - std::vector buf(kChunkSize); - for (auto _ : state) { - double sum = 0.0; - auto cursor = rangeQuery(chunks, 0, kPointCount - 1); - cursor.forEachChunk([&](const ChunkRowRange& range) { - const std::size_t n = range.row_end - range.row_start; - range.chunk->readColumnAsDoubles(0, Span(buf.data(), n), range.row_start); - for (std::size_t i = 0; i < n; ++i) { - sum += buf[i]; - } - }); - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_Cursor_ChunkAtATime_Int64(benchmark::State& state) { - static std::deque chunks = build_chunked_deque( - PrimitiveType::kInt64, [](TopicChunkBuilder& b, std::size_t col, int i) { b.set(col, static_cast(i)); }); - - std::vector buf(kChunkSize); - for (auto _ : state) { - double sum = 0.0; - auto cursor = rangeQuery(chunks, 0, kPointCount - 1); - cursor.forEachChunk([&](const ChunkRowRange& range) { - const std::size_t n = range.row_end - range.row_start; - range.chunk->readColumnAsDoubles(0, Span(buf.data(), n), range.row_start); - for (std::size_t i = 0; i < n; ++i) { - sum += buf[i]; - } - }); - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -BENCHMARK(BM_Cursor_ChunkAtATime_Float32); -BENCHMARK(BM_Cursor_ChunkAtATime_Int64); - -// =========================================================================== -// Tier 4 — Deque baseline (reference overhead) -// =========================================================================== - -void BM_Deque_ReadFloat(benchmark::State& state) { - static std::deque> data = - BuildDequeData([](int i) { return static_cast(i) * 0.1f; }); - - for (auto _ : state) { - double sum = 0.0; - for (auto& [ts, v] : data) { - benchmark::DoNotOptimize(ts); - sum += v; - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_Deque_ReadInt64(benchmark::State& state) { - static std::deque> data = - BuildDequeData([](int i) { return static_cast(i); }); - - for (auto _ : state) { - int64_t sum = 0; - for (auto& [ts, v] : data) { - benchmark::DoNotOptimize(ts); - sum += v; - } - benchmark::DoNotOptimize(sum); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -void BM_Deque_ReadString(benchmark::State& state) { - static const std::vector kEnumValues = {"IDLE", "RUN", "WARN", "ERROR"}; - static std::deque> data = - BuildDequeData([](int i) { return kEnumValues[static_cast(i) % kEnumValues.size()]; }); - - for (auto _ : state) { - std::size_t total_len = 0; - for (auto& [ts, v] : data) { - benchmark::DoNotOptimize(ts); - total_len += v.size(); - } - benchmark::DoNotOptimize(total_len); - benchmark::ClobberMemory(); - } - state.SetItemsProcessed(static_cast(state.iterations()) * kPointCount); -} - -BENCHMARK(BM_Deque_ReadFloat); -BENCHMARK(BM_Deque_ReadInt64); -BENCHMARK(BM_Deque_ReadString); - -} // namespace -} // namespace PJ - -BENCHMARK_MAIN(); diff --git a/pj_datastore/docs/ARCHITECTURE.md b/pj_datastore/docs/ARCHITECTURE.md deleted file mode 100644 index 31d033e9..00000000 --- a/pj_datastore/docs/ARCHITECTURE.md +++ /dev/null @@ -1,265 +0,0 @@ -# pj_datastore Architecture - -> **Scope:** this document covers the **scalar `DataEngine` path** — the columnar -> store, encoding, queries, and `DerivedEngine`. The library also compiles three -> first-class components documented elsewhere or summarized below: the opaque-blob -> **`ObjectStore`** (its own doc, [`OBJECT_STORE_DESIGN.md`](./OBJECT_STORE_DESIGN.md)), -> the **`ColorMapRegistry`** + its C-ABI host (§3 *Color Map Layer*), and the -> **streaming two-engine `flushTo`/`setTarget`** swap (§3 *Plugin Host Layer*). - -## 1. Module Structure - -Two libraries with a strict dependency direction: - -- **pj_base** (`pj_base/`): Vocabulary types and SDK headers with zero external dependencies. Defines: - - `Timestamp` (`int64_t`, nanoseconds since Unix epoch) - - `Range` inclusive min/max pairs - - Identity types: `DatasetId`, `TopicId`, `FieldId`, `SchemaId`, `TimeDomainId` (all `uint32_t`), `ChunkId` (`uint64_t`), `NodeId` (`uint32_t`) - - `PrimitiveType` enum (12 variants: kFloat32..kUint64, kBool, kString) - - `NumericType`, `NumericValue` (variant of all numeric scalars) - - `TypeTreeNode` (recursive schema tree: primitive, struct, array, enum nodes) - - `Span` (alias for `std::span`), `BitSpan` (bit-level view with offset) - - `Expected` / `Status` for fallible operations, `PJ_ASSERT` for invariants - -- **pj_datastore** (`pj_datastore/`): Engine implementation. External dependencies: fmt, tsl::robin_map, nanoarrow (Arrow IPC import). - -Dependency: `pj_datastore` -> `pj_base`. No reverse dependency. `pj_plugins` depends on `pj_base` only, never on `pj_datastore`. - -## 2. Domain Model - -Hierarchy: **Dataset -> Topic -> Chunk -> Column** - -- **Dataset** (`DatasetInfo`): Identity (`DatasetId`), `source_name` string, bound `TimeDomain`. Represents one data source (file, live connection). - -- **Topic** (`TopicDescriptor` + `TopicStorage`): Named data stream within a dataset. Key fields: `schema_id` (0 = schemaless), `max_chunk_rows` (default 1024), `array_expansion_limit` (default 64). `TopicStorage` owns the committed chunk deque and column descriptors for schemaless topics. - -- **Schema** (`TypeRegistry` + `TypeTreeNode`): Tree of named typed nodes. `TypeRegistry` assigns `SchemaId` values, supports lookup by id/name, and additive-only evolution via `evolveSchema()`. Schemas are shared across topics. - -- **Chunk** (`TopicChunk`): Sealed, immutable storage unit produced by `TopicChunkBuilder::seal()`. Contains: `ChunkId` (monotonic atomic counter), timestamps vector, columns vector (each holding `EncodedData` + optional `BitVector` validity bitmap + `shared_ptr`), and `ChunkStats` (t_min, t_max, row_count, per-column `ColumnStats`). - -- **Column** (`TopicChunk::Column`): The physical unit. `EncodedData` is a variant of 5 encoding types (see section 5). `ColumnDescriptor` carries field_id, logical_type, and fully-qualified field_path (e.g. `"pose.position.x"`). - -## 3. Layer Architecture - -### Logical Layer - -**`DataEngine`** — Central owner of all state. Stores datasets, topics (as `TopicStorage`), time domains, and the global `TypeRegistry`, all in hash map containers (tsl::robin_map internally, std::unordered_map in headers). Provides: -- `createDataset()`, `createTopic()`, `createTimeDomain()` with monotonic ID allocation -- `commitChunks()` — appends sealed chunks to `TopicStorage`, returns deduplicated list of changed `TopicId`s -- `enforceRetention()` — evicts old chunks across all topics -- Factory methods `createWriter()` and `createReader()` - -**`TypeRegistry`** — `registerSchema()` assigns a new `SchemaId`. `registerOrGet()` returns an existing ID if the name matches (for late-discovery schemas). `evolveSchema()` validates additive-only changes (no field removal or type change). - -### Storage Layer - -**`TopicChunkBuilder`** — Mutable builder that accumulates rows for one topic. Two append paths: -- *Row-at-a-time*: `beginRow(timestamp)` -> `set(col, value)` / `setNull(col)` -> `finishRow()`. `finishRow()` pads unset columns with null. -- *Bulk*: `appendTimestamps(span)` -> `appendColumn(col, span)` per column -> `appendColumnValidity(col, bitspan)` -> `finishBulkAppend()` (computes stats). - -Tracks per-column `ColumnStats` incrementally (min, max, null_count, is_constant, run_count). `seal()` encodes columns, assigns a monotonic `ChunkId` via `std::atomic`, and produces an immutable `TopicChunk`. - -**`TopicStorage`** — Per-topic container of committed chunks in a `std::deque`. `appendSealedChunk()` validates ordering: each chunk's `t_min >= previous chunk's t_max`. `evictBefore()` removes chunks whose `t_max < threshold`. Also stores column descriptors for schemaless (schema_id == 0) topics and per-field array expansion counts. - -**`TypedColumnBuffer`** — In-memory typed column buffer used internally by the builder. One buffer per column. Supports per-type single-row append (`appendFloat64`, `appendInt64`, etc.) and bulk append (`appendFloat64Bulk`, etc.). String storage uses a separate offsets buffer (`RawBuffer`) with Arrow-compatible uint32 offset layout. Validity bitmap (`BitVector`) is lazily initialized -- only allocated when the first null is appended. - -**`RawBuffer`** — Growable byte vector wrapping `std::vector`. Used for column value storage and encoded payloads. - -**`BitVector`** — Owning packed validity bitmap with Arrow-compatible LSB-first layout. Supports `initValid()`, `setNull()`, `isValid()`, `countNulls()`, and bulk `assignBytes()`. - -### Writer Layer - -**`DataWriter`** — High-level write facade bound to a `DataEngine`. Manages one `TopicChunkBuilder` and a pending chunk list per topic. Key operations: -- `registerTopic()` / `registerScalarSeries()` — topic creation -- `ensureColumn()` — dynamic column addition for schemaless topics. Rejects if a row is in progress (for new columns only). Seals any pending builder before modifying column layout. -- `expandArray()` — variable-length array expansion. Seals current builder, adds new `ColumnDescriptor`s, updates `TopicStorage`. Clamps to `array_expansion_limit`. -- `appendColumns()` — bulk ingest with auto-chunking: splits batches that exceed `max_chunk_rows`, calling `appendTimestamps` / `appendColumn` / `finishBulkAppend` / `autoSeal` per sub-batch. -- `flush()` / `flushAll()` — seals remaining builders and returns `vector>` for `commitChunks()`. -- `autoSeal()` — called when builder is full (`rowCount >= max_chunk_rows`). Seals builder, moves chunk to `pending_chunks_`. - -### Reader Layer - -**`DataReader`** — Read-only facade over committed `DataEngine` storage. Provides `listDatasets()`, `listTopics()`, `getTypeTree()`, `getMetadata()`, `rangeQuery()`, `latestAt()`, and `series(topic_id, column_index)`. - -### Query Layer - -**`RangeCursor`** — Iterates rows in `[t_min, t_max]` across the chunk deque. Constructor binary-searches the deque for the first overlapping chunk and row. Supports: -- `forEach(callback)` — per-row iteration via `SampleRow` (timestamp + chunk pointer + row index) -- `forEachChunk(callback)` — bulk iteration via `ChunkRowRange` (chunk pointer + row start/end) - -**`latestAt(chunks, t)`** — Binary search for the most recent row at or before timestamp `t`. Returns `optional`. - -Both are free functions operating on `const std::deque&`. - -**`SeriesReader`** — Views one numeric/bool topic column as a virtual vector of `(timestamp, value)` samples. It is bound to a topic and column when created through `DataReader::series()`. Null physical rows and chunks where the column does not exist are skipped by definition. Provides: -- `size()` / `empty()` — sample count, not physical row count -- `sampleAt(index)` — lookup by virtual series index -- `sampleAtOrBeforeTime(t)` / `sampleAtOrAfterTime(t)` — lookup by timestamp over value-bearing samples -- `samples(Range)` — cursor over samples in an inclusive time range -- `bounds()` / `bounds(Range)` — valid time and value ranges for the series - -**`SeriesCursor`** — Iterates value-bearing samples in `[time.min, time.max]` across chunks. It returns `SeriesSample` values with timestamp, double value, chunk pointer, and physical row index for low-level consumers that need provenance. - -### Encoding Layer - -Seven `StorageKind` values map logical `PrimitiveType` to physical storage: - -| StorageKind | Source PrimitiveTypes | -|---|---| -| kFloat32 | kFloat32 | -| kFloat64 | kFloat64 | -| kInt32 | kInt32 | -| kInt64 | kInt8, kInt16, kInt64 | -| kUint64 | kUint8, kUint16, kUint32, kUint64 | -| kBool | kBool | -| kString | kString | - -Five `EncodingType` values, selected at seal time per column: - -| EncodingType | When used | Representation | -|---|---|---| -| kRaw (`RawBuffer`) | Default for float/uint64 when not constant | Typed byte buffer | -| kConstant (`ConstantEncoded`) | All non-null values equal (`is_constant && rowCount > 0`) | 8-byte value + count | -| kFrameOfReference (`FrameOfReferenceEncoded`) | kInt32/kInt64 when range fits in fewer bytes | int64 reference + packed uint8/16/32 offsets | -| kDictionary (`DictionaryEncoded`) | Always for kString | Unique string list + narrowed uint8/16/32 indices | -| kPackedBool (`PackedBools`) | kBool when not constant | 1 bit per value, LSB first | - -`EncodedData = std::variant` - -Encoding selection in `TopicChunkBuilder::seal()`: -1. **Strings**: always dictionary encoded via `dictionaryEncodeStrings()`. -2. **Bools**: constant if `is_constant`, otherwise packed bits via `packBools()`. -3. **Signed integers** (kInt32/kInt64): recomputes exact int64 min/max from raw buffer (avoids double-precision loss). Constant if all equal, frame-of-reference if `offsetBytesFor(range) < storageKindSize(kind)`, otherwise raw. -4. **Float32/float64/uint64**: constant if `is_constant`, otherwise raw. - -### Derived Layer - -**`DerivedEngine`** — Manages a transform DAG. Uses pimpl (`DerivedEngineImpl`). Key operations: - -- `addSisoTransform()` — registers a single-input/single-output node. Input must be a single-column topic. Creates an output scalar topic. Returns `NodeId`. -- `addMimoTransform()` — registers a multi-input/multi-output node. All inputs must be single-column. Creates N output topics. -- `topologicalOrder()` — Kahn's algorithm. Cycle detection via DFS at registration time. -- `onSourceCommitted(changed_topics)` — marks directly dependent nodes dirty. -- `scheduleAll()` — processes all dirty nodes in topological order (incremental path). -- `scheduleActive(active_nodes)` — processes only specified nodes and their transitive upstream dependencies. -- `recompute_batch(node_id)` — clears output topic, calls `transform.reset()`, replays full input history. - -**`ISISOTransform`** — Point-at-a-time interface. `calculate(time, input, &out_time, &out_value) -> bool`. Called in strictly ascending timestamp order. State persists across chunk boundaries. `reset()` clears state for batch recompute. `outputKind()` declares output `StorageKind` (default kFloat64). - -**`IMIMOTransform`** — N inputs -> M outputs. `calculate(time, inputs_span, &out_time, &outputs_vec) -> bool`. Exact-timestamp inner join: only called when ALL input topics have a sample at the same timestamp. `outputKinds(input_kinds)` declares one `StorageKind` per output topic. - -**`VarValue = std::variant`** — Universal value type for transform I/O. Mapping: float32/float64 -> double, int8..int64/bool -> int64_t, uint64 -> uint64_t, string -> std::string. - -Incremental scheduling: each node tracks a `last_processed_chunk_id` watermark. `scheduleAll()` iterates only chunks with id > watermark, reads each row, calls `calculate()`, writes output via `beginRow`/`set`/`finishRow`, then flushes and commits. - -### Color Map Layer - -**`ColorMapRegistry`** (`colormap_registry.hpp`) — Registry of named colormap callbacks. A plugin registers one or more named `ColorMapEvalFn`s (scalar -> CSS color / `#rrggbb`) via `registerMap()` and selects an active one with `setActive()`; consumers (chart renderers, exporters) evaluate the active map per data point via `evaluate()`. The registry holds raw callback pointers + user contexts and does not own the plugin that supplied them. - -**`makeColorMapRegistryHost()`** (`colormap_registry_host.hpp`) — Wraps a `ColorMapRegistry` as a C-ABI `PJ_colormap_registry_t` fat pointer for `bind_colormap_registry`. The fat pointer references the registry by address (the registry must outlive every bound plugin); the vtable is a static singleton, safe to share across plugins and threads. - -### Plugin Host Layer - -**`DatastoreSourceWriteHost`** / **`DatastoreParserWriteHost`** / **`DatastoreToolboxHost`** — Bridge between the C ABI scalar-write protocol (`PJ_source_write_host_t`, etc.) and the C++ `DataWriter`/`DataEngine`. Each wraps a pimpl state struct. Provides `raw()` to get the C function-pointer table and `flushPending()` to seal/commit accumulated data. - -The host translates C ABI calls (ensureTopic, ensureField, appendRecord) into `DataWriter` operations (registerTopic/ensureColumn, beginRow/set/finishRow). - -**Object hosts** — `DatastoreSourceObjectWriteHost`, `DatastoreParserObjectWriteHost`, and `DatastoreToolboxObjectReadHost` are the `ObjectStore` peers of the scalar hosts (surfaces `pj.source_object_write` / `pj.parser_object_write` / `pj.toolbox_object_read`). They bridge the C ABI onto an `ObjectStore` rather than a `DataEngine`; see [`OBJECT_STORE_DESIGN.md`](./OBJECT_STORE_DESIGN.md) for their write/read contracts. - -**Streaming two-engine swap** — During paused streaming the host can route pushes to a *secondary* engine/store and swap back on resume. `DataEngine::flushTo(dst)` zero-copy-moves committed chunks from the secondary into the primary (topics matched by descriptor, per-topic monotonicity enforced); `DatastoreSourceWriteHost::setTarget()` / `DatastoreParserWriteHost::setTarget()` (and the object hosts' `setTarget()`) flush pending rows then atomically retarget the destination. The streaming manager keeps both engines/stores registered in lockstep so the bound topics exist on the target. `ObjectStore::flushTo()` provides the matching blob-side move. - -### Import Adapter - -**Arrow IPC** (`PJ::arrow_import` namespace): -- `schemaFromIpc()` — parses Arrow schema from IPC stream bytes via nanoarrow. Maps Arrow fields to `ArrowColumnMapping` (arrow column index -> PJ column index + PrimitiveType). Unsupported Arrow types are skipped. -- `importIpcStream()` — reads record batches, appends via `DataWriter::appendColumns()` with validity bitmaps. - -## 4. Data Flow - -### Row-at-a-time Ingest (via plugin host) - -1. Plugin calls `writeHost.ensureTopic(name)` -> host creates topic via `DataWriter::registerTopic()` -2. Plugin calls `writeHost.ensureField(topic, name, type)` -> host calls `DataWriter::ensureColumn()` -3. Plugin calls `writeHost.appendRecord(topic, timestamp, fields)` -> host calls `DataWriter::beginRow()`, `set()` per field, `finishRow()` -4. `finishRow()` pads unset columns with null, updates per-column stats -5. If builder is full (`rowCount >= max_chunk_rows`): `autoSeal()` seals builder, moves `TopicChunk` to `pending_chunks_` -6. Host calls `flushPending()` -> `DataWriter::flushAll()` seals remaining builders -> `DataEngine::commitChunks()` appends to `TopicStorage` -7. Chunks are now visible to readers and derived engine - -### Bulk Ingest - -1. `DataWriter::appendColumns(topic_id, timestamps, column_data_array)` -2. Auto-chunks: splits batch into sub-batches fitting builder's `remainingCapacity()` -3. For each sub-batch: `appendTimestamps()`, `appendColumn()` per column, `finishBulkAppend()` (computes stats), `autoSeal()` if full -4. Flush + commit same as row-at-a-time - -### Series Query - -1. `DataReader::series(topic_id, column_index)` validates the topic, column bounds, and numeric/bool value type -2. The returned `SeriesReader` treats the column as a field-level time series -3. Null physical rows are skipped; every `SeriesSample` has a value -4. Time lookups return valid series samples and virtual series indices -5. `bounds()` returns min/max over value-bearing samples only - -### Row Query - -1. `DataReader::rangeQuery(QueryRange{topic_id, t_min, t_max})` -> `RangeCursor` -2. Cursor binary-searches the chunk deque for start position -3. `forEach` callback receives `SampleRow` (chunk pointer + row index) -4. Caller reads values via `chunk->readNumericAsDouble(col, row)`, `readString()`, `readBool()`, `isNull()`, or batch via `readColumnAsDoubles()` - -### Derived Scheduling - -1. `engine.commitChunks()` returns changed topic IDs -2. Caller invokes `derivedEngine.onSourceCommitted(changed_topics)` -> marks dependent nodes dirty -3. `derivedEngine.scheduleAll()` processes dirty nodes in topological order -4. For each dirty SISO node: iterate new input chunks (id > `last_processed_chunk_id`), read each row, call `transform.calculate()`, write output via `beginRow`/`set`/`finishRow`, flush + commit -5. For each dirty MIMO node: iterate primary input's new chunks, for each timestamp check all other inputs via `latestAt()`, call `transform.calculate()` only when all inputs have matching timestamps - -## 5. Key Invariants - -- **Dense field IDs**: Field IDs within a topic are always 0, 1, 2, ... with no gaps. -- **Monotonic timestamps**: Timestamps within a topic are monotonically non-decreasing. Enforced at `beginRow()` and `appendTimestamps()`. -- **ensureColumn guards**: Rejects new columns after a row is in progress. Invalidates stale 0-row builders when adding columns. -- **expandArray seals first**: `expandArray()` seals the current builder before modifying column layout, preventing mid-chunk schema changes. -- **Chunk ordering**: Each chunk's `t_min >= previous chunk's t_max`. Enforced by `TopicStorage::appendSealedChunk()`. -- **Lazy validity bitmaps**: `TypedColumnBuffer` only allocates a `BitVector` on first `appendNull()`. Sealed chunks include validity only when `hasNulls()` is true. -- **NaN for nulls**: `readColumnAsDoubles()` writes `NaN` at null positions, preventing confusion between null and zero. -- **ChunkId monotonicity**: `TopicChunkBuilder` uses a `static atomic` counter starting at 1. `kInvalidChunkId` (0) is the sentinel for "no chunk seen yet". - -## 6. Threading Model - -Effectively single-threaded. `DataWriter` accumulates in-memory. `DataEngine::commitChunks()` is synchronous. No internal locks or queues. The only concurrency point is `TopicChunkBuilder::next_chunk_id_` (a `std::atomic`), which allows multiple builders to generate unique chunk IDs without coordination. Plugin sources that receive data on network threads must queue messages internally and process them in `onPoll()`. - -## 7. Testing - -18 core test executables covering all layers (the authoritative live set is the -`PJ_DATASTORE_TESTS` list plus the explicitly-added targets in `CMakeLists.txt`): - -| Test | Coverage | -|---|---| -| `buffer_test` | `RawBuffer`, `BitVector` | -| `column_buffer_test` | `TypedColumnBuffer` per-type append/read, validity | -| `type_registry_test` | Schema registration, lookup, evolution | -| `encoding_test` | All 5 encoding types: constant, FOR, dictionary, packed bool, raw | -| `chunk_test` | `TopicChunkBuilder` row/bulk paths, seal, read-back | -| `topic_storage_test` | Commit ordering, eviction, metadata | -| `query_test` | `RangeCursor`, `latestAt`, edge cases | -| `series_reader_test` | `SeriesReader`, `SeriesCursor`, series sample bounds/lookups | -| `engine_integration_test` | Full `DataEngine` + `DataWriter` + `DataReader` round-trip | -| `derived_engine_test` | SISO/MIMO transforms, topological order, incremental + batch recompute | -| `array_expansion_test` | `expandArray`, clamping, cross-builder expansion | -| `regression_test` | Bug-specific regression cases | -| `object_store_test` | `ObjectStore` owned/lazy push, latest-at, retention, `flushTo` | -| `plugin_data_host_object_test` | Object-write host bridges (`pj.source_object_write` / `pj.parser_object_write`) onto `ObjectStore` | -| `plugin_data_host_object_read_test` | Toolbox object-read host (`pj.toolbox_object_read`) queries | -| `plugin_parser_object_write_test` | Parser plugin writing canonical builtin objects through the object-write host | -| `arrow_import_test` | Arrow IPC schema parsing and batch import | -| `arrow_stream_round_trip_test` | v4 Arrow C Data Interface round-trip (Phase 1b) | - -Disabled pending the Phase 1b v4-ABI rewrite (commented out in `CMakeLists.txt`, -`.cpp` files still on disk): `plugin_host_write_test` (v3 `appendArrowIpc`/`readSeries` -write path) and `plugin_host_read_test` (v3 toolbox read path; rewrite for -`read_series_arrow`). - -Build and run: `./build.sh --debug && ./test.sh` diff --git a/pj_datastore/docs/OBJECT_STORE_DESIGN.md b/pj_datastore/docs/OBJECT_STORE_DESIGN.md deleted file mode 100644 index b34bf650..00000000 --- a/pj_datastore/docs/OBJECT_STORE_DESIGN.md +++ /dev/null @@ -1,149 +0,0 @@ -# ObjectStore Design - -`ObjectStore` stores timestamped opaque byte payloads alongside the columnar -`DataEngine`. It is for data that should be selected by time but should not be -expanded into scalar columns at ingest time. - -## Responsibilities - -`ObjectStore` owns: - -- object topic registration scoped by `DatasetId` -- monotonically non-decreasing timestamped entries per topic -- eager payload storage through `pushOwned()` -- lazy payload storage through `pushLazy()` -- at-or-before timestamp lookup -- entry-index lookup and timestamp views -- per-topic retention budgets -- explicit topic eviction, removal, and clear operations - -It does not decode payloads, interpret metadata, choose renderers, or own UI -policy. Topic metadata is opaque JSON retained verbatim for callers that need to -interpret object bytes. - -## Data Model - -```cpp -struct ObjectTopicDescriptor { - DatasetId dataset_id; - std::string topic_name; - std::string metadata_json; -}; - -// Eager payload: store-owned bytes, counted against the retention budget. -using SharedBuffer = std::shared_ptr>; -// Lazy payload: idempotent fetcher returning a view + ownership anchor. -using LazyCallback = std::function; - -struct ObjectEntry { - Timestamp timestamp; - std::variant payload; -}; - -struct RetentionBudget { - int64_t time_window_ns; - size_t max_memory_bytes; -}; -``` - -Topic names must be unique within one dataset. The same topic name may appear in -different datasets. - -Entries in a topic must be pushed in monotonically non-decreasing timestamp -order. Equal timestamps are allowed. Out-of-order writes fail. - -## Write Paths - -`pushOwned(id, timestamp, payload)` moves the caller-provided vector into a -shared buffer owned by the store. Owned entries contribute to `memoryUsage()`. - -`pushLazy(id, timestamp, fetch)` stores a callable instead of bytes. The callable -is invoked on each read and returns a `sdk::PayloadView` — a `Span` -paired with a type-erased `BufferAnchor` that keeps those bytes alive for as long -as the resolved view is held. The producer anchors on whatever already owns the -bytes (a decompressed chunk, an mmap, or a fresh allocation via -`sdk::makePayloadView`), so the store never copies on resolve. Lazy entries do not -contribute to `memoryUsage()` because the store retains the callable, not the -fetched bytes. - -Both write paths apply the topic retention budget after the new entry is -inserted. - -## Read Paths - -`latestAt(id, timestamp)` returns the newest entry whose timestamp is less than -or equal to the query timestamp. It returns `std::nullopt` if the topic is -unknown, empty, or has no entry at or before that time. - -`at(id, index)` resolves an entry by sequence index. - -`indexAt(id, timestamp)` returns the index that `latestAt()` would resolve. - -`entryTimestamps(id)` returns an `EntryTimestampsView` that holds the series read -lock while the timestamp span is inspected. - -Resolved entries contain: - -```cpp -struct ResolvedObjectEntry { - Timestamp timestamp; - sdk::PayloadView payload; // { Span bytes; BufferAnchor anchor; } -}; -``` - -`payload.bytes` is the resolved view; `payload.anchor` keeps those bytes alive -independently of later store mutation (eviction, removal, or `clear()`). For an -owned entry the anchor is the store's `SharedBuffer`; for a lazy entry it is -whatever the fetcher anchored on. An empty `anchor` means "no bytes". - -## Retention - -Retention is configured per topic: - -- `time_window_ns > 0`: drop entries older than `newest_push_ts - - time_window_ns`. -- `max_memory_bytes > 0`: drop oldest entries until owned-payload memory is at - or below the cap. - -Either axis can be zero to disable that axis. Both zero disables automatic -retention. - -Automatic retention runs only during `pushOwned()` and `pushLazy()`. Explicit -eviction is available through `evictBefore(id, threshold)` and -`evictAllBefore(threshold)`. - -Memory accounting includes only owned payloads. Lazy entries are counted as zero -bytes because the store retains a fetch callable, not the fetched payload. - -## Threading - -The store has one global shared mutex for topic lookup and one shared mutex per -topic series. Reads can proceed concurrently with reads on the same or different -topics. Writes take the target topic's exclusive lock. Topic registration, -removal, and `clear()` take the global exclusive lock. - -## Plugin ABI Bridge - -Plugin access to `ObjectStore` is provided by three optional v4 services: - -| Service | Host implementation | Purpose | -|---|---|---| -| `pj.source_object_write.v1` | `DatastoreSourceObjectWriteHost` | DataSource plugins register object topics and push owned or lazy entries. | -| `pj.parser_object_write.v1` | `DatastoreParserObjectWriteHost` | MessageParser plugins push entries to a host-bound object topic. | -| `pj.toolbox_object_read.v1` | `DatastoreToolboxObjectReadHost` | Toolbox plugins look up topics and read entries as owning byte handles. | - -The raw ABI lives in `pj_base/include/pj_base/plugin_data_api.h`; the C++ SDK -views live in `pj_base/include/pj_base/sdk/plugin_data_api.hpp`. - -The toolbox read ABI allocates one owning handle per successful read. The handle -keeps bytes alive until the plugin releases it, even if the store evicts or -removes the underlying topic. - -## Tests - -Core behavior is covered by: - -- `pj_datastore/tests/object_store_test.cpp` -- `pj_datastore/tests/plugin_data_host_object_test.cpp` -- `pj_datastore/tests/plugin_data_host_object_read_test.cpp` -- `pj_datastore/tests/plugin_parser_object_write_test.cpp` diff --git a/pj_datastore/docs/REQUIREMENTS.md b/pj_datastore/docs/REQUIREMENTS.md deleted file mode 100644 index b1ffee41..00000000 --- a/pj_datastore/docs/REQUIREMENTS.md +++ /dev/null @@ -1,119 +0,0 @@ -# pj_datastore Requirements - -## 1. Purpose - -Columnar time-series storage engine for PlotJuggler Core. Decouples data storage from UI, plugin formats, and transport protocols. Provides a single, type-safe data layer that all plugins write to and all consumers read from. - -## 2. Goals - -- Memory-efficient columnar storage with adaptive per-column encoding (constant, frame-of-reference, dictionary, packed bools) -- Full type fidelity: float32, float64, int32, int64, uint64, bool, string — no lossy double-only representation -- Derived (computed) series via a transform DAG supporting both single-input/single-output (SISO) and multi-input/multi-output (MIMO) transforms -- Decoupled plugin API: plugins interact through host-provided interfaces, never touching engine internals -- Independent time domains for comparing data from multiple sources with different clocks - -## 3. Use Cases - -- **One-shot file import**: CSV, Parquet, ULog, MCAP — read entire file, write all data, done -- **Continuous streaming ingest**: ZMQ, MQTT, WebSocket — data arrives indefinitely, rolling buffer with retention -- **Delegated ingest**: Source plugins push raw bytes; host routes to appropriate parser plugin which writes parsed fields -- **Derived series**: User-defined transforms (derivative, moving average, quaternion-to-RPY, Lua scripts) compute new series from existing data -- **Time-range queries**: Retrieve all samples in a time window for plotting -- **Latest-at queries**: Retrieve the most recent sample at or before a given time (for real-time displays) -- **Rolling buffer with eviction**: Streaming sources need bounded memory; old data evicted outside a configurable retention window - -## 4. Functional Requirements - -### 4.1 Data Model - -- Data is organized as: Dataset, Topic, Field (column) -- A Dataset represents one data source (e.g., one file, one network connection) -- A Topic represents one logical data stream (e.g., one sensor, one message type) -- Fields are typed columns within a topic, sharing the same timestamp column -- Schemas describe the structure of typed topics (struct with named, typed fields) - -### 4.2 Type System - -- Supported primitive types: float32, float64, int32, int64, uint64, bool, string -- Values represented as a type-safe variant (ValueRef) preserving native precision -- No implicit narrowing: int64 and uint64 values must not be silently cast to double - -### 4.3 Ingest - -- Row-at-a-time append: set fields by name or by pre-resolved handle, then commit the row -- Bulk columnar append: write arrays of timestamps + column data in one call -- Sparse records: fields not included in a row are automatically null-filled -- Fields may be introduced at any point during ingestion — not just before the first row. This is the expected behavior for variable-length sequences (ROS, Protobuf, DDS) and dynamic-schema formats (JSON). When a new field appears after rows exist, the engine seals the current chunk and continues with the expanded column set. -- Pre-registration with `ensureField()` is an optional optimization for the minority of sources with a fixed, fully-known schema. It enables the faster bound-write path (`appendBoundRecord`) and avoids mid-stream chunk sealing. -- Timestamps must be monotonically increasing within each topic (nanosecond resolution, absolute epoch time) -- Arrow IPC import: accept Arrow record batches for high-throughput bulk ingest - -### 4.4 Storage - -- Chunked columnar storage: data accumulated in mutable builders, sealed into immutable chunks at configurable row thresholds -- Per-column encoding selected at seal time based on data statistics (constant values, integer ranges, string cardinality, boolean density) -- Validity bitmaps (Arrow-compatible, LSB-first) track null values per column -- Null values in batch reads returned as NaN so consumers don't confuse null with zero - -### 4.5 Schema Evolution - -The column set of a topic evolves during ingestion. This is the common case, not the exception — most real-world data sources produce a column count that is unknown at startup and changes as messages arrive: - -- **Variable-length sequences** are the norm in schema-based protocols (ROS, DDS, Protobuf, IDL, FlatBuffers). A `repeated float data` field is flattened to columns `data[0]`, `data[1]`, etc. The column count changes with every message that has a different sequence length. Nested messages containing sequences multiply the effect. Even a single ROS `sensor_msgs/PointCloud2` can produce hundreds of dynamically-sized columns. -- **Dynamic-schema formats** (JSON, MessagePack, CBOR) have no fixed schema at all. Each message may introduce new keys. -- **Fixed column count is the exception**, limited to formats where every field is a scalar with no sequences (e.g., a flat CSV, a Protobuf message with only scalar fields). The engine must not be designed around this minority case. - -Requirements: - -- New fields may appear at any point during ingestion. The engine seals the current chunk and continues with the expanded column set. Rows in earlier chunks have no value for the new column; readers treat absent columns as null. -- Field IDs are append-only and stable — existing handles are never invalidated by later column additions. -- Once a field is created with a given type, subsequent writes must use the same type. Type mismatches are rejected with a clear error. -- Schema changes are resolved between rows, never mid-row. -- Pre-registration with `ensureField()` is an optimization for the fixed-schema case, not a prerequisite for writing data. - -### 4.6 Query - -- Series queries: view one numeric/bool topic column as a time series of value-bearing samples -- Series sample count, bounds, and timestamp lookups always skip null physical rows -- Range queries: iterate all rows in a time window for a topic -- Latest-at queries: find the most recent row at or before a given timestamp -- Per-row and per-chunk-range iteration for physical-row consumers that need explicit null handling -- Read methods for each type: double, int64, uint64, bool, string, with explicit null checking - -### 4.7 Derived Series - -- Transform DAG: register transforms that take one or more input topics and produce one or more output topics -- SISO transforms: single input to single output, point-at-a-time sequential contract -- MIMO transforms: N inputs to M outputs, exact-timestamp inner join (all inputs must have matching timestamps) -- Incremental scheduling: only process new data since last run (watermark tracking) -- Batch recompute: clear output, reset transform state, replay full input history -- Topological ordering: transforms scheduled in dependency order, cycle detection on registration - -### 4.8 Retention - -- Configurable retention window (nanoseconds) -- Old chunks evicted when their maximum timestamp falls outside the retention window -- Eviction is per-topic, triggered externally (not automatic) - -## 5. Non-Functional Requirements - -- Pure C++20 with fmt + tsl::robin_map (no Qt dependency) -- Clean under AddressSanitizer (ASAN) in debug builds -- Deterministic chunk ordering: no internal threading, synchronous commit model -- Zero-copy string reads where possible (string_view into dictionary-encoded column memory) -- Builds with -Wall -Wextra -Werror - -## 6. Plugin Contract - -- Plugins interact through typed host views (SourceWriteHostView, ParserWriteHostView, ToolboxHostView) -- Plugins never instantiate or reference engine classes directly -- Error model: Expected for fallible operations, Status for success/failure — no exceptions cross the plugin ABI boundary -- Host manages flush and commit lifecycle — plugins don't call flush or commit directly - -## 7. Deferred / Out of Scope - -- RLE (run-length encoding) for repetitive numeric data -- Asynchronous plugin staging queue with background commit thread -- Schema version history tracking -- Persistence (save/load to disk) -- Advanced time-domain alignment (cross-source interpolation) diff --git a/pj_datastore/docs/USER_GUIDE.md b/pj_datastore/docs/USER_GUIDE.md deleted file mode 100644 index eb365eb8..00000000 --- a/pj_datastore/docs/USER_GUIDE.md +++ /dev/null @@ -1,384 +0,0 @@ -# pj_datastore User Guide - -How to use pj_datastore to read and write time-series data. This guide is for plugin developers (DataSource, MessageParser, Toolbox) and AI agents implementing plugins. - -Plugins interact with the datastore through host-provided views — never through engine classes directly. The three views are: - -- **SourceWriteHostView** — for DataSource plugins (file importers, streamers) -- **ParserWriteHostView** — for MessageParser plugins (decoders) -- **ToolboxHostView** — for Toolbox plugins (read + write + catalog) - -All are defined in `pj_base/include/pj_base/sdk/plugin_data_api.hpp`. - ---- - -## 1. Data Model - -- **Dataset**: One data source (one file, one network connection). Created automatically by the host. -- **Topic**: A named data stream within a dataset. Has typed columns sharing one timestamp column. Example: a ROS topic `/imu/data` with columns `angular_velocity.x`, `angular_velocity.y`, etc. -- **Field**: A typed column within a topic. Types: `float32`, `float64`, `int32`, `int64`, `uint64`, `bool`, `string`. -- **Timestamp**: `int64_t` nanoseconds since Unix epoch. Always absolute — never subtract a base time during ingestion. - -### ValueRef — Preserve Native Types - -```cpp -using ValueRef = std::variant; -``` - -**Never cast int64 or uint64 to double.** Values larger than 2^53 lose precision. Push native types directly: - -```cpp -// WRONG — loses precision for large integers -fields.push_back({"counter", static_cast(value)}); - -// CORRECT — preserves full precision -fields.push_back({"counter", value}); // value is int64_t -``` - ---- - -## 2. Writing Data - -DataSource and Toolbox write hosts name the target topic on each write. A -MessageParser write host is already bound to one topic by the host, so parser -calls omit the topic argument and write into that bound topic. - -### Step 1: Get a topic handle - -DataSource and Toolbox plugins create or resolve topics explicitly: - -```cpp -auto topic = writeHost().ensureTopic("sensor/imu"); -if (!topic) { /* handle error */ } -``` - -MessageParser plugins do not call `ensureTopic()` on the parser write host. -They only create fields inside the already-bound topic: - -```cpp -auto field = writeHost().ensureField("temperature", PJ::PrimitiveType::kFloat64); -if (!field) { /* handle error */ } -``` - -### Step 2 (optional): Pre-register fields for the bound-write fast path - -For DataSource and Toolbox writers: - -```cpp -writeHost().ensureField(*topic, "accel.x", PJ::PrimitiveType::kFloat64); -writeHost().ensureField(*topic, "accel.y", PJ::PrimitiveType::kFloat64); -writeHost().ensureField(*topic, "accel.z", PJ::PrimitiveType::kFloat64); -writeHost().ensureField(*topic, "label", PJ::PrimitiveType::kString); -``` - -Pre-registration is **optional**. If you skip it, fields are auto-created on -first non-null write via `appendRecord()`. Pre-registering is recommended when -the schema is known upfront because it enables the faster `appendBoundRecord()` -path and avoids mid-stream chunk sealing. - -### Step 3: Append records - -**By name** (flexible, resolves names each call). DataSource and Toolbox -writers pass the topic: - -```cpp -std::vector fields = { - {"accel.x", 9.81}, - {"accel.y", 0.0}, - {"accel.z", -0.05}, - {"label", std::string_view("moving")}, -}; -auto status = writeHost().appendRecord(*topic, timestamp_ns, PJ::Span(fields)); -``` - -MessageParser writers omit the topic: - -```cpp -std::vector fields = { - {"temperature", 23.5}, - {"humidity", 61.0}, -}; -auto status = writeHost().appendRecord(timestamp_ns, PJ::Span(fields)); -``` - -**By handle** (pre-resolved, faster for high-rate data). DataSource and -Toolbox writers pass the topic: - -```cpp -auto fx = writeHost().ensureField(*topic, "accel.x", PJ::PrimitiveType::kFloat64); -auto fy = writeHost().ensureField(*topic, "accel.y", PJ::PrimitiveType::kFloat64); -// ... resolve all fields once ... - -std::vector bound = { - {*fx, 9.81}, - {*fy, 0.0}, -}; -writeHost().appendBoundRecord(*topic, timestamp_ns, PJ::Span(bound)); -``` - -Parser writers use the field handles from their bound topic: - -```cpp -auto temp = writeHost().ensureField("temperature", PJ::PrimitiveType::kFloat64); -std::vector bound = {{*temp, 23.5}}; -writeHost().appendBoundRecord(timestamp_ns, PJ::Span(bound)); -``` - -### Sparse Records - -Not every field needs data on every row. Fields omitted from `appendRecord()` are automatically null-filled. This is the correct way to handle sparse data: - -```cpp -// Row 1: only accel.x has data -fields = {{"accel.x", 1.0}}; -writeHost().appendRecord(*topic, t1, PJ::Span(fields)); -// accel.y, accel.z, label are null for this row - -// Row 2: all fields have data -fields = {{"accel.x", 2.0}, {"accel.y", 3.0}, {"accel.z", 4.0}}; -writeHost().appendRecord(*topic, t2, PJ::Span(fields)); -``` - -### NamedFieldValue.name is std::string - -The `name` field in `NamedFieldValue` is `std::string` (not `string_view`). You can safely use temporary string expressions: - -```cpp -fields.push_back({prefix + "/" + key, value}); // safe — name is owned -``` - -### Bulk Arrow Import - -For high-throughput imports or parser-shaped payloads that already have Arrow -data, use the Arrow C Data Interface (`ArrowArrayStream`). The byte-based -`appendArrowIpc` slot was removed in ABI v4. - -DataSource and Toolbox writers pass the destination topic: - -```cpp -PJ::sdk::ArrowStreamHolder stream(buildMyStream()); -auto status = writeHost().appendArrowStream(*topic, std::move(stream), "timestamp"); -``` - -Parser writers omit the topic because the parser host is bound to one topic: - -```cpp -PJ::sdk::ArrowStreamHolder stream(buildMyPayloadStream()); -auto status = writeHost().appendArrowStream(std::move(stream), "timestamp"); -``` - -`ArrowStreamHolder` is an RAII wrapper that auto-releases the stream; the -`std::move` overload disarms it on success. See `pj_base/sdk/arrow.hpp` for -the holder + stream-builder helpers. - ---- - -## 3. Timestamps - -- Type: `int64_t` (nanoseconds since Unix epoch) -- Must be **monotonically increasing** within each topic -- Convert from seconds: `auto ts = static_cast(epoch_seconds * 1e9);` -- **Never subtract a base time** — display-time subtraction belongs in the UI layer - ---- - -## 4. Delegated Ingest (Streaming Sources) - -Streaming sources that receive pre-encoded messages (e.g., ROS CDR, Protobuf) use delegated ingest. The host routes raw bytes to the appropriate MessageParser plugin. - -```cpp -// In onStart(): bind a parser for each topic/encoding -auto binding = runtimeHost().ensureParserBinding({ - .topic_name = "/camera/image", - .parser_encoding = "cdr", - .type_name = "sensor_msgs/msg/Image", - .schema = PJ::Span(schema_data, schema_size), - .parser_config_json = config_json, -}); - -// In onPoll(): push incoming messages (the host invokes the fetcher per policy) -runtimeHost().pushMessage( - *binding, timestamp_ns, [bytes = payload]() -> std::vector { return bytes; }); -``` - -- `parser_encoding` is the wire format (e.g., `"cdr"`, `"json"`, `"protobuf"`), not the schema format -- Cache binding handles — don't re-resolve on every message -- `onPoll()` must not block — drain buffered data and return immediately - ---- - -## 5. Reading Data - -### Series Reads - -Use `SeriesReader` for application-level access to one numeric/bool field as a -time series. It is created once from a topic and column index, then behaves like -a virtual vector of `(timestamp, value)` samples. Null physical rows are skipped: -if the field has no value at a row timestamp, that row is not a series sample. - -```cpp -auto series_or = reader.series(topic_id, col_index); -if (!series_or) { /* handle invalid topic/column/type */ } - -const PJ::SeriesReader series = *series_or; -auto bounds = series.bounds(); -auto latest = series.sampleAtOrBeforeTime(now_ns); - -series.samples(PJ::Range{.min = t0, .max = t1}) - .forEach([](const PJ::SeriesSample& sample) { - double value = sample.value; - PJ::Timestamp ts = sample.timestamp; - }); -``` - -Series APIs are the default for plotting and other field-level consumers. -Row-level APIs below intentionally expose physical rows and nulls. - -### Range Query - -```cpp -auto reader = engine.createReader(); -auto cursor = reader.rangeQuery({.topic_id = topic_id, .t_min = 0, .t_max = INT64_MAX}); -if (!cursor) { /* handle error */ } - -cursor->forEach([](const PJ::SampleRow& row) { - double x = row.chunk->readNumericAsDouble(0, row.row_index); - int64_t ts = row.chunk->readTimestamp(row.row_index); -}); -``` - -### Latest-At Query - -```cpp -auto sample = reader.latestAt({.topic_id = topic_id, .t = now_ns}); -if (sample && *sample) { - double val = (*sample)->chunk->readNumericAsDouble(0, (*sample)->row_index); -} -``` - -### Read Methods - -| Method | Returns | Null behavior | -|--------|---------|---------------| -| `readNumericAsDouble(col, row)` | `double` | Returns 0.0 for nulls — **caller must check isNull()** | -| `readColumnAsDoubles(col, span, row_start)` | batch into span | **Returns NaN for null positions** (safe) | -| `readNumericAsInt64(col, row)` | `int64_t` | Returns 0 for nulls | -| `readNumericAsUint64(col, row)` | `uint64_t` | Returns 0 for nulls | -| `readString(col, row)` | `string_view` | Points into chunk memory — don't outlive the chunk | -| `readBool(col, row)` | `bool` | Returns false for nulls | -| `isNull(col, row)` | `bool` | Explicit null check | - -### Schema Evolution and Column Bounds - -Early chunks may have fewer columns than later ones (if columns were added via array expansion between chunks). Always check bounds: - -```cpp -if (col_index < row.chunk->columns.size()) { - double val = row.chunk->readNumericAsDouble(col_index, row.row_index); -} -``` - ---- - -## 6. Common Pitfalls - -### Column addition auto-seals the current chunk - -When `appendRecord()` encounters a new field after rows have been written, -the datastore seals the current chunk and adds the column to a fresh one. -Earlier rows (in sealed chunks) have no value for the new column — readers -treat absent columns as null. - -You do NOT need to pre-register all fields before writing. Fields may appear -at any time. Pre-registration with `ensureField()` is still recommended when -the schema is known upfront, as it avoids mid-stream chunk sealing. - -For schema-based parsers (ROS, Protobuf) where a field's type is known but -the value is null, use `TypedNull{type}` instead of `kNull` to create the -column immediately. - -### Don't cast int64/uint64 to double - -Values above 2^53 lose precision. Push the native type via ValueRef. - -### Timestamps are nanoseconds, not seconds - -`auto ts = static_cast(epoch_seconds * 1e9);` — don't forget the `* 1e9`. - -### readNumericAsDouble doesn't check nulls - -For single-value reads, check `isNull(col, row)` first. For batch reads, use `readColumnAsDoubles()` which returns NaN for nulls. - -### string_view lifetime - -`readString()` returns a `string_view` pointing into the chunk's dictionary-encoded memory. Don't store it beyond the chunk's lifetime. - -### Sparse data: pre-register, let null-fill work - -Don't skip rows for topics that have no data at a given timestamp. Instead, write a record with only the fields that have data. The engine null-fills the rest. - ---- - -## 7. Minimal Examples - -### File Importer (CSV pattern) - -```cpp -class MyImporter : public PJ::FileSourceBase { - uint64_t extraCapabilities() const override { return PJ::kCapabilityDirectIngest; } - - PJ::Status importData() override { - auto topic = writeHost().ensureTopic("my_data"); - if (!topic) return PJ::unexpected(topic.error()); - - // Pre-register ALL fields - for (const auto& col_name : column_names) { - writeHost().ensureField(*topic, col_name, PJ::PrimitiveType::kFloat64); - } - - // Write rows - for (const auto& row : parsed_rows) { - std::vector fields; - for (size_t i = 0; i < row.values.size(); i++) { - fields.push_back({column_names[i], row.values[i]}); - } - writeHost().appendRecord(*topic, row.timestamp_ns, PJ::Span(fields)); - } - return PJ::okStatus(); - } -}; -``` - -### Streaming Source (Delegated Ingest) - -```cpp -class MyStreamer : public PJ::StreamSourceBase { - uint64_t extraCapabilities() const override { return PJ::kCapabilityDelegatedIngest; } - - PJ::Status onStart() override { - // Connect, discover topics, create parser bindings - binding_ = *runtimeHost().ensureParserBinding({ - .topic_name = "/data", - .parser_encoding = "json", - .type_name = "MyMessage", - .schema = {}, - }); - return PJ::okStatus(); - } - - PJ::Status onPoll() override { - // Drain buffered messages (must not block!) - while (auto msg = dequeue()) { - runtimeHost().pushMessage( - binding_, msg->timestamp_ns, [bytes = msg->payload]() -> std::vector { return bytes; }); - } - return PJ::okStatus(); - } - - void onStop() override { /* close connections */ } -}; -``` diff --git a/pj_datastore/examples/parquet_import.cpp b/pj_datastore/examples/parquet_import.cpp deleted file mode 100644 index c7aa6003..00000000 --- a/pj_datastore/examples/parquet_import.cpp +++ /dev/null @@ -1,396 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// parquet_import — load a Parquet file into DataEngine, report memory stats. -// -// Usage: ./parquet_import [chunk_rows] - -#include -#include -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "pj_base/number_parse.hpp" -#include "pj_base/span.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/arrow_import.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/encoding.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/topic_storage.hpp" - -namespace { - -using PJ::DataEngine; -using PJ::EncodingType; -using PJ::PrimitiveType; -using PJ::TopicChunk; - -// ── Local column mapping for report formatting ────────────────────────────── - -struct ColumnMapping { - int arrow_index; // column index in Arrow table - std::size_t pj_col_index; // column index in PJ schema - PrimitiveType pj_type; - std::string name; -}; - -std::string_view encoding_name(EncodingType enc) { - switch (enc) { - case EncodingType::kRaw: - return "Raw"; - case EncodingType::kDictionary: - return "Dictionary"; - case EncodingType::kPackedBool: - return "PackedBool"; - case EncodingType::kConstant: - return "Constant"; - case EncodingType::kFrameOfReference: - return "FrameOfRef"; - } - return "Unknown"; -} - -std::string arrow_type_name(const std::shared_ptr& type) { - return type->name(); -} - -// ── Per-column memory measurement ─────────────────────────────────────────── - -struct ColumnMemory { - std::size_t actual_bytes = 0; - std::size_t theoretical_bytes = 0; - EncodingType dominant_encoding = EncodingType::kRaw; -}; - -std::size_t encoded_column_bytes(const TopicChunk& chunk, std::size_t col) { - return std::visit( - [](const auto& v) -> std::size_t { - using T = std::decay_t; - if constexpr (std::is_same_v) { - return v.size(); - } else if constexpr (std::is_same_v) { - return v.value_size; - } else if constexpr (std::is_same_v) { - return v.offsets.size(); - } else if constexpr (std::is_same_v) { - std::size_t dict_bytes = 0; - for (const auto& s : v.dictionary) { - dict_bytes += s.size(); - } - return v.indices.size() + dict_bytes; - } else if constexpr (std::is_same_v) { - return v.bits.size(); - } - }, - chunk.columns[col].data); -} - -std::vector measure_memory( - const std::deque& chunks, std::size_t num_columns, const std::vector& mappings, - const std::shared_ptr& table) { - std::vector result(num_columns); - - // Count encoding occurrences per column to determine dominant encoding - constexpr int kNumEncodings = 5; - std::vector> enc_counts(num_columns, std::vector(kNumEncodings, 0)); - - for (const auto& chunk : chunks) { - for (std::size_t col = 0; col < num_columns; ++col) { - result[col].actual_bytes += encoded_column_bytes(chunk, col); - if (chunk.columns[col].validity_bitmap) { - result[col].actual_bytes += chunk.columns[col].validity_bitmap->sizeBytes(); - } - enc_counts[col][static_cast(chunk.columnEncoding(col))]++; - } - } - - // Determine dominant encoding per column - for (std::size_t col = 0; col < num_columns; ++col) { - uint32_t max_count = 0; - for (int enc = 0; enc < kNumEncodings; ++enc) { - if (enc_counts[col][static_cast(enc)] > max_count) { - max_count = enc_counts[col][static_cast(enc)]; - result[col].dominant_encoding = static_cast(enc); - } - } - } - - // Theoretical bytes: arrow type byte width * total_rows, or actual string - // length for string columns - auto total_rows = static_cast(table->num_rows()); - for (std::size_t col = 0; col < num_columns; ++col) { - const auto& mapping = mappings[col]; - auto arrow_col = table->column(mapping.arrow_index); - auto arrow_type = arrow_col->type(); - - if (arrow_type->id() == arrow::Type::STRING || arrow_type->id() == arrow::Type::LARGE_STRING) { - // Sum actual string data length across all chunks - std::size_t string_bytes = 0; - for (int i = 0; i < arrow_col->num_chunks(); ++i) { - if (arrow_type->id() == arrow::Type::STRING) { - auto arr = std::static_pointer_cast(arrow_col->chunk(i)); - string_bytes += static_cast(arr->total_values_length()); - } else { - auto arr = std::static_pointer_cast(arrow_col->chunk(i)); - string_bytes += static_cast(arr->total_values_length()); - } - } - result[col].theoretical_bytes = string_bytes; - } else if (arrow_type->id() == arrow::Type::BOOL) { - // 1 byte per bool uncompressed - result[col].theoretical_bytes = total_rows; - } else { - auto byte_width = static_cast(arrow_type->byte_width()); - result[col].theoretical_bytes = byte_width * total_rows; - } - } - - return result; -} - -// ── Report formatting ─────────────────────────────────────────────────────── - -void print_report( - const std::vector& mappings, const std::vector& memory, - const std::shared_ptr& table, double ingest_seconds) { - // Column widths - constexpr int kNameW = 24; - constexpr int kTypeW = 12; - constexpr int kEncW = 14; - constexpr int kBytesW = 12; - constexpr int kRatioW = 8; - int total_w = kNameW + kTypeW + kEncW + kBytesW * 2 + kRatioW; - - std::cout << "\n"; - std::cout << "Rows: " << table->num_rows() << "\n"; - std::cout << "Columns: " << mappings.size() << "\n"; - std::cout << "Ingest: " << std::fixed << std::setprecision(3) << ingest_seconds << " s\n\n"; - - // Header - std::cout << std::left << std::setw(kNameW) << "Column" << std::setw(kTypeW) << "Arrow Type" << std::setw(kEncW) - << "PJ Encoding" << std::right << std::setw(kBytesW) << "Actual" << std::setw(kBytesW) << "Theoretical" - << std::setw(kRatioW) << "Ratio" - << "\n"; - std::cout << std::string(static_cast(total_w), '-') << "\n"; - - std::size_t total_actual = 0; - std::size_t total_theoretical = 0; - - for (std::size_t i = 0; i < mappings.size(); ++i) { - const auto& m = mappings[i]; - const auto& mem = memory[i]; - - auto arrow_type = table->column(m.arrow_index)->type(); - double ratio = mem.theoretical_bytes > 0 - ? static_cast(mem.actual_bytes) / static_cast(mem.theoretical_bytes) - : 0.0; - - std::cout << std::left << std::setw(kNameW) << m.name << std::setw(kTypeW) << arrow_type_name(arrow_type) - << std::setw(kEncW) << encoding_name(mem.dominant_encoding) << std::right << std::setw(kBytesW) - << mem.actual_bytes << std::setw(kBytesW) << mem.theoretical_bytes << std::setw(kRatioW - 1) << std::fixed - << std::setprecision(2) << ratio << "x\n"; - - total_actual += mem.actual_bytes; - total_theoretical += mem.theoretical_bytes; - } - - std::cout << std::string(static_cast(total_w), '-') << "\n"; - - double total_ratio = - total_theoretical > 0 ? static_cast(total_actual) / static_cast(total_theoretical) : 0.0; - - std::cout << std::left << std::setw(kNameW) << "TOTAL" << std::setw(kTypeW) << "" << std::setw(kEncW) << "" - << std::right << std::setw(kBytesW) << total_actual << std::setw(kBytesW) << total_theoretical - << std::setw(kRatioW - 1) << std::fixed << std::setprecision(2) << total_ratio << "x\n\n"; -} - -} // namespace - -int main(int argc, char* argv[]) { - if (argc < 2) { - std::cerr << "Usage: " << argv[0] << " [chunk_rows]\n"; - return 1; - } - - const std::string path = argv[1]; - uint32_t chunk_rows = 8192; - if (argc >= 3) { - const auto parsed = PJ::parseNumber(argv[2]); - if (!parsed.has_value() || *parsed == 0) { - std::cerr << "Invalid chunk_rows value: " << argv[2] << " (expected a positive integer)\n"; - return 1; - } - chunk_rows = *parsed; - } - - // ── 1. Open Parquet file ────────────────────────────────────────────────── - - auto maybe_infile = arrow::io::ReadableFile::Open(path); - if (!maybe_infile.ok()) { - std::cerr << "Failed to open file: " << maybe_infile.status().ToString() << "\n"; - return 1; - } - - auto maybe_reader = parquet::arrow::OpenFile(*maybe_infile, arrow::default_memory_pool()); - if (!maybe_reader.ok()) { - std::cerr << "Failed to open Parquet reader: " << maybe_reader.status().ToString() << "\n"; - return 1; - } - auto arrow_reader = std::move(*maybe_reader); - - std::shared_ptr table; - auto st = arrow_reader->ReadTable(&table); - if (!st.ok()) { - std::cerr << "Failed to read table: " << st.ToString() << "\n"; - return 1; - } - - std::cout << "Loaded " << path << ": " << table->num_rows() << " rows, " << table->num_columns() << " columns\n"; - - // ── 2. Serialize Arrow Table to IPC stream bytes ───────────────────────── - - auto sink_result = arrow::io::BufferOutputStream::Create(); - if (!sink_result.ok()) { - std::cerr << "Failed to create buffer output stream: " << sink_result.status().ToString() << "\n"; - return 1; - } - auto sink = *sink_result; - - auto ipc_writer_result = arrow::ipc::MakeStreamWriter(sink, table->schema()); - if (!ipc_writer_result.ok()) { - std::cerr << "Failed to create IPC writer: " << ipc_writer_result.status().ToString() << "\n"; - return 1; - } - auto ipc_writer = *ipc_writer_result; - - st = ipc_writer->WriteTable(*table); - if (!st.ok()) { - std::cerr << "IPC WriteTable failed: " << st.ToString() << "\n"; - return 1; - } - st = ipc_writer->Close(); - if (!st.ok()) { - std::cerr << "IPC writer Close failed: " << st.ToString() << "\n"; - return 1; - } - - auto ipc_buf_result = sink->Finish(); - if (!ipc_buf_result.ok()) { - std::cerr << "Failed to finish IPC buffer: " << ipc_buf_result.status().ToString() << "\n"; - return 1; - } - auto ipc_buffer = *ipc_buf_result; - - PJ::Span ipc_bytes(ipc_buffer->data(), static_cast(ipc_buffer->size())); - - std::cout << "Serialized to IPC: " << ipc_buffer->size() << " bytes\n"; - - // ── 3. Map IPC schema → TypeTreeNode via arrow_import ────────────────── - - auto schema_result = PJ::arrow_import::schemaFromIpc(ipc_bytes); - if (!schema_result.has_value()) { - std::cerr << "Schema conversion failed: " << schema_result.error() << "\n"; - return 1; - } - auto& [type_tree, arrow_mappings] = *schema_result; - - // Build local ColumnMapping for the report - std::vector mappings; - mappings.reserve(arrow_mappings.size()); - for (const auto& am : arrow_mappings) { - ColumnMapping m; - m.arrow_index = am.arrow_column_index; - m.pj_col_index = am.pj_column_index; - m.pj_type = am.pj_type; - m.name = am.field_name; - mappings.push_back(std::move(m)); - } - - // ── 4. Create DataEngine, dataset, schema, topic ────────────────────────── - - DataEngine engine; - - auto td_or = engine.createTimeDomain("default"); - if (!td_or.has_value()) { - std::cerr << "Failed to create time domain: " << td_or.error() << "\n"; - return 1; - } - - PJ::DatasetDescriptor ds_desc; - ds_desc.source_name = path; - ds_desc.time_domain_id = *td_or; - auto ds_or = engine.createDataset(std::move(ds_desc)); - if (!ds_or.has_value()) { - std::cerr << "Failed to create dataset: " << ds_or.error() << "\n"; - return 1; - } - auto dataset_id = *ds_or; - - auto writer = engine.createWriter(); - - auto schema_or = writer.registerSchema("parquet_schema", type_tree); - if (!schema_or.has_value()) { - std::cerr << "Failed to register schema: " << schema_or.error() << "\n"; - return 1; - } - - PJ::TopicDescriptor topic_desc; - topic_desc.name = "parquet_data"; - topic_desc.schema_id = *schema_or; - topic_desc.dataset_id = dataset_id; - topic_desc.max_chunk_rows = chunk_rows; - - auto topic_or = writer.registerTopic(dataset_id, std::move(topic_desc)); - if (!topic_or.has_value()) { - std::cerr << "Failed to register topic: " << topic_or.error() << "\n"; - return 1; - } - auto topic_id = *topic_or; - - // ── 5. Bulk ingest via IPC import API ──────────────────────────────────── - - auto t_start = std::chrono::steady_clock::now(); - - auto import_st = PJ::arrow_import::importIpcStream(writer, topic_id, ipc_bytes, arrow_mappings); - if (!import_st.has_value()) { - std::cerr << "import_ipc_stream failed: " << import_st.error() << "\n"; - return 1; - } - - // Flush and commit - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - auto t_end = std::chrono::steady_clock::now(); - double ingest_seconds = std::chrono::duration(t_end - t_start).count(); - - // ── 6. Memory report ───────────────────────────────────────────────────── - - const auto* storage = engine.getTopicStorage(topic_id); - if (storage == nullptr) { - std::cerr << "Topic storage not found after commit.\n"; - return 1; - } - - auto memory = measure_memory(storage->sealedChunks(), mappings.size(), mappings, table); - print_report(mappings, memory, table, ingest_seconds); - - // Cross-check with TopicMetadata - auto meta = storage->metadata(); - std::cout << "TopicMetadata.total_byte_size: " << meta.total_byte_size << "\n"; - std::cout << "TopicMetadata.total_row_count: " << meta.total_row_count << "\n"; - - return 0; -} diff --git a/pj_datastore/include/pj_datastore/arrow_import.hpp b/pj_datastore/include/pj_datastore/arrow_import.hpp deleted file mode 100644 index d328989c..00000000 --- a/pj_datastore/include/pj_datastore/arrow_import.hpp +++ /dev/null @@ -1,68 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/plugin_data_api.h" // for ArrowArrayStream forward-declared in the Arrow C Data Interface block -#include "pj_base/span.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ::arrow_import { - -/// Describes how an Arrow column maps to a PJ topic column. -struct ArrowColumnMapping { - /// Source column index in Arrow batch/table schema. - int arrow_column_index; - /// Destination column index in PJ flattened schema. - std::size_t pj_column_index; - /// Target primitive type used by PJ storage. - PrimitiveType pj_type; - /// Source field name. - std::string field_name; -}; - -/// Parse schema from Arrow IPC stream bytes (reads first message). -/// Returns a TypeTreeNode and column mappings for supported types. -/// Unsupported Arrow types are skipped. -[[nodiscard]] PJ::Expected, std::vector>> schemaFromIpc( - PJ::Span ipc_stream); - -/// Import all record batches from Arrow IPC stream bytes into a DataWriter topic. -/// -/// timestamp_column: which Arrow column contains timestamps (as int64). -/// If -1, row indices (0, 1, 2, ...) are used as timestamps. -[[nodiscard]] PJ::Status importIpcStream( - DataWriter& writer, TopicId topic_id, PJ::Span ipc_stream, - const std::vector& mappings, int timestamp_column = -1); - -/// Import all record batches from a live Arrow C Data Interface stream. -/// This is the v4 in-memory path — no IPC parse, plugin hands the stream. -/// -/// Ownership: on success, the caller retains responsibility for releasing -/// @p stream (the importer does NOT call stream->release). This lets the -/// caller enforce the ownership contract on the ABI boundary: host-side -/// code that got the stream from a plugin releases on success, retains on -/// failure, all at the outermost ABI frame. -/// -/// The mappings vector must match the stream's schema (same columns in -/// the same order). @p timestamp_column is an index into the stream's -/// schema, or -1 for synthetic sequential timestamps. -[[nodiscard]] PJ::Status importArrowStream( - DataWriter& writer, TopicId topic_id, struct ::ArrowArrayStream* stream, - const std::vector& mappings, int timestamp_column = -1); - -/// Parse schema from a live Arrow C Data Interface stream (reads schema only; -/// does not consume batches). Caller retains ownership of @p stream. -[[nodiscard]] PJ::Expected, std::vector>> -schemaFromArrowStream(struct ::ArrowArrayStream* stream); - -} // namespace PJ::arrow_import diff --git a/pj_datastore/include/pj_datastore/buffer.hpp b/pj_datastore/include/pj_datastore/buffer.hpp deleted file mode 100644 index 28ca943e..00000000 --- a/pj_datastore/include/pj_datastore/buffer.hpp +++ /dev/null @@ -1,106 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include - -#include "pj_base/span.hpp" - -namespace PJ { - -/// Growable byte buffer used by column/chunk encodings. -class RawBuffer { - public: - /// Construct an empty buffer. - RawBuffer() = default; - - /// Construct an empty buffer reserving `initial_capacity` bytes. - explicit RawBuffer(std::size_t initial_capacity); - - /// Ensure the buffer can hold at least `capacity` bytes without reallocation. - void reserve(std::size_t capacity); - - /// Append `size` bytes from `data`. - void append(const void* data, std::size_t size); - - /// Resize to exactly `new_size` bytes. - void resize(std::size_t new_size); - - /// Reset size to zero, preserving capacity. - void clear(); - - [[nodiscard]] const uint8_t* data() const noexcept; - - [[nodiscard]] uint8_t* mutable_data() noexcept; - - [[nodiscard]] std::size_t size() const noexcept; - - [[nodiscard]] std::size_t capacity() const noexcept; - - [[nodiscard]] bool empty() const noexcept; - - private: - std::vector data_; -}; - -/// Owning packed validity bitmap (Arrow-compatible LSB-first layout). -class BitVector { - public: - /// Construct an empty bit vector. - BitVector() = default; - - /// Required bytes for `num_bits` bits. - [[nodiscard]] static constexpr std::size_t bytesForBits(std::size_t num_bits) noexcept { - return (num_bits + 7) / 8; - } - - /// Initialize to `num_bits` bits, all set to valid. - void initValid(std::size_t num_bits); - - /// Ensure capacity for at least `num_bits` bits. - void ensureSize(std::size_t num_bits); - - /// Mark one bit as valid. - void setValid(std::size_t bit_index); - - /// Mark one bit as null. - void setNull(std::size_t bit_index); - - /// Return true if one bit is valid. - [[nodiscard]] bool isValid(std::size_t bit_index) const; - - /// Count null bits in the first `num_bits` bits. - [[nodiscard]] std::size_t countNulls(std::size_t num_bits) const; - - /// Replace bytes from `bytes` and set total bit count. - void assignBytes(Span bytes, std::size_t bit_count); - - /// Reset to empty. - void clear(); - - /// Return non-owning bit view. - [[nodiscard]] PJ::BitSpan bitSpan() const noexcept; - - /// Return underlying bytes. - [[nodiscard]] const uint8_t* data() const noexcept; - - /// Return mutable underlying bytes. - [[nodiscard]] uint8_t* mutable_data() noexcept; - - /// Return byte count of packed storage. - [[nodiscard]] std::size_t sizeBytes() const noexcept; - - /// Return bit count tracked by this vector. - [[nodiscard]] std::size_t sizeBits() const noexcept; - - /// Return true when no bits are stored. - [[nodiscard]] bool empty() const noexcept; - - private: - std::vector bytes_; - std::size_t bit_count_ = 0; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/builtin_transforms.hpp b/pj_datastore/include/pj_datastore/builtin_transforms.hpp deleted file mode 100644 index 11d04690..00000000 --- a/pj_datastore/include/pj_datastore/builtin_transforms.hpp +++ /dev/null @@ -1,31 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// Built-in ISISOTransform implementations registerable with DerivedEngine. -// See derived_engine.hpp for the sequential calculate()/reset() contract. - -#include "pj_base/types.hpp" -#include "pj_datastore/derived_engine.hpp" - -namespace PJ { - -// --------------------------------------------------------------------------- -// DerivativeTransform -// --------------------------------------------------------------------------- -// Numerical derivative: d(value)/d(t) in units/second. -// Skips the first row (no previous sample). Assumes float64 input/output. -class DerivativeTransform : public ISISOTransform { - PJ::Timestamp prev_time_ = 0; - double prev_value_ = 0.0; - bool has_prev_ = false; - - public: - void reset() override; - - [[nodiscard]] StorageKind outputKind(StorageKind input_kind) const override; - - bool calculate(PJ::Timestamp time, const VarValue& input, PJ::Timestamp& out_time, VarValue& out_value) override; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/chunk.hpp b/pj_datastore/include/pj_datastore/chunk.hpp deleted file mode 100644 index e6b7ac03..00000000 --- a/pj_datastore/include/pj_datastore/chunk.hpp +++ /dev/null @@ -1,211 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include - -#include "pj_base/span.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/buffer.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/encoding.hpp" - -namespace PJ { - -// Import base types into engine namespace -using PJ::BitSpan; -using PJ::ChunkId; -using PJ::kInvalidChunkId; -using PJ::SchemaId; -using PJ::Span; -using PJ::Timestamp; -using PJ::TopicId; - -struct ColumnStats { - /// Number of null rows in this column within the chunk. - uint32_t null_count = 0; - /// Number of value runs (for compression heuristics). - uint32_t run_count = 0; - /// True if all non-null values are equal. - bool is_constant = true; - /// Minimum non-null numeric value, if applicable. - std::optional min_value; - /// Maximum non-null numeric value, if applicable. - std::optional max_value; -}; - -/// Per-chunk aggregate statistics. -struct ChunkStats { - /// Minimum timestamp in chunk. - Timestamp t_min = std::numeric_limits::max(); - /// Maximum timestamp in chunk. - Timestamp t_max = std::numeric_limits::min(); - /// Number of rows in chunk. - uint32_t row_count = 0; - /// Per-column statistics aligned to schema columns. - std::vector column_stats; -}; - -/// Immutable sealed storage unit produced by TopicChunkBuilder::seal(): a -/// timestamp column plus per-column EncodedData + optional validity bitmap. -/// Read accessors do NOT null-check unless noted (readColumnAsDoubles fills NaN); -/// readString() views chunk-internal dictionary memory. -struct TopicChunk { - /// Chunk identifier. - ChunkId id = 0; - /// Owning topic id. - TopicId topic_id = 0; - /// Schema version used when chunk was produced. - SchemaId schema_version = 0; - /// Chunk-level and column-level stats. - ChunkStats stats; - - // Raw timestamp column (one int64 per row) - /// One timestamp per row. - std::vector timestamps; - - struct Column { - encoding::EncodedData data; - std::optional validity_bitmap; - std::shared_ptr descriptor; - }; - - std::vector columns; - - /// Derive encoding type from the EncodedData variant index. - [[nodiscard]] EncodingType columnEncoding(std::size_t index) const; - - /// Read timestamp at row index. - [[nodiscard]] Timestamp readTimestamp(std::size_t row) const; - - /// Bulk-read timestamps into `out` starting at `row_start`. - void readTimestamps(Span out, std::size_t row_start) const; - - /// Read numeric value at `col_index,row` as double. - [[nodiscard]] double readNumericAsDouble(std::size_t col_index, std::size_t row) const; - - /// Read numeric value at `col_index,row` as int64_t. - [[nodiscard]] int64_t readNumericAsInt64(std::size_t col_index, std::size_t row) const; - - /// Read numeric value at `col_index,row` as uint64_t. - [[nodiscard]] uint64_t readNumericAsUint64(std::size_t col_index, std::size_t row) const; - - /// Read string value at `col_index,row`. - [[nodiscard]] std::string_view readString(std::size_t col_index, std::size_t row) const; - - /// Read boolean value at `col_index,row`. - [[nodiscard]] bool readBool(std::size_t col_index, std::size_t row) const; - - /// Return true if value at `col_index,row` is null. - [[nodiscard]] bool isNull(std::size_t col_index, std::size_t row) const; - - // Bulk read: switch on type once, then tight inner loop. - // For kBool/kString columns, fills NaN. - /// Decode a numeric range into `out` starting at `row_start`. - void readColumnAsDoubles(std::size_t col_index, Span out, std::size_t row_start) const; -}; - -/// Mutable accumulator for one topic's rows; sealed into an immutable TopicChunk. -/// Two append paths (row-at-a-time begin/set/finish, and bulk -/// appendTimestamps/appendColumn/finishBulkAppend) that must not be interleaved -/// within a row. Picks per-column encoding at seal() time from accumulated stats. -class TopicChunkBuilder { - public: - /// Create a builder for one topic/schema pair. - TopicChunkBuilder(TopicId topic_id, SchemaId schema_id, std::vector columns, uint32_t max_rows); - - // Start a new row with the given timestamp - /// Begin a new row at `timestamp`. - void beginRow(Timestamp timestamp); - - /// Set a typed value for the current row. - /// Supported T: float, double, int32_t, int64_t, uint64_t, bool, std::string_view. - template - void set(std::size_t col_index, T value); - - /// Mark value as null for current row. - void setNull(std::size_t col_index); - - // Finalize the current row (append all columns) - /// Finalize current row, auto-filling unset columns with null. - void finishRow(); - - // ---- Bulk column append ---- - // Call appendTimestamps first, then appendColumn for each column, - // then appendColumnValidity for columns with nulls, then finishBulkAppend. - // Stats are computed in finishBulkAppend using the column's validity bitmap. - - /// Append a contiguous timestamp batch. - void appendTimestamps(Span timestamps); - - /// Append a typed column batch. - /// Supported T: float, double, int32_t, int64_t, uint64_t, uint8_t (bool bytes). - template - void appendColumn(std::size_t col_index, Span data); - - /// Append a string column batch from offsets+data views. - void appendColumnStrings(std::size_t col_index, Span offsets, Span data); - - /// Append validity bits for last appended rows of this column. - void appendColumnValidity(std::size_t col_index, BitSpan validity); - - /// Finalize pending bulk append and compute stats. - void finishBulkAppend(); - - /// Remaining row capacity before auto-seal. - [[nodiscard]] uint32_t remainingCapacity() const noexcept; - - /// True if no more rows can be appended. - [[nodiscard]] bool isFull() const noexcept; - - /// Number of finalized rows. - [[nodiscard]] uint32_t rowCount() const noexcept; - - /// True if beginRow() has been called but finishRow() has not yet been called. - [[nodiscard]] bool isRowInProgress() const noexcept; - - /// Current chunk statistics. - [[nodiscard]] const ChunkStats& stats() const noexcept; - - /// Last appended timestamp. - [[nodiscard]] Timestamp lastTimestamp() const noexcept; - - // Seal: finalize stats, apply encodings, produce immutable TopicChunk - /// Seal builder into immutable chunk. - [[nodiscard]] TopicChunk seal(); - - private: - TopicId topic_id_; - SchemaId schema_id_; - uint32_t max_rows_; - static inline std::atomic next_chunk_id_{1}; // monotonic counter - - std::vector timestamps_; - std::vector columns_; - std::vector column_descriptors_; - ChunkStats stats_; - - // Track per-row state during begin_row/finish_row - bool row_in_progress_ = false; - Timestamp current_timestamp_ = 0; - - Timestamp last_timestamp_ = std::numeric_limits::min(); - std::vector last_column_values_; - std::size_t bulk_pending_rows_ = 0; // rows added via bulk but not yet finished - - void updateColumnStats(std::size_t col_index, double value); - - // Bulk stats computation: single pass over column buffer data. - // Called by finishBulkAppend() after both data and validity are set. - // Reads from column buffer and skips null positions via validity bitmap. - void computeBulkNumericStats(std::size_t col_index, StorageKind kind, std::size_t first_row, std::size_t count); - - void computeBulkStringStats(std::size_t col_index, std::size_t first_row, std::size_t count); -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/colormap_registry.hpp b/pj_datastore/include/pj_datastore/colormap_registry.hpp deleted file mode 100644 index d43776cc..00000000 --- a/pj_datastore/include/pj_datastore/colormap_registry.hpp +++ /dev/null @@ -1,66 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include - -namespace PJ { - -/// Signature of a color evaluation callback. Receives a scalar value and a -/// user-provided context pointer; returns a CSS color name or "#rrggbb" hex -/// string. The returned pointer must remain valid until the next call to the -/// same callback. -using ColorMapEvalFn = const char* (*)(double value, void* user_ctx); - -/// Registry of named colormap callbacks. -/// -/// Plugins register one or more named maps during the lifetime of their -/// dialog, pick one as active, and consumers (chart renderers, exporters) -/// evaluate the active map per data point. -/// -/// The `DatastoreToolboxHost` owns one instance and forwards -/// `register_colormap`/`unregister_colormap` calls received through the C ABI -/// vtable. Consumers read through `DatastoreToolboxHost::colorMaps()`. -class ColorMapRegistry { - public: - ColorMapRegistry() = default; - ~ColorMapRegistry() = default; - - ColorMapRegistry(const ColorMapRegistry&) = delete; - ColorMapRegistry& operator=(const ColorMapRegistry&) = delete; - - /// Register or replace a named colormap. The newly registered map becomes - /// active; call `setActive()` afterwards to switch to a different one. - void registerMap(std::string_view name, ColorMapEvalFn eval_fn, void* user_ctx); - - /// Unregister a colormap by name. If it was active, clears the active - /// selection — subsequent `evaluate()` calls return an empty string. - void unregisterMap(std::string_view name); - - /// Set the active colormap by name. No-op if `name` is not registered. - void setActive(std::string_view name); - - /// Evaluate the active colormap for a scalar value. Returns empty when no - /// colormap is active. - [[nodiscard]] std::string evaluate(double value) const; - - /// True when a colormap is active and its callback is available. - [[nodiscard]] bool hasActive() const; - - /// Name of the currently active colormap, or empty string when none. - [[nodiscard]] const std::string& activeName() const { - return active_; - } - - private: - struct Entry { - ColorMapEvalFn eval_fn; - void* user_ctx; - }; - std::unordered_map maps_; - std::string active_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/colormap_registry_host.hpp b/pj_datastore/include/pj_datastore/colormap_registry_host.hpp deleted file mode 100644 index 1d445ad6..00000000 --- a/pj_datastore/include/pj_datastore/colormap_registry_host.hpp +++ /dev/null @@ -1,19 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_base/plugin_data_api.h" - -namespace PJ { - -class ColorMapRegistry; - -/// Wrap a `ColorMapRegistry` as a C ABI `PJ_colormap_registry_t` so it can be -/// bound into a toolbox plugin via `bind_colormap_registry`. -/// -/// The returned fat pointer references `registry` by address; the registry -/// must outlive every plugin instance it is bound to. The vtable itself is a -/// static singleton — safe to share across plugins and threads. -[[nodiscard]] PJ_colormap_registry_t makeColorMapRegistryHost(ColorMapRegistry& registry); - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/column_buffer.hpp b/pj_datastore/include/pj_datastore/column_buffer.hpp deleted file mode 100644 index cfba3e65..00000000 --- a/pj_datastore/include/pj_datastore/column_buffer.hpp +++ /dev/null @@ -1,242 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include - -#include "pj_base/span.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/buffer.hpp" - -namespace PJ { - -// Import base types into engine namespace -using PJ::BitSpan; -using PJ::FieldId; -using PJ::PrimitiveType; -using PJ::Span; - -// Physical storage category. Narrow integers (int8, int16) widen to int64; - -// int32 is kept as a dedicated storage kind because it's extremely common. -// Narrow unsigned integers (uint8..uint32) widen to uint64. -// FOR compression recovers further byte savings at seal time. -enum class StorageKind : uint8_t { - kFloat32, - kFloat64, - kInt32, - kInt64, - kUint64, - kBool, - kString, -}; - -[[nodiscard]] constexpr StorageKind storageKindOf(PrimitiveType t) noexcept { - switch (t) { - case PrimitiveType::kFloat32: - return StorageKind::kFloat32; - case PrimitiveType::kFloat64: - return StorageKind::kFloat64; - case PrimitiveType::kInt8: - case PrimitiveType::kInt16: - return StorageKind::kInt64; - case PrimitiveType::kInt32: - return StorageKind::kInt32; - case PrimitiveType::kInt64: - return StorageKind::kInt64; - case PrimitiveType::kUint8: - case PrimitiveType::kUint16: - case PrimitiveType::kUint32: - case PrimitiveType::kUint64: - return StorageKind::kUint64; - case PrimitiveType::kBool: - return StorageKind::kBool; - case PrimitiveType::kString: - return StorageKind::kString; - case PrimitiveType::kUnspecified: - break; - } - return StorageKind::kFloat64; -} - -// Byte size of a StorageKind's fixed-width element. Returns 0 for kString. -[[nodiscard]] constexpr std::size_t storageKindSize(StorageKind k) noexcept { - switch (k) { - case StorageKind::kFloat32: - return sizeof(float); - case StorageKind::kFloat64: - return sizeof(double); - case StorageKind::kInt32: - return sizeof(int32_t); - case StorageKind::kInt64: - return sizeof(int64_t); - case StorageKind::kUint64: - return sizeof(uint64_t); - case StorageKind::kBool: - return sizeof(uint8_t); - case StorageKind::kString: - return 0; - } - return 0; -} - -enum class EncodingType : uint8_t { - kRaw, // Unencoded typed storage - kDictionary, // Dictionary encoding (strings) - kPackedBool, // Packed bitfield (bools) - kConstant, // Single repeated value - kFrameOfReference, // Min-subtracted narrowed offsets -}; - -/// Flattened column descriptor derived from a schema leaf. -struct ColumnDescriptor { - /// Stable field id assigned by the writer. - FieldId field_id; - /// Logical field type from schema. - PrimitiveType logical_type; // Full logical type for metadata/schema - /// Fully-qualified field path (e.g. "pose.position.x"). - std::string field_path; // e.g., "position.x" -}; - -/// In-memory typed column buffer with optional validity bitmap. -class TypedColumnBuffer { - public: - /// Construct a buffer for one logical column. - explicit TypedColumnBuffer(ColumnDescriptor descriptor); - - /// Descriptor metadata used by this buffer. - [[nodiscard]] const ColumnDescriptor& descriptor() const noexcept; - - /// Number of appended rows. - [[nodiscard]] std::size_t rowCount() const noexcept; - - /// True if at least one row is null. - [[nodiscard]] bool hasNulls() const noexcept; - - /// True if row is valid (non-null). - [[nodiscard]] bool isValid(std::size_t row) const noexcept; - - // Append typed values (7 storage types) - /// Append one float32 value. - void appendFloat32(float value); - - /// Append one float64 value. - void appendFloat64(double value); - - /// Append one int32 value. - void appendInt32(int32_t value); - - /// Append one int64 value. - void appendInt64(int64_t value); - - /// Append one uint64 value. - void appendUint64(uint64_t value); - - /// Append one bool value (stored as uint8 0/1). - void appendBool(bool value); - - /// Append one UTF-8 string. - void appendString(std::string_view value); - - /// Append a null row (value slot is zero-filled). - void appendNull(); - - // Read typed values (7 storage types) - /// Read one float32 value. - [[nodiscard]] float readFloat32(std::size_t row) const; - - /// Read one float64 value. - [[nodiscard]] double readFloat64(std::size_t row) const; - - /// Read one int32 value. - [[nodiscard]] int32_t readInt32(std::size_t row) const; - - /// Read one int64 value. - [[nodiscard]] int64_t readInt64(std::size_t row) const; - - /// Read one uint64 value. - [[nodiscard]] uint64_t readUint64(std::size_t row) const; - - /// Read one bool value. - [[nodiscard]] bool readBool(std::size_t row) const; - - /// Read one string view. - [[nodiscard]] std::string_view readString(std::size_t row) const; - - /// Return true if row is null. - [[nodiscard]] bool isNull(std::size_t row) const; - - // Read any numeric column as double (for stats, display). - // For string columns, returns NaN. - [[nodiscard]] double readAsDouble(std::size_t row) const; - - // ---- Bulk append (contiguous memcpy-based) ---- - /// Append contiguous float32 values. - void appendFloat32Bulk(Span data); - - /// Append contiguous float64 values. - void appendFloat64Bulk(Span data); - - /// Append contiguous int32 values. - void appendInt32Bulk(Span data); - - /// Append contiguous int64 values. - void appendInt64Bulk(Span data); - - /// Append contiguous uint64 values. - void appendUint64Bulk(Span data); - - /// Append contiguous bool bytes (0/1). - void appendBoolBulk(Span data); - - /// Append strings from Arrow-compatible offset+data layout. - /// offsets has (count + 1) entries; data contains the concatenated strings. - void appendStringsBulk(Span offsets, Span data); - - /// Append a validity bitmap for the most recently appended `count` rows. - /// Arrow-compatible bit layout. bit_offset is the starting bit within bitmap. - void appendValidityBulk(BitSpan validity); - - // Access underlying buffers (for encoding at seal time) - /// Raw value bytes. - [[nodiscard]] const RawBuffer& valueBuffer() const noexcept; - - /// Packed validity bitmap. - [[nodiscard]] const BitVector& validityBuffer() const noexcept; - - /// String offsets bytes (uint32 array). - [[nodiscard]] const RawBuffer& offsetsBuffer() const noexcept; // strings only - - private: - /// Column descriptor metadata. - ColumnDescriptor descriptor_; - /// Raw value payload. - RawBuffer values_; - /// Packed validity bits. - BitVector validity_; - /// String offset buffer. - RawBuffer offsets_; // For string: offset array (uint32_t per entry + 1 sentinel) - /// Number of rows currently stored. - std::size_t row_count_ = 0; - /// Number of null rows. - std::size_t null_count_ = 0; - /// Whether validity_ has been initialized. - bool validity_initialized_ = false; - - void ensureValidityInitialized(); - - template - void appendFixed(T value); - - template - void appendFixedBulk(Span data); - - template - [[nodiscard]] T readFixed(std::size_t row) const; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/derived_engine.hpp b/pj_datastore/include/pj_datastore/derived_engine.hpp deleted file mode 100644 index a3f523c1..00000000 --- a/pj_datastore/include/pj_datastore/derived_engine.hpp +++ /dev/null @@ -1,180 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/span.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/column_buffer.hpp" - -namespace PJ { - -class DataEngine; - -/// Implementation struct — defined in derived_engine.cpp, hidden from callers. -struct DerivedEngineImpl; - -// --------------------------------------------------------------------------- -// VarValue — universal column value type for transform I/O -// --------------------------------------------------------------------------- -// Engine storage kinds map as follows: -// kFloat32, kFloat64 → double (float32 widens losslessly) -// kInt8 … kInt64, kBool → int64_t (sign-extend; bool → 0/1) -// kUint64 → uint64_t (lossless) -// kString → std::string -using VarValue = std::variant; - -// --------------------------------------------------------------------------- -// ISISOTransform — single-input / single-output transform -// --------------------------------------------------------------------------- -// SEQUENTIAL CONTRACT (fundamental): -// The engine calls calculate() once per sample, strictly in ascending -// timestamp order. Implementations may therefore accumulate state freely -// in member variables between calls (e.g. previous value for derivative, -// ring buffer for moving average, running sum for integral). -// State persists across chunk boundaries — the engine never resets it -// during incremental scheduling. -// reset() is the only path that clears state; the engine calls it -// exclusively before a full batch recompute. -class ISISOTransform { - public: - virtual ~ISISOTransform() = default; - - /// Clear all accumulated state. Called by DerivedEngine before batch recompute. - /// After reset(), the next calculate() call must behave as if no data has - /// been seen (same as a freshly constructed instance). - virtual void reset() {} - - /// Declare the StorageKind of the output column. Called once at registration. - /// Default: kFloat64 (suitable for most numeric filters). - /// Override to preserve integer types or produce strings. - virtual StorageKind outputKind(StorageKind input_kind) const { - (void)input_kind; - return StorageKind::kFloat64; - } - - /// Process one sample. Called in strictly ascending timestamp order. - /// time: sample timestamp (nanoseconds since epoch) - /// input: sample value decoded as VarValue - /// out_time: output timestamp (written by callee; read by engine only when true) - /// out_value: output value (written by callee; read by engine only when true) - /// - /// Returns true to emit a row, false to suppress (e.g. first row of derivative). - /// - /// out_time MAY differ from `time` — time-offset transforms and interpolation - /// may produce output on a different time grid than their input. - /// When true is returned, out_time must be >= all previously returned out_times. - virtual bool calculate(PJ::Timestamp time, const VarValue& input, PJ::Timestamp& out_time, VarValue& out_value) = 0; -}; - -// --------------------------------------------------------------------------- -// IMIMOTransform — multi-input / multi-output transform -// --------------------------------------------------------------------------- -// SEQUENTIAL CONTRACT (fundamental, same as ISISOTransform): -// The engine calls calculate() once per joined sample, strictly in ascending -// timestamp order. State may be accumulated in member variables between calls. -// reset() clears all state; called exclusively before batch recompute. -class IMIMOTransform { - public: - virtual ~IMIMOTransform() = default; - - /// Clear all accumulated state. Called by DerivedEngine before batch recompute. - virtual void reset() {} - - /// Declare output StorageKind for each output topic. - /// Called once at registration with the input kinds (one per input topic). - /// Return one StorageKind per output topic name passed to add_mimo_transform. - virtual std::vector outputKinds(PJ::Span input_kinds) const = 0; - - /// Process one joined sample. Called in strictly ascending timestamp order, - /// only when ALL input topics have a sample at exactly `time`. - /// inputs[i] = value from input topic i (in add_mimo_transform order). - /// out_time = output timestamp (written by callee; read only when true). - /// output = pre-allocated buffer (size == num output topics); fill in-place. - /// output[k] corresponds to output_topic_names[k] from add_mimo_transform. - /// - /// Returns true to emit a row; false to suppress. - /// out_time MAY differ from `time`. When true is returned, out_time must be - /// >= all previously returned out_times. All M output topics share this timestamp. - virtual bool calculate( - PJ::Timestamp time, PJ::Span inputs, PJ::Timestamp& out_time, std::vector& output) = 0; -}; - -// --------------------------------------------------------------------------- -// DerivedEngine -// --------------------------------------------------------------------------- -class DerivedEngine { - public: - explicit DerivedEngine(DataEngine& engine); - ~DerivedEngine(); - DerivedEngine(const DerivedEngine&) = delete; - DerivedEngine& operator=(const DerivedEngine&) = delete; - - // ---- SISO ---------------------------------------------------------------- - // Creates one scalar output topic (StorageKind from op->outputKind()). - // Returns error if: - // - input_topic_id does not exist - // - input topic has more than one column - // - output_topic_name already registered within output_dataset_id - // - // Topics created via DataWriter::registerScalarSeries (schema_id == 0) - // are supported even before any data has been committed: the column layout - // is stored in TopicStorage at registration time. Topics created via - // DataWriter::register_topic with schema_id != 0 are always supported. - // Returns error if the column layout cannot be determined (e.g. a topic - // created with schema_id==0 via the low-level register_topic API with no - // committed chunks and no stored column descriptors). - [[nodiscard]] PJ::Expected addSisoTransform( - PJ::TopicId input_topic_id, std::string output_topic_name, PJ::DatasetId output_dataset_id, - std::unique_ptr op); - - // ---- MIMO ----------------------------------------------------------------- - // All input topics must be single-column (scalar). - // A row is emitted only when ALL input topics share the exact same timestamp. - // Creates output_topic_names.size() new topics (kinds from op->outputKinds()). - [[nodiscard]] PJ::Expected addMimoTransform( - std::vector input_topic_ids, std::vector output_topic_names, - PJ::DatasetId output_dataset_id, std::unique_ptr op); - - // ---- Node management ----------------------------------------------------- - PJ::Status removeNode(PJ::NodeId id); - [[nodiscard]] bool hasNode(PJ::NodeId id) const noexcept; - - // Returns output topic IDs: 1 for SISO, M for MIMO. - [[nodiscard]] std::vector outputTopics(PJ::NodeId id) const; - - // Kahn's topological order (upstream → downstream). - [[nodiscard]] std::vector topologicalOrder() const; - - // ---- Commit-cycle hook --------------------------------------------------- - // Call after DataEngine::commitChunks() with the set of changed topic IDs. - // Marks directly dependent nodes dirty. - void onSourceCommitted(PJ::Span changed_topics); - - // ---- Scheduling ---------------------------------------------------------- - // Run all dirty nodes in topological order (incremental path). - // Use for file/batch playback and tests. Equivalent to passing every - // registered node to scheduleActive(). - PJ::Status scheduleAll(); - - // Run only the specified nodes (and their transitive upstream dependencies) - // that are dirty. Pass the set of nodes whose output topics are currently - // visible in the UI to implement display-lazy scheduling. - PJ::Status scheduleActive(const std::unordered_set& active_nodes); - - // Full history recompute: clear output, reset transform, replay all input. - PJ::Status recompute_batch(PJ::NodeId node_id); - - private: - DataEngine& engine_; - PJ::NodeId next_node_id_ = 1; - std::unique_ptr impl_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/encoding.hpp b/pj_datastore/include/pj_datastore/encoding.hpp deleted file mode 100644 index 1dcde130..00000000 --- a/pj_datastore/include/pj_datastore/encoding.hpp +++ /dev/null @@ -1,149 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include -#include - -#include "pj_base/span.hpp" -#include "pj_datastore/buffer.hpp" -#include "pj_datastore/column_buffer.hpp" // StorageKind - -namespace PJ::encoding { - -// --------------------------------------------------------------------------- -// Constant encoding — stores a single repeated value -// --------------------------------------------------------------------------- -struct ConstantEncoded { - /// Raw bytes of the repeated value (up to 8 bytes). - std::array value_bytes{}; // Single value (max StorageKind size) - /// Storage kind used to interpret `value_bytes`. - StorageKind value_kind; - /// Number of valid bytes in `value_bytes`. - uint8_t value_size = 0; - /// Number of rows represented by this constant. - std::size_t count = 0; -}; - -// --------------------------------------------------------------------------- -// Frame of Reference — subtract min, store narrowed offsets -// (Applied to kInt32 and kInt64 columns; kUint64 is excluded because large -// unsigned values overflow the int64_t reference.) -// --------------------------------------------------------------------------- -struct FrameOfReferenceEncoded { - /// Base value added to each offset during decode. - int64_t reference = 0; // min value - /// Bytes per encoded offset (1, 2, or 4). - uint8_t offset_bytes = 0; // 1, 2, or 4 - /// Packed offset payload. - RawBuffer offsets; // packed as uint8/uint16/uint32 - /// Number of encoded rows. - std::size_t count = 0; -}; - -// --------------------------------------------------------------------------- -// Dictionary encoding for strings (with narrowed indices) -// --------------------------------------------------------------------------- -struct DictionaryEncoded { - /// Unique dictionary entries. - std::vector dictionary; // unique values in insertion order - /// Packed dictionary indices. - RawBuffer indices; // stored as uint8/uint16/uint32 - /// Bytes per index (1, 2, or 4). - uint8_t index_bytes = 4; // 1, 2, or 4 - /// Number of encoded rows. - std::size_t count = 0; -}; - -// Encode a string column into dictionary form. -// Takes raw string data (offsets buffer + value buffer from TypedColumnBuffer). -[[nodiscard]] DictionaryEncoded dictionaryEncodeStrings( - PJ::Span offsets_data, PJ::Span values_data, std::size_t row_count); - -[[nodiscard]] std::string_view dictionaryLookup(const DictionaryEncoded& encoded, std::size_t row); - -// --------------------------------------------------------------------------- -// Packed bitfield for bools (1 bit per value, LSB first like Arrow validity bitmaps) -// --------------------------------------------------------------------------- -struct PackedBools { - /// Packed bit values (LSB first). - RawBuffer bits; - /// Number of boolean rows. - std::size_t count = 0; -}; - -// Pack bool values (stored as uint8_t 0/1) into a bitfield -/// Pack bool bytes into a compact bitfield. -[[nodiscard]] PackedBools packBools(PJ::Span values); - -/// Read one bool from a packed bitfield. -[[nodiscard]] bool unpackBool(const PackedBools& packed, std::size_t index); - -// --------------------------------------------------------------------------- -// Unified per-column encoding data variant -// --------------------------------------------------------------------------- -using EncodedData = std::variant; - -// --------------------------------------------------------------------------- -// Constant encoding functions -// --------------------------------------------------------------------------- -/// Build constant encoding from one repeated storage-kind value. -[[nodiscard]] ConstantEncoded constantEncode(PJ::Span data, StorageKind kind, std::size_t count); - -/// Decode constant numeric value as double. -[[nodiscard]] double constantDecodeAsDouble(const ConstantEncoded& enc); - -/// Decode constant numeric value as int64_t (no precision loss for integer types). -[[nodiscard]] int64_t constantDecodeAsInt64(const ConstantEncoded& enc); - -/// Decode constant numeric value as uint64_t (no precision loss for integer types). -[[nodiscard]] uint64_t constantDecodeAsUint64(const ConstantEncoded& enc); - -// --------------------------------------------------------------------------- -// Frame of Reference encoding functions -// Data must be kInt32 or kInt64 values. -// --------------------------------------------------------------------------- -/// Encode signed integers as offsets from `min_val`. -[[nodiscard]] FrameOfReferenceEncoded forEncode( - PJ::Span data, StorageKind kind, std::size_t count, int64_t min_val, int64_t max_val); -/// Decode one FOR value as double. -[[nodiscard]] double forDecodeOneAsDouble(const FrameOfReferenceEncoded& enc, std::size_t row); - -/// Decode one FOR value as int64_t (no precision loss). -[[nodiscard]] int64_t forDecodeOneAsInt64(const FrameOfReferenceEncoded& enc, std::size_t row); - -/// Decode a contiguous FOR range into `out`. -void forDecodeRangeAsDoubles(const FrameOfReferenceEncoded& enc, PJ::Span out, std::size_t row_start); - -// --------------------------------------------------------------------------- -// Byte-width helpers -// --------------------------------------------------------------------------- -[[nodiscard]] constexpr uint8_t indexBytesFor(std::size_t dict_size) noexcept { - if (dict_size <= 256) { - return 1; - } - if (dict_size <= 65536) { - return 2; - } - return 4; -} - -[[nodiscard]] constexpr uint8_t offsetBytesFor(uint64_t range) noexcept { - if (range < 256) { - return 1; - } - if (range < 65536) { - return 2; - } - if (range < uint64_t{1} << 32) { - return 4; - } - return 8; // signals: stay kRaw -} - -} // namespace PJ::encoding diff --git a/pj_datastore/include/pj_datastore/engine.hpp b/pj_datastore/include/pj_datastore/engine.hpp deleted file mode 100644 index ac80546c..00000000 --- a/pj_datastore/include/pj_datastore/engine.hpp +++ /dev/null @@ -1,117 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include - -#include "pj_base/dataset.hpp" -#include "pj_base/expected.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/type_registry.hpp" - -namespace PJ { - -class DataWriter; -class DataReader; - -/// Central owner of datasets, topics, schemas, and committed chunks. -class DataEngine { - public: - /// Construct an empty engine instance. - DataEngine(); - - /// Destructor (defined in .cpp for pimpl). - ~DataEngine(); - - /// Move constructor. - DataEngine(DataEngine&&) noexcept; - - /// Move assignment. - DataEngine& operator=(DataEngine&&) noexcept; - - /// Deleted copy constructor. - DataEngine(const DataEngine&) = delete; - - /// Deleted copy assignment. - DataEngine& operator=(const DataEngine&) = delete; - - // Dataset management - /// Create and register a dataset. - [[nodiscard]] PJ::Expected createDataset(PJ::DatasetDescriptor descriptor); - - /// Lookup dataset by id (nullptr if missing). - [[nodiscard]] const PJ::DatasetInfo* getDataset(PJ::DatasetId id) const; - - // Topic management (called by DataWriter) - /// Create a topic under a dataset. - [[nodiscard]] PJ::Expected createTopic(PJ::DatasetId dataset_id, TopicDescriptor descriptor); - - /// Mutable topic storage lookup (nullptr if missing). - [[nodiscard]] TopicStorage* getTopicStorage(PJ::TopicId id); - - /// Const topic storage lookup (nullptr if missing). - [[nodiscard]] const TopicStorage* getTopicStorage(PJ::TopicId id) const; - - // Schema registry access - /// Mutable schema registry access. - [[nodiscard]] TypeRegistry& typeRegistry(); - - /// Const schema registry access. - [[nodiscard]] const TypeRegistry& typeRegistry() const; - - // Time domains - /// Create a new time domain. - [[nodiscard]] PJ::Expected createTimeDomain(std::string name); - - /// Lookup time domain by id (nullptr if missing). - [[nodiscard]] const PJ::TimeDomain* getTimeDomain(PJ::TimeDomainId id) const; - - /// Update display offset for one time domain. - void setDisplayOffset(PJ::TimeDomainId id, PJ::Timestamp offset); - - // Commit cycle: commit sealed chunks, enforce retention - /// Commit flushed chunks into topic storage. - /// Returns the deduplicated set of topic IDs that received at least one new chunk. - /// Pass the return value directly to DerivedEngine::onSourceCommitted(): - /// derived.onSourceCommitted(engine.commitChunks(writer.flushAll())); - std::vector commitChunks(std::vector> chunks); - - /// Evict old chunks outside the retention window. - void enforceRetention(PJ::Timestamp retention_window_ns); - - /// Move every committed chunk into `dst`, leaving this engine's storages - /// empty (datasets, topics, schemas, time domains stay registered). Topics - /// are matched by descriptor (`dataset_id` + `name`); both engines must have - /// them registered. Monotonicity is enforced per topic: the source's - /// earliest chunk timestamp must be >= the destination's `time_max()`. Any - /// failure mutates neither engine. - /// - /// Zero-copy: dst's `std::deque` receives the chunks via - /// `std::move` (column buffers/value arrays are pointer moves). Schema - /// compatibility is the caller's responsibility — typically dst is kept in - /// lockstep with the source via parallel registration at startup. - PJ::Status flushTo(DataEngine& dst); - - // Writer/Reader factories - /// Create a writer bound to this engine. - [[nodiscard]] DataWriter createWriter(); - - /// Create a reader bound to this engine. - [[nodiscard]] DataReader createReader() const; - - // Topic listing by dataset - /// List all dataset ids. - [[nodiscard]] std::vector listDatasets() const; - - /// List topic ids for a dataset. - [[nodiscard]] std::vector listTopics(PJ::DatasetId dataset_id) const; - - private: - struct Impl; - std::unique_ptr impl_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/object_store.hpp b/pj_datastore/include/pj_datastore/object_store.hpp deleted file mode 100644 index f0149e13..00000000 --- a/pj_datastore/include/pj_datastore/object_store.hpp +++ /dev/null @@ -1,210 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// ObjectStore: timestamped opaque byte payloads stored beside the columnar -// DataEngine, selectable by time. See docs/OBJECT_STORE_DESIGN.md. - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "pj_base/buffer_anchor.hpp" -#include "pj_base/expected.hpp" -#include "pj_base/span.hpp" -#include "pj_base/types.hpp" - -namespace PJ { - -struct ObjectTopicId { - uint32_t id = 0; - - bool operator==(const ObjectTopicId& other) const { - return id == other.id; - } - bool operator!=(const ObjectTopicId& other) const { - return id != other.id; - } -}; - -/// Identity for an object topic: dataset scope + name (unique per dataset) plus -/// opaque metadata_json retained verbatim for callers that interpret bytes. -struct ObjectTopicDescriptor { - DatasetId dataset_id = 0; - std::string topic_name; - std::string metadata_json; -}; - -/// Eager payload: store-owned bytes, counted against the retention budget. -using SharedBuffer = std::shared_ptr>; - -/// Lazy payload: idempotent, thread-safe fetcher returning bytes + anchor. -/// Invoked on every read; bytes are not counted against the retention budget. -using LazyCallback = std::function; - -struct ObjectEntry { - Timestamp timestamp = 0; - // Eager owned bytes or a lazy resolver; resolveEntry discriminates via std::get_if. - std::variant payload; -}; - -struct ResolvedObjectEntry { - Timestamp timestamp = 0; - // Non-owning Span over the bytes plus an opaque anchor (any shared_ptr). - // Consumers read `payload.bytes`; retain `payload.anchor` to keep the bytes - // alive past the resolve call. resolveEntry never casts the anchor. - sdk::PayloadView payload; -}; - -struct RetentionBudget { - int64_t time_window_ns = 0; - size_t max_memory_bytes = 0; -}; - -/// Read view over a topic's entry timestamps that holds the series read lock for -/// its lifetime; keep it short-lived to avoid blocking writers. -class EntryTimestampsView { - public: - EntryTimestampsView() = default; - EntryTimestampsView(std::shared_lock lock, const std::vector* timestamps) - : lock_(std::move(lock)), timestamps_(timestamps) {} - - [[nodiscard]] bool empty() const { - return timestamps_ == nullptr || timestamps_->empty(); - } - [[nodiscard]] size_t size() const { - return timestamps_ != nullptr ? timestamps_->size() : 0; - } - [[nodiscard]] Timestamp operator[](size_t i) const { - return (*timestamps_)[i]; - } - [[nodiscard]] const Timestamp* begin() const { - return timestamps_ != nullptr ? timestamps_->data() : nullptr; - } - [[nodiscard]] const Timestamp* end() const { - return timestamps_ != nullptr ? timestamps_->data() + timestamps_->size() : nullptr; - } - - private: - std::shared_lock lock_; - const std::vector* timestamps_ = nullptr; -}; - -/// Timestamped opaque-blob store living alongside DataEngine: payloads selected -/// by time but never expanded into scalar columns (images, point clouds, -/// annotations). Owned (`pushOwned`) or lazy (`pushLazy`) entries, per-topic -/// retention, at-or-before lookup. Thread-safe: one shared_mutex per series + -/// one global lock for registration. Does NOT decode payloads or own renderer/ -/// UI policy. See docs/OBJECT_STORE_DESIGN.md. -class ObjectStore { - public: - ObjectStore() = default; - ~ObjectStore() = default; - - ObjectStore(const ObjectStore&) = delete; - ObjectStore& operator=(const ObjectStore&) = delete; - ObjectStore(ObjectStore&&) = delete; - ObjectStore& operator=(ObjectStore&&) = delete; - - // --- Registration --- - - Expected registerTopic(const ObjectTopicDescriptor& descriptor); - - // Resolve a topic id by (dataset_id, topic_name) without registering. Returns - // nullopt if no topic with that key exists. Used by hosts that need to bind a - // parser-side write surface to a topic the source already registered. - std::optional findTopic(DatasetId dataset_id, std::string_view topic_name) const; - - const ObjectTopicDescriptor& descriptor(ObjectTopicId id) const; - - std::vector listTopics() const; - std::vector listTopics(DatasetId dataset_id) const; - - // --- Write --- - - Status pushOwned(ObjectTopicId id, Timestamp timestamp, std::vector payload); - - // Fetcher runs on every read; the store retains the anchor via PayloadView - // and never copies. The closure can return a view over bytes the producer - // already owns (chunk cache, mmap, hand-off between stores). - Status pushLazy(ObjectTopicId id, Timestamp timestamp, LazyCallback fetch); - - // --- Read --- - - std::optional latestAt(ObjectTopicId id, Timestamp timestamp) const; - - std::optional at(ObjectTopicId id, size_t index) const; - - std::optional indexAt(ObjectTopicId id, Timestamp timestamp) const; - - size_t entryCount(ObjectTopicId id) const; - - std::pair timeRange(ObjectTopicId id) const; - - EntryTimestampsView entryTimestamps(ObjectTopicId id) const; - - // --- Retention --- - - void setRetentionBudget(ObjectTopicId id, RetentionBudget budget); - RetentionBudget retentionBudget(ObjectTopicId id) const; - size_t memoryUsage(ObjectTopicId id) const; - - // --- Explicit eviction --- - - void evictBefore(ObjectTopicId id, Timestamp threshold); - void evictAllBefore(Timestamp threshold); - - // --- Cross-store flush --- - - // Move every entry into `dst`, leaving this store empty (registrations kept). - // Topics are matched by descriptor (dataset_id + topic_name); both stores - // must share descriptors. Monotonicity is enforced per series: the earliest - // moved timestamp must be >= the destination's last. Any validation failure - // returns an error and mutates neither store. - // - // Zero-copy: each ObjectEntry is moved by value, so the variant's shared_ptr - // or closure transfers as a pointer move — bytes are never copied. Lazy - // entries keep their semantics; their closure re-runs only on a dst read. - // Afterward, dst's retention budget is applied to each touched series. - Status flushTo(ObjectStore& dst); - - // --- Lifecycle --- - - void removeTopic(ObjectTopicId id); - void clear(); - - private: - struct ObjectSeries { - ObjectTopicDescriptor descriptor; - std::deque entries; - std::vector entry_timestamps; - RetentionBudget budget; - size_t memory_bytes = 0; - mutable std::shared_mutex mutex; - }; - - ObjectSeries* findSeries(ObjectTopicId id); - const ObjectSeries* findSeries(ObjectTopicId id) const; - - static std::optional upperBoundIndex(const std::vector& timestamps, Timestamp ts); - static ResolvedObjectEntry resolveEntry(const ObjectEntry& entry); - - void evictFront(ObjectSeries& series); - void applyRetention(ObjectSeries& series, Timestamp newest_ts); - - mutable std::shared_mutex store_mutex_; - std::vector>> topics_; - uint32_t next_id_ = 1; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/plugin_data_host.hpp b/pj_datastore/include/pj_datastore/plugin_data_host.hpp deleted file mode 100644 index c28cf8f0..00000000 --- a/pj_datastore/include/pj_datastore/plugin_data_host.hpp +++ /dev/null @@ -1,181 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// Host-side bridges that translate pj_plugins C-ABI write/read calls into -// DataWriter / DataEngine / ObjectStore operations. raw() yields the C vtable; -// flushPending() seals+commits. See docs/OBJECT_STORE_DESIGN.md (ABI bridge). - -#include - -#include "pj_base/plugin_data_api.h" -#include "pj_base/types.hpp" - -namespace PJ { - -class DataEngine; -class ObjectStore; -struct DatastoreSourceWriteHostState; -struct DatastoreSourceObjectWriteHostState; -struct DatastoreParserWriteHostState; -struct DatastoreParserObjectWriteHostState; -struct DatastoreToolboxHostState; -struct DatastoreToolboxObjectReadHostState; - -/// Bridges the `pj.source_write` C ABI onto DataWriter for a DataSource session. -/// Owns a pimpl state; flushPending() seals+commits accumulated scalar rows. -/// setTarget() supports the streaming two-engine pause/resume swap. -class DatastoreSourceWriteHost { - public: - DatastoreSourceWriteHost(DataEngine& engine, PJ_data_source_handle_t source); - ~DatastoreSourceWriteHost(); - - DatastoreSourceWriteHost(const DatastoreSourceWriteHost&) = delete; - DatastoreSourceWriteHost& operator=(const DatastoreSourceWriteHost&) = delete; - DatastoreSourceWriteHost(DatastoreSourceWriteHost&&) noexcept; - DatastoreSourceWriteHost& operator=(DatastoreSourceWriteHost&&) noexcept; - - [[nodiscard]] PJ_source_write_host_t raw() noexcept; - void flushPending(); - - // Atomically swap the destination DataEngine. Mirrors the parser write - // host's setTarget: the streaming two-engine flow routes source-level - // scalar pushes to a secondary DataEngine during pause and back to the - // primary on resume. Pending rows are flushed to the current engine - // before the switch. Does not take ownership. - void setTarget(DataEngine* target); - - private: - std::unique_ptr state_; -}; - -/// Host-side implementation of the scalar-peer object-write surface exposed -/// as `pj.source_object_write.v1`. Bridges the C ABI onto -/// `pj_datastore::ObjectStore`. One instance per DataSource session; the -/// `DatasetId` scopes newly-registered topics to the enclosing dataset. -class DatastoreSourceObjectWriteHost { - public: - DatastoreSourceObjectWriteHost(ObjectStore& store, DatasetId dataset_id); - ~DatastoreSourceObjectWriteHost(); - - DatastoreSourceObjectWriteHost(const DatastoreSourceObjectWriteHost&) = delete; - DatastoreSourceObjectWriteHost& operator=(const DatastoreSourceObjectWriteHost&) = delete; - DatastoreSourceObjectWriteHost(DatastoreSourceObjectWriteHost&&) noexcept; - DatastoreSourceObjectWriteHost& operator=(DatastoreSourceObjectWriteHost&&) noexcept; - - [[nodiscard]] PJ_object_write_host_t raw() noexcept; - - // Atomically swap the destination store. Used by the streaming two-store - // flow to route pushes to a secondary ObjectStore during pause and back to - // the primary on resume. Must point to a live store; the host does not - // take ownership and the caller must keep the target alive for as long as - // the host can receive pushes against it. - void setTarget(ObjectStore* target) noexcept; - - private: - std::unique_ptr state_; -}; - -/// Bridges the `pj.parser_write` C ABI onto DataWriter, bound to one host-chosen -/// topic (parsers never name topics). flushPending() seals+commits; setTarget() -/// is the streaming two-engine swap (bound topic must exist on the target). -class DatastoreParserWriteHost { - public: - DatastoreParserWriteHost(DataEngine& engine, PJ_topic_handle_t topic); - ~DatastoreParserWriteHost(); - - DatastoreParserWriteHost(const DatastoreParserWriteHost&) = delete; - DatastoreParserWriteHost& operator=(const DatastoreParserWriteHost&) = delete; - DatastoreParserWriteHost(DatastoreParserWriteHost&&) noexcept; - DatastoreParserWriteHost& operator=(DatastoreParserWriteHost&&) noexcept; - - [[nodiscard]] PJ_parser_write_host_t raw() noexcept; - void flushPending(); - - // Atomically swap the destination DataEngine. Mirrors the object write - // host's setTarget: the streaming two-store flow routes scalar pushes to a - // secondary DataEngine during pause and back to the primary on resume. - // Pending rows are flushed to the current engine before the switch; the - // bound topic must exist in `target` with the same TopicId (the streaming - // manager registers it lockstep on both engines). Does not take ownership. - void setTarget(DataEngine* target); - - private: - std::unique_ptr state_; -}; - -/// Host-side implementation of the toolbox object-read surface exposed as -/// `pj.toolbox_object_read.v1`. Bridges the C ABI onto -/// `pj_datastore::ObjectStore`, allocating an owning handle per successful -/// `read_latest_at`. The handle keeps bytes alive independent of the -/// store's internal state, matching the `shared_ptr` model. -class DatastoreToolboxObjectReadHost { - public: - explicit DatastoreToolboxObjectReadHost(ObjectStore& store); - ~DatastoreToolboxObjectReadHost(); - - DatastoreToolboxObjectReadHost(const DatastoreToolboxObjectReadHost&) = delete; - DatastoreToolboxObjectReadHost& operator=(const DatastoreToolboxObjectReadHost&) = delete; - DatastoreToolboxObjectReadHost(DatastoreToolboxObjectReadHost&&) noexcept; - DatastoreToolboxObjectReadHost& operator=(DatastoreToolboxObjectReadHost&&) noexcept; - - [[nodiscard]] PJ_object_read_host_t raw() noexcept; - - private: - std::unique_ptr state_; -}; - -/// Host-side implementation of the parser-scoped object write surface -/// exposed as `pj.parser_object_write.v1`. The target ObjectTopic is bound -/// at construction time (matching the scalar `DatastoreParserWriteHost` -/// pattern); the parser never names topics. -/// -/// @param topic_id the raw `ObjectTopicId::id` of the bound topic. -class DatastoreParserObjectWriteHost { - public: - DatastoreParserObjectWriteHost(ObjectStore& store, uint32_t topic_id); - ~DatastoreParserObjectWriteHost(); - - DatastoreParserObjectWriteHost(const DatastoreParserObjectWriteHost&) = delete; - DatastoreParserObjectWriteHost& operator=(const DatastoreParserObjectWriteHost&) = delete; - DatastoreParserObjectWriteHost(DatastoreParserObjectWriteHost&&) noexcept; - DatastoreParserObjectWriteHost& operator=(DatastoreParserObjectWriteHost&&) noexcept; - - [[nodiscard]] PJ_parser_object_write_host_t raw() noexcept; - - // Atomically swap the destination store. Used by the streaming two-store - // flow to route pushes to a secondary ObjectStore during pause and back to - // the primary on resume. The bound topic id must exist in `target` (the - // streaming manager ensures this via lockstep registerTopic on both - // stores). The host does not take ownership of the target. - void setTarget(ObjectStore* target) noexcept; - - private: - std::unique_ptr state_; -}; - -/// Bridges the toolbox C ABI onto both a DataEngine (scalar/Arrow columns) and an -/// ObjectStore (media blobs); a toolbox plugin writes into either via one host -/// fat pointer. raw() yields the vtable; flushPending() seals+commits. -class DatastoreToolboxHost { - public: - /// Construct with both an engine (scalar/Arrow column writes) and an - /// object store (canonical media payloads — images, point clouds, - /// annotations). The two are independent storage backends; toolbox - /// plugins write into one or both via the same host fat pointer. - DatastoreToolboxHost(DataEngine& engine, ObjectStore& object_store); - ~DatastoreToolboxHost(); - - DatastoreToolboxHost(const DatastoreToolboxHost&) = delete; - DatastoreToolboxHost& operator=(const DatastoreToolboxHost&) = delete; - DatastoreToolboxHost(DatastoreToolboxHost&&) noexcept; - DatastoreToolboxHost& operator=(DatastoreToolboxHost&&) noexcept; - - [[nodiscard]] PJ_toolbox_host_t raw() noexcept; - void flushPending(); - - private: - std::unique_ptr state_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/query.hpp b/pj_datastore/include/pj_datastore/query.hpp deleted file mode 100644 index 4c300cf7..00000000 --- a/pj_datastore/include/pj_datastore/query.hpp +++ /dev/null @@ -1,189 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// Read-side query primitives over committed chunk deques: RangeCursor (rows), -// SeriesReader/SeriesCursor (one column as a value-bearing time series), and -// latestAt(). Reached via DataReader. See docs/USER_GUIDE.md §5. - -#include -#include -#include -#include -#include - -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" - -namespace PJ { - -struct QueryRange { - /// Topic to query. - PJ::TopicId topic_id = 0; - /// Inclusive range start. - PJ::Timestamp t_min = 0; - /// Inclusive range end. - PJ::Timestamp t_max = 0; -}; - -/// Point query descriptor for latest-at lookup. -struct QueryPoint { - /// Topic to query. - PJ::TopicId topic_id = 0; - /// Query timestamp. - PJ::Timestamp t = 0; -}; - -/// One materialized series sample. A series is a topic column viewed as a -/// time series; rows where the column is null are not samples. -struct SeriesSample { - /// Sample timestamp. - PJ::Timestamp timestamp = 0; - /// Numeric sample value, converted to double for display/analysis. - double value = 0.0; - /// Pointer to source chunk containing this sample. - const TopicChunk* chunk = nullptr; - /// Physical row index inside `chunk`. - std::size_t row_index = 0; -}; - -/// Bounds for a series over a query window. -struct SeriesBounds { - /// Time range covered by value-bearing samples. - PJ::Range time; - /// Finite value range covered by value-bearing samples. - PJ::Range value; - /// Number of value-bearing samples in the range. - std::size_t sample_count = 0; -}; - -/// One materialized row reference returned by cursors. -struct SampleRow { - /// Sample timestamp. - PJ::Timestamp timestamp = 0; - /// Pointer to source chunk containing this row. - const TopicChunk* chunk = nullptr; - /// Row index inside `chunk`. - std::size_t row_index = 0; -}; - -/// Contiguous row interval inside one chunk. -struct ChunkRowRange { - /// Source chunk. - const TopicChunk* chunk = nullptr; - /// Inclusive start row. - std::size_t row_start = 0; - /// Exclusive end row. - std::size_t row_end = 0; // exclusive -}; - -// Cursor for iterating range query results across chunks -class RangeCursor { - public: - /// Construct cursor over [t_min, t_max] from committed chunks. - RangeCursor(const std::deque& chunks, PJ::Timestamp t_min, PJ::Timestamp t_max); - - [[nodiscard]] bool valid() const noexcept; - - /// Advance to next matching row. - void advance(); - - /// Return current row descriptor. - [[nodiscard]] SampleRow current() const; - - // Iterate all results via callback (per-row) - void forEach(std::function callback); - - // Iterate chunk-at-a-time (bulk path) - void forEachChunk(std::function callback); - - private: - const std::deque* chunks_; - PJ::Timestamp t_min_; - PJ::Timestamp t_max_; - std::size_t chunk_index_ = 0; - std::size_t row_index_ = 0; - - void findFirstValid(); - - void skipToValid(); -}; - -/// Cursor for iterating a topic column as a time series. It skips null rows by -/// definition; every current() value is a value-bearing sample for the bound -/// column. -class SeriesCursor { - public: - /// Construct cursor over [time_range.min, time_range.max] from committed chunks. - SeriesCursor(const std::deque& chunks, std::size_t column_index, PJ::Range time_range); - - [[nodiscard]] bool valid() const noexcept; - - /// Advance to next matching sample. - void advance(); - - /// Return current sample descriptor. - [[nodiscard]] SeriesSample current() const; - - /// Iterate all results via callback. - void forEach(std::function callback); - - private: - const std::deque* chunks_; - std::size_t column_index_ = 0; - PJ::Range time_range_; - std::size_t chunk_index_ = 0; - std::size_t row_index_ = 0; - - void skipToSample(); -}; - -/// View a topic column as a virtual vector of value-bearing time series -/// samples. Null rows are storage details and are not visible through this API. -class SeriesReader { - public: - /// Construct a series reader over committed chunks. - SeriesReader(const std::deque& chunks, std::size_t column_index); - - /// Number of samples in the virtual series. - [[nodiscard]] std::size_t size() const; - - /// True when the virtual series contains no samples. - [[nodiscard]] bool empty() const; - - /// Return the sample at a virtual series index. - [[nodiscard]] std::optional sampleAt(std::size_t index) const; - - /// Return the virtual series index of the latest sample at or before `t`. - [[nodiscard]] std::optional indexAtOrBeforeTime(PJ::Timestamp t) const; - - /// Return the virtual series index of the first sample at or after `t`. - [[nodiscard]] std::optional indexAtOrAfterTime(PJ::Timestamp t) const; - - /// Return the latest sample at or before `t`. - [[nodiscard]] std::optional sampleAtOrBeforeTime(PJ::Timestamp t) const; - - /// Return the first sample at or after `t`. - [[nodiscard]] std::optional sampleAtOrAfterTime(PJ::Timestamp t) const; - - /// Iterate samples in an inclusive time range. - [[nodiscard]] SeriesCursor samples(PJ::Range time_range) const; - - /// Return bounds over the entire series. - [[nodiscard]] std::optional bounds() const; - - /// Return bounds over an inclusive time range. - [[nodiscard]] std::optional bounds(PJ::Range time_range) const; - - private: - const std::deque* chunks_; - std::size_t column_index_ = 0; -}; - -// Find the most recent sample at or before time t; nullopt if none exists. -[[nodiscard]] std::optional latestAt(const std::deque& chunks, PJ::Timestamp t); - -// Create a range cursor -[[nodiscard]] RangeCursor rangeQuery(const std::deque& chunks, PJ::Timestamp t_min, PJ::Timestamp t_max); - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/reader.hpp b/pj_datastore/include/pj_datastore/reader.hpp deleted file mode 100644 index 57ad3ee4..00000000 --- a/pj_datastore/include/pj_datastore/reader.hpp +++ /dev/null @@ -1,53 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/topic_storage.hpp" - -namespace PJ { - -class DataEngine; - -/// Read-only facade over committed DataEngine storage. -/// Provides listing, metadata, type-tree lookup, range queries, -/// and latest-at point queries. -class DataReader { - public: - /// Create a read-only facade bound to `engine`. - explicit DataReader(const DataEngine& engine); - - /// List all dataset ids known by the engine. - [[nodiscard]] std::vector listDatasets() const; - - /// List topic ids for one dataset. - [[nodiscard]] std::vector listTopics(PJ::DatasetId dataset_id) const; - - /// Lookup schema tree for a topic (nullptr if unknown). - [[nodiscard]] const PJ::TypeTreeNode* getTypeTree(PJ::TopicId topic_id) const; - - /// Return topic metadata if topic exists. - [[nodiscard]] std::optional getMetadata(PJ::TopicId topic_id) const; - - /// Create range cursor over [t_min, t_max]. - [[nodiscard]] PJ::Expected rangeQuery(const QueryRange& range) const; - - /// Return latest sample at or before query time; nullopt payload if no row exists. - [[nodiscard]] PJ::Expected> latestAt(const QueryPoint& point) const; - - /// Create a series view over one numeric/bool topic column. The returned - /// reader exposes only value-bearing samples; null rows are skipped. - [[nodiscard]] PJ::Expected series(PJ::TopicId topic_id, std::size_t column_index) const; - - private: - const DataEngine& engine_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/topic_storage.hpp b/pj_datastore/include/pj_datastore/topic_storage.hpp deleted file mode 100644 index 7f51391b..00000000 --- a/pj_datastore/include/pj_datastore/topic_storage.hpp +++ /dev/null @@ -1,148 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" - -namespace PJ { - -// Import base types into engine namespace -using PJ::DatasetId; -using PJ::SchemaId; -using PJ::Timestamp; -using PJ::TopicId; - -struct TopicDescriptor { - /// Topic display/name key. - std::string name; - /// Active schema for newly written chunks. - SchemaId schema_id = 0; - /// Owning dataset. - DatasetId dataset_id = 0; - /// Target maximum rows per chunk for writers. - uint32_t max_chunk_rows = 1024; // Default chunk size - /// Maximum number of element columns to expand per variable-length array field. - /// Prevents column explosion. expandArray() clamps to this limit. - uint32_t array_expansion_limit = 64; -}; - -/// Aggregated metadata snapshot for one topic. -struct TopicMetadata { - /// Topic identifier. - TopicId topic_id = 0; - /// Topic display/name key. - std::string name; - /// Current schema id. - SchemaId current_schema = 0; - /// Owning dataset id. - DatasetId dataset_id = 0; - /// Minimum timestamp across retained chunks. - Timestamp time_range_min = 0; - /// Maximum timestamp across retained chunks. - Timestamp time_range_max = 0; - /// Total rows across retained chunks. - uint64_t total_row_count = 0; - /// Approximate total memory footprint across retained chunks. - uint64_t total_byte_size = 0; // approximate - /// Largest array length ever passed to expandArray() for any field in this topic. - uint32_t max_observed_array_length = 0; - /// Number of times expandArray() clamped due to array_expansion_limit. - uint32_t truncated_sample_count = 0; -}; - -class DataEngine; - -/// Per-topic container of committed chunks (commit-ordered deque). -/// appendSealedChunk() enforces chunk t_min >= previous t_max; evictBefore() -/// drops chunks fully older than a threshold. Also holds the column layout for -/// schemaless (schema_id==0) topics and per-field array-expansion counts. -class TopicStorage { - public: - /// Create storage for one topic descriptor. - TopicStorage(TopicId topic_id, TopicDescriptor descriptor); - - /// Append a sealed chunk; rejects out-of-order chunk timestamps. - [[nodiscard]] PJ::Status appendSealedChunk(TopicChunk chunk); - - /// Remove chunks whose max time is strictly before `t_keep_min`. - void evictBefore(Timestamp t_keep_min); - - /// Unconditionally remove all retained sealed chunks. - void clearChunks() noexcept; - - /// Access retained sealed chunks in commit order. - [[nodiscard]] const std::deque& sealedChunks() const noexcept; - - /// Store column layout for schema_id==0 topics (populated at writer registration time). - /// Allows derived engine and fresh writers to resolve the layout without a committed chunk. - void setColumnDescriptors(std::vector descs) noexcept; - - /// Inline column layout (non-empty for schema_id==0 topics after the first writer is created). - [[nodiscard]] const std::vector& columnDescriptors() const noexcept; - - /// Compute aggregated metadata for current retained chunks. - [[nodiscard]] TopicMetadata metadata() const; - - /// Access topic descriptor. - [[nodiscard]] const TopicDescriptor& descriptor() const noexcept; - - /// Topic identifier. - [[nodiscard]] TopicId topic_id() const noexcept; - - /// True if no chunks are retained. - [[nodiscard]] bool empty() const noexcept; - - /// Minimum timestamp of retained chunks (0 if empty). - [[nodiscard]] Timestamp time_min() const noexcept; - - /// Maximum timestamp of retained chunks (0 if empty). - [[nodiscard]] Timestamp time_max() const noexcept; - - /// Update descriptor schema id for future writes. - void updateSchema(SchemaId new_schema); - - /// Track the largest observed array length (called by DataWriter::expand_array). - void updateMaxObservedArrayLength(uint32_t observed_length); - - /// Increment the truncation counter (called when expand_array clamps due to limit). - void incrementTruncatedSampleCount(); - - /// Largest array length ever passed to expandArray() for any field in this topic. - [[nodiscard]] uint32_t maxObservedArrayLength() const noexcept; - - /// Number of times expandArray() clamped due to array_expansion_limit. - [[nodiscard]] uint32_t truncatedSampleCount() const noexcept; - - /// Return the current expansion count for a variable-length array field. - /// Returns 0 if the field has not been expanded yet. - [[nodiscard]] uint32_t arrayExpansionCount(const std::string& field_path) const noexcept; - - /// Update the expansion count for a variable-length array field. - void setArrayExpansionCount(const std::string& field_path, uint32_t count); - - private: - // DataEngine::flushTo needs to move sealed_chunks_ between TopicStorage - // instances of different engines without copying. Friending it lets the - // transfer happen entirely inside DataEngine without exposing the move - // primitive on the public TopicStorage API. - friend class DataEngine; - - TopicId topic_id_; - TopicDescriptor descriptor_; - std::deque sealed_chunks_; - std::vector column_descriptors_; // for schema_id==0 topics - uint32_t max_observed_array_length_ = 0; - uint32_t truncated_sample_count_ = 0; - // Authoritative expansion count per variable-length array field path. - // Shared across all DataWriter instances writing to this topic. - std::unordered_map array_expansion_counts_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/type_registry.hpp b/pj_datastore/include/pj_datastore/type_registry.hpp deleted file mode 100644 index 42a11332..00000000 --- a/pj_datastore/include/pj_datastore/type_registry.hpp +++ /dev/null @@ -1,54 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" - -namespace PJ { - -/// Engine-wide schema registry: assigns SchemaId to named TypeTreeNode trees, -/// shared across topics. registerOrGet() supports late discovery (return existing -/// id by name); evolveSchema() permits additive-only changes (no field removal or -/// type change). -class TypeRegistry { - public: - TypeRegistry(); - ~TypeRegistry(); - TypeRegistry(TypeRegistry&&) noexcept; - TypeRegistry& operator=(TypeRegistry&&) noexcept; - - TypeRegistry(const TypeRegistry&) = delete; - TypeRegistry& operator=(const TypeRegistry&) = delete; - - // Register a known schema (from Protobuf, ROS, etc.) - // Fails if schema_name already exists. - [[nodiscard]] PJ::Expected registerSchema( - std::string schema_name, std::shared_ptr type_tree); - - // Late discovery: register from first message (JSON, etc.) - // Returns existing schema ID if name already registered. - [[nodiscard]] PJ::Expected registerOrGet( - std::string schema_name, std::shared_ptr type_tree); - - // Lookup by ID — returns nullptr if not found - [[nodiscard]] const PJ::TypeTreeNode* lookup(PJ::SchemaId id) const; - - // Lookup by name — returns nullopt if not found - [[nodiscard]] std::optional findByName(std::string_view name) const; - - // Schema evolution: add fields to existing schema (additive only). - // Fails if: ID not found, existing fields changed type, fields removed. - [[nodiscard]] PJ::Status evolveSchema(PJ::SchemaId id, std::shared_ptr updated_tree); - - private: - struct Impl; - std::unique_ptr impl_; -}; - -} // namespace PJ diff --git a/pj_datastore/include/pj_datastore/writer.hpp b/pj_datastore/include/pj_datastore/writer.hpp deleted file mode 100644 index c38629d2..00000000 --- a/pj_datastore/include/pj_datastore/writer.hpp +++ /dev/null @@ -1,213 +0,0 @@ -#pragma once -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include -#include -#include -#include -#include -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/span.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/topic_storage.hpp" - -namespace PJ { - -class DataEngine; // forward declaration - -/// Handle returned by bind_topic_writer for fast-path column access. -struct TopicWriteHandle { - /// Topic associated with this handle. - PJ::TopicId topic_id; - /// Field ids aligned to writer column order. - std::vector field_ids; -}; - -/// Handle for scalar convenience API — single numeric column per topic. -struct ScalarSeriesHandle { - /// Scalar topic id. - PJ::TopicId topic_id; - /// Value field id (always one logical scalar field). - PJ::FieldId value_field; -}; - -/// Describes one column's data for bulk appendColumns(). -struct ColumnData { - /// Column index in topic schema. - std::size_t col_index; - - /// Arrow-compatible string data (offsets + concatenated bytes). - struct StringData { - PJ::Span offsets; // (row_count + 1) entries - PJ::Span values; - }; - - /// Type-safe column payload — variant index determines StorageKind. - using Data = std::variant< - PJ::Span, // kFloat32 - PJ::Span, // kFloat64 - PJ::Span, // kInt32 - PJ::Span, // kInt64 - PJ::Span, // kUint64 - PJ::Span, // kBool (one byte per bool) - StringData // kString - >; - - Data data; - - /// Optional validity bits aligned to row order (empty = all valid). - PJ::BitSpan validity; - - /// Derive row count from the active variant alternative. - [[nodiscard]] std::size_t rowCount() const; - - /// Derive StorageKind from the active variant alternative. - [[nodiscard]] StorageKind kind() const; - - // Convenience factories - static ColumnData Float32(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData Float64(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData Int32(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData Int64(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData Uint64(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData Bool(std::size_t col, PJ::Span values, PJ::BitSpan validity = {}) { - return {col, Data{values}, validity}; - } - static ColumnData String( - std::size_t col, PJ::Span offsets, PJ::Span str_data, PJ::BitSpan validity = {}) { - return {col, Data{StringData{offsets, str_data}}, validity}; - } -}; - -/// High-level write facade bound to one DataEngine. Accumulates rows in per-topic -/// builders (row-at-a-time, bulk, or scalar) and seals them into chunks on -/// flush(); the engine becomes visible to readers only after -/// DataEngine::commitChunks(flushAll()). Supports mid-stream column addition and -/// array expansion. See docs/USER_GUIDE.md and docs/ARCHITECTURE.md. -class DataWriter { - public: - /// Create a writer bound to one engine instance. - explicit DataWriter(DataEngine& engine); - - ~DataWriter(); - DataWriter(DataWriter&&) noexcept; - DataWriter& operator=(DataWriter&&) noexcept; - DataWriter(const DataWriter&) = delete; - DataWriter& operator=(const DataWriter&) = delete; - - // ---- Schema registration (delegates to engine's TypeRegistry) ---- - /// Register a schema name -> type tree mapping. - [[nodiscard]] PJ::Expected registerSchema( - std::string schema_name, std::shared_ptr type_tree); - - // ---- Topic registration ---- - /// Register a topic under `dataset_id`. - [[nodiscard]] PJ::Expected registerTopic(PJ::DatasetId dataset_id, TopicDescriptor descriptor); - - // ---- Bind for fast path ---- - /// Resolve and cache topic columns for low-overhead writes. - [[nodiscard]] PJ::Expected bindTopicWriter(PJ::TopicId topic_id); - - // ---- Field resolution ---- - /// Resolve one field path to its field id. - [[nodiscard]] PJ::Expected resolveField(PJ::TopicId topic_id, std::string_view field_path); - - // ---- Row-at-a-time append ---- - /// Begin one row at timestamp `t`. - [[nodiscard]] PJ::Status beginRow(PJ::TopicId topic_id, PJ::Timestamp t); - - /// Finalize current row for `topic_id`. Returns error if begin_row was not called first. - [[nodiscard]] PJ::Status finishRow(PJ::TopicId topic_id); - - /// Set a typed value in the current row. - /// Supported T: float, double, int32_t, int64_t, uint64_t, bool, std::string_view. - template - void set(PJ::TopicId topic_id, std::size_t col_index, T value); - - /// Mark current row value as null. - void setNull(PJ::TopicId topic_id, std::size_t col_index); - - // ---- Bulk column append ---- - /// Append aligned column batches and timestamps (auto-chunking if needed). - [[nodiscard]] PJ::Status appendColumns( - PJ::TopicId topic_id, PJ::Span timestamps, PJ::Span columns); - - // ---- Scalar convenience API ---- - /// Create/register a single-column scalar topic. - [[nodiscard]] PJ::Expected registerScalarSeries( - PJ::DatasetId dataset_id, std::string_view topic_name, PJ::NumericType value_type); - /// Append one scalar sample. - void appendScalar(const ScalarSeriesHandle& handle, PJ::Timestamp t, PJ::NumericValue value); - - // ---- Dynamic column addition ---- - /// Ensure a column with `field_path` and `type` exists for `topic_id`. - /// - No-op if a column with this exact path already exists with the same type; returns its FieldId. - /// - Returns error if the path already exists with a DIFFERENT type. - /// - If new and a row is IN PROGRESS: returns error (applies only to new columns; existing columns - /// are returned safely even mid-row). - /// - If new and no row in progress: seals any pending builder, appends a ColumnDescriptor, persists layout. - /// Works for both typed (schema_id != 0) and schemaless (schema_id == 0) topics. - /// NOTE: on typed topics, columns added via ensure_column are NOT reflected in getTypeTree() — - /// they exist only in the physical column layout (TopicStorage::column_descriptors / chunk descriptors). - [[nodiscard]] PJ::Expected ensureColumn( - PJ::TopicId topic_id, std::string_view field_path, PJ::PrimitiveType type); - - // ---- Variable-length array expansion ---- - /// Ensure the variable-length array at `array_field_path` has at least `new_length` - /// element columns. Must be called OUTSIDE a begin_row/finish_row block. - /// - new_length <= current expansion: no-op, returns current count. - /// - new_length > array_expansion_limit: clamps to limit, records truncation. - /// - Otherwise: seals current builder, adds new ColumnDescriptors, updates TopicStorage. - /// Returns actual expansion count (may be less than new_length if clamped). - /// For typed topics (schema_id != 0): validates field against schema; element_type ignored. - /// For schemaless topics (schema_id == 0): any field path accepted; uses element_type. - [[nodiscard]] PJ::Expected expandArray( - PJ::TopicId topic_id, std::string_view array_field_path, uint32_t new_length, - PJ::PrimitiveType element_type = PJ::PrimitiveType::kFloat64); - - // ---- Flush ---- - /// Seal and return pending chunks for one topic. - [[nodiscard]] std::vector flush(PJ::TopicId topic_id); - - /// Seal and return all pending chunks for all topics. - [[nodiscard]] std::vector> flushAll(); - - private: - struct Impl; - std::unique_ptr impl_; - - TopicChunkBuilder& getOrCreateBuilder(PJ::TopicId topic_id); - - // Populate topic_columns[topic_id] from TopicStorage if not already cached. - void ensureColsLoaded(PJ::TopicId topic_id, const TopicStorage& storage); - - // Build column descriptors from a type tree - static std::vector buildColumnDescriptors(const PJ::TypeTreeNode& root); - - // Seal current builder and move chunk to pending list - void autoSeal(PJ::TopicId topic_id); - - // Seal and erase the current builder (if any) before a column layout change. - // No-op if no builder exists; skips sealing if builder has zero rows. - void sealBeforeLayoutChange(PJ::TopicId topic_id); -}; - -} // namespace PJ diff --git a/pj_datastore/src/arrow_import.cpp b/pj_datastore/src/arrow_import.cpp deleted file mode 100644 index 7e328359..00000000 --- a/pj_datastore/src/arrow_import.cpp +++ /dev/null @@ -1,469 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/arrow_import.hpp" - -#include - -#include -#include -#include -#include -#include -#include -#include -#include - -#include "nanoarrow/nanoarrow.h" -#include "nanoarrow/nanoarrow.hpp" -#include "nanoarrow/nanoarrow_ipc.h" -#include "pj_base/expected.hpp" -#include "pj_base/span.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ::arrow_import { -namespace { - -// --------------------------------------------------------------------------- -// Non-owning IPC input stream from PJ::Span -// --------------------------------------------------------------------------- - -struct SpanInputStreamData { - const uint8_t* data; - int64_t size; - int64_t offset; -}; - -ArrowErrorCode span_input_stream_read( - ArrowIpcInputStream* stream, uint8_t* buf, int64_t buf_size_bytes, int64_t* size_read_out, ArrowError* /*error*/) { - auto* s = static_cast(stream->private_data); - const int64_t available = s->size - s->offset; - const int64_t to_read = std::min(buf_size_bytes, available); - if (to_read > 0) { - std::memcpy(buf, s->data + s->offset, static_cast(to_read)); - s->offset += to_read; - } - *size_read_out = to_read; - return NANOARROW_OK; -} - -void span_input_stream_release(ArrowIpcInputStream* stream) { - delete static_cast(stream->private_data); - stream->private_data = nullptr; - stream->release = nullptr; -} - -void init_span_input_stream(ArrowIpcInputStream* stream, PJ::Span span) { - stream->read = span_input_stream_read; - stream->release = span_input_stream_release; - stream->private_data = new SpanInputStreamData{span.data(), static_cast(span.size()), 0}; -} - -// --------------------------------------------------------------------------- -// nanoarrow ArrowType → PrimitiveType -// --------------------------------------------------------------------------- - -std::optional nanoarrow_type_to_primitive(ArrowType type) { - switch (type) { - case NANOARROW_TYPE_INT8: - return PrimitiveType::kInt8; - case NANOARROW_TYPE_INT16: - return PrimitiveType::kInt16; - case NANOARROW_TYPE_INT32: - return PrimitiveType::kInt32; - case NANOARROW_TYPE_INT64: - return PrimitiveType::kInt64; - case NANOARROW_TYPE_UINT8: - return PrimitiveType::kUint8; - case NANOARROW_TYPE_UINT16: - return PrimitiveType::kUint16; - case NANOARROW_TYPE_UINT32: - return PrimitiveType::kUint32; - case NANOARROW_TYPE_UINT64: - return PrimitiveType::kUint64; - case NANOARROW_TYPE_FLOAT: - return PrimitiveType::kFloat32; - case NANOARROW_TYPE_DOUBLE: - return PrimitiveType::kFloat64; - case NANOARROW_TYPE_BOOL: - return PrimitiveType::kBool; - case NANOARROW_TYPE_STRING: - case NANOARROW_TYPE_LARGE_STRING: - return PrimitiveType::kString; - default: - return std::nullopt; - } -} - -// --------------------------------------------------------------------------- -// Helpers: extract raw data from nanoarrow ArrowArrayView children -// --------------------------------------------------------------------------- - -struct ColumnDataWithBuffer { - ColumnData col_data; - std::vector int64_buf; - std::vector uint64_buf; - std::vector offset_buf; -}; - -ColumnDataWithBuffer make_column_data_nanoarrow( - const ArrowArrayView* child, const ArrowColumnMapping& mapping, int64_t length) { - ColumnDataWithBuffer result; - const auto sk = storageKindOf(mapping.pj_type); - const auto n = static_cast(length); - - // Validity bitmap - BitSpan validity_view; - if (child->null_count > 0 && child->buffer_views[0].data.as_uint8 != nullptr) { - const std::size_t validity_offset = static_cast(child->offset); - const std::size_t validity_bytes = (validity_offset + n + 7) / 8; - validity_view = - BitSpan{Span(child->buffer_views[0].data.as_uint8, validity_bytes), validity_offset, n}; - } - - switch (sk) { - case StorageKind::kFloat32: { - result.col_data = ColumnData::Float32( - mapping.pj_column_index, Span(child->buffer_views[1].data.as_float + child->offset, n), - validity_view); - break; - } - case StorageKind::kFloat64: { - result.col_data = ColumnData::Float64( - mapping.pj_column_index, Span(child->buffer_views[1].data.as_double + child->offset, n), - validity_view); - break; - } - case StorageKind::kInt32: { - result.col_data = ColumnData::Int32( - mapping.pj_column_index, Span(child->buffer_views[1].data.as_int32 + child->offset, n), - validity_view); - break; - } - case StorageKind::kInt64: { - switch (mapping.pj_type) { - case PrimitiveType::kInt8: - case PrimitiveType::kInt16: { - result.int64_buf.resize(n); - for (int64_t i = 0; i < length; ++i) { - result.int64_buf[static_cast(i)] = ArrowArrayViewGetIntUnsafe(child, i); - } - result.col_data = ColumnData::Int64( - mapping.pj_column_index, Span(result.int64_buf.data(), n), validity_view); - break; - } - case PrimitiveType::kInt64: { - result.col_data = ColumnData::Int64( - mapping.pj_column_index, Span(child->buffer_views[1].data.as_int64 + child->offset, n), - validity_view); - break; - } - default: - break; - } - break; - } - case StorageKind::kUint64: { - switch (mapping.pj_type) { - case PrimitiveType::kUint8: - case PrimitiveType::kUint16: - case PrimitiveType::kUint32: { - result.uint64_buf.resize(n); - for (int64_t i = 0; i < length; ++i) { - result.uint64_buf[static_cast(i)] = ArrowArrayViewGetUIntUnsafe(child, i); - } - result.col_data = ColumnData::Uint64( - mapping.pj_column_index, Span(result.uint64_buf.data(), n), validity_view); - break; - } - case PrimitiveType::kUint64: { - result.col_data = ColumnData::Uint64( - mapping.pj_column_index, Span(child->buffer_views[1].data.as_uint64 + child->offset, n), - validity_view); - break; - } - default: - break; - } - break; - } - case StorageKind::kBool: { - // Arrow stores bools as packed bits; we need unpacked uint8_t. - std::vector bool_buf(n); - for (int64_t i = 0; i < length; ++i) { - bool_buf[static_cast(i)] = ArrowArrayViewGetIntUnsafe(child, i) != 0 ? uint8_t{1} : uint8_t{0}; - } - // Store in uint64_buf as raw bytes - result.uint64_buf.resize((n + sizeof(uint64_t) - 1) / sizeof(uint64_t)); - std::memcpy(result.uint64_buf.data(), bool_buf.data(), n); - result.col_data = ColumnData::Bool( - mapping.pj_column_index, Span(reinterpret_cast(result.uint64_buf.data()), n), - validity_view); - break; - } - case StorageKind::kString: { - // STRING: Arrow uses int32_t offsets; PJ uses uint32_t. Copy with cast to avoid UB. - const auto* offsets_ptr = child->buffer_views[1].data.as_int32 + child->offset; - result.offset_buf.resize(n + 1); - for (std::size_t i = 0; i <= n; ++i) { - result.offset_buf[i] = static_cast(offsets_ptr[i]); - } - result.col_data = ColumnData::String( - mapping.pj_column_index, Span(result.offset_buf.data(), n + 1), - Span( - child->buffer_views[2].data.as_char, static_cast(child->buffer_views[2].size_bytes)), - validity_view); - break; - } - } - - return result; -} - -/// Extract timestamps from an ArrowArrayView child column. -std::vector extract_timestamps_nanoarrow(const ArrowArrayView* view, int64_t length) { - const auto n = static_cast(length); - std::vector result(n); - - if (view->storage_type == NANOARROW_TYPE_INT64) { - const auto* raw = view->buffer_views[1].data.as_int64 + view->offset; - std::memcpy(result.data(), raw, n * sizeof(Timestamp)); - } else if (view->storage_type == NANOARROW_TYPE_UINT64) { - const auto* raw = view->buffer_views[1].data.as_uint64 + view->offset; - std::memcpy(result.data(), raw, n * sizeof(Timestamp)); - } else if (view->storage_type == NANOARROW_TYPE_INT32) { - const auto* raw = view->buffer_views[1].data.as_int32 + view->offset; - for (int64_t i = 0; i < length; ++i) { - result[static_cast(i)] = static_cast(raw[i]); - } - } else { - for (int64_t i = 0; i < length; ++i) { - result[static_cast(i)] = i; - } - } - - return result; -} - -std::vector generate_sequential_timestamps(int64_t length) { - const auto n = static_cast(length); - std::vector result(n); - for (int64_t i = 0; i < length; ++i) { - result[static_cast(i)] = i; - } - return result; -} - -} // namespace - -// --------------------------------------------------------------------------- -// schema_from_ipc -// --------------------------------------------------------------------------- - -namespace { - -// Derive column mappings + type tree from an already-populated nanoarrow -// schema. Shared between schemaFromIpc and schemaFromArrowStream. -PJ::Expected, std::vector>> mappingsFromSchema( - const ArrowSchema* schema) { - std::vector mappings; - std::vector> children; - - for (int64_t i = 0; i < schema->n_children; ++i) { - ArrowSchemaView view; - ArrowError error; - const int rc = ArrowSchemaViewInit(&view, schema->children[i], &error); - if (rc != NANOARROW_OK) { - continue; // skip unrecognized types - } - - auto pj_type = nanoarrow_type_to_primitive(view.type); - if (!pj_type.has_value()) { - continue; // skip unsupported types - } - - ArrowColumnMapping m; - m.arrow_column_index = static_cast(i); - m.pj_column_index = mappings.size(); - m.pj_type = *pj_type; - m.field_name = schema->children[i]->name != nullptr ? schema->children[i]->name : ""; - - children.push_back(PJ::makePrimitive(m.field_name, *pj_type)); - mappings.push_back(std::move(m)); - } - - if (mappings.empty()) { - return PJ::unexpected("No supported columns found in Arrow schema"); - } - - auto type_tree = PJ::makeStruct("arrow_row", std::move(children)); - return std::make_pair(std::move(type_tree), std::move(mappings)); -} - -// Pull record batches from an ArrowArrayStream* and feed them into the -// writer. The stream's schema must already be known (caller passes it in). -// Ownership: the caller retains ownership of @p stream; this helper does -// NOT call stream->release. -PJ::Status ingestBatchesFromStream( - DataWriter& writer, TopicId topic_id, ArrowArrayStream* stream, const ArrowSchema* schema, - const std::vector& mappings, int timestamp_column) { - nanoarrow::UniqueArrayView array_view; - int rc = ArrowArrayViewInitFromSchema(array_view.get(), const_cast(schema), nullptr); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to initialize ArrowArrayView from schema"); - } - - nanoarrow::UniqueArray batch; - while (true) { - batch.reset(); - rc = stream->get_next(stream, batch.get()); - if (rc != NANOARROW_OK) { - const char* err = stream->get_last_error != nullptr ? stream->get_last_error(stream) : nullptr; - return PJ::unexpected(fmt::format("Failed to read next batch: {}", err != nullptr ? err : "unknown")); - } - if (batch->release == nullptr) { - break; // end of stream - } - - const int64_t num_rows = batch->length; - if (num_rows == 0) { - continue; - } - - rc = ArrowArrayViewSetArray(array_view.get(), batch.get(), nullptr); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to set array on ArrowArrayView"); - } - - std::vector timestamps; - if (timestamp_column >= 0) { - if (timestamp_column >= static_cast(array_view->n_children)) { - return PJ::unexpected( - fmt::format("timestamp_column {} out of range ({} children)", timestamp_column, array_view->n_children)); - } - timestamps = extract_timestamps_nanoarrow(array_view->children[timestamp_column], num_rows); - } else { - timestamps = generate_sequential_timestamps(num_rows); - } - - std::vector col_buffers; - col_buffers.reserve(mappings.size()); - for (const auto& mapping : mappings) { - if (mapping.arrow_column_index >= static_cast(array_view->n_children)) { - return PJ::unexpected(fmt::format("Arrow column index {} out of range", mapping.arrow_column_index)); - } - col_buffers.push_back( - make_column_data_nanoarrow(array_view->children[mapping.arrow_column_index], mapping, num_rows)); - } - - std::vector col_data_vec; - col_data_vec.reserve(col_buffers.size()); - for (auto& cb : col_buffers) { - col_data_vec.push_back(cb.col_data); - } - - auto status = writer.appendColumns(topic_id, timestamps, col_data_vec); - if (!status.has_value()) { - return status; - } - } - - return PJ::okStatus(); -} - -} // namespace - -// --------------------------------------------------------------------------- -// schemaFromIpc -// --------------------------------------------------------------------------- - -PJ::Expected, std::vector>> schemaFromIpc( - PJ::Span ipc_stream) { - ArrowIpcInputStream input; - init_span_input_stream(&input, ipc_stream); - - nanoarrow::UniqueArrayStream stream; - int rc = ArrowIpcArrayStreamReaderInit(stream.get(), &input, nullptr); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to initialize IPC stream reader"); - } - - nanoarrow::UniqueSchema schema; - rc = stream->get_schema(stream.get(), schema.get()); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to read schema from IPC stream"); - } - - return mappingsFromSchema(schema.get()); -} - -// --------------------------------------------------------------------------- -// schemaFromArrowStream -// --------------------------------------------------------------------------- - -PJ::Expected, std::vector>> schemaFromArrowStream( - ArrowArrayStream* stream) { - if (stream == nullptr || stream->get_schema == nullptr) { - return PJ::unexpected("null ArrowArrayStream or missing get_schema"); - } - - nanoarrow::UniqueSchema schema; - const int rc = stream->get_schema(stream, schema.get()); - if (rc != NANOARROW_OK) { - const char* err = stream->get_last_error != nullptr ? stream->get_last_error(stream) : nullptr; - return PJ::unexpected(fmt::format("Failed to read schema from ArrowArrayStream: {}", err != nullptr ? err : "")); - } - - return mappingsFromSchema(schema.get()); -} - -// --------------------------------------------------------------------------- -// importIpcStream -// --------------------------------------------------------------------------- - -PJ::Status importIpcStream( - DataWriter& writer, TopicId topic_id, PJ::Span ipc_stream, - const std::vector& mappings, int timestamp_column) { - ArrowIpcInputStream input; - init_span_input_stream(&input, ipc_stream); - - nanoarrow::UniqueArrayStream stream; - int rc = ArrowIpcArrayStreamReaderInit(stream.get(), &input, nullptr); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to initialize IPC stream reader"); - } - - nanoarrow::UniqueSchema schema; - rc = stream->get_schema(stream.get(), schema.get()); - if (rc != NANOARROW_OK) { - return PJ::unexpected("Failed to read schema from IPC stream"); - } - - return ingestBatchesFromStream(writer, topic_id, stream.get(), schema.get(), mappings, timestamp_column); -} - -// --------------------------------------------------------------------------- -// importArrowStream (v4 Arrow C Data Interface path) -// --------------------------------------------------------------------------- - -PJ::Status importArrowStream( - DataWriter& writer, TopicId topic_id, ArrowArrayStream* stream, const std::vector& mappings, - int timestamp_column) { - if (stream == nullptr || stream->get_schema == nullptr || stream->get_next == nullptr) { - return PJ::unexpected("null ArrowArrayStream or missing callbacks"); - } - - nanoarrow::UniqueSchema schema; - int rc = stream->get_schema(stream, schema.get()); - if (rc != NANOARROW_OK) { - const char* err = stream->get_last_error != nullptr ? stream->get_last_error(stream) : nullptr; - return PJ::unexpected(fmt::format("Failed to read schema from ArrowArrayStream: {}", err != nullptr ? err : "")); - } - - return ingestBatchesFromStream(writer, topic_id, stream, schema.get(), mappings, timestamp_column); -} - -} // namespace PJ::arrow_import diff --git a/pj_datastore/src/buffer.cpp b/pj_datastore/src/buffer.cpp deleted file mode 100644 index c1fd360d..00000000 --- a/pj_datastore/src/buffer.cpp +++ /dev/null @@ -1,146 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/buffer.hpp" - -#include -#include - -namespace PJ { - -// --------------------------------------------------------------------------- -// RawBuffer -// --------------------------------------------------------------------------- - -RawBuffer::RawBuffer(std::size_t initial_capacity) { - data_.reserve(initial_capacity); -} - -void RawBuffer::reserve(std::size_t capacity) { - data_.reserve(capacity); -} - -void RawBuffer::append(const void* data, std::size_t size) { - const auto* begin = static_cast(data); - data_.insert(data_.end(), begin, begin + size); -} - -void RawBuffer::resize(std::size_t new_size) { - data_.resize(new_size); -} - -void RawBuffer::clear() { - data_.clear(); -} - -const uint8_t* RawBuffer::data() const noexcept { - return data_.data(); -} - -uint8_t* RawBuffer::mutable_data() noexcept { - return data_.data(); -} - -std::size_t RawBuffer::size() const noexcept { - return data_.size(); -} - -std::size_t RawBuffer::capacity() const noexcept { - return data_.capacity(); -} - -bool RawBuffer::empty() const noexcept { - return data_.empty(); -} - -// --------------------------------------------------------------------------- -// BitVector -// --------------------------------------------------------------------------- - -void BitVector::initValid(std::size_t num_bits) { - bit_count_ = num_bits; - bytes_.resize(bytesForBits(num_bits)); - if (!bytes_.empty()) { - std::memset(bytes_.data(), 0xFF, bytes_.size()); - } -} - -void BitVector::ensureSize(std::size_t num_bits) { - if (num_bits > bit_count_) { - bit_count_ = num_bits; - } - const std::size_t needed = bytesForBits(num_bits); - if (bytes_.size() < needed) { - bytes_.resize(needed); - } -} - -void BitVector::setValid(std::size_t bit_index) { - bytes_[bit_index / 8] |= static_cast(1u << (bit_index % 8)); -} - -void BitVector::setNull(std::size_t bit_index) { - bytes_[bit_index / 8] &= static_cast(~(1u << (bit_index % 8))); -} - -bool BitVector::isValid(std::size_t bit_index) const { - return (bytes_[bit_index / 8] & (1u << (bit_index % 8))) != 0; -} - -std::size_t BitVector::countNulls(std::size_t num_bits) const { - const std::size_t num_bytes = bytesForBits(num_bits); - const uint8_t* ptr = bytes_.data(); - - std::size_t total_set_bits = 0; - - // Process full bytes - const std::size_t full_bytes = num_bits / 8; - for (std::size_t i = 0; i < full_bytes; ++i) { - total_set_bits += static_cast(std::popcount(ptr[i])); - } - - // Process remaining bits in the last partial byte (if any) - const std::size_t remaining_bits = num_bits % 8; - if (remaining_bits > 0 && num_bytes > 0) { - const uint8_t mask = static_cast((1u << remaining_bits) - 1u); - total_set_bits += static_cast(std::popcount(static_cast(ptr[full_bytes] & mask))); - } - - return num_bits - total_set_bits; -} - -void BitVector::assignBytes(Span bytes, std::size_t bit_count) { - bytes_.assign(bytes.begin(), bytes.end()); - bit_count_ = bit_count; -} - -void BitVector::clear() { - bytes_.clear(); - bit_count_ = 0; -} - -PJ::BitSpan BitVector::bitSpan() const noexcept { - return PJ::BitSpan{PJ::Span(bytes_.data(), bytes_.size()), 0, bit_count_}; -} - -const uint8_t* BitVector::data() const noexcept { - return bytes_.data(); -} - -uint8_t* BitVector::mutable_data() noexcept { - return bytes_.data(); -} - -std::size_t BitVector::sizeBytes() const noexcept { - return bytes_.size(); -} - -std::size_t BitVector::sizeBits() const noexcept { - return bit_count_; -} - -bool BitVector::empty() const noexcept { - return bit_count_ == 0; -} - -} // namespace PJ diff --git a/pj_datastore/src/builtin_transforms.cpp b/pj_datastore/src/builtin_transforms.cpp deleted file mode 100644 index b90bf463..00000000 --- a/pj_datastore/src/builtin_transforms.cpp +++ /dev/null @@ -1,53 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/builtin_transforms.hpp" - -namespace PJ { - -// --------------------------------------------------------------------------- -// DerivativeTransform -// --------------------------------------------------------------------------- - -void DerivativeTransform::reset() { - has_prev_ = false; - prev_value_ = 0.0; - prev_time_ = 0; -} - -StorageKind DerivativeTransform::outputKind(StorageKind /*input_kind*/) const { - return StorageKind::kFloat64; -} - -bool DerivativeTransform::calculate( - PJ::Timestamp time, const VarValue& input, PJ::Timestamp& out_time, VarValue& out_value) { - // Input is decoded as VarValue{double} because outputKind() → kFloat64 and - // the engine widens all numeric inputs to double for float64 output columns. - double v = std::visit( - [](const auto& val) -> double { - using T = std::decay_t; - if constexpr (std::is_same_v) { - return 0.0; - } else { - return static_cast(val); - } - }, - input); - - if (!has_prev_) { - prev_time_ = time; - prev_value_ = v; - has_prev_ = true; - return false; // suppress first row — no previous sample - } - - double dt = static_cast(time - prev_time_) * 1e-9; // ns → seconds - out_time = time; - out_value = (dt > 0.0) ? (v - prev_value_) / dt : 0.0; - - prev_time_ = time; - prev_value_ = v; - return true; -} - -} // namespace PJ diff --git a/pj_datastore/src/chunk.cpp b/pj_datastore/src/chunk.cpp deleted file mode 100644 index 41b738e0..00000000 --- a/pj_datastore/src/chunk.cpp +++ /dev/null @@ -1,706 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/chunk.hpp" - -#include -#include -#include -#include -#include -#include - -#include "pj_base/assert.hpp" - -namespace PJ { - -namespace { - -// Dispatch a callable with the correct numeric type tag. -// Returns true if kind is numeric (kFloat32..kUint64), false for kBool/kString. -template -bool dispatch_numeric_kind(StorageKind kind, F&& fn) { - switch (kind) { - case StorageKind::kFloat32: - fn(static_cast(nullptr)); - return true; - case StorageKind::kFloat64: - fn(static_cast(nullptr)); - return true; - case StorageKind::kInt32: - fn(static_cast(nullptr)); - return true; - case StorageKind::kInt64: - fn(static_cast(nullptr)); - return true; - case StorageKind::kUint64: - fn(static_cast(nullptr)); - return true; - default: - return false; - } -} - -// Read a raw numeric value from a buffer, dispatching on StorageKind. -template -[[nodiscard]] R read_raw_as(const RawBuffer& buf, StorageKind kind, std::size_t row) { - const std::size_t elem_size = storageKindSize(kind); - const uint8_t* ptr = buf.data() + row * elem_size; - - R result{}; - dispatch_numeric_kind(kind, [&](const T* /*tag*/) { - T v{}; - std::memcpy(&v, ptr, sizeof(v)); - result = static_cast(v); - }); - return result; -} - -template -struct overloaded : Ts... { - using Ts::operator()...; -}; -template -overloaded(Ts...) -> overloaded; - -} // namespace - -// =========================================================================== -// TopicChunkBuilder -// =========================================================================== - -TopicChunkBuilder::TopicChunkBuilder( - TopicId topic_id, SchemaId schema_id, std::vector columns, uint32_t max_rows) - : topic_id_(topic_id), schema_id_(schema_id), max_rows_(max_rows), column_descriptors_(std::move(columns)) { - columns_.reserve(column_descriptors_.size()); - for (const auto& desc : column_descriptors_) { - columns_.emplace_back(desc); - } - stats_.column_stats.resize(column_descriptors_.size()); - last_column_values_.resize(column_descriptors_.size(), 0.0); -} - -void TopicChunkBuilder::beginRow(Timestamp timestamp) { - PJ_ASSERT(!row_in_progress_, "begin_row called while row already in progress"); - PJ_ASSERT(timestamp >= last_timestamp_, "timestamps must be monotonically non-decreasing"); - row_in_progress_ = true; - current_timestamp_ = timestamp; - last_timestamp_ = timestamp; -} - -// --------------------------------------------------------------------------- -// set — templatized value setters -// --------------------------------------------------------------------------- - -template <> -void TopicChunkBuilder::set(std::size_t col_index, float value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendFloat32(value); - updateColumnStats(col_index, static_cast(value)); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, double value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendFloat64(value); - updateColumnStats(col_index, value); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, int32_t value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendInt32(value); - updateColumnStats(col_index, static_cast(value)); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, int64_t value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendInt64(value); - updateColumnStats(col_index, static_cast(value)); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, uint64_t value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendUint64(value); - updateColumnStats(col_index, static_cast(value)); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, bool value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendBool(value); - updateColumnStats(col_index, value ? 1.0 : 0.0); -} - -template <> -void TopicChunkBuilder::set(std::size_t col_index, std::string_view value) { - PJ_ASSERT(row_in_progress_, "set called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendString(value); - auto& cs = stats_.column_stats[col_index]; - const std::size_t current_row = columns_[col_index].rowCount() - 1; - if (current_row == 0) { - cs.run_count = 1; - } else { - std::string_view prev = columns_[col_index].readString(current_row - 1); - if (value != prev) { - cs.is_constant = false; - cs.run_count++; - } - } -} - -void TopicChunkBuilder::setNull(std::size_t col_index) { - PJ_ASSERT(row_in_progress_, "set_null called without begin_row"); - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendNull(); - stats_.column_stats[col_index].null_count++; -} - -void TopicChunkBuilder::finishRow() { - PJ_ASSERT(row_in_progress_, "finish_row called without begin_row"); - const std::size_t expected = rowCount() + 1; - for (std::size_t i = 0; i < columns_.size(); ++i) { - if (columns_[i].rowCount() < expected) { - columns_[i].appendNull(); - stats_.column_stats[i].null_count++; - } - } - - timestamps_.push_back(current_timestamp_); - stats_.t_min = std::min(stats_.t_min, current_timestamp_); - stats_.t_max = std::max(stats_.t_max, current_timestamp_); - stats_.row_count++; - row_in_progress_ = false; -} - -// --------------------------------------------------------------------------- -// Bulk column append -// --------------------------------------------------------------------------- - -void TopicChunkBuilder::appendTimestamps(Span timestamps) { - PJ_ASSERT(!row_in_progress_, "append_timestamps called while row in progress"); - const std::size_t count = timestamps.size(); - if (count == 0) { - return; - } - - PJ_ASSERT(timestamps[0] >= last_timestamp_, "timestamps must be monotonically non-decreasing"); - - timestamps_.reserve(timestamps_.size() + count); - timestamps_.insert(timestamps_.end(), timestamps.begin(), timestamps.end()); - - stats_.t_min = std::min(stats_.t_min, timestamps[0]); - stats_.t_max = std::max(stats_.t_max, timestamps[count - 1]); - last_timestamp_ = timestamps[count - 1]; - - bulk_pending_rows_ = count; -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendFloat32Bulk(data); -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendFloat64Bulk(data); -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendInt32Bulk(data); -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendInt64Bulk(data); -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendUint64Bulk(data); -} - -template <> -void TopicChunkBuilder::appendColumn(std::size_t col_index, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendBoolBulk(data); -} - -void TopicChunkBuilder::appendColumnStrings( - std::size_t col_index, Span offsets, Span data) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendStringsBulk(offsets, data); -} - -void TopicChunkBuilder::appendColumnValidity(std::size_t col_index, BitSpan validity) { - PJ_ASSERT(col_index < columns_.size(), "col_index out of bounds"); - columns_[col_index].appendValidityBulk(validity); -} - -void TopicChunkBuilder::finishBulkAppend() { - PJ_ASSERT(!row_in_progress_, "finish_bulk_append called while row in progress"); - if (bulk_pending_rows_ == 0) { - return; - } - - const std::size_t count = bulk_pending_rows_; - - for (std::size_t col = 0; col < columns_.size(); ++col) { - PJ_ASSERT( - columns_[col].rowCount() >= count, - "finishBulkAppend: column has fewer rows than bulk_pending_rows_ — " - "appendColumn*() must be called with exactly bulk_pending_rows_ values"); - const std::size_t first_row = columns_[col].rowCount() - count; - const auto kind = storageKindOf(column_descriptors_[col].logical_type); - - if (kind == StorageKind::kString) { - computeBulkStringStats(col, first_row, count); - } else { - computeBulkNumericStats(col, kind, first_row, count); - } - } - - stats_.row_count += static_cast(count); - bulk_pending_rows_ = 0; -} - -uint32_t TopicChunkBuilder::remainingCapacity() const noexcept { - return max_rows_ - rowCount(); -} - -// --------------------------------------------------------------------------- -// Bulk stats helpers -// --------------------------------------------------------------------------- - -void TopicChunkBuilder::computeBulkNumericStats( - std::size_t col_index, StorageKind kind, std::size_t first_row, std::size_t count) { - if (count == 0) { - return; - } - - auto& cs = stats_.column_stats[col_index]; - const auto& col = columns_[col_index]; - const bool has_validity = col.hasNulls(); - - auto process = [&](const T* /*tag*/) { - const auto* buf = reinterpret_cast(col.valueBuffer().data()); - double local_min = cs.min_value.value_or(std::numeric_limits::max()); - double local_max = cs.max_value.value_or(std::numeric_limits::lowest()); - double prev = last_column_values_[col_index]; - bool had_valid = cs.run_count > 0; - - for (std::size_t i = 0; i < count; ++i) { - const std::size_t row = first_row + i; - if (has_validity && !col.isValid(row)) { - cs.null_count++; - continue; - } - const double v = static_cast(buf[row]); - if (v < local_min) { - local_min = v; - } - if (v > local_max) { - local_max = v; - } - if (!had_valid) { - cs.run_count = 1; - had_valid = true; - } else if (v != prev) { - cs.is_constant = false; - cs.run_count++; - } - prev = v; - } - if (had_valid) { - cs.min_value = local_min; - cs.max_value = local_max; - } - last_column_values_[col_index] = prev; - }; - - if (!dispatch_numeric_kind(kind, process)) { - if (kind == StorageKind::kBool) { - const auto* buf = col.valueBuffer().data(); - double prev = last_column_values_[col_index]; - double local_min = cs.min_value.value_or(std::numeric_limits::max()); - double local_max = cs.max_value.value_or(std::numeric_limits::lowest()); - bool had_valid = cs.run_count > 0; - for (std::size_t i = 0; i < count; ++i) { - const std::size_t row = first_row + i; - if (has_validity && !col.isValid(row)) { - cs.null_count++; - continue; - } - const double v = buf[row] ? 1.0 : 0.0; - if (v < local_min) { - local_min = v; - } - if (v > local_max) { - local_max = v; - } - if (!had_valid) { - cs.run_count = 1; - had_valid = true; - } else if (v != prev) { - cs.is_constant = false; - cs.run_count++; - } - prev = v; - } - if (had_valid) { - cs.min_value = local_min; - cs.max_value = local_max; - } - last_column_values_[col_index] = prev; - } - } -} - -void TopicChunkBuilder::computeBulkStringStats(std::size_t col_index, std::size_t first_row, std::size_t count) { - if (count == 0) { - return; - } - - auto& cs = stats_.column_stats[col_index]; - const auto& col = columns_[col_index]; - const bool has_validity = col.hasNulls(); - - std::optional last_valid; - - if (first_row > 0 && cs.run_count > 0) { - for (std::size_t j = first_row; j > 0; --j) { - if (!has_validity || col.isValid(j - 1)) { - last_valid = col.readString(j - 1); - break; - } - } - } - - for (std::size_t i = 0; i < count; ++i) { - const std::size_t row = first_row + i; - if (has_validity && !col.isValid(row)) { - cs.null_count++; - continue; - } - - std::string_view current = col.readString(row); - if (!last_valid.has_value()) { - cs.run_count = 1; - } else if (current != *last_valid) { - cs.is_constant = false; - cs.run_count++; - } - last_valid = current; - } -} - -bool TopicChunkBuilder::isFull() const noexcept { - return rowCount() >= max_rows_; -} - -uint32_t TopicChunkBuilder::rowCount() const noexcept { - return stats_.row_count; -} - -bool TopicChunkBuilder::isRowInProgress() const noexcept { - return row_in_progress_; -} - -const ChunkStats& TopicChunkBuilder::stats() const noexcept { - return stats_; -} - -Timestamp TopicChunkBuilder::lastTimestamp() const noexcept { - return last_timestamp_; -} - -void TopicChunkBuilder::updateColumnStats(std::size_t col_index, double value) { - auto& cs = stats_.column_stats[col_index]; - const std::size_t current_row = columns_[col_index].rowCount() - 1; - - if (!cs.min_value.has_value() || value < *cs.min_value) { - cs.min_value = value; - } - if (!cs.max_value.has_value() || value > *cs.max_value) { - cs.max_value = value; - } - - if (current_row == 0) { - cs.run_count = 1; - } else { - if (value != last_column_values_[col_index]) { - cs.is_constant = false; - cs.run_count++; - } - } - last_column_values_[col_index] = value; -} - -// --------------------------------------------------------------------------- -// seal -// --------------------------------------------------------------------------- - -TopicChunk TopicChunkBuilder::seal() { - TopicChunk chunk; - chunk.id = next_chunk_id_++; - chunk.topic_id = topic_id_; - chunk.schema_version = schema_id_; - chunk.stats = stats_; - - chunk.timestamps = std::move(timestamps_); - - const std::size_t num_cols = columns_.size(); - chunk.columns.resize(num_cols); - - for (std::size_t i = 0; i < num_cols; ++i) { - const auto& col = columns_[i]; - const StorageKind kind = storageKindOf(column_descriptors_[i].logical_type); - const auto& cs = stats_.column_stats[i]; - - chunk.columns[i].descriptor = std::make_shared(column_descriptors_[i]); - - switch (kind) { - case StorageKind::kString: { - chunk.columns[i].data = encoding::dictionaryEncodeStrings( - Span(col.offsetsBuffer().data(), col.offsetsBuffer().size()), - Span(col.valueBuffer().data(), col.valueBuffer().size()), col.rowCount()); - break; - } - case StorageKind::kBool: { - if (cs.is_constant && col.rowCount() > 0) { - chunk.columns[i].data = encoding::constantEncode( - Span(col.valueBuffer().data(), col.valueBuffer().size()), kind, col.rowCount()); - } else { - chunk.columns[i].data = encoding::packBools(Span(col.valueBuffer().data(), col.rowCount())); - } - break; - } - case StorageKind::kInt32: - case StorageKind::kInt64: { - // Compute exact integer min/max from the raw column buffer to avoid - // precision loss from the double-based stats (BUG-1/2). - const std::size_t row_count = col.rowCount(); - const uint8_t* buf_data = col.valueBuffer().data(); - const std::size_t esize = storageKindSize(kind); - - int64_t exact_min = std::numeric_limits::max(); - int64_t exact_max = std::numeric_limits::min(); - bool exact_is_constant = true; - int64_t first_val{}; - - for (std::size_t r = 0; r < row_count; ++r) { - int64_t v{}; - if (kind == StorageKind::kInt32) { - int32_t tmp{}; - std::memcpy(&tmp, buf_data + r * esize, sizeof(tmp)); - v = tmp; - } else { - std::memcpy(&v, buf_data + r * esize, sizeof(v)); - } - if (r == 0) { - first_val = v; - } else if (v != first_val) { - exact_is_constant = false; - } - exact_min = std::min(exact_min, v); - exact_max = std::max(exact_max, v); - } - - if (exact_is_constant && row_count > 0) { - chunk.columns[i].data = - encoding::constantEncode(Span(buf_data, col.valueBuffer().size()), kind, row_count); - } else if (row_count > 0) { - const auto range = static_cast(exact_max - exact_min); - const uint8_t ob = encoding::offsetBytesFor(range); - - if (ob < storageKindSize(kind)) { - chunk.columns[i].data = encoding::forEncode( - Span(buf_data, col.valueBuffer().size()), kind, row_count, exact_min, exact_max); - } else { - RawBuffer raw; - raw.append(buf_data, col.valueBuffer().size()); - chunk.columns[i].data = std::move(raw); - } - } else { - RawBuffer raw; - raw.append(buf_data, col.valueBuffer().size()); - chunk.columns[i].data = std::move(raw); - } - break; - } - default: { - if (cs.is_constant && col.rowCount() > 0) { - chunk.columns[i].data = encoding::constantEncode( - Span(col.valueBuffer().data(), col.valueBuffer().size()), kind, col.rowCount()); - } else { - RawBuffer raw; - raw.append(col.valueBuffer().data(), col.valueBuffer().size()); - chunk.columns[i].data = std::move(raw); - } - break; - } - } - - if (col.hasNulls()) { - BitVector bv; - bv.assignBytes( - Span(col.validityBuffer().data(), col.validityBuffer().sizeBytes()), - col.validityBuffer().sizeBits()); - chunk.columns[i].validity_bitmap = std::move(bv); - } - } - - return chunk; -} - -// =========================================================================== -// TopicChunk decode helpers -// =========================================================================== - -EncodingType TopicChunk::columnEncoding(std::size_t index) const { - return std::visit( - overloaded{ - [](const RawBuffer&) { return EncodingType::kRaw; }, - [](const encoding::ConstantEncoded&) { return EncodingType::kConstant; }, - [](const encoding::FrameOfReferenceEncoded&) { return EncodingType::kFrameOfReference; }, - [](const encoding::DictionaryEncoded&) { return EncodingType::kDictionary; }, - [](const encoding::PackedBools&) { return EncodingType::kPackedBool; }, - }, - columns[index].data); -} - -Timestamp TopicChunk::readTimestamp(std::size_t row) const { - return timestamps[row]; -} - -void TopicChunk::readTimestamps(Span out, std::size_t row_start) const { - std::memcpy(out.data(), timestamps.data() + row_start, out.size() * sizeof(Timestamp)); -} - -double TopicChunk::readNumericAsDouble(std::size_t col_index, std::size_t row) const { - const auto& col = columns[col_index]; - return std::visit( - overloaded{ - [&](const RawBuffer& buf) { - return read_raw_as(buf, storageKindOf(col.descriptor->logical_type), row); - }, - [](const encoding::ConstantEncoded& enc) { return encoding::constantDecodeAsDouble(enc); }, - [row](const encoding::FrameOfReferenceEncoded& enc) { return encoding::forDecodeOneAsDouble(enc, row); }, - [](const auto&) { return 0.0; }, - }, - col.data); -} - -int64_t TopicChunk::readNumericAsInt64(std::size_t col_index, std::size_t row) const { - const auto& col = columns[col_index]; - return std::visit( - overloaded{ - [&](const RawBuffer& buf) { - return read_raw_as(buf, storageKindOf(col.descriptor->logical_type), row); - }, - [](const encoding::ConstantEncoded& enc) { return encoding::constantDecodeAsInt64(enc); }, - [row](const encoding::FrameOfReferenceEncoded& enc) { return encoding::forDecodeOneAsInt64(enc, row); }, - [](const auto&) { return static_cast(0); }, - }, - col.data); -} - -uint64_t TopicChunk::readNumericAsUint64(std::size_t col_index, std::size_t row) const { - const auto& col = columns[col_index]; - return std::visit( - overloaded{ - [&](const RawBuffer& buf) { - return read_raw_as(buf, storageKindOf(col.descriptor->logical_type), row); - }, - [](const encoding::ConstantEncoded& enc) { return encoding::constantDecodeAsUint64(enc); }, - [row](const encoding::FrameOfReferenceEncoded& enc) { - return static_cast(encoding::forDecodeOneAsInt64(enc, row)); - }, - [](const auto&) { return static_cast(0); }, - }, - col.data); -} - -std::string_view TopicChunk::readString(std::size_t col_index, std::size_t row) const { - return encoding::dictionaryLookup(std::get(columns[col_index].data), row); -} - -bool TopicChunk::readBool(std::size_t col_index, std::size_t row) const { - return std::visit( - overloaded{ - [](const encoding::ConstantEncoded& enc) { - uint8_t v = 0; - std::memcpy(&v, enc.value_bytes.data(), sizeof(v)); - return v != 0; - }, - [row](const encoding::PackedBools& enc) { return encoding::unpackBool(enc, row); }, - [](const auto&) { return false; }, - }, - columns[col_index].data); -} - -bool TopicChunk::isNull(std::size_t col_index, std::size_t row) const { - const auto& bm = columns[col_index].validity_bitmap; - if (!bm.has_value() || bm->empty()) { - return false; - } - return !bm->isValid(row); -} - -void TopicChunk::readColumnAsDoubles(std::size_t col_index, Span out, std::size_t row_start) const { - const std::size_t count = out.size(); - const auto& col = columns[col_index]; - std::visit( - overloaded{ - [&](const encoding::ConstantEncoded& enc) { - std::fill(out.begin(), out.end(), encoding::constantDecodeAsDouble(enc)); - }, - [&](const encoding::FrameOfReferenceEncoded& enc) { encoding::forDecodeRangeAsDoubles(enc, out, row_start); }, - [&](const RawBuffer& buf) { - const StorageKind kind = storageKindOf(col.descriptor->logical_type); - const uint8_t* base = buf.data(); - const std::size_t esize = storageKindSize(kind); - auto convert = [&](const T* /*tag*/) { - const uint8_t* src = base + row_start * esize; - for (std::size_t i = 0; i < count; ++i) { - T v{}; - std::memcpy(&v, src + i * esize, sizeof(v)); - out[i] = static_cast(v); - } - }; - if (!dispatch_numeric_kind(kind, convert)) { - std::fill(out.begin(), out.end(), std::numeric_limits::quiet_NaN()); - } - }, - [&](const auto&) { std::fill(out.begin(), out.end(), std::numeric_limits::quiet_NaN()); }, - }, - col.data); - - // Replace values at null positions with NaN so consumers don't confuse - // null (no data) with actual zero values. - if (col.validity_bitmap.has_value()) { - const auto& bm = *col.validity_bitmap; - for (std::size_t i = 0; i < count; ++i) { - if (!bm.isValid(row_start + i)) { - out[i] = std::numeric_limits::quiet_NaN(); - } - } - } -} - -} // namespace PJ diff --git a/pj_datastore/src/colormap_registry.cpp b/pj_datastore/src/colormap_registry.cpp deleted file mode 100644 index d7c61b3e..00000000 --- a/pj_datastore/src/colormap_registry.cpp +++ /dev/null @@ -1,45 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/colormap_registry.hpp" - -namespace PJ { - -void ColorMapRegistry::registerMap(std::string_view name, ColorMapEvalFn eval_fn, void* user_ctx) { - std::string key(name); - maps_[key] = Entry{eval_fn, user_ctx}; - active_ = std::move(key); -} - -void ColorMapRegistry::unregisterMap(std::string_view name) { - std::string key(name); - maps_.erase(key); - if (active_ == key) { - active_.clear(); - } -} - -void ColorMapRegistry::setActive(std::string_view name) { - std::string key(name); - if (maps_.find(key) != maps_.end()) { - active_ = std::move(key); - } -} - -std::string ColorMapRegistry::evaluate(double value) const { - if (active_.empty()) { - return {}; - } - auto it = maps_.find(active_); - if (it == maps_.end()) { - return {}; - } - const char* result = it->second.eval_fn(value, it->second.user_ctx); - return result ? std::string{result} : std::string{}; -} - -bool ColorMapRegistry::hasActive() const { - return !active_.empty() && maps_.find(active_) != maps_.end(); -} - -} // namespace PJ diff --git a/pj_datastore/src/colormap_registry_host.cpp b/pj_datastore/src/colormap_registry_host.cpp deleted file mode 100644 index d00aac0d..00000000 --- a/pj_datastore/src/colormap_registry_host.cpp +++ /dev/null @@ -1,73 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/colormap_registry_host.hpp" - -#include - -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_datastore/colormap_registry.hpp" - -namespace PJ { - -namespace { - -std::string_view toStringView(PJ_string_view_t s) { - return std::string_view(s.data, s.size); -} - -bool registryRegisterMap( - void* ctx, PJ_string_view_t name, const char* (*eval_fn)(double, void*), void* user_ctx, - PJ_error_t* out_error) noexcept { - if (ctx == nullptr || eval_fn == nullptr) { - sdk::fillError(out_error, 2, "colormap", "null registry ctx or eval_fn"); - return false; - } - auto* reg = static_cast(ctx); - try { - reg->registerMap(toStringView(name), eval_fn, user_ctx); - } catch (const std::exception& e) { - sdk::fillError(out_error, 1, "colormap", std::string("registerMap threw: ") + e.what()); - return false; - } catch (...) { - sdk::fillError(out_error, 1, "colormap", "registerMap threw unknown exception"); - return false; - } - return true; -} - -bool registryUnregisterMap(void* ctx, PJ_string_view_t name, PJ_error_t* out_error) noexcept { - if (ctx == nullptr) { - sdk::fillError(out_error, 2, "colormap", "null registry ctx"); - return false; - } - auto* reg = static_cast(ctx); - try { - reg->unregisterMap(toStringView(name)); - } catch (const std::exception& e) { - sdk::fillError(out_error, 1, "colormap", std::string("unregisterMap threw: ") + e.what()); - return false; - } catch (...) { - sdk::fillError(out_error, 1, "colormap", "unregisterMap threw unknown exception"); - return false; - } - return true; -} - -constexpr PJ_colormap_registry_vtable_t kRegistryVTable = { - PJ_PLUGIN_DATA_API_VERSION, - sizeof(PJ_colormap_registry_vtable_t), - registryRegisterMap, - registryUnregisterMap, -}; - -} // namespace - -PJ_colormap_registry_t makeColorMapRegistryHost(ColorMapRegistry& registry) { - return PJ_colormap_registry_t{ - .ctx = ®istry, - .vtable = &kRegistryVTable, - }; -} - -} // namespace PJ diff --git a/pj_datastore/src/column_buffer.cpp b/pj_datastore/src/column_buffer.cpp deleted file mode 100644 index 413fec49..00000000 --- a/pj_datastore/src/column_buffer.cpp +++ /dev/null @@ -1,373 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/column_buffer.hpp" - -#include -#include -#include -#include -#include - -namespace PJ { - -// --------------------------------------------------------------------------- -// Construction -// --------------------------------------------------------------------------- - -TypedColumnBuffer::TypedColumnBuffer(ColumnDescriptor descriptor) : descriptor_(std::move(descriptor)) {} - -// --------------------------------------------------------------------------- -// Accessors -// --------------------------------------------------------------------------- - -const ColumnDescriptor& TypedColumnBuffer::descriptor() const noexcept { - return descriptor_; -} - -std::size_t TypedColumnBuffer::rowCount() const noexcept { - return row_count_; -} - -bool TypedColumnBuffer::hasNulls() const noexcept { - return null_count_ > 0; -} - -bool TypedColumnBuffer::isValid(std::size_t row) const noexcept { - if (!validity_initialized_) { - return true; - } - return validity_.isValid(row); -} - -// --------------------------------------------------------------------------- -// Underlying buffers -// --------------------------------------------------------------------------- - -const RawBuffer& TypedColumnBuffer::valueBuffer() const noexcept { - return values_; -} - -const BitVector& TypedColumnBuffer::validityBuffer() const noexcept { - return validity_; -} - -const RawBuffer& TypedColumnBuffer::offsetsBuffer() const noexcept { - return offsets_; -} - -// --------------------------------------------------------------------------- -// Validity (lazy initialization) -// --------------------------------------------------------------------------- - -void TypedColumnBuffer::ensureValidityInitialized() { - if (validity_initialized_) { - return; - } - // Initialize bitmap with all existing rows marked valid. - validity_.initValid(row_count_); - validity_initialized_ = true; -} - -// --------------------------------------------------------------------------- -// Fixed-size append / read templates -// --------------------------------------------------------------------------- - -template -void TypedColumnBuffer::appendFixed(T value) { - values_.append(&value, sizeof(T)); - if (validity_initialized_) { - // Ensure bitmap has room for this row, then mark valid. - validity_.ensureSize(row_count_ + 1); - validity_.setValid(row_count_); - } - ++row_count_; -} - -template -T TypedColumnBuffer::readFixed(std::size_t row) const { - T result{}; - std::memcpy(&result, values_.data() + row * sizeof(T), sizeof(T)); - return result; -} - -// --------------------------------------------------------------------------- -// Typed append functions (6 storage types) -// --------------------------------------------------------------------------- - -void TypedColumnBuffer::appendFloat32(float value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kFloat32); - appendFixed(value); -} - -void TypedColumnBuffer::appendFloat64(double value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kFloat64); - appendFixed(value); -} - -void TypedColumnBuffer::appendInt32(int32_t value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kInt32); - appendFixed(value); -} - -void TypedColumnBuffer::appendInt64(int64_t value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kInt64); - appendFixed(value); -} - -void TypedColumnBuffer::appendUint64(uint64_t value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kUint64); - appendFixed(value); -} - -void TypedColumnBuffer::appendBool(bool value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kBool); - const auto byte = static_cast(value ? 1 : 0); - appendFixed(byte); -} - -void TypedColumnBuffer::appendString(std::string_view value) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kString); - // Write the initial offset (0) on first row. - if (row_count_ == 0) { - const uint32_t zero = 0; - offsets_.append(&zero, sizeof(zero)); - } - - // Append string bytes to value buffer. - values_.append(value.data(), value.size()); - - // Append new end offset. - const auto end_offset = static_cast(values_.size()); - offsets_.append(&end_offset, sizeof(end_offset)); - - // Update validity if initialized. - if (validity_initialized_) { - validity_.ensureSize(row_count_ + 1); - validity_.setValid(row_count_); - } - - ++row_count_; -} - -void TypedColumnBuffer::appendNull() { - ensureValidityInitialized(); - - // Ensure bitmap has room for this row. - validity_.ensureSize(row_count_ + 1); - validity_.setNull(row_count_); - - // Append zero bytes for the value slot. - const StorageKind kind = storageKindOf(descriptor_.logical_type); - if (kind == StorageKind::kString) { - // For strings: write the initial offset if this is the first row. - if (row_count_ == 0) { - const uint32_t zero = 0; - offsets_.append(&zero, sizeof(zero)); - } - // Duplicate the current end offset (no new string data). - const auto current_offset = static_cast(values_.size()); - offsets_.append(¤t_offset, sizeof(current_offset)); - } else { - const std::size_t type_size = storageKindSize(kind); - // Append type_size zero bytes. - const uint64_t zero = 0; // 8 bytes, enough for any fixed type - values_.append(&zero, type_size); - } - - ++row_count_; - ++null_count_; -} - -// --------------------------------------------------------------------------- -// Bulk fixed-size append template -// --------------------------------------------------------------------------- - -template -void TypedColumnBuffer::appendFixedBulk(Span data) { - const std::size_t count = data.size(); - if (count == 0) { - return; - } - values_.reserve(values_.size() + count * sizeof(T)); - values_.append(data.data(), count * sizeof(T)); - if (validity_initialized_) { - const std::size_t new_total = row_count_ + count; - validity_.ensureSize(new_total); - for (std::size_t i = row_count_; i < new_total; ++i) { - validity_.setValid(i); - } - } - row_count_ += count; -} - -// --------------------------------------------------------------------------- -// Typed bulk append functions (7 storage types) -// --------------------------------------------------------------------------- - -void TypedColumnBuffer::appendFloat32Bulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kFloat32); - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendFloat64Bulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kFloat64); - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendInt32Bulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kInt32); - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendInt64Bulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kInt64); - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendUint64Bulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kUint64); - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendBoolBulk(Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kBool); - // Bool stored as uint8_t per element (1 byte per bool, not packed) - appendFixedBulk(data); -} - -void TypedColumnBuffer::appendStringsBulk(Span offsets, Span data) { - assert(storageKindOf(descriptor_.logical_type) == StorageKind::kString); - if (offsets.empty()) { - return; - } - const std::size_t count = offsets.size() - 1; - if (count == 0) { - return; - } - - // Write the initial offset (0) on first row if buffer is empty - const uint32_t base_data_offset = static_cast(values_.size()); - if (row_count_ == 0) { - const uint32_t zero = 0; - offsets_.append(&zero, sizeof(zero)); - } - - // Append all string data at once - const uint32_t total_string_bytes = offsets[count] - offsets[0]; - assert(offsets[0] <= static_cast(data.size())); - assert(offsets[count] <= static_cast(data.size())); - values_.reserve(values_.size() + total_string_bytes); - values_.append(data.data() + offsets[0], total_string_bytes); - - // Append adjusted offsets (rebase to our value buffer position) - const uint32_t src_base = offsets[0]; - for (std::size_t i = 1; i <= count; ++i) { - const uint32_t adjusted = base_data_offset + (offsets[i] - src_base); - offsets_.append(&adjusted, sizeof(adjusted)); - } - - // Update validity if initialized - if (validity_initialized_) { - const std::size_t new_total = row_count_ + count; - validity_.ensureSize(new_total); - for (std::size_t i = row_count_; i < new_total; ++i) { - validity_.setValid(i); - } - } - - row_count_ += count; -} - -void TypedColumnBuffer::appendValidityBulk(BitSpan validity) { - const std::size_t count = validity.bit_length; - if (count == 0) { - return; - } - ensureValidityInitialized(); - - // The validity bitmap covers the last `count` rows that were just appended. - // row_count_ already includes them, so the range is - // [row_count_ - count, row_count_). - const std::size_t start_row = row_count_ - count; - validity_.ensureSize(row_count_); - - for (std::size_t i = 0; i < count; ++i) { - const bool valid = validity.test(i); - if (valid) { - validity_.setValid(start_row + i); - } else { - validity_.setNull(start_row + i); - ++null_count_; - } - } -} - -// --------------------------------------------------------------------------- -// Typed read functions (6 storage types) -// --------------------------------------------------------------------------- - -float TypedColumnBuffer::readFloat32(std::size_t row) const { - return readFixed(row); -} - -double TypedColumnBuffer::readFloat64(std::size_t row) const { - return readFixed(row); -} - -int32_t TypedColumnBuffer::readInt32(std::size_t row) const { - return readFixed(row); -} - -int64_t TypedColumnBuffer::readInt64(std::size_t row) const { - return readFixed(row); -} - -uint64_t TypedColumnBuffer::readUint64(std::size_t row) const { - return readFixed(row); -} - -bool TypedColumnBuffer::readBool(std::size_t row) const { - return readFixed(row) != 0; -} - -std::string_view TypedColumnBuffer::readString(std::size_t row) const { - uint32_t start_offset = 0; - uint32_t end_offset = 0; - std::memcpy(&start_offset, offsets_.data() + row * sizeof(uint32_t), sizeof(uint32_t)); - std::memcpy(&end_offset, offsets_.data() + (row + 1) * sizeof(uint32_t), sizeof(uint32_t)); - return {reinterpret_cast(values_.data()) + start_offset, end_offset - start_offset}; -} - -bool TypedColumnBuffer::isNull(std::size_t row) const { - if (!validity_initialized_) { - return false; - } - return !validity_.isValid(row); -} - -// --------------------------------------------------------------------------- -// read_as_double -// --------------------------------------------------------------------------- - -double TypedColumnBuffer::readAsDouble(std::size_t row) const { - switch (storageKindOf(descriptor_.logical_type)) { - case StorageKind::kFloat32: - return static_cast(readFloat32(row)); - case StorageKind::kFloat64: - return readFloat64(row); - case StorageKind::kInt32: - return static_cast(readInt32(row)); - case StorageKind::kInt64: - return static_cast(readInt64(row)); - case StorageKind::kUint64: - return static_cast(readUint64(row)); - case StorageKind::kBool: - return readBool(row) ? 1.0 : 0.0; - case StorageKind::kString: - return std::numeric_limits::quiet_NaN(); - } - return std::numeric_limits::quiet_NaN(); // unreachable -} - -} // namespace PJ diff --git a/pj_datastore/src/derived_engine.cpp b/pj_datastore/src/derived_engine.cpp deleted file mode 100644 index d47d9537..00000000 --- a/pj_datastore/src/derived_engine.cpp +++ /dev/null @@ -1,1015 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/derived_engine.hpp" - -#include -#include -#include - -#include -#include -#include -#include -#include -#include - -#include "pj_base/type_tree.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { - -// --------------------------------------------------------------------------- -// Helpers -// --------------------------------------------------------------------------- - -// Walk a TypeTreeNode DFS to find the first primitive leaf's PrimitiveType. -static std::optional find_first_leaf(const PJ::TypeTreeNode& node) { - switch (node.kind) { - case PJ::TypeKind::kPrimitive: - return node.primitive_type; - case PJ::TypeKind::kEnum: - return node.primitive_type; // set by make_enum via primitive_type field - case PJ::TypeKind::kStruct: - for (const auto& child : node.children) { - if (auto r = find_first_leaf(*child)) { - return r; - } - } - return std::nullopt; - case PJ::TypeKind::kArray: - if (node.element_type) { - return find_first_leaf(*node.element_type); - } - return std::nullopt; - } - return std::nullopt; -} - -static PJ::PrimitiveType storage_kind_to_primitive(StorageKind k) { - switch (k) { - case StorageKind::kFloat32: - return PJ::PrimitiveType::kFloat32; - case StorageKind::kFloat64: - return PJ::PrimitiveType::kFloat64; - case StorageKind::kInt32: - return PJ::PrimitiveType::kInt32; - case StorageKind::kInt64: - return PJ::PrimitiveType::kInt64; - case StorageKind::kUint64: - return PJ::PrimitiveType::kUint64; - case StorageKind::kBool: - return PJ::PrimitiveType::kBool; - case StorageKind::kString: - return PJ::PrimitiveType::kString; - } - return PJ::PrimitiveType::kFloat64; -} - -// Decode one row of a chunk column into a VarValue, based on the column's StorageKind. -static VarValue decode_as_varvalue(const TopicChunk& chunk, std::size_t col, std::size_t row, StorageKind kind) { - switch (kind) { - case StorageKind::kFloat32: - case StorageKind::kFloat64: - return chunk.readNumericAsDouble(col, row); - case StorageKind::kInt32: - case StorageKind::kInt64: - return chunk.readNumericAsInt64(col, row); - case StorageKind::kUint64: - return chunk.readNumericAsUint64(col, row); - case StorageKind::kBool: - return static_cast(chunk.readBool(col, row) ? 1 : 0); - case StorageKind::kString: - return std::string(chunk.readString(col, row)); - } - return 0.0; -} - -// Write a VarValue to a DataWriter row at (topic, col), coercing to out_kind. -static void write_varvalue( - DataWriter& writer, PJ::TopicId tid, std::size_t col, const VarValue& val, StorageKind out_kind) { - if (out_kind == StorageKind::kString) { - if (const auto* s = std::get_if(&val)) { - writer.set(tid, col, std::string_view(*s)); - } - return; - } - - // Integer→integer fast paths: avoid the lossy double round-trip. - if (out_kind == StorageKind::kUint64) { - if (const auto* u = std::get_if(&val)) { - writer.set(tid, col, *u); - return; - } - if (const auto* i = std::get_if(&val)) { - writer.set(tid, col, static_cast(*i)); - return; - } - } - if (out_kind == StorageKind::kInt64) { - if (const auto* i = std::get_if(&val)) { - writer.set(tid, col, *i); - return; - } - if (const auto* u = std::get_if(&val)) { - writer.set(tid, col, static_cast(*u)); - return; - } - } - if (out_kind == StorageKind::kInt32) { - if (const auto* i = std::get_if(&val)) { - writer.set(tid, col, static_cast(*i)); - return; - } - if (const auto* u = std::get_if(&val)) { - writer.set(tid, col, static_cast(*u)); - return; - } - } - - // Fallback: extract as double then coerce to the target type. - double dval = std::visit( - [](const auto& v) -> double { - using T = std::decay_t; - if constexpr (std::is_same_v) { - return 0.0; - } else { - return static_cast(v); - } - }, - val); - - switch (out_kind) { - case StorageKind::kFloat32: - writer.set(tid, col, static_cast(dval)); - break; - case StorageKind::kFloat64: - writer.set(tid, col, dval); - break; - case StorageKind::kInt32: - writer.set(tid, col, static_cast(dval)); - break; - case StorageKind::kInt64: - writer.set(tid, col, static_cast(dval)); - break; - case StorageKind::kUint64: - writer.set(tid, col, static_cast(dval)); - break; - case StorageKind::kBool: - writer.set(tid, col, dval != 0.0); - break; - case StorageKind::kString: - break; // handled above - } -} - -// --------------------------------------------------------------------------- -// Internal data structures (hidden in .cpp — not exposed in header) -// --------------------------------------------------------------------------- - -struct DerivedNode { - PJ::NodeId id = PJ::kInvalidNodeId; - bool is_mimo = false; - - // SISO fields - PJ::TopicId siso_input_topic_id = 0; - StorageKind siso_input_kind = StorageKind::kFloat64; - StorageKind siso_output_kind = StorageKind::kFloat64; - std::unique_ptr siso_op; - - // MIMO fields (flat list; no primary/secondary distinction) - std::vector mimo_input_topic_ids; - std::vector mimo_input_kinds; - std::vector mimo_output_kinds; - std::unique_ptr mimo_op; - - // Common - std::vector all_input_topic_ids; // unified input list for all types - std::vector output_topic_ids; // 1 for SISO, M for MIMO - bool dirty = true; - PJ::ChunkId last_processed_chunk_id = 0; // SISO: chunk watermark - PJ::Timestamp mimo_last_ts = std::numeric_limits::min(); // MIMO: timestamp watermark - - // Reusable decode buffers (avoid per-row allocation) - VarValue in_val_buf = 0.0; // SISO input - VarValue out_val_buf = 0.0; // SISO output - std::vector mimo_in_buf; // MIMO inputs - std::vector mimo_out_buf; // MIMO outputs -}; - -struct DatasetNameHash { - std::size_t operator()(const std::pair& key) const noexcept { - std::size_t h1 = std::hash{}(key.first); - std::size_t h2 = std::hash{}(key.second); - return h1 ^ (h2 << 1); - } -}; - -struct DerivedEngineImpl { - tsl::robin_map nodes; - - // downstream_of[N] = list of nodes whose inputs include an output of N - tsl::robin_map> downstream_of; - - // topic_to_nodes[T] = list of nodes that use T as an input - tsl::robin_map> topic_to_nodes; - - // output_topic_to_node[T] = node that produces T (for cycle detection) - tsl::robin_map output_topic_to_node; - - // Name uniqueness within dataset: (dataset_id, topic_name) → topic_id - tsl::robin_map, PJ::TopicId, DatasetNameHash> registered_output_names; -}; - -// --------------------------------------------------------------------------- -// DerivedEngine — constructor / destructor -// --------------------------------------------------------------------------- - -DerivedEngine::DerivedEngine(DataEngine& engine) : engine_(engine), impl_(std::make_unique()) {} - -DerivedEngine::~DerivedEngine() = default; - -// --------------------------------------------------------------------------- -// Cycle detection (DFS) -// --------------------------------------------------------------------------- -// Returns an error string if adding a node with `input_topics → output_topics` -// would create a cycle. Otherwise returns empty string. -static std::string check_cycle( - const DerivedEngineImpl& impl, const std::vector& input_topics, - const std::vector& output_topics) { - tsl::robin_set outputs(output_topics.begin(), output_topics.end()); - - // DFS from each input: follow upstream edges (output → producing node → its inputs). - // If we ever reach a topic in `outputs`, we have a cycle. - std::vector stack(input_topics.begin(), input_topics.end()); - tsl::robin_set visited; - - while (!stack.empty()) { - PJ::TopicId t = stack.back(); - stack.pop_back(); - if (!visited.insert(t).second) { - continue; - } - - if (outputs.contains(t)) { - return fmt::format("cycle detected: topic {} is both an input and an output", t); - } - - auto it = impl.output_topic_to_node.find(t); - if (it == impl.output_topic_to_node.end()) { - continue; // source topic, no upstream - } - - PJ::NodeId producer = it->second; - auto nit = impl.nodes.find(producer); - if (nit == impl.nodes.end()) { - continue; - } - - for (PJ::TopicId in : nit->second.all_input_topic_ids) { - if (!visited.contains(in)) { - stack.push_back(in); - } - } - } - return ""; // no cycle -} - -// --------------------------------------------------------------------------- -// add_siso_transform -// --------------------------------------------------------------------------- - -PJ::Expected DerivedEngine::addSisoTransform( - PJ::TopicId input_topic_id, std::string output_topic_name, PJ::DatasetId output_dataset_id, - std::unique_ptr op) { - // 1. Check input topic exists - const TopicStorage* in_storage = engine_.getTopicStorage(input_topic_id); - if (!in_storage) { - return PJ::unexpected(fmt::format("add_siso_transform: input topic {} not found", input_topic_id)); - } - - // 2. Determine the single leaf column's StorageKind. - // Prefer TypeRegistry (via schema_id). Fall back to the first sealed chunk's - // column_descriptors when schema_id == 0 (e.g. topics created via - // register_scalar_series, which stores schema only in the writer's internal state). - PJ::SchemaId schema_id = in_storage->descriptor().schema_id; - - std::size_t num_cols = 0; - std::optional leaf_primitive; - - if (schema_id != 0) { - const PJ::TypeTreeNode* root = engine_.typeRegistry().lookup(schema_id); - if (root) { - num_cols = PJ::countLeafFields(*root); - leaf_primitive = find_first_leaf(*root); - } - } - - if (num_cols == 0) { - // Fall back 1: inline column layout stored in TopicStorage at registration time. - // This covers schema_id==0 topics (register_scalar_series) with no committed chunks yet. - const auto& stored = in_storage->columnDescriptors(); - if (!stored.empty()) { - num_cols = stored.size(); - leaf_primitive = stored[0].logical_type; - } - } - - if (num_cols == 0) { - // Fall back 2: first committed chunk's columnDescriptors (legacy path). - const auto& chunks = in_storage->sealedChunks(); - if (!chunks.empty() && !chunks[0].columns.empty()) { - num_cols = chunks[0].columns.size(); - leaf_primitive = chunks[0].columns[0].descriptor->logical_type; - } - } - - if (num_cols == 0) { - return PJ::unexpected( - fmt::format( - "add_siso_transform: cannot determine column layout for topic {}" - " (no schema_id, no stored column layout, and no committed chunks)", - input_topic_id)); - } - if (num_cols != 1) { - return PJ::unexpected( - fmt::format("add_siso_transform: SISO requires single-column input, got {} columns", num_cols)); - } - if (!leaf_primitive) { - return PJ::unexpected("add_siso_transform: could not determine leaf primitive type"); - } - StorageKind in_kind = storageKindOf(*leaf_primitive); - - // 3. Determine output kind - StorageKind out_kind = op->outputKind(in_kind); - - // 4. Check output name uniqueness within dataset - auto name_key = std::make_pair(output_dataset_id, output_topic_name); - if (impl_->registered_output_names.contains(name_key)) { - return PJ::unexpected( - fmt::format( - "add_siso_transform: output topic '{}' already registered in dataset {}", output_topic_name, - output_dataset_id)); - } - - // 5. Cycle detection (structurally impossible for SISO fresh output, but guard correctly) - std::string cycle_err = check_cycle(*impl_, {input_topic_id}, {}); // output topic doesn't exist yet - if (!cycle_err.empty()) { - return PJ::unexpected(cycle_err); - } - - // 6. Create output schema (single column, output_kind, name = "value") - PJ::PrimitiveType out_primitive = storage_kind_to_primitive(out_kind); - std::string schema_name = fmt::format("derived_siso_{}_{}", output_topic_name, next_node_id_); - auto out_type_tree = PJ::makePrimitive("value", out_primitive); - auto out_schema_or = engine_.typeRegistry().registerOrGet(schema_name, out_type_tree); - if (!out_schema_or.has_value()) { - return PJ::unexpected(out_schema_or.error()); - } - - // 7. Create output topic - auto out_topic_or = engine_.createTopic( - output_dataset_id, - TopicDescriptor{.name = output_topic_name, .schema_id = *out_schema_or, .dataset_id = output_dataset_id}); - if (!out_topic_or.has_value()) { - return PJ::unexpected(out_topic_or.error()); - } - PJ::TopicId out_topic_id = *out_topic_or; - - // 8. Register node - PJ::NodeId node_id = next_node_id_++; - DerivedNode node; - node.id = node_id; - node.is_mimo = false; - node.siso_input_topic_id = input_topic_id; - node.siso_input_kind = in_kind; - node.siso_output_kind = out_kind; - node.siso_op = std::move(op); - node.all_input_topic_ids = {input_topic_id}; - node.output_topic_ids = {out_topic_id}; - node.dirty = true; - - impl_->registered_output_names[name_key] = out_topic_id; - impl_->topic_to_nodes[input_topic_id].push_back(node_id); - impl_->output_topic_to_node[out_topic_id] = node_id; - - // Update downstream_of: if input_topic_id is produced by another node, register dependency - auto prod_it = impl_->output_topic_to_node.find(input_topic_id); - if (prod_it != impl_->output_topic_to_node.end()) { - impl_->downstream_of[prod_it->second].push_back(node_id); - } - - impl_->nodes[node_id] = std::move(node); - return node_id; -} - -// --------------------------------------------------------------------------- -// add_mimo_transform -// --------------------------------------------------------------------------- - -PJ::Expected DerivedEngine::addMimoTransform( - std::vector input_topic_ids, std::vector output_topic_names, - PJ::DatasetId output_dataset_id, std::unique_ptr op) { - if (input_topic_ids.empty()) { - return PJ::unexpected("add_mimo_transform: requires at least one input topic"); - } - if (output_topic_names.empty()) { - return PJ::unexpected("add_mimo_transform: requires at least one output topic name"); - } - if (!op) { - return PJ::unexpected("add_mimo_transform: null transform op"); - } - - // 1. Validate all inputs and determine their StorageKinds. - // Same 3-tier fallback as add_siso_transform: type_registry → stored - // column_descriptors → first sealed chunk. - std::vector input_kinds; - input_kinds.reserve(input_topic_ids.size()); - - for (PJ::TopicId tid : input_topic_ids) { - const TopicStorage* storage = engine_.getTopicStorage(tid); - if (!storage) { - return PJ::unexpected(fmt::format("add_mimo_transform: input topic {} not found", tid)); - } - - PJ::SchemaId schema_id = storage->descriptor().schema_id; - std::size_t num_cols = 0; - std::optional leaf_primitive; - - if (schema_id != 0) { - const PJ::TypeTreeNode* root = engine_.typeRegistry().lookup(schema_id); - if (root) { - num_cols = PJ::countLeafFields(*root); - leaf_primitive = find_first_leaf(*root); - } - } - if (num_cols == 0) { - const auto& stored = storage->columnDescriptors(); - if (!stored.empty()) { - num_cols = stored.size(); - leaf_primitive = stored[0].logical_type; - } - } - if (num_cols == 0) { - const auto& chunks = storage->sealedChunks(); - if (!chunks.empty() && !chunks[0].columns.empty()) { - num_cols = chunks[0].columns.size(); - leaf_primitive = chunks[0].columns[0].descriptor->logical_type; - } - } - - if (num_cols == 0) { - return PJ::unexpected(fmt::format("add_mimo_transform: cannot determine column layout for input topic {}", tid)); - } - if (num_cols != 1) { - return PJ::unexpected( - fmt::format( - "add_mimo_transform: MIMO requires single-column inputs; topic {} has {} columns", tid, num_cols)); - } - if (!leaf_primitive) { - return PJ::unexpected(fmt::format("add_mimo_transform: cannot determine primitive type for input topic {}", tid)); - } - input_kinds.push_back(storageKindOf(*leaf_primitive)); - } - - // 2. Check output name uniqueness within dataset. - for (const auto& name : output_topic_names) { - auto key = std::make_pair(output_dataset_id, name); - if (impl_->registered_output_names.contains(key)) { - return PJ::unexpected( - fmt::format( - "add_mimo_transform: output topic '{}' already registered in dataset {}", name, output_dataset_id)); - } - } - - // 3. Cycle detection. - { - std::string cycle_err = check_cycle(*impl_, input_topic_ids, {}); - if (!cycle_err.empty()) { - return PJ::unexpected(cycle_err); - } - } - - // 4. Query output StorageKinds from the transform. - std::vector output_kinds = op->outputKinds(PJ::Span(input_kinds)); - if (output_kinds.size() != output_topic_names.size()) { - return PJ::unexpected( - fmt::format( - "add_mimo_transform: op->outputKinds() returned {} kinds but {} output names provided", output_kinds.size(), - output_topic_names.size())); - } - - // 5. Create output schema (single "value" column) and topic for each output. - PJ::NodeId node_id = next_node_id_++; - std::vector out_topic_ids; - out_topic_ids.reserve(output_topic_names.size()); - - for (std::size_t k = 0; k < output_topic_names.size(); ++k) { - PJ::PrimitiveType out_primitive = storage_kind_to_primitive(output_kinds[k]); - std::string schema_name = fmt::format("derived_mimo_{}_{}", node_id, k); - auto out_type_tree = PJ::makePrimitive("value", out_primitive); - auto out_schema_or = engine_.typeRegistry().registerOrGet(schema_name, out_type_tree); - if (!out_schema_or.has_value()) { - return PJ::unexpected(out_schema_or.error()); - } - auto out_topic_or = engine_.createTopic( - output_dataset_id, - TopicDescriptor{.name = output_topic_names[k], .schema_id = *out_schema_or, .dataset_id = output_dataset_id}); - if (!out_topic_or.has_value()) { - return PJ::unexpected(out_topic_or.error()); - } - out_topic_ids.push_back(*out_topic_or); - } - - // 6. Build and register the node. - DerivedNode node; - node.id = node_id; - node.is_mimo = true; - node.mimo_input_topic_ids = input_topic_ids; - node.mimo_input_kinds = std::move(input_kinds); - node.mimo_output_kinds = std::move(output_kinds); - node.mimo_op = std::move(op); - node.mimo_last_ts = std::numeric_limits::min(); - node.all_input_topic_ids = std::move(input_topic_ids); - node.output_topic_ids = std::move(out_topic_ids); - node.dirty = true; - - // Register output names for uniqueness enforcement. - for (std::size_t k = 0; k < output_topic_names.size(); ++k) { - impl_->registered_output_names[std::make_pair(output_dataset_id, output_topic_names[k])] = node.output_topic_ids[k]; - } - - // Map input topics to this node (for dirty propagation via on_source_committed). - for (PJ::TopicId in_tid : node.all_input_topic_ids) { - impl_->topic_to_nodes[in_tid].push_back(node_id); - } - - // Map output topics to this node (for cycle detection of downstream nodes). - for (PJ::TopicId out_tid : node.output_topic_ids) { - impl_->output_topic_to_node[out_tid] = node_id; - } - - // Update downstream_of: if any input is produced by another derived node, - // record that node_id depends on the producer (deduplicated for multi-input). - for (PJ::TopicId in_tid : node.all_input_topic_ids) { - auto prod_it = impl_->output_topic_to_node.find(in_tid); - if (prod_it != impl_->output_topic_to_node.end()) { - auto& list = impl_->downstream_of[prod_it->second]; - if (std::find(list.begin(), list.end(), node_id) == list.end()) { - list.push_back(node_id); - } - } - } - - impl_->nodes[node_id] = std::move(node); - return node_id; -} - -// --------------------------------------------------------------------------- -// Node management -// --------------------------------------------------------------------------- - -PJ::Status DerivedEngine::removeNode(PJ::NodeId id) { - auto it = impl_->nodes.find(id); - if (it == impl_->nodes.end()) { - return PJ::unexpected(fmt::format("remove_node: node {} not found", id)); - } - - const DerivedNode& node = it->second; - - // Remove from topic_to_nodes - for (PJ::TopicId in_tid : node.all_input_topic_ids) { - auto& v = impl_->topic_to_nodes[in_tid]; - v.erase(std::remove(v.begin(), v.end(), id), v.end()); - } - - // Remove from output_topic_to_node and registered_output_names - for (PJ::TopicId out_tid : node.output_topic_ids) { - impl_->output_topic_to_node.erase(out_tid); - // Remove from registered_output_names (scan for the value) - for (auto sit = impl_->registered_output_names.begin(); sit != impl_->registered_output_names.end(); ++sit) { - if (sit->second == out_tid) { - impl_->registered_output_names.erase(sit); - break; - } - } - } - - // Remove from downstream_of - impl_->downstream_of.erase(id); - for (auto dit = impl_->downstream_of.begin(); dit != impl_->downstream_of.end(); ++dit) { - auto& list = dit.value(); - list.erase(std::remove(list.begin(), list.end(), id), list.end()); - } - - impl_->nodes.erase(it); - return PJ::okStatus(); -} - -bool DerivedEngine::hasNode(PJ::NodeId id) const noexcept { - return impl_->nodes.contains(id); -} - -std::vector DerivedEngine::outputTopics(PJ::NodeId id) const { - auto it = impl_->nodes.find(id); - if (it == impl_->nodes.end()) { - return {}; - } - return it->second.output_topic_ids; -} - -// --------------------------------------------------------------------------- -// topological_order — Kahn's algorithm -// --------------------------------------------------------------------------- - -std::vector DerivedEngine::topologicalOrder() const { - tsl::robin_map in_degree; - for (const auto& [id, _] : impl_->nodes) { - in_degree[id] = 0; - } - - for (const auto& [upstream, downstream_list] : impl_->downstream_of) { - for (PJ::NodeId downstream : downstream_list) { - if (impl_->nodes.contains(downstream)) { - in_degree[downstream]++; - } - } - } - - // Seed queue with in-degree 0 nodes (sorted for determinism) - std::vector ready; - ready.reserve(in_degree.size()); - for (const auto& [id, deg] : in_degree) { - if (deg == 0) { - ready.push_back(id); - } - } - std::sort(ready.begin(), ready.end()); - - std::vector order; - order.reserve(impl_->nodes.size()); - std::size_t head = 0; - - while (head < ready.size()) { - PJ::NodeId n = ready[head++]; - order.push_back(n); - - auto it = impl_->downstream_of.find(n); - if (it == impl_->downstream_of.end()) { - continue; - } - - std::vector newly_ready; - for (PJ::NodeId m : it->second) { - if (!impl_->nodes.contains(m)) { - continue; - } - if (--in_degree[m] == 0) { - newly_ready.push_back(m); - } - } - // Keep deterministic order within the newly ready set - std::sort(newly_ready.begin(), newly_ready.end()); - for (PJ::NodeId m : newly_ready) { - ready.push_back(m); - } - } - - return order; -} - -// --------------------------------------------------------------------------- -// on_source_committed -// --------------------------------------------------------------------------- - -void DerivedEngine::onSourceCommitted(PJ::Span changed_topics) { - for (PJ::TopicId tid : changed_topics) { - auto it = impl_->topic_to_nodes.find(tid); - if (it == impl_->topic_to_nodes.end()) { - continue; - } - for (PJ::NodeId nid : it->second) { - auto nit = impl_->nodes.find(nid); - if (nit != impl_->nodes.end()) { - nit.value().dirty = true; - } - } - } -} - -// --------------------------------------------------------------------------- -// run_node_incremental (private helper) -// --------------------------------------------------------------------------- - -static PJ::Status run_siso_incremental(DerivedEngineImpl& /*impl*/, DataEngine& engine, DerivedNode& node) { - const TopicStorage* in_storage = engine.getTopicStorage(node.siso_input_topic_id); - if (!in_storage) { - return PJ::unexpected(fmt::format("run_siso_incremental: input topic {} not found", node.siso_input_topic_id)); - } - - const std::deque& all_chunks = in_storage->sealedChunks(); - - DataWriter writer = engine.createWriter(); - PJ::TopicId out_tid = node.output_topic_ids[0]; - PJ::ChunkId max_seen = node.last_processed_chunk_id; - bool wrote_any = false; - PJ::Timestamp out_ts = 0; - - for (const TopicChunk& chunk : all_chunks) { - if (chunk.id <= node.last_processed_chunk_id) { - continue; - } - max_seen = std::max(max_seen, chunk.id); - - for (uint32_t i = 0; i < chunk.stats.row_count; ++i) { - PJ::Timestamp ts = chunk.timestamps[i]; - node.in_val_buf = decode_as_varvalue(chunk, 0, i, node.siso_input_kind); - - if (node.siso_op->calculate(ts, node.in_val_buf, out_ts, node.out_val_buf)) { - auto s = writer.beginRow(out_tid, out_ts); - if (!s.has_value()) { - return s; - } - write_varvalue(writer, out_tid, 0, node.out_val_buf, node.siso_output_kind); - s = writer.finishRow(out_tid); - if (!s.has_value()) { - return s; - } - wrote_any = true; - } - } - } - - if (wrote_any) { - auto chunks = writer.flushAll(); - engine.commitChunks(std::move(chunks)); - } - - node.last_processed_chunk_id = max_seen; - return PJ::okStatus(); -} - -// --------------------------------------------------------------------------- -// run_mimo_incremental -// --------------------------------------------------------------------------- - -static PJ::Status run_mimo_incremental(DerivedEngineImpl& /*impl*/, DataEngine& engine, DerivedNode& node) { - const std::size_t num_inputs = node.mimo_input_topic_ids.size(); - if (num_inputs == 0) { - return PJ::okStatus(); - } - - // 1. Collect (timestamp, chunk*, row_index) for each input topic, - // only for rows strictly newer than the watermark. - struct SampleLoc { - PJ::Timestamp ts; - const TopicChunk* chunk; - uint32_t row; - }; - std::vector> per_topic(num_inputs); - - for (std::size_t i = 0; i < num_inputs; ++i) { - const TopicStorage* storage = engine.getTopicStorage(node.mimo_input_topic_ids[i]); - if (!storage) { - return PJ::unexpected( - fmt::format("run_mimo_incremental: input topic {} not found", node.mimo_input_topic_ids[i])); - } - for (const TopicChunk& chunk : storage->sealedChunks()) { - if (chunk.stats.t_max <= node.mimo_last_ts) { - continue; // entire chunk already processed - } - for (uint32_t r = 0; r < chunk.stats.row_count; ++r) { - PJ::Timestamp ts = chunk.timestamps[r]; - if (ts <= node.mimo_last_ts) { - continue; - } - per_topic[i].push_back({ts, &chunk, r}); - } - } - // Early exit: if any topic has no new data, no join is possible. - if (per_topic[i].empty()) { - return PJ::okStatus(); - } - } - - // 2. N-way timestamp intersection: find timestamps present in ALL input topics. - // Start from topic 0's sorted timestamps, remove any not in subsequent topics. - std::vector joined_ts; - joined_ts.reserve(per_topic[0].size()); - for (const auto& s : per_topic[0]) { - joined_ts.push_back(s.ts); - } - - for (std::size_t i = 1; i < num_inputs; ++i) { - tsl::robin_set topic_set; - topic_set.reserve(per_topic[i].size()); - for (const auto& s : per_topic[i]) { - topic_set.insert(s.ts); - } - auto new_end = - std::remove_if(joined_ts.begin(), joined_ts.end(), [&](PJ::Timestamp t) { return !topic_set.contains(t); }); - joined_ts.erase(new_end, joined_ts.end()); - if (joined_ts.empty()) { - return PJ::okStatus(); - } - } - - // 2b. Deduplicate joined_ts: if topic[0] has two rows at the same timestamp, - // that timestamp appears twice in joined_ts. We must process it exactly once - // (joined_ts is already sorted because per_topic[0] preserves chunk order). - { - auto new_end = std::unique(joined_ts.begin(), joined_ts.end()); - joined_ts.erase(new_end, joined_ts.end()); - } - if (joined_ts.empty()) { - return PJ::okStatus(); - } - - // 3. Build per-topic lookup: timestamp → (chunk*, row_index). - // insert_or_assign gives last-write-wins semantics for duplicate timestamps - // within a topic, producing a well-defined and consistent result. - std::vector>> lookups(num_inputs); - for (std::size_t i = 0; i < num_inputs; ++i) { - lookups[i].reserve(per_topic[i].size()); - for (const auto& s : per_topic[i]) { - lookups[i].insert_or_assign(s.ts, std::make_pair(s.chunk, s.row)); - } - } - - // 4. Process each joined timestamp: decode, call transform, emit output. - const std::size_t num_outputs = node.output_topic_ids.size(); - node.mimo_in_buf.resize(num_inputs); - node.mimo_out_buf.resize(num_outputs); - - DataWriter writer = engine.createWriter(); - bool wrote_any = false; - - for (PJ::Timestamp ts : joined_ts) { - for (std::size_t i = 0; i < num_inputs; ++i) { - const auto& [chp, row] = lookups[i].at(ts); - node.mimo_in_buf[i] = decode_as_varvalue(*chp, 0, row, node.mimo_input_kinds[i]); - } - - PJ::Timestamp out_ts = ts; - if (node.mimo_op->calculate(ts, node.mimo_in_buf, out_ts, node.mimo_out_buf)) { - for (std::size_t k = 0; k < num_outputs; ++k) { - auto s = writer.beginRow(node.output_topic_ids[k], out_ts); - if (!s.has_value()) { - return s; - } - write_varvalue(writer, node.output_topic_ids[k], 0, node.mimo_out_buf[k], node.mimo_output_kinds[k]); - s = writer.finishRow(node.output_topic_ids[k]); - if (!s.has_value()) { - return s; - } - } - wrote_any = true; - } - } - - if (wrote_any) { - engine.commitChunks(writer.flushAll()); - } - - // Advance watermark to the last joined input timestamp. - // Data is monotonically increasing, so timestamps ≤ joined_ts.back() won't - // produce new joins in the future even if not all of them generated output. - node.mimo_last_ts = joined_ts.back(); - - return PJ::okStatus(); -} - -// --------------------------------------------------------------------------- -// scheduleAll / scheduleActive -// --------------------------------------------------------------------------- - -PJ::Status DerivedEngine::scheduleAll() { - return scheduleActive({}); -} - -PJ::Status DerivedEngine::scheduleActive(const std::unordered_set& active_nodes) { - auto order = topologicalOrder(); - - // Compute the set of nodes to consider (active_nodes ∪ their transitive upstream deps). - tsl::robin_set filter; - if (!active_nodes.empty()) { - std::queue bfs; - for (PJ::NodeId n : active_nodes) { - if (impl_->nodes.contains(n)) { - filter.insert(n); - bfs.push(n); - } - } - while (!bfs.empty()) { - PJ::NodeId curr = bfs.front(); - bfs.pop(); - auto nit = impl_->nodes.find(curr); - if (nit == impl_->nodes.end()) { - continue; - } - for (PJ::TopicId in_tid : nit->second.all_input_topic_ids) { - auto prod_it = impl_->output_topic_to_node.find(in_tid); - if (prod_it == impl_->output_topic_to_node.end()) { - continue; - } - PJ::NodeId prod = prod_it->second; - if (filter.insert(prod).second) { - bfs.push(prod); - } - } - } - } - - for (PJ::NodeId node_id : order) { - if (!active_nodes.empty() && !filter.contains(node_id)) { - continue; - } - - auto& node = impl_->nodes.at(node_id); - if (!node.dirty) { - continue; - } - - PJ::Status s = PJ::okStatus(); - if (!node.is_mimo) { - s = run_siso_incremental(*impl_, engine_, node); - } else { - s = run_mimo_incremental(*impl_, engine_, node); - } - - if (!s.has_value()) { - return s; - } - - node.dirty = false; - - // Propagate dirty to downstream nodes - auto dit = impl_->downstream_of.find(node_id); - if (dit != impl_->downstream_of.end()) { - for (PJ::NodeId downstream : dit->second) { - auto dnit = impl_->nodes.find(downstream); - if (dnit != impl_->nodes.end()) { - dnit.value().dirty = true; - } - } - } - } - - return PJ::okStatus(); -} - -// --------------------------------------------------------------------------- -// recompute_batch -// --------------------------------------------------------------------------- - -PJ::Status DerivedEngine::recompute_batch(PJ::NodeId node_id) { - auto it = impl_->nodes.find(node_id); - if (it == impl_->nodes.end()) { - return PJ::unexpected(fmt::format("recompute_batch: node {} not found", node_id)); - } - DerivedNode& node = it.value(); - - // 1. Clear all output chunks unconditionally. - for (PJ::TopicId out_tid : node.output_topic_ids) { - TopicStorage* storage = engine_.getTopicStorage(out_tid); - if (storage) { - storage->clearChunks(); - } - } - - // 2. Reset transform state - if (!node.is_mimo) { - if (node.siso_op) { - node.siso_op->reset(); - } - } else { - if (node.mimo_op) { - node.mimo_op->reset(); - } - } - - // 3. Reset processed chunk watermark - node.last_processed_chunk_id = 0; - if (node.is_mimo) { - node.mimo_last_ts = std::numeric_limits::min(); - } - - // 4. Full replay - PJ::Status s = PJ::okStatus(); - if (!node.is_mimo) { - s = run_siso_incremental(*impl_, engine_, node); - } else { - s = run_mimo_incremental(*impl_, engine_, node); - } - - if (!s.has_value()) { - return s; - } - node.dirty = false; - return PJ::okStatus(); -} - -} // namespace PJ diff --git a/pj_datastore/src/encoding.cpp b/pj_datastore/src/encoding.cpp deleted file mode 100644 index 023663c7..00000000 --- a/pj_datastore/src/encoding.cpp +++ /dev/null @@ -1,385 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/encoding.hpp" - -#include - -#include - -namespace PJ::encoding { - -using PJ::Span; - -namespace { - -// Write an index in the given byte width -void write_index(RawBuffer& buf, uint32_t index, uint8_t bytes) { - switch (bytes) { - case 1: { - auto v = static_cast(index); - buf.append(&v, sizeof(v)); - break; - } - case 2: { - auto v = static_cast(index); - buf.append(&v, sizeof(v)); - break; - } - default: { - buf.append(&index, sizeof(index)); - break; - } - } -} - -// Read an index at the given byte width -[[nodiscard]] uint32_t read_index(const uint8_t* data, std::size_t row, uint8_t bytes) { - switch (bytes) { - case 1: { - uint8_t v = 0; - std::memcpy(&v, data + row, sizeof(v)); - return v; - } - case 2: { - uint16_t v = 0; - std::memcpy(&v, data + row * 2, sizeof(v)); - return v; - } - default: { - uint32_t v = 0; - std::memcpy(&v, data + row * 4, sizeof(v)); - return v; - } - } -} - -} // namespace - -// --------------------------------------------------------------------------- -// Constant encoding -// --------------------------------------------------------------------------- - -ConstantEncoded constantEncode(Span data, StorageKind kind, std::size_t count) { - ConstantEncoded result; - result.value_kind = kind; - result.count = count; - - const std::size_t esize = storageKindSize(kind); - result.value_size = static_cast(esize); - if (esize > 0 && count > 0) { - std::memcpy(result.value_bytes.data(), data.data(), esize); - } - return result; -} - -double constantDecodeAsDouble(const ConstantEncoded& enc) { - const uint8_t* ptr = enc.value_bytes.data(); - - auto load = [&](const T* /*tag*/) -> double { - T v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - }; - - switch (enc.value_kind) { - case StorageKind::kFloat32: - return load(static_cast(nullptr)); - case StorageKind::kFloat64: - return load(static_cast(nullptr)); - case StorageKind::kInt32: - return load(static_cast(nullptr)); - case StorageKind::kInt64: - return load(static_cast(nullptr)); - case StorageKind::kUint64: - return load(static_cast(nullptr)); - case StorageKind::kBool: - case StorageKind::kString: - break; - } - return 0.0; -} - -int64_t constantDecodeAsInt64(const ConstantEncoded& enc) { - const uint8_t* ptr = enc.value_bytes.data(); - switch (enc.value_kind) { - case StorageKind::kInt32: { - int32_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kInt64: { - int64_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return v; - } - case StorageKind::kUint64: { - uint64_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kFloat32: { - float v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kFloat64: { - double v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - default: - return 0; - } -} - -uint64_t constantDecodeAsUint64(const ConstantEncoded& enc) { - const uint8_t* ptr = enc.value_bytes.data(); - switch (enc.value_kind) { - case StorageKind::kUint64: { - uint64_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return v; - } - case StorageKind::kInt32: { - int32_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kInt64: { - int64_t v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kFloat32: { - float v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - case StorageKind::kFloat64: { - double v{}; - std::memcpy(&v, ptr, sizeof(v)); - return static_cast(v); - } - default: - return 0; - } -} - -// --------------------------------------------------------------------------- -// Frame of Reference encoding -// Data must be int64_t values. -// --------------------------------------------------------------------------- - -FrameOfReferenceEncoded forEncode( - Span data, StorageKind kind, std::size_t count, int64_t min_val, int64_t max_val) { - FrameOfReferenceEncoded result; - result.reference = min_val; - result.count = count; - - const auto range = static_cast(max_val - min_val); - result.offset_bytes = offsetBytesFor(range); - - const std::size_t esize = storageKindSize(kind); - result.offsets.reserve(count * result.offset_bytes); - - for (std::size_t i = 0; i < count; ++i) { - int64_t val{}; - if (kind == StorageKind::kInt32) { - int32_t tmp{}; - std::memcpy(&tmp, data.data() + i * esize, sizeof(tmp)); - val = tmp; // sign-extend - } else { - std::memcpy(&val, data.data() + i * esize, sizeof(val)); - } - const auto offset = static_cast(val - min_val); - - switch (result.offset_bytes) { - case 1: { - auto v = static_cast(offset); - result.offsets.append(&v, sizeof(v)); - break; - } - case 2: { - auto v = static_cast(offset); - result.offsets.append(&v, sizeof(v)); - break; - } - default: { - auto v = static_cast(offset); - result.offsets.append(&v, sizeof(v)); - break; - } - } - } - - return result; -} - -namespace { - -uint64_t for_read_offset(const uint8_t* data, std::size_t row, uint8_t offset_bytes) { - switch (offset_bytes) { - case 1: { - uint8_t v = 0; - std::memcpy(&v, data + row, sizeof(v)); - return v; - } - case 2: { - uint16_t v = 0; - std::memcpy(&v, data + row * 2, sizeof(v)); - return v; - } - default: { - uint32_t v = 0; - std::memcpy(&v, data + row * 4, sizeof(v)); - return v; - } - } -} - -} // namespace - -double forDecodeOneAsDouble(const FrameOfReferenceEncoded& enc, std::size_t row) { - const uint64_t offset = for_read_offset(enc.offsets.data(), row, enc.offset_bytes); - return static_cast(enc.reference) + static_cast(offset); -} - -int64_t forDecodeOneAsInt64(const FrameOfReferenceEncoded& enc, std::size_t row) { - const uint64_t offset = for_read_offset(enc.offsets.data(), row, enc.offset_bytes); - return enc.reference + static_cast(offset); -} - -void forDecodeRangeAsDoubles(const FrameOfReferenceEncoded& enc, Span out, std::size_t row_start) { - const std::size_t count = out.size(); - const double ref = static_cast(enc.reference); - const uint8_t* base = enc.offsets.data(); - - switch (enc.offset_bytes) { - case 1: { - const uint8_t* src = base + row_start; - for (std::size_t i = 0; i < count; ++i) { - out[i] = ref + static_cast(src[i]); - } - break; - } - case 2: { - const uint8_t* src = base + row_start * 2; - for (std::size_t i = 0; i < count; ++i) { - uint16_t v{}; - std::memcpy(&v, src + i * 2, sizeof(v)); - out[i] = ref + static_cast(v); - } - break; - } - default: { - const uint8_t* src = base + row_start * 4; - for (std::size_t i = 0; i < count; ++i) { - uint32_t v{}; - std::memcpy(&v, src + i * 4, sizeof(v)); - out[i] = ref + static_cast(v); - } - break; - } - } -} - -// --------------------------------------------------------------------------- -// Dictionary encoding for strings -// --------------------------------------------------------------------------- - -DictionaryEncoded dictionaryEncodeStrings( - Span offsets_data, Span values_data, std::size_t row_count) { - DictionaryEncoded result; - result.count = row_count; - - if (row_count == 0) { - result.index_bytes = 1; - return result; - } - - // First pass: build dictionary to determine size - tsl::robin_map lookup; - std::vector temp_indices; - temp_indices.reserve(row_count); - - for (std::size_t row = 0; row < row_count; ++row) { - uint32_t start_offset = 0; - uint32_t end_offset = 0; - std::memcpy(&start_offset, offsets_data.data() + row * sizeof(uint32_t), sizeof(uint32_t)); - std::memcpy(&end_offset, offsets_data.data() + (row + 1) * sizeof(uint32_t), sizeof(uint32_t)); - - std::string_view str_view( - reinterpret_cast(values_data.data() + start_offset), end_offset - start_offset); - - std::string key_str(str_view); - auto it = lookup.find(key_str); - uint32_t index = 0; - if (it != lookup.end()) { - index = it->second; - } else { - index = static_cast(result.dictionary.size()); - result.dictionary.push_back(key_str); - lookup[std::move(key_str)] = index; - } - temp_indices.push_back(index); - } - - // Determine narrowed index width - result.index_bytes = indexBytesFor(result.dictionary.size()); - result.indices.reserve(row_count * result.index_bytes); - - for (uint32_t idx : temp_indices) { - write_index(result.indices, idx, result.index_bytes); - } - - return result; -} - -std::string_view dictionaryLookup(const DictionaryEncoded& encoded, std::size_t row) { - uint32_t index = read_index(encoded.indices.data(), row, encoded.index_bytes); - if (index >= encoded.dictionary.size()) { - return {}; - } - return encoded.dictionary[index]; -} - -// --------------------------------------------------------------------------- -// Packed bools -// --------------------------------------------------------------------------- - -PackedBools packBools(Span values) { - const std::size_t count = values.size(); - PackedBools result; - result.count = count; - - if (count == 0) { - return result; - } - - std::size_t num_bytes = (count + 7) / 8; - result.bits.resize(num_bytes); - - uint8_t* bit_data = result.bits.mutable_data(); - std::memset(bit_data, 0, num_bytes); - - for (std::size_t i = 0; i < count; ++i) { - if (values[i] != 0) { - std::size_t byte_idx = i / 8; - std::size_t bit_idx = i % 8; - bit_data[byte_idx] |= static_cast(1u << bit_idx); - } - } - - return result; -} - -bool unpackBool(const PackedBools& packed, std::size_t index) { - std::size_t byte_idx = index / 8; - std::size_t bit_idx = index % 8; - uint8_t byte_val = 0; - std::memcpy(&byte_val, packed.bits.data() + byte_idx, sizeof(uint8_t)); - return (byte_val & (1u << bit_idx)) != 0; -} - -} // namespace PJ::encoding diff --git a/pj_datastore/src/engine.cpp b/pj_datastore/src/engine.cpp deleted file mode 100644 index 693e1d26..00000000 --- a/pj_datastore/src/engine.cpp +++ /dev/null @@ -1,275 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/engine.hpp" - -#include -#include - -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_datastore/reader.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { - -struct DataEngine::Impl { - TypeRegistry type_registry; - PJ::DatasetId next_dataset_id = 1; - PJ::TopicId next_topic_id = 1; - PJ::TimeDomainId next_time_domain_id = 1; - tsl::robin_map datasets; - tsl::robin_map topics; - tsl::robin_map time_domains; -}; - -DataEngine::DataEngine() : impl_(std::make_unique()) {} - -DataEngine::~DataEngine() = default; - -DataEngine::DataEngine(DataEngine&&) noexcept = default; - -DataEngine& DataEngine::operator=(DataEngine&&) noexcept = default; - -// --------------------------------------------------------------------------- -// Dataset management -// --------------------------------------------------------------------------- - -Expected DataEngine::createDataset(DatasetDescriptor descriptor) { - DatasetId id = impl_->next_dataset_id++; - - // Verify time domain exists if specified - if (descriptor.time_domain_id != 0) { - auto it = impl_->time_domains.find(descriptor.time_domain_id); - if (it == impl_->time_domains.end()) { - return PJ::unexpected(fmt::format("Time domain {} not found", descriptor.time_domain_id)); - } - } - - DatasetInfo info; - info.id = id; - info.source_name = std::move(descriptor.source_name); - if (descriptor.time_domain_id != 0) { - info.time_domain = impl_->time_domains.at(descriptor.time_domain_id); - } - impl_->datasets.emplace(id, std::move(info)); - return id; -} - -const DatasetInfo* DataEngine::getDataset(DatasetId id) const { - auto it = impl_->datasets.find(id); - if (it == impl_->datasets.end()) { - return nullptr; - } - return &it->second; -} - -// --------------------------------------------------------------------------- -// Topic management -// --------------------------------------------------------------------------- - -Expected DataEngine::createTopic(DatasetId dataset_id, TopicDescriptor descriptor) { - auto it = impl_->datasets.find(dataset_id); - if (it == impl_->datasets.end()) { - return PJ::unexpected(fmt::format("Dataset {} not found", dataset_id)); - } - - // Validate schema_id if non-zero (zero means inline columns, e.g. scalar series) - if (descriptor.schema_id != 0) { - if (impl_->type_registry.lookup(descriptor.schema_id) == nullptr) { - return PJ::unexpected(fmt::format("Schema {} not found", descriptor.schema_id)); - } - } - - TopicId id = impl_->next_topic_id++; - descriptor.dataset_id = dataset_id; - impl_->topics.emplace( - std::piecewise_construct, std::forward_as_tuple(id), std::forward_as_tuple(id, std::move(descriptor))); - it.value().topic_ids.push_back(id); - return id; -} - -TopicStorage* DataEngine::getTopicStorage(TopicId id) { - auto it = impl_->topics.find(id); - if (it == impl_->topics.end()) { - return nullptr; - } - return &it.value(); -} - -const TopicStorage* DataEngine::getTopicStorage(TopicId id) const { - auto it = impl_->topics.find(id); - if (it == impl_->topics.end()) { - return nullptr; - } - return &it->second; -} - -// --------------------------------------------------------------------------- -// Schema registry -// --------------------------------------------------------------------------- - -TypeRegistry& DataEngine::typeRegistry() { - return impl_->type_registry; -} - -const TypeRegistry& DataEngine::typeRegistry() const { - return impl_->type_registry; -} - -// --------------------------------------------------------------------------- -// Time domains -// --------------------------------------------------------------------------- - -Expected DataEngine::createTimeDomain(std::string name) { - TimeDomainId id = impl_->next_time_domain_id++; - TimeDomain td; - td.id = id; - td.name = std::move(name); - impl_->time_domains.emplace(id, std::move(td)); - return id; -} - -const TimeDomain* DataEngine::getTimeDomain(TimeDomainId id) const { - auto it = impl_->time_domains.find(id); - if (it == impl_->time_domains.end()) { - return nullptr; - } - return &it->second; -} - -void DataEngine::setDisplayOffset(TimeDomainId id, Timestamp offset) { - auto it = impl_->time_domains.find(id); - if (it != impl_->time_domains.end()) { - it.value().display_offset = offset; - } -} - -// --------------------------------------------------------------------------- -// Commit cycle -// --------------------------------------------------------------------------- - -std::vector DataEngine::commitChunks( - std::vector> chunks) { // NOLINT(performance-unnecessary-value-param) - std::vector changed; - for (auto& [topic_id, chunk] : chunks) { - auto* storage = getTopicStorage(topic_id); - if (storage != nullptr) { - auto status = storage->appendSealedChunk(std::move(chunk)); - if (!status.has_value()) { - continue; // chunk rejected (e.g. out-of-order); do not mark topic as changed - } - if (changed.empty() || changed.back() != topic_id) { - changed.push_back(topic_id); - } - } - } - // Deduplicate (flushAll() may emit multiple chunks for one topic). - std::sort(changed.begin(), changed.end()); - changed.erase(std::unique(changed.begin(), changed.end()), changed.end()); - return changed; -} - -void DataEngine::enforceRetention(Timestamp retention_window_ns) { - for (auto it = impl_->topics.begin(); it != impl_->topics.end(); ++it) { - auto& storage = it.value(); - if (!storage.empty()) { - Timestamp t_max = storage.time_max(); - storage.evictBefore(t_max - retention_window_ns); - } - } -} - -Status DataEngine::flushTo(DataEngine& dst) { - if (&dst == this) { - return PJ::unexpected("flushTo: source and destination are the same engine"); - } - - // Phase 1: validate. Walk every src topic with sealed chunks and look up - // the matching dst topic by descriptor (dataset_id + name). Verify - // monotonicity against dst's current time_max. No mutation yet. - struct Step { - TopicStorage* src; - TopicStorage* dst; - }; - std::vector plan; - plan.reserve(impl_->topics.size()); - - for (auto it = impl_->topics.begin(); it != impl_->topics.end(); ++it) { - auto& src_storage = it.value(); - if (src_storage.empty()) { - continue; - } - TopicStorage* dst_storage = nullptr; - for (auto dst_it = dst.impl_->topics.begin(); dst_it != dst.impl_->topics.end(); ++dst_it) { - auto& candidate = dst_it.value(); - if (candidate.descriptor().dataset_id == src_storage.descriptor().dataset_id && - candidate.descriptor().name == src_storage.descriptor().name) { - dst_storage = &candidate; - break; - } - } - if (dst_storage == nullptr) { - return PJ::unexpected( - "flushTo: destination has no topic '" + src_storage.descriptor().name + "' for dataset " + - std::to_string(src_storage.descriptor().dataset_id)); - } - if (!dst_storage->empty() && src_storage.time_min() < dst_storage->time_max()) { - return PJ::unexpected("flushTo: monotonicity violation for topic '" + src_storage.descriptor().name + "'"); - } - plan.push_back({&src_storage, dst_storage}); - } - - // Phase 2: execute. friend access lets us move sealed_chunks_ directly - // between TopicStorage instances of different engines — the deque move - // transfers chunk ownership without copying any column data or value - // buffers. Each chunk's TopicChunkStats (t_min/t_max/row_count) rides - // along inside the chunk by value, so dst's time_min/time_max queries - // reflect the new state immediately after the move. - for (auto& step : plan) { - auto drained = std::move(step.src->sealed_chunks_); - step.src->sealed_chunks_.clear(); // post-move state: deque is valid but empty. - for (auto& chunk : drained) { - step.dst->sealed_chunks_.push_back(std::move(chunk)); - } - } - - return {}; -} - -// --------------------------------------------------------------------------- -// Listing helpers -// --------------------------------------------------------------------------- - -std::vector DataEngine::listDatasets() const { - std::vector result; - result.reserve(impl_->datasets.size()); - for (const auto& [id, info] : impl_->datasets) { - result.push_back(id); - } - return result; -} - -std::vector DataEngine::listTopics(DatasetId dataset_id) const { - auto it = impl_->datasets.find(dataset_id); - if (it == impl_->datasets.end()) { - return {}; - } - return it->second.topic_ids; -} - -// --------------------------------------------------------------------------- -// Writer/Reader factories -// --------------------------------------------------------------------------- - -DataWriter DataEngine::createWriter() { - return DataWriter(*this); -} - -DataReader DataEngine::createReader() const { - return DataReader(*this); -} - -} // namespace PJ diff --git a/pj_datastore/src/object_store.cpp b/pj_datastore/src/object_store.cpp deleted file mode 100644 index f24fc0e9..00000000 --- a/pj_datastore/src/object_store.cpp +++ /dev/null @@ -1,422 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/object_store.hpp" - -#include - -namespace PJ { - -// --- Registration --- - -Expected ObjectStore::registerTopic(const ObjectTopicDescriptor& descriptor) { - std::unique_lock lock(store_mutex_); - for (const auto& [tid, series] : topics_) { - if (series->descriptor.topic_name == descriptor.topic_name && - series->descriptor.dataset_id == descriptor.dataset_id) { - return unexpected("topic already registered: " + descriptor.topic_name); - } - } - ObjectTopicId id{next_id_++}; - auto series = std::make_unique(); - series->descriptor = descriptor; - topics_.emplace_back(id, std::move(series)); - return id; -} - -std::optional ObjectStore::findTopic(DatasetId dataset_id, std::string_view topic_name) const { - std::shared_lock lock(store_mutex_); - for (const auto& [tid, series] : topics_) { - if (series->descriptor.dataset_id == dataset_id && series->descriptor.topic_name == topic_name) { - return tid; - } - } - return std::nullopt; -} - -const ObjectTopicDescriptor& ObjectStore::descriptor(ObjectTopicId id) const { - std::shared_lock lock(store_mutex_); - const auto* s = findSeries(id); - if (s == nullptr) { - static const ObjectTopicDescriptor kEmpty{}; - return kEmpty; - } - return s->descriptor; -} - -std::vector ObjectStore::listTopics() const { - std::shared_lock lock(store_mutex_); - std::vector result; - result.reserve(topics_.size()); - for (const auto& [tid, _] : topics_) { - result.push_back(tid); - } - return result; -} - -std::vector ObjectStore::listTopics(DatasetId dataset_id) const { - std::shared_lock lock(store_mutex_); - std::vector result; - for (const auto& [tid, series] : topics_) { - if (series->descriptor.dataset_id == dataset_id) { - result.push_back(tid); - } - } - return result; -} - -// --- Write --- - -Status ObjectStore::pushOwned(ObjectTopicId id, Timestamp timestamp, std::vector payload) { - std::shared_lock store_lock(store_mutex_); - auto* series = findSeries(id); - if (series == nullptr) { - return unexpected("unknown topic"); - } - - std::unique_lock lock(series->mutex); - if (!series->entry_timestamps.empty() && timestamp < series->entry_timestamps.back()) { - return unexpected("timestamp not monotonically non-decreasing"); - } - - const size_t payload_size = payload.size(); - - auto shared_data = std::make_shared>(std::move(payload)); - - ObjectEntry entry; - entry.timestamp = timestamp; - entry.payload = std::move(shared_data); - series->entries.push_back(std::move(entry)); - series->entry_timestamps.push_back(timestamp); - series->memory_bytes += payload_size; - - applyRetention(*series, timestamp); - return {}; -} - -Status ObjectStore::pushLazy(ObjectTopicId id, Timestamp timestamp, LazyCallback fetch) { - std::shared_lock store_lock(store_mutex_); - auto* series = findSeries(id); - if (series == nullptr) { - return unexpected("unknown topic"); - } - - std::unique_lock lock(series->mutex); - if (!series->entry_timestamps.empty() && timestamp < series->entry_timestamps.back()) { - return unexpected("timestamp not monotonically non-decreasing"); - } - - ObjectEntry entry; - entry.timestamp = timestamp; - entry.payload = std::move(fetch); - series->entries.push_back(std::move(entry)); - series->entry_timestamps.push_back(timestamp); - - applyRetention(*series, timestamp); - return {}; -} - -// --- Read --- - -std::optional ObjectStore::latestAt(ObjectTopicId id, Timestamp timestamp) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return std::nullopt; - } - - std::shared_lock lock(series->mutex); - if (series->entry_timestamps.empty()) { - return std::nullopt; - } - - auto it = std::upper_bound(series->entry_timestamps.begin(), series->entry_timestamps.end(), timestamp); - if (it == series->entry_timestamps.begin()) { - return std::nullopt; - } - --it; - auto idx = static_cast(it - series->entry_timestamps.begin()); - return resolveEntry(series->entries[idx]); -} - -std::optional ObjectStore::at(ObjectTopicId id, size_t index) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return std::nullopt; - } - - std::shared_lock lock(series->mutex); - if (index >= series->entries.size()) { - return std::nullopt; - } - return resolveEntry(series->entries[index]); -} - -std::optional ObjectStore::indexAt(ObjectTopicId id, Timestamp timestamp) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return std::nullopt; - } - - std::shared_lock lock(series->mutex); - if (series->entry_timestamps.empty()) { - return std::nullopt; - } - - auto it = std::upper_bound(series->entry_timestamps.begin(), series->entry_timestamps.end(), timestamp); - if (it == series->entry_timestamps.begin()) { - return std::nullopt; - } - --it; - return static_cast(it - series->entry_timestamps.begin()); -} - -size_t ObjectStore::entryCount(ObjectTopicId id) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return 0; - } - - std::shared_lock lock(series->mutex); - return series->entries.size(); -} - -std::pair ObjectStore::timeRange(ObjectTopicId id) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return {0, 0}; - } - - std::shared_lock lock(series->mutex); - if (series->entry_timestamps.empty()) { - return {0, 0}; - } - return {series->entry_timestamps.front(), series->entry_timestamps.back()}; -} - -EntryTimestampsView ObjectStore::entryTimestamps(ObjectTopicId id) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return {}; - } - - std::shared_lock lock(series->mutex); - return {std::move(lock), &series->entry_timestamps}; -} - -// --- Retention --- - -void ObjectStore::setRetentionBudget(ObjectTopicId id, RetentionBudget budget) { - std::shared_lock store_lock(store_mutex_); - auto* series = findSeries(id); - if (series == nullptr) { - return; - } - std::unique_lock lock(series->mutex); - series->budget = budget; -} - -RetentionBudget ObjectStore::retentionBudget(ObjectTopicId id) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return {}; - } - std::shared_lock lock(series->mutex); - return series->budget; -} - -size_t ObjectStore::memoryUsage(ObjectTopicId id) const { - std::shared_lock store_lock(store_mutex_); - const auto* series = findSeries(id); - if (series == nullptr) { - return 0; - } - std::shared_lock lock(series->mutex); - return series->memory_bytes; -} - -// --- Explicit eviction --- - -void ObjectStore::evictBefore(ObjectTopicId id, Timestamp threshold) { - std::shared_lock store_lock(store_mutex_); - auto* series = findSeries(id); - if (series == nullptr) { - return; - } - std::unique_lock lock(series->mutex); - while (!series->entries.empty() && series->entry_timestamps.front() < threshold) { - evictFront(*series); - } -} - -void ObjectStore::evictAllBefore(Timestamp threshold) { - std::shared_lock store_lock(store_mutex_); - for (auto& [tid, series] : topics_) { - std::unique_lock lock(series->mutex); - while (!series->entries.empty() && series->entry_timestamps.front() < threshold) { - evictFront(*series); - } - } -} - -// --- Cross-store flush --- - -Status ObjectStore::flushTo(ObjectStore& dst) { - if (&dst == this) { - return unexpected("flushTo: source and destination are the same store"); - } - - // Deterministic lock order by address to avoid deadlock with concurrent flushTo calls. - ObjectStore* first = this < &dst ? this : &dst; - ObjectStore* second = first == this ? &dst : this; - std::unique_lock first_lock(first->store_mutex_); - std::unique_lock second_lock(second->store_mutex_); - - // Phase 1: validate every source series can be matched to a destination - // topic by descriptor and that the move respects monotonicity. No mutation. - struct Step { - ObjectSeries* src; - ObjectSeries* dst; - }; - std::vector plan; - plan.reserve(topics_.size()); - - for (auto& [src_id, src_series] : topics_) { - if (src_series->entry_timestamps.empty()) { - continue; - } - ObjectSeries* dst_series = nullptr; - for (auto& [dst_id, dst_series_ptr] : dst.topics_) { - if (dst_series_ptr->descriptor.dataset_id == src_series->descriptor.dataset_id && - dst_series_ptr->descriptor.topic_name == src_series->descriptor.topic_name) { - dst_series = dst_series_ptr.get(); - break; - } - } - if (dst_series == nullptr) { - return unexpected( - "flushTo: destination has no topic '" + src_series->descriptor.topic_name + "' for dataset " + - std::to_string(src_series->descriptor.dataset_id)); - } - if (!dst_series->entry_timestamps.empty() && - src_series->entry_timestamps.front() < dst_series->entry_timestamps.back()) { - return unexpected("flushTo: monotonicity violation for topic '" + src_series->descriptor.topic_name + "'"); - } - plan.push_back({src_series.get(), dst_series}); - } - - // Phase 2: execute the moves. Holding both store_mutex_ unique means no - // other reader or writer can observe an intermediate state; per-series - // mutexes are not needed because no concurrent access can occur. - for (auto& step : plan) { - for (auto& entry : step.src->entries) { - step.dst->entries.push_back(std::move(entry)); - } - step.dst->entry_timestamps.insert( - step.dst->entry_timestamps.end(), step.src->entry_timestamps.begin(), step.src->entry_timestamps.end()); - step.dst->memory_bytes += step.src->memory_bytes; - - step.src->entries.clear(); - step.src->entry_timestamps.clear(); - step.src->memory_bytes = 0; - - const Timestamp newest = step.dst->entry_timestamps.empty() ? 0 : step.dst->entry_timestamps.back(); - applyRetention(*step.dst, newest); - } - - return {}; -} - -// --- Lifecycle --- - -void ObjectStore::removeTopic(ObjectTopicId id) { - std::unique_lock lock(store_mutex_); - auto it = std::find_if(topics_.begin(), topics_.end(), [&](const auto& pair) { return pair.first == id; }); - if (it != topics_.end()) { - topics_.erase(it); - } -} - -void ObjectStore::clear() { - std::unique_lock lock(store_mutex_); - topics_.clear(); - next_id_ = 1; -} - -// --- Private helpers --- - -ObjectStore::ObjectSeries* ObjectStore::findSeries(ObjectTopicId id) { - for (auto& [tid, series] : topics_) { - if (tid == id) { - return series.get(); - } - } - return nullptr; -} - -const ObjectStore::ObjectSeries* ObjectStore::findSeries(ObjectTopicId id) const { - for (const auto& [tid, series] : topics_) { - if (tid == id) { - return series.get(); - } - } - return nullptr; -} - -ResolvedObjectEntry ObjectStore::resolveEntry(const ObjectEntry& entry) { - ResolvedObjectEntry resolved; - resolved.timestamp = entry.timestamp; - - if (const auto* owned = std::get_if(&entry.payload)) { - // Span the vector, anchor on the same shared_ptr — refcount bump, no copy. - // A default-constructed entry holds a null SharedBuffer, so guard it. - if (*owned) { - resolved.payload = sdk::PayloadView{ - Span{(*owned)->data(), (*owned)->size()}, - sdk::BufferAnchor{*owned}, - }; - } - } else if (const auto* lazy = std::get_if(&entry.payload)) { - // Forward the closure's PayloadView verbatim. The anchor stays opaque (no - // cast), so producers can back it with arrow::Buffer, mmap, or a C-ABI anchor. - resolved.payload = (*lazy)(); - } - - return resolved; -} - -void ObjectStore::evictFront(ObjectSeries& series) { - if (series.entries.empty()) { - return; - } - - const auto& front = series.entries.front(); - if (const auto* owned = std::get_if(&front.payload); owned != nullptr && *owned) { - series.memory_bytes -= (*owned)->size(); - } - - series.entries.pop_front(); - series.entry_timestamps.erase(series.entry_timestamps.begin()); -} - -void ObjectStore::applyRetention(ObjectSeries& series, Timestamp newest_ts) { - if (series.budget.time_window_ns > 0) { - Timestamp threshold = newest_ts - series.budget.time_window_ns; - while (!series.entries.empty() && series.entry_timestamps.front() < threshold) { - evictFront(series); - } - } - if (series.budget.max_memory_bytes > 0) { - while (!series.entries.empty() && series.memory_bytes > series.budget.max_memory_bytes) { - evictFront(series); - } - } -} - -} // namespace PJ diff --git a/pj_datastore/src/plugin_data_host.cpp b/pj_datastore/src/plugin_data_host.cpp deleted file mode 100644 index ced50479..00000000 --- a/pj_datastore/src/plugin_data_host.cpp +++ /dev/null @@ -1,1826 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/plugin_data_host.hpp" - -#include -#include -#include - -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include -#include - -#include "nanoarrow/nanoarrow.h" -#include "nanoarrow/nanoarrow.hpp" -#include "pj_base/dataset.hpp" -#include "pj_base/plugin_data_api.h" -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_datastore/arrow_import.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/encoding.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -using DataSourceHandle = PJ_data_source_handle_t; -using TopicHandle = PJ_topic_handle_t; -using FieldHandle = PJ_field_handle_t; - -[[nodiscard]] std::string_view toStringView(PJ_string_view_t view) { - return std::string_view(view.data == nullptr ? "" : view.data, view.size); -} - -[[nodiscard]] Expected fromAbiType(PJ_primitive_type_t type) { - const auto raw = static_cast(type); - if (raw > static_cast(PrimitiveType::kString)) { - return unexpected(fmt::format("unsupported primitive type value {}", raw)); - } - return static_cast(type); -} - -template -[[nodiscard]] T loadFromBytes(const uint8_t* data) { - T value{}; - std::memcpy(&value, data, sizeof(T)); - return value; -} - -[[nodiscard]] uint64_t readForOffset(const encoding::FrameOfReferenceEncoded& enc, std::size_t row) { - const uint8_t* data = enc.offsets.data(); - switch (enc.offset_bytes) { - case 1: - return loadFromBytes(data + row); - case 2: - return loadFromBytes(data + row * 2); - default: - return loadFromBytes(data + row * 4); - } -} - -template -[[nodiscard]] T decodeNumericExact(const TopicChunk& chunk, std::size_t col_index, std::size_t row) { - switch (chunk.columnEncoding(col_index)) { - case EncodingType::kConstant: { - const auto& enc = std::get(chunk.columns[col_index].data); - return loadFromBytes(enc.value_bytes.data()); - } - case EncodingType::kFrameOfReference: { - const auto& enc = std::get(chunk.columns[col_index].data); - const uint64_t offset = readForOffset(enc, row); - return static_cast(enc.reference + static_cast(offset)); - } - case EncodingType::kRaw: { - const StorageKind kind = storageKindOf(chunk.columns[col_index].descriptor->logical_type); - const uint8_t* base = std::get(chunk.columns[col_index].data).data(); - switch (kind) { - case StorageKind::kFloat32: - return static_cast(loadFromBytes(base + row * sizeof(float))); - case StorageKind::kFloat64: - return static_cast(loadFromBytes(base + row * sizeof(double))); - case StorageKind::kInt32: - return static_cast(loadFromBytes(base + row * sizeof(int32_t))); - case StorageKind::kInt64: - return static_cast(loadFromBytes(base + row * sizeof(int64_t))); - case StorageKind::kUint64: - return static_cast(loadFromBytes(base + row * sizeof(uint64_t))); - case StorageKind::kBool: - return static_cast(chunk.readBool(col_index, row)); - case StorageKind::kString: - return T{}; - } - return T{}; - } - default: - return T{}; - } -} - -void flattenColumnsImpl( - const TypeTreeNode& node, std::string_view prefix, FieldId& next_id, std::vector& out) { - std::string path = prefix.empty() ? node.name : fmt::format("{}.{}", prefix, node.name); - switch (node.kind) { - case TypeKind::kPrimitive: { - ColumnDescriptor desc; - desc.field_id = next_id++; - desc.logical_type = node.primitive_type.value_or(PrimitiveType::kFloat64); - desc.field_path = std::move(path); - out.push_back(std::move(desc)); - return; - } - case TypeKind::kEnum: { - ColumnDescriptor desc; - desc.field_id = next_id++; - desc.logical_type = node.primitive_type.value_or(PrimitiveType::kInt32); - desc.field_path = std::move(path); - out.push_back(std::move(desc)); - return; - } - case TypeKind::kStruct: - for (const auto& child : node.children) { - flattenColumnsImpl(*child, path, next_id, out); - } - return; - case TypeKind::kArray: - return; - } -} - -[[nodiscard]] std::vector buildSchemaColumns(const TypeTreeNode& root) { - std::vector result; - FieldId next_id = 0; - if (root.kind == TypeKind::kStruct) { - for (const auto& child : root.children) { - flattenColumnsImpl(*child, "", next_id, result); - } - } else { - flattenColumnsImpl(root, "", next_id, result); - } - return result; -} - -[[nodiscard]] std::vector effectiveColumns(const DataEngine& engine, const TopicStorage& storage) { - const auto& stored = storage.columnDescriptors(); - if (!stored.empty()) { - return stored; - } - if (const auto* type_tree = engine.typeRegistry().lookup(storage.descriptor().schema_id)) { - return buildSchemaColumns(*type_tree); - } - const auto& chunks = storage.sealedChunks(); - if (!chunks.empty()) { - std::vector result; - result.reserve(chunks.front().columns.size()); - for (const auto& col : chunks.front().columns) { - result.push_back(*col.descriptor); - } - return result; - } - return {}; -} - -[[nodiscard]] const ColumnDescriptor* findFieldDescriptor( - const std::vector& columns, FieldId field_id) { - for (const auto& col : columns) { - if (col.field_id == field_id) { - return &col; - } - } - return nullptr; -} - -} // namespace - -struct WriteCore { - explicit WriteCore(DataEngine& engine) : engine_(engine), writer_(engine.createWriter()) {} - - DataEngine& engine_; - DataWriter writer_; - std::string last_error_; - - struct DatasetTopicKey { - DatasetId dataset_id; - std::string topic_name; - - friend bool operator==(const DatasetTopicKey& a, const DatasetTopicKey& b) { - return a.dataset_id == b.dataset_id && a.topic_name == b.topic_name; - } - }; - - struct DatasetTopicKeyHash { - std::size_t operator()(const DatasetTopicKey& key) const noexcept { - std::size_t h1 = std::hash{}(key.dataset_id); - std::size_t h2 = std::hash{}(key.topic_name); - return h1 ^ (h2 << 1); - } - }; - - struct TopicFieldKey { - TopicId topic_id; - std::string field_name; - - friend bool operator==(const TopicFieldKey& a, const TopicFieldKey& b) { - return a.topic_id == b.topic_id && a.field_name == b.field_name; - } - }; - - struct TopicFieldKeyHash { - std::size_t operator()(const TopicFieldKey& key) const noexcept { - std::size_t h1 = std::hash{}(key.topic_id); - std::size_t h2 = std::hash{}(key.field_name); - return h1 ^ (h2 << 1); - } - }; - - struct TopicFieldIdKey { - TopicId topic_id; - FieldId field_id; - - friend bool operator==(const TopicFieldIdKey& a, const TopicFieldIdKey& b) { - return a.topic_id == b.topic_id && a.field_id == b.field_id; - } - }; - - struct TopicFieldIdKeyHash { - std::size_t operator()(const TopicFieldIdKey& key) const noexcept { - std::size_t h1 = std::hash{}(key.topic_id); - std::size_t h2 = std::hash{}(key.field_id); - return h1 ^ (h2 << 1); - } - }; - - tsl::robin_map topic_cache_; - tsl::robin_map field_cache_; - tsl::robin_map field_types_; - - void setError(std::string message) { - last_error_ = std::move(message); - } - - [[nodiscard]] const char* lastError() const { - return last_error_.empty() ? nullptr : last_error_.c_str(); - } - - [[nodiscard]] bool createDataSource(std::string_view name, DataSourceHandle* out_source) { - auto id_or = engine_.createDataset(DatasetDescriptor{.source_name = std::string(name), .time_domain_id = 0}); - if (!id_or.has_value()) { - setError(id_or.error()); - return false; - } - *out_source = DataSourceHandle{.id = *id_or}; - last_error_.clear(); - return true; - } - - [[nodiscard]] bool ensureTopic(DataSourceHandle source, std::string_view topic_name, TopicHandle* out_topic) { - const auto* dataset = engine_.getDataset(source.id); - if (dataset == nullptr) { - setError(fmt::format("data source {} not found", source.id)); - return false; - } - - DatasetTopicKey key{.dataset_id = source.id, .topic_name = std::string(topic_name)}; - if (auto it = topic_cache_.find(key); it != topic_cache_.end()) { - *out_topic = it->second; - last_error_.clear(); - return true; - } - - auto topic_ids = engine_.listTopics(source.id); - std::sort(topic_ids.begin(), topic_ids.end()); - for (TopicId tid : topic_ids) { - const auto* storage = engine_.getTopicStorage(tid); - if (storage != nullptr && storage->descriptor().name == topic_name) { - *out_topic = TopicHandle{.id = tid}; - topic_cache_.emplace(std::move(key), *out_topic); - last_error_.clear(); - return true; - } - } - - TopicDescriptor desc; - desc.name = std::string(topic_name); - desc.schema_id = 0; - auto tid_or = writer_.registerTopic(source.id, std::move(desc)); - if (!tid_or.has_value()) { - setError(tid_or.error()); - return false; - } - - *out_topic = TopicHandle{.id = *tid_or}; - topic_cache_.emplace(std::move(key), *out_topic); - last_error_.clear(); - return true; - } - - [[nodiscard]] bool lookupFieldType(TopicHandle topic, FieldId field_id, PrimitiveType* out_type) { - const TopicFieldIdKey key{.topic_id = topic.id, .field_id = field_id}; - if (auto it = field_types_.find(key); it != field_types_.end()) { - *out_type = it->second; - return true; - } - - const auto* storage = engine_.getTopicStorage(topic.id); - if (storage == nullptr) { - setError(fmt::format("topic {} not found", topic.id)); - return false; - } - const auto columns = effectiveColumns(engine_, *storage); - const auto* desc = findFieldDescriptor(columns, field_id); - if (desc == nullptr) { - setError(fmt::format("field {} not found in topic {}", field_id, topic.id)); - return false; - } - - *out_type = desc->logical_type; - field_types_[key] = desc->logical_type; - field_cache_[{.topic_id = topic.id, .field_name = desc->field_path}] = FieldHandle{.topic = topic, .id = field_id}; - return true; - } - - [[nodiscard]] bool ensureField( - TopicHandle topic, std::string_view field_name, PJ_primitive_type_t abi_type, FieldHandle* out_field) { - const auto* storage = engine_.getTopicStorage(topic.id); - if (storage == nullptr) { - setError(fmt::format("topic {} not found", topic.id)); - return false; - } - - auto type_or = fromAbiType(abi_type); - if (!type_or.has_value()) { - setError(type_or.error()); - return false; - } - const PrimitiveType type = *type_or; - - TopicFieldKey key{.topic_id = topic.id, .field_name = std::string(field_name)}; - if (auto it = field_cache_.find(key); it != field_cache_.end()) { - PrimitiveType existing{}; - if (!lookupFieldType(topic, it->second.id, &existing)) { - return false; - } - if (existing != type) { - setError(fmt::format("field '{}' already exists with a different type", field_name)); - return false; - } - *out_field = it->second; - last_error_.clear(); - return true; - } - - auto field_id_or = writer_.ensureColumn(topic.id, field_name, type); - if (!field_id_or.has_value()) { - setError(field_id_or.error()); - return false; - } - - *out_field = FieldHandle{.topic = topic, .id = *field_id_or}; - field_cache_.emplace(std::move(key), *out_field); - field_types_[{.topic_id = topic.id, .field_id = *field_id_or}] = type; - last_error_.clear(); - return true; - } - - [[nodiscard]] bool validateScalar(const PJ_scalar_value_t& value, PrimitiveType expected, std::string_view where) { - auto actual_or = fromAbiType(value.type); - if (!actual_or.has_value()) { - setError(actual_or.error()); - return false; - } - if (*actual_or != expected) { - setError(fmt::format("{}: scalar type mismatch", where)); - return false; - } - return true; - } - - void setFieldValue( - TopicId topic_id, std::size_t col_index, PrimitiveType logical_type, const PJ_scalar_value_t& value) { - switch (logical_type) { - case PrimitiveType::kFloat32: - writer_.set(topic_id, col_index, value.data.as_float32); - break; - case PrimitiveType::kFloat64: - writer_.set(topic_id, col_index, value.data.as_float64); - break; - case PrimitiveType::kInt8: - writer_.set(topic_id, col_index, static_cast(value.data.as_int8)); - break; - case PrimitiveType::kInt16: - writer_.set(topic_id, col_index, static_cast(value.data.as_int16)); - break; - case PrimitiveType::kInt32: - writer_.set(topic_id, col_index, value.data.as_int32); - break; - case PrimitiveType::kInt64: - writer_.set(topic_id, col_index, value.data.as_int64); - break; - case PrimitiveType::kUint8: - writer_.set(topic_id, col_index, static_cast(value.data.as_uint8)); - break; - case PrimitiveType::kUint16: - writer_.set(topic_id, col_index, static_cast(value.data.as_uint16)); - break; - case PrimitiveType::kUint32: - writer_.set(topic_id, col_index, static_cast(value.data.as_uint32)); - break; - case PrimitiveType::kUint64: - writer_.set(topic_id, col_index, value.data.as_uint64); - break; - case PrimitiveType::kBool: - writer_.set(topic_id, col_index, value.data.as_bool != 0); - break; - case PrimitiveType::kString: - writer_.set(topic_id, col_index, toStringView(value.data.as_string)); - break; - case PrimitiveType::kUnspecified: - break; - } - } - - [[nodiscard]] bool appendRecord( - TopicHandle topic, Timestamp timestamp, const PJ_named_field_value_t* fields, std::size_t field_count) { - if (engine_.getTopicStorage(topic.id) == nullptr) { - setError(fmt::format("topic {} not found", topic.id)); - return false; - } - - tsl::robin_set seen_names; - struct ResolvedField { - FieldHandle handle; - PrimitiveType type; - const PJ_named_field_value_t* raw; - }; - std::vector resolved; - resolved.reserve(field_count); - for (std::size_t i = 0; i < field_count; ++i) { - const auto& field = fields[i]; - const auto name = toStringView(field.name); - if (!seen_names.insert(name).second) { - setError(fmt::format("duplicate field name '{}'", name)); - return false; - } - if (field.is_null) { - // Null values: look up existing field by name. - TopicFieldKey key{.topic_id = topic.id, .field_name = std::string(name)}; - auto it = field_cache_.find(key); - if (it == field_cache_.end()) { - // Field has never been seen. Check if this is a typed null (the ABI - // carries value.type even when is_null is true). A valid type lets - // us create the column now; an untyped null (kNull) is silently - // skipped — the column will be created when a non-null value arrives. - auto type_or = fromAbiType(field.value.type); - if (type_or.has_value()) { - FieldHandle handle{}; - if (!ensureField(topic, name, field.value.type, &handle)) { - return false; - } - resolved.push_back({handle, *type_or, &field}); - } - continue; - } - PrimitiveType existing{}; - if (!lookupFieldType(topic, it->second.id, &existing)) { - return false; - } - resolved.push_back({it->second, existing, &field}); - } else { - auto type_or = fromAbiType(field.value.type); - if (!type_or.has_value()) { - setError(type_or.error()); - return false; - } - FieldHandle handle{}; - if (!ensureField(topic, name, field.value.type, &handle)) { - return false; - } - if (!validateScalar(field.value, *type_or, "appendRecord")) { - return false; - } - resolved.push_back({handle, *type_or, &field}); - } - } - - auto begin_status = writer_.beginRow(topic.id, timestamp); - if (!begin_status.has_value()) { - setError(begin_status.error()); - return false; - } - for (const auto& field : resolved) { - if (field.raw->is_null) { - writer_.setNull(topic.id, static_cast(field.handle.id)); - } else { - setFieldValue(topic.id, static_cast(field.handle.id), field.type, field.raw->value); - } - } - auto finish_status = writer_.finishRow(topic.id); - if (!finish_status.has_value()) { - setError(finish_status.error()); - return false; - } - last_error_.clear(); - return true; - } - - [[nodiscard]] bool appendBoundRecord( - TopicHandle topic, Timestamp timestamp, const PJ_bound_field_value_t* fields, std::size_t field_count) { - if (engine_.getTopicStorage(topic.id) == nullptr) { - setError(fmt::format("topic {} not found", topic.id)); - return false; - } - - tsl::robin_set seen_ids; - struct ResolvedField { - PrimitiveType type; - const PJ_bound_field_value_t* raw; - }; - std::vector resolved; - resolved.reserve(field_count); - for (std::size_t i = 0; i < field_count; ++i) { - const auto& field = fields[i]; - if (field.field.topic.id != topic.id) { - setError("field handle does not belong to the target topic"); - return false; - } - if (!seen_ids.insert(field.field.id).second) { - setError(fmt::format("duplicate field id {}", field.field.id)); - return false; - } - PrimitiveType type{}; - if (!lookupFieldType(topic, field.field.id, &type)) { - return false; - } - if (!field.is_null && !validateScalar(field.value, type, "appendBoundRecord")) { - return false; - } - resolved.push_back({type, &field}); - } - - auto begin_status = writer_.beginRow(topic.id, timestamp); - if (!begin_status.has_value()) { - setError(begin_status.error()); - return false; - } - for (const auto& field : resolved) { - if (field.raw->is_null) { - writer_.setNull(topic.id, static_cast(field.raw->field.id)); - } else { - setFieldValue(topic.id, static_cast(field.raw->field.id), field.type, field.raw->value); - } - } - auto finish_status = writer_.finishRow(topic.id); - if (!finish_status.has_value()) { - setError(finish_status.error()); - return false; - } - last_error_.clear(); - return true; - } - - /// Ingest a whole Arrow C Data Interface stream into a topic. - /// - /// Ownership contract: callers pass a producer-owned @p stream. The caller - /// decides whether to release after this call — this method does NOT - /// call stream->release. That lets the outermost ABI trampoline enforce - /// the "success releases, failure retains" rule uniformly. - [[nodiscard]] bool appendArrowStream( - TopicHandle topic, struct ArrowArrayStream* stream, PJ_string_view_t timestamp_column) { - if (stream == nullptr) { - setError("append_arrow_stream: null stream"); - return false; - } - if (engine_.getTopicStorage(topic.id) == nullptr) { - setError(fmt::format("topic {} not found", topic.id)); - return false; - } - - auto schema_or = arrow_import::schemaFromArrowStream(stream); - if (!schema_or.has_value()) { - setError(schema_or.error()); - return false; - } - - const std::string_view timestamp_name = toStringView(timestamp_column); - int ts_arrow_col = -1; - std::vector mappings; - for (const auto& mapping : schema_or->second) { - if (!timestamp_name.empty() && mapping.field_name == timestamp_name) { - ts_arrow_col = mapping.arrow_column_index; - continue; - } - - FieldHandle field{}; - if (!ensureField(topic, mapping.field_name, static_cast(mapping.pj_type), &field)) { - return false; - } - auto adjusted = mapping; - adjusted.pj_column_index = static_cast(field.id); - mappings.push_back(std::move(adjusted)); - } - - if (!timestamp_name.empty() && ts_arrow_col < 0) { - setError(fmt::format("timestamp column '{}' not found in stream schema", timestamp_name)); - return false; - } - - auto status = arrow_import::importArrowStream(writer_, topic.id, stream, mappings, ts_arrow_col); - if (!status.has_value()) { - setError(status.error()); - return false; - } - last_error_.clear(); - return true; - } - - void flushPending() { - auto flushed = writer_.flushAll(); - if (!flushed.empty()) { - engine_.commitChunks(std::move(flushed)); - } - } -}; - -struct CatalogSnapshotState { - std::deque names; - std::vector data_sources; - std::vector topics; - std::vector fields; -}; - -void releaseCatalogSnapshot(void* ctx) { - delete static_cast(ctx); -} - -PJ_string_view_t storeString(CatalogSnapshotState& state, std::string_view value) { - state.names.emplace_back(value); - const auto& stored = state.names.back(); - return PJ_string_view_t{stored.data(), stored.size()}; -} - -struct ToolboxCore { - explicit ToolboxCore(DataEngine& engine) : write(engine), engine_(engine) {} - - WriteCore write; - DataEngine& engine_; - - [[nodiscard]] bool acquireCatalogSnapshot(PJ_catalog_snapshot_t* out_snapshot) { - auto* state = new CatalogSnapshotState{}; - auto dataset_ids = engine_.listDatasets(); - std::sort(dataset_ids.begin(), dataset_ids.end()); - state->data_sources.reserve(dataset_ids.size()); - - for (DatasetId ds_id : dataset_ids) { - const auto* dataset = engine_.getDataset(ds_id); - if (dataset == nullptr) { - continue; - } - - const uint32_t first_topic = static_cast(state->topics.size()); - auto topic_ids = engine_.listTopics(ds_id); - std::sort(topic_ids.begin(), topic_ids.end()); - for (TopicId tid : topic_ids) { - const auto* storage = engine_.getTopicStorage(tid); - if (storage == nullptr) { - continue; - } - const uint32_t first_field = static_cast(state->fields.size()); - const auto columns = effectiveColumns(engine_, *storage); - for (const auto& col : columns) { - state->fields.push_back( - PJ_field_info_t{ - .handle = FieldHandle{.topic = TopicHandle{.id = tid}, .id = col.field_id}, - .name = storeString(*state, col.field_path), - .type = static_cast(col.logical_type), - }); - } - state->topics.push_back( - PJ_topic_info_t{ - .handle = TopicHandle{.id = tid}, - .source = DataSourceHandle{.id = ds_id}, - .name = storeString(*state, storage->descriptor().name), - .first_field = first_field, - .field_count = static_cast(state->fields.size()) - first_field, - }); - } - - state->data_sources.push_back( - PJ_data_source_info_t{ - .handle = DataSourceHandle{.id = ds_id}, - .name = storeString(*state, dataset->source_name), - .first_topic = first_topic, - .topic_count = static_cast(state->topics.size()) - first_topic, - }); - } - - *out_snapshot = PJ_catalog_snapshot_t{ - .data_sources = state->data_sources.data(), - .data_source_count = state->data_sources.size(), - .topics = state->topics.data(), - .topic_count = state->topics.size(), - .fields = state->fields.data(), - .field_count = state->fields.size(), - .release_ctx = state, - .release = releaseCatalogSnapshot, - }; - write.last_error_.clear(); - return true; - } - - // v4: materialise one field's time series into host-owned Arrow structs. - // Output is a struct array with 2 columns: ["timestamp" (int64), - // (typed)]. The caller must invoke out_schema->release and - // out_array->release when done; release callbacks are set by nanoarrow - // and free all allocated buffers. - [[nodiscard]] bool readSeriesArrow(FieldHandle field, struct ArrowSchema* out_schema, struct ArrowArray* out_array) { - if (out_schema == nullptr || out_array == nullptr) { - write.setError("readSeriesArrow: out_schema and out_array must be non-null"); - return false; - } - - const auto* storage = engine_.getTopicStorage(field.topic.id); - if (storage == nullptr) { - write.setError(fmt::format("topic {} not found", field.topic.id)); - return false; - } - const auto columns = effectiveColumns(engine_, *storage); - const auto* desc = findFieldDescriptor(columns, field.id); - if (desc == nullptr) { - write.setError(fmt::format("field {} not found in topic {}", field.id, field.topic.id)); - return false; - } - - const ArrowType value_arrow_type = [&]() { - switch (desc->logical_type) { - case PrimitiveType::kFloat32: - return NANOARROW_TYPE_FLOAT; - case PrimitiveType::kFloat64: - return NANOARROW_TYPE_DOUBLE; - case PrimitiveType::kInt8: - return NANOARROW_TYPE_INT8; - case PrimitiveType::kInt16: - return NANOARROW_TYPE_INT16; - case PrimitiveType::kInt32: - return NANOARROW_TYPE_INT32; - case PrimitiveType::kInt64: - return NANOARROW_TYPE_INT64; - case PrimitiveType::kUint8: - return NANOARROW_TYPE_UINT8; - case PrimitiveType::kUint16: - return NANOARROW_TYPE_UINT16; - case PrimitiveType::kUint32: - return NANOARROW_TYPE_UINT32; - case PrimitiveType::kUint64: - return NANOARROW_TYPE_UINT64; - case PrimitiveType::kBool: - return NANOARROW_TYPE_BOOL; - case PrimitiveType::kString: - return NANOARROW_TYPE_STRING; - case PrimitiveType::kUnspecified: - return NANOARROW_TYPE_NA; - } - return NANOARROW_TYPE_NA; - }(); - - nanoarrow::UniqueSchema schema; - ArrowSchemaInit(schema.get()); - if (ArrowSchemaSetTypeStruct(schema.get(), 2) != NANOARROW_OK) { - write.setError("readSeriesArrow: ArrowSchemaSetTypeStruct failed"); - return false; - } - ArrowSchemaInit(schema->children[0]); - if (ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT64) != NANOARROW_OK || - ArrowSchemaSetName(schema->children[0], "timestamp") != NANOARROW_OK) { - write.setError("readSeriesArrow: failed to set timestamp child schema"); - return false; - } - ArrowSchemaInit(schema->children[1]); - if (ArrowSchemaSetType(schema->children[1], value_arrow_type) != NANOARROW_OK || - ArrowSchemaSetName(schema->children[1], desc->field_path.c_str()) != NANOARROW_OK) { - write.setError("readSeriesArrow: failed to set value child schema"); - return false; - } - - nanoarrow::UniqueArray array; - ArrowError arrow_err; - if (ArrowArrayInitFromSchema(array.get(), schema.get(), &arrow_err) != NANOARROW_OK) { - write.setError(std::string("readSeriesArrow: ArrowArrayInitFromSchema failed: ") + arrow_err.message); - return false; - } - if (ArrowArrayStartAppending(array.get()) != NANOARROW_OK) { - write.setError("readSeriesArrow: ArrowArrayStartAppending failed"); - return false; - } - - auto* ts_child = array->children[0]; - auto* val_child = array->children[1]; - - for (const auto& chunk : storage->sealedChunks()) { - int col_index = -1; - for (std::size_t i = 0; i < chunk.columns.size(); ++i) { - if (chunk.columns[i].descriptor->field_id == field.id) { - col_index = static_cast(i); - break; - } - } - if (col_index < 0) { - continue; - } - const auto col_sz = static_cast(col_index); - - for (uint32_t row = 0; row < chunk.stats.row_count; ++row) { - if (ArrowArrayAppendInt(ts_child, chunk.readTimestamp(row)) != NANOARROW_OK) { - write.setError("readSeriesArrow: timestamp append failed"); - return false; - } - - const bool is_null = chunk.isNull(col_sz, row); - if (is_null) { - if (ArrowArrayAppendNull(val_child, 1) != NANOARROW_OK) { - write.setError("readSeriesArrow: null append failed"); - return false; - } - } else { - ArrowErrorCode rc = NANOARROW_OK; - switch (desc->logical_type) { - case PrimitiveType::kFloat32: - rc = ArrowArrayAppendDouble(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kFloat64: - rc = ArrowArrayAppendDouble(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kInt8: - rc = ArrowArrayAppendInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kInt16: - rc = ArrowArrayAppendInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kInt32: - rc = ArrowArrayAppendInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kInt64: - rc = ArrowArrayAppendInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kUint8: - rc = ArrowArrayAppendUInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kUint16: - rc = ArrowArrayAppendUInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kUint32: - rc = ArrowArrayAppendUInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kUint64: - rc = ArrowArrayAppendUInt(val_child, decodeNumericExact(chunk, col_sz, row)); - break; - case PrimitiveType::kBool: - rc = ArrowArrayAppendInt(val_child, chunk.readBool(col_sz, row) ? 1 : 0); - break; - case PrimitiveType::kString: { - const auto text = chunk.readString(col_sz, row); - const ArrowStringView sv{text.data(), static_cast(text.size())}; - rc = ArrowArrayAppendString(val_child, sv); - break; - } - case PrimitiveType::kUnspecified: - rc = ArrowArrayAppendNull(val_child, 1); - break; - } - if (rc != NANOARROW_OK) { - write.setError("readSeriesArrow: value append failed"); - return false; - } - } - - if (ArrowArrayFinishElement(array.get()) != NANOARROW_OK) { - write.setError("readSeriesArrow: ArrowArrayFinishElement failed"); - return false; - } - } - } - - if (ArrowArrayFinishBuildingDefault(array.get(), &arrow_err) != NANOARROW_OK) { - write.setError(std::string("readSeriesArrow: finish building failed: ") + arrow_err.message); - return false; - } - - // Move schema + array into caller-provided out params (transfers release - // callbacks; the UniqueXxx destructors become no-ops). - ArrowSchemaMove(schema.get(), out_schema); - ArrowArrayMove(array.get(), out_array); - write.last_error_.clear(); - return true; - } -}; - -struct DatastoreSourceWriteHostState { - DatastoreSourceWriteHostState(DataEngine& engine, DataSourceHandle source_handle) - : core(std::make_unique(engine)), source(source_handle) {} - // Held by pointer so setTarget() can rebind to a different engine (streaming - // two-engine pause/resume) by reconstructing the WriteCore — WriteCore holds - // DataEngine by reference and is not reseatable. - std::unique_ptr core; - DataSourceHandle source; -}; - -struct DatastoreParserWriteHostState { - DatastoreParserWriteHostState(DataEngine& engine, TopicHandle topic_handle) - : core(std::make_unique(engine)), topic(topic_handle) {} - // Held by pointer so setTarget() can rebind to a different engine (streaming - // two-store pause/resume) by reconstructing the WriteCore — its writer and - // caches are engine-specific. WriteCore itself is not reassignable (holds a - // DataEngine reference). - std::unique_ptr core; - TopicHandle topic; -}; - -struct DatastoreToolboxHostState { - DatastoreToolboxHostState(DataEngine& engine, ObjectStore& store) : core(engine), object_store(store) {} - ToolboxCore core; - // Toolbox plugins share the session's object store; the host holds a - // reference so register_object_topic + push_owned_object can forward - // without going back through the engine. - ObjectStore& object_store; - std::string object_last_error; - - void setObjectError(std::string msg) { - object_last_error = std::move(msg); - } -}; - -struct DatastoreSourceObjectWriteHostState { - DatastoreSourceObjectWriteHostState(ObjectStore& s, DatasetId dataset) : target(&s), dataset_id(dataset) {} - // Atomic pointer rather than reference: the streaming two-store flow - // retargets the host between the primary and secondary ObjectStore on each - // pause/resume transition. Plain reference would not be reassignable. The - // atomic guarantees the worker thread sees a fully-published swap from the - // manager thread without locking on the hot push path. - std::atomic target; - DatasetId dataset_id; - std::string last_error; - - void setError(std::string msg) { - last_error = std::move(msg); - } -}; - -struct DatastoreToolboxObjectReadHostState { - explicit DatastoreToolboxObjectReadHostState(ObjectStore& s) : store(s) {} - ObjectStore& store; - std::string last_error; - - void setError(std::string msg) { - last_error = std::move(msg); - } -}; - -struct DatastoreParserObjectWriteHostState { - DatastoreParserObjectWriteHostState(ObjectStore& s, ObjectTopicId topic) : target(&s), bound_topic(topic) {} - // Atomic pointer rather than reference: see DatastoreSourceObjectWriteHostState. - std::atomic target; - ObjectTopicId bound_topic; - std::string last_error; - - void setError(std::string msg) { - last_error = std::move(msg); - } -}; - -void propagateError(PJ_error_t* out_error, const char* msg) { - sdk::fillError(out_error, 1, "datastore", msg != nullptr ? std::string_view(msg) : std::string_view{}); -} - -template -bool guardHostCallback(PJ_error_t* out_error, Fn&& fn) noexcept { - try { - return fn(); - } catch (const std::exception& e) { - propagateError(out_error, e.what()); - } catch (...) { - propagateError(out_error, "unknown datastore host exception"); - } - return false; -} - -bool sourceEnsureTopic(void* ctx, PJ_string_view_t topic_name, TopicHandle* out_topic, PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->ensureTopic(impl->source, toStringView(topic_name), out_topic)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool sourceEnsureField( - void* ctx, TopicHandle topic, PJ_string_view_t field_name, PJ_primitive_type_t type, FieldHandle* out_field, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->ensureField(topic, toStringView(field_name), type, out_field)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool sourceAppendRecord( - void* ctx, TopicHandle topic, int64_t timestamp, const PJ_named_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendRecord(topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool sourceAppendBoundRecord( - void* ctx, TopicHandle topic, int64_t timestamp, const PJ_bound_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendBoundRecord(topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool sourceAppendArrowStream( - void* ctx, TopicHandle topic, struct ArrowArrayStream* stream, PJ_string_view_t timestamp_column, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendArrowStream(topic, stream, timestamp_column)) { - // Failure: plugin retains ownership of the stream; we do NOT release. - propagateError(out_error, impl->core->lastError()); - return false; - } - // Success: host now owns the stream — release it. - if (stream != nullptr && stream->release != nullptr) { - stream->release(stream); - } - return true; - }); -} - -bool parserEnsureField( - void* ctx, PJ_string_view_t field_name, PJ_primitive_type_t type, FieldHandle* out_field, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->ensureField(impl->topic, toStringView(field_name), type, out_field)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool parserAppendRecord( - void* ctx, int64_t timestamp, const PJ_named_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendRecord(impl->topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool parserAppendBoundRecord( - void* ctx, int64_t timestamp, const PJ_bound_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendBoundRecord(impl->topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - return true; - }); -} - -bool parserAppendArrowStream( - void* ctx, struct ArrowArrayStream* stream, PJ_string_view_t timestamp_column, PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core->appendArrowStream(impl->topic, stream, timestamp_column)) { - propagateError(out_error, impl->core->lastError()); - return false; - } - if (stream != nullptr && stream->release != nullptr) { - stream->release(stream); - } - return true; - }); -} - -bool toolboxCreateDataSource( - void* ctx, PJ_string_view_t name, DataSourceHandle* out_source, PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.createDataSource(toStringView(name), out_source)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxEnsureTopic( - void* ctx, DataSourceHandle source, PJ_string_view_t topic_name, TopicHandle* out_topic, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.ensureTopic(source, toStringView(topic_name), out_topic)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxEnsureField( - void* ctx, TopicHandle topic, PJ_string_view_t field_name, PJ_primitive_type_t type, FieldHandle* out_field, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.ensureField(topic, toStringView(field_name), type, out_field)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxAppendRecord( - void* ctx, TopicHandle topic, int64_t timestamp, const PJ_named_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.appendRecord(topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxAppendBoundRecord( - void* ctx, TopicHandle topic, int64_t timestamp, const PJ_bound_field_value_t* fields, uint64_t field_count, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.appendBoundRecord(topic, timestamp, fields, field_count)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxAppendArrowStream( - void* ctx, TopicHandle topic, struct ArrowArrayStream* stream, PJ_string_view_t timestamp_column, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.write.appendArrowStream(topic, stream, timestamp_column)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - if (stream != nullptr && stream->release != nullptr) { - stream->release(stream); - } - return true; - }); -} - -bool toolboxAcquireCatalogSnapshot(void* ctx, PJ_catalog_snapshot_t* out_snapshot, PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.acquireCatalogSnapshot(out_snapshot)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxReadSeriesArrow( - void* ctx, FieldHandle field, struct ArrowSchema* out_schema, struct ArrowArray* out_array, - PJ_error_t* out_error) noexcept { - return guardHostCallback(out_error, [&] { - auto* impl = static_cast(ctx); - if (!impl->core.readSeriesArrow(field, out_schema, out_array)) { - propagateError(out_error, impl->core.write.lastError()); - return false; - } - return true; - }); -} - -bool toolboxRegisterObjectTopic( - void* ctx, DataSourceHandle source, PJ_string_view_t topic_name, PJ_string_view_t metadata_json, - PJ_object_topic_handle_t* out_handle, PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (out_handle == nullptr) { - propagateError(out_error, "out_handle must not be null"); - return false; - } - // Validate the source handle against the engine — same check used by - // scalar ensureTopic so the toolbox can't register a topic against a - // dataset that doesn't exist. - if (impl->core.engine_.getDataset(source.id) == nullptr) { - impl->setObjectError(fmt::format("data source {} not found", source.id)); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } - try { - ObjectTopicDescriptor desc{}; - desc.dataset_id = source.id; - desc.topic_name = std::string(toStringView(topic_name)); - desc.metadata_json = std::string(toStringView(metadata_json)); - auto result = impl->object_store.registerTopic(desc); - if (!result) { - impl->setObjectError(result.error()); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } - out_handle->id = result->id; - impl->object_last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setObjectError(e.what()); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } catch (...) { - impl->setObjectError("registerObjectTopic: unknown exception"); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } -} - -bool toolboxPushOwnedObject( - void* ctx, PJ_object_topic_handle_t topic, int64_t timestamp_ns, const uint8_t* data, uint64_t size, - PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - try { - std::vector bytes; - if (data != nullptr && size > 0) { - bytes.assign(data, data + size); - } - auto result = impl->object_store.pushOwned(ObjectTopicId{topic.id}, timestamp_ns, std::move(bytes)); - if (!result) { - impl->setObjectError(result.error()); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } - impl->object_last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setObjectError(e.what()); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } catch (...) { - impl->setObjectError("pushOwnedObject: unknown exception"); - propagateError(out_error, impl->object_last_error.c_str()); - return false; - } -} - -/// RAII holder for the plugin-owned `fetch_ctx` passed to push_lazy. Stores -/// the destroy callback pointer and the ctx value; destroys both on drop. -/// Wrapped in a shared_ptr so the lambda that ObjectStore stores remains -/// copyable (std::function requires copyable targets). -class PluginFetchCtx { - public: - PluginFetchCtx(PJ_lazy_fetch_fn_t fetch_fn, void* fetch_ctx, void (*destroy_fn)(void*)) noexcept - : fetch_fn_(fetch_fn), ctx_(fetch_ctx), destroy_fn_(destroy_fn) {} - - ~PluginFetchCtx() { - if (destroy_fn_ != nullptr) { - destroy_fn_(ctx_); - } - } - - PluginFetchCtx(const PluginFetchCtx&) = delete; - PluginFetchCtx& operator=(const PluginFetchCtx&) = delete; - PluginFetchCtx(PluginFetchCtx&&) = delete; - PluginFetchCtx& operator=(PluginFetchCtx&&) = delete; - - [[nodiscard]] std::vector invoke() const { - if (fetch_fn_ == nullptr) { - return {}; - } - const uint8_t* data = nullptr; - uint64_t size = 0; // matches PJ_lazy_fetch_fn_t out_size (uint64_t*) - if (!fetch_fn_(ctx_, &data, &size) || data == nullptr) { - return {}; - } - return std::vector(data, data + size); - } - - private: - PJ_lazy_fetch_fn_t fetch_fn_; - void* ctx_; - void (*destroy_fn_)(void*); -}; - -bool sourceObjectRegisterTopic( - void* ctx, PJ_string_view_t topic_name, PJ_string_view_t metadata_json, PJ_object_topic_handle_t* out_handle, - PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (out_handle == nullptr) { - propagateError(out_error, "out_handle must not be null"); - return false; - } - auto* target = impl->target.load(std::memory_order_acquire); - try { - ObjectTopicDescriptor desc{}; - desc.dataset_id = impl->dataset_id; - desc.topic_name = std::string(toStringView(topic_name)); - desc.metadata_json = std::string(toStringView(metadata_json)); - auto result = target->registerTopic(desc); - if (!result) { - impl->setError(result.error()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } - out_handle->id = result->id; - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("registerTopic: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -bool sourceObjectPushOwned( - void* ctx, PJ_object_topic_handle_t topic, int64_t timestamp_ns, const uint8_t* data, uint64_t size, - PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - auto* target = impl->target.load(std::memory_order_acquire); - try { - std::vector bytes; - if (data != nullptr && size > 0) { - bytes.assign(data, data + size); - } - auto result = target->pushOwned(ObjectTopicId{topic.id}, timestamp_ns, std::move(bytes)); - if (!result) { - impl->setError(result.error()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("pushOwned: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -bool sourceObjectPushLazy( - void* ctx, PJ_object_topic_handle_t topic, int64_t timestamp_ns, PJ_lazy_fetch_fn_t fetch_fn, void* fetch_ctx, - void (*fetch_ctx_destroy)(void*), PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (fetch_fn == nullptr) { - if (fetch_ctx_destroy != nullptr) { - fetch_ctx_destroy(fetch_ctx); - } - propagateError(out_error, "fetch_fn must not be null"); - return false; - } - auto* target = impl->target.load(std::memory_order_acquire); - try { - // shared_ptr keeps the ctx holder alive as long as ObjectStore keeps - // the lambda; destructor runs exactly once when ObjectStore drops the - // entry (retention, evict, removeTopic, clear, or store teardown). - auto holder = std::make_shared(fetch_fn, fetch_ctx, fetch_ctx_destroy); - // Plugins return raw bytes via the C ABI; wrap them as a PayloadView whose - // anchor is a shared_ptr>, per the pushLazy contract. - // Target pointer comes from the atomic swap layer (so writes follow the - // current target store, not a captured-at-construction one). - auto closure = [holder]() -> sdk::PayloadView { return sdk::makePayloadView(holder->invoke()); }; - auto result = target->pushLazy(ObjectTopicId{topic.id}, timestamp_ns, std::move(closure)); - if (!result) { - impl->setError(result.error()); - propagateError(out_error, impl->last_error.c_str()); - // `holder` is the only reference to the ctx on failure; dropping it - // runs fetch_ctx_destroy exactly once (the destructor already does it). - return false; - } - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - // On exception before the ObjectStore took ownership, PluginFetchCtx's - // destructor runs as part of shared_ptr teardown — single destroy call. - return false; - } catch (...) { - impl->setError("pushLazy: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -void sourceObjectSetRetentionBudget( - void* ctx, PJ_object_topic_handle_t topic, int64_t time_window_ns, uint64_t max_memory_bytes) noexcept { - auto* impl = static_cast(ctx); - auto* target = impl->target.load(std::memory_order_acquire); - try { - RetentionBudget budget{}; - budget.time_window_ns = time_window_ns; - budget.max_memory_bytes = max_memory_bytes; - target->setRetentionBudget(ObjectTopicId{topic.id}, budget); - } catch (...) { - // Infallible by contract — swallow any exception from the store. - } -} - -// --------------------------------------------------------------------------- -// Toolbox object read host trampolines -// --------------------------------------------------------------------------- - -// PJ_object_bytes_handle_t is a heap-allocated sdk::PayloadView: its anchor -// keeps the buffer alive until the plugin calls release_bytes. No wrapper -// struct needed — PayloadView already carries the Span + anchor. -PJ_object_topic_handle_t toolboxObjectLookupTopic(void* ctx, PJ_string_view_t topic_name) noexcept { - auto* impl = static_cast(ctx); - try { - const auto needle = toStringView(topic_name); - for (const auto id : impl->store.listTopics()) { - if (impl->store.descriptor(id).topic_name == needle) { - return PJ_object_topic_handle_t{id.id}; - } - } - } catch (...) { - // Fall through to invalid handle. - } - return PJ_object_topic_handle_t{0}; -} - -bool toolboxObjectListTopics( - void* ctx, PJ_object_topic_handle_t* out_buffer, uint64_t buffer_capacity, uint64_t* out_count, - PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (out_count == nullptr) { - propagateError(out_error, "out_count must not be null"); - return false; - } - try { - const auto ids = impl->store.listTopics(); - *out_count = ids.size(); - if (out_buffer != nullptr) { - // buffer_capacity is uint64_t (ABI); ids.size() is size_t. Compare in - // uint64_t, then index with size_t (n <= ids.size(), so it fits). - const std::size_t n = static_cast(std::min(buffer_capacity, ids.size())); - for (std::size_t i = 0; i < n; ++i) { - out_buffer[i] = PJ_object_topic_handle_t{ids[i].id}; - } - } - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("listTopics: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -const char* toolboxObjectTopicMetadata(void* ctx, PJ_object_topic_handle_t topic) noexcept { - auto* impl = static_cast(ctx); - try { - const auto& desc = impl->store.descriptor(ObjectTopicId{topic.id}); - // Descriptor is stored in the series and lives as long as the topic; - // the pointer remains stable until the topic is removed. - return desc.metadata_json.c_str(); - } catch (...) { - return nullptr; - } -} - -bool toolboxObjectReadLatestAt( - void* ctx, PJ_object_topic_handle_t topic, int64_t timestamp_ns, PJ_object_bytes_handle_t* out_handle, - int64_t* out_timestamp, PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (out_handle == nullptr) { - propagateError(out_error, "out_handle must not be null"); - return false; - } - *out_handle = nullptr; - try { - auto entry = impl->store.latestAt(ObjectTopicId{topic.id}, timestamp_ns); - if (!entry.has_value() || entry->payload.anchor == nullptr) { - impl->setError("no entry at-or-before timestamp"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } - auto* payload_handle = new sdk::PayloadView(std::move(entry->payload)); - *out_handle = reinterpret_cast(payload_handle); - if (out_timestamp != nullptr) { - *out_timestamp = entry->timestamp; - } - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("readLatestAt: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -void toolboxObjectGetBytes(PJ_object_bytes_handle_t handle, const uint8_t** out_data, uint64_t* out_size) noexcept { - if (out_data != nullptr) { - *out_data = nullptr; - } - if (out_size != nullptr) { - *out_size = 0; - } - if (handle == nullptr) { - return; - } - const auto* payload = reinterpret_cast(handle); - if (payload->anchor == nullptr) { - return; - } - if (out_data != nullptr) { - *out_data = payload->bytes.data(); - } - if (out_size != nullptr) { - *out_size = payload->bytes.size(); - } -} - -void toolboxObjectReleaseBytes(PJ_object_bytes_handle_t handle) noexcept { - if (handle == nullptr) { - return; - } - delete reinterpret_cast(handle); -} - -uint64_t toolboxObjectEntryCount(void* ctx, PJ_object_topic_handle_t topic) noexcept { - auto* impl = static_cast(ctx); - try { - return impl->store.entryCount(ObjectTopicId{topic.id}); - } catch (...) { - return 0; - } -} - -bool toolboxObjectTimeRange( - void* ctx, PJ_object_topic_handle_t topic, int64_t* out_min_ts, int64_t* out_max_ts) noexcept { - auto* impl = static_cast(ctx); - try { - if (impl->store.entryCount(ObjectTopicId{topic.id}) == 0) { - return false; - } - const auto range = impl->store.timeRange(ObjectTopicId{topic.id}); - if (out_min_ts != nullptr) { - *out_min_ts = range.first; - } - if (out_max_ts != nullptr) { - *out_max_ts = range.second; - } - return true; - } catch (...) { - return false; - } -} - -// --------------------------------------------------------------------------- -// Parser object write host trampolines — topic bound at service-create time. -// --------------------------------------------------------------------------- - -bool parserObjectPushOwned( - void* ctx, int64_t timestamp_ns, const uint8_t* data, uint64_t size, PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - auto* target = impl->target.load(std::memory_order_acquire); - try { - std::vector bytes; - if (data != nullptr && size > 0) { - bytes.assign(data, data + size); - } - auto result = target->pushOwned(impl->bound_topic, timestamp_ns, std::move(bytes)); - if (!result) { - impl->setError(result.error()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("parser pushOwned: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -bool parserObjectPushLazy( - void* ctx, int64_t timestamp_ns, PJ_lazy_fetch_fn_t fetch_fn, void* fetch_ctx, void (*fetch_ctx_destroy)(void*), - PJ_error_t* out_error) noexcept { - auto* impl = static_cast(ctx); - if (fetch_fn == nullptr) { - if (fetch_ctx_destroy != nullptr) { - fetch_ctx_destroy(fetch_ctx); - } - propagateError(out_error, "fetch_fn must not be null"); - return false; - } - auto* target = impl->target.load(std::memory_order_acquire); - try { - auto holder = std::make_shared(fetch_fn, fetch_ctx, fetch_ctx_destroy); - auto closure = [holder]() -> sdk::PayloadView { return sdk::makePayloadView(holder->invoke()); }; - auto result = target->pushLazy(impl->bound_topic, timestamp_ns, std::move(closure)); - if (!result) { - impl->setError(result.error()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } - impl->last_error.clear(); - return true; - } catch (const std::exception& e) { - impl->setError(e.what()); - propagateError(out_error, impl->last_error.c_str()); - return false; - } catch (...) { - impl->setError("parser pushLazy: unknown exception"); - propagateError(out_error, impl->last_error.c_str()); - return false; - } -} - -const PJ_source_write_host_vtable_t kSourceWriteVTable = { - PJ_PLUGIN_DATA_API_VERSION, sizeof(PJ_source_write_host_vtable_t), - sourceEnsureTopic, sourceEnsureField, - sourceAppendRecord, sourceAppendBoundRecord, - sourceAppendArrowStream, -}; - -const PJ_parser_write_host_vtable_t kParserWriteVTable = { - PJ_PLUGIN_DATA_API_VERSION, sizeof(PJ_parser_write_host_vtable_t), - parserEnsureField, parserAppendRecord, - parserAppendBoundRecord, parserAppendArrowStream, -}; - -const PJ_toolbox_host_vtable_t kToolboxVTable = { - PJ_PLUGIN_DATA_API_VERSION, - sizeof(PJ_toolbox_host_vtable_t), - toolboxCreateDataSource, - toolboxEnsureTopic, - toolboxEnsureField, - toolboxAppendRecord, - toolboxAppendBoundRecord, - toolboxAppendArrowStream, - toolboxAcquireCatalogSnapshot, - toolboxReadSeriesArrow, - toolboxRegisterObjectTopic, - toolboxPushOwnedObject, -}; - -const PJ_object_write_host_vtable_t kSourceObjectWriteVTable = { - PJ_PLUGIN_DATA_API_VERSION, sizeof(PJ_object_write_host_vtable_t), sourceObjectRegisterTopic, sourceObjectPushOwned, - sourceObjectPushLazy, sourceObjectSetRetentionBudget, -}; - -const PJ_object_read_host_vtable_t kToolboxObjectReadVTable = { - PJ_PLUGIN_DATA_API_VERSION, sizeof(PJ_object_read_host_vtable_t), - toolboxObjectLookupTopic, toolboxObjectListTopics, - toolboxObjectTopicMetadata, toolboxObjectReadLatestAt, - toolboxObjectGetBytes, toolboxObjectReleaseBytes, - toolboxObjectEntryCount, toolboxObjectTimeRange, -}; - -const PJ_parser_object_write_host_vtable_t kParserObjectWriteVTable = { - PJ_PLUGIN_DATA_API_VERSION, - sizeof(PJ_parser_object_write_host_vtable_t), - parserObjectPushOwned, - parserObjectPushLazy, -}; - -DatastoreSourceWriteHost::DatastoreSourceWriteHost(DataEngine& engine, DataSourceHandle source) - : state_(std::make_unique(engine, source)) {} -DatastoreSourceWriteHost::~DatastoreSourceWriteHost() = default; -DatastoreSourceWriteHost::DatastoreSourceWriteHost(DatastoreSourceWriteHost&&) noexcept = default; -DatastoreSourceWriteHost& DatastoreSourceWriteHost::operator=(DatastoreSourceWriteHost&&) noexcept = default; - -PJ_source_write_host_t DatastoreSourceWriteHost::raw() noexcept { - return PJ_source_write_host_t{.ctx = state_.get(), .vtable = &kSourceWriteVTable}; -} - -void DatastoreSourceWriteHost::flushPending() { - state_->core->flushPending(); -} - -void DatastoreSourceWriteHost::setTarget(DataEngine* target) { - // Seal + commit any open chunk to the current engine so no rows are lost, - // then rebind to the new engine with a fresh WriteCore (its writer and - // per-engine caches must not carry over). Mirrors DatastoreParserWriteHost. - state_->core->flushPending(); - state_->core = std::make_unique(*target); -} - -DatastoreParserWriteHost::DatastoreParserWriteHost(DataEngine& engine, TopicHandle topic) - : state_(std::make_unique(engine, topic)) {} -DatastoreParserWriteHost::~DatastoreParserWriteHost() = default; -DatastoreParserWriteHost::DatastoreParserWriteHost(DatastoreParserWriteHost&&) noexcept = default; -DatastoreParserWriteHost& DatastoreParserWriteHost::operator=(DatastoreParserWriteHost&&) noexcept = default; - -PJ_parser_write_host_t DatastoreParserWriteHost::raw() noexcept { - return PJ_parser_write_host_t{.ctx = state_.get(), .vtable = &kParserWriteVTable}; -} - -void DatastoreParserWriteHost::flushPending() { - state_->core->flushPending(); -} - -void DatastoreParserWriteHost::setTarget(DataEngine* target) { - // Seal + commit any open chunk to the current engine so no rows are lost, - // then rebind to the new engine with a fresh WriteCore (its writer and - // per-engine caches must not carry over). The bound topic is expected to - // already exist in `target` with the same TopicId. - state_->core->flushPending(); - state_->core = std::make_unique(*target); -} - -DatastoreToolboxHost::DatastoreToolboxHost(DataEngine& engine, ObjectStore& object_store) - : state_(std::make_unique(engine, object_store)) {} -DatastoreToolboxHost::~DatastoreToolboxHost() = default; -DatastoreToolboxHost::DatastoreToolboxHost(DatastoreToolboxHost&&) noexcept = default; -DatastoreToolboxHost& DatastoreToolboxHost::operator=(DatastoreToolboxHost&&) noexcept = default; - -PJ_toolbox_host_t DatastoreToolboxHost::raw() noexcept { - return PJ_toolbox_host_t{.ctx = state_.get(), .vtable = &kToolboxVTable}; -} - -void DatastoreToolboxHost::flushPending() { - state_->core.write.flushPending(); -} - -DatastoreSourceObjectWriteHost::DatastoreSourceObjectWriteHost(ObjectStore& store, DatasetId dataset_id) - : state_(std::make_unique(store, dataset_id)) {} -DatastoreSourceObjectWriteHost::~DatastoreSourceObjectWriteHost() = default; -DatastoreSourceObjectWriteHost::DatastoreSourceObjectWriteHost(DatastoreSourceObjectWriteHost&&) noexcept = default; -DatastoreSourceObjectWriteHost& DatastoreSourceObjectWriteHost::operator=(DatastoreSourceObjectWriteHost&&) noexcept = - default; - -PJ_object_write_host_t DatastoreSourceObjectWriteHost::raw() noexcept { - return PJ_object_write_host_t{.ctx = state_.get(), .vtable = &kSourceObjectWriteVTable}; -} - -void DatastoreSourceObjectWriteHost::setTarget(ObjectStore* target) noexcept { - state_->target.store(target, std::memory_order_release); -} - -DatastoreToolboxObjectReadHost::DatastoreToolboxObjectReadHost(ObjectStore& store) - : state_(std::make_unique(store)) {} -DatastoreToolboxObjectReadHost::~DatastoreToolboxObjectReadHost() = default; -DatastoreToolboxObjectReadHost::DatastoreToolboxObjectReadHost(DatastoreToolboxObjectReadHost&&) noexcept = default; -DatastoreToolboxObjectReadHost& DatastoreToolboxObjectReadHost::operator=(DatastoreToolboxObjectReadHost&&) noexcept = - default; - -PJ_object_read_host_t DatastoreToolboxObjectReadHost::raw() noexcept { - return PJ_object_read_host_t{.ctx = state_.get(), .vtable = &kToolboxObjectReadVTable}; -} - -DatastoreParserObjectWriteHost::DatastoreParserObjectWriteHost(ObjectStore& store, uint32_t topic_id) - : state_(std::make_unique(store, ObjectTopicId{topic_id})) {} -DatastoreParserObjectWriteHost::~DatastoreParserObjectWriteHost() = default; -DatastoreParserObjectWriteHost::DatastoreParserObjectWriteHost(DatastoreParserObjectWriteHost&&) noexcept = default; -DatastoreParserObjectWriteHost& DatastoreParserObjectWriteHost::operator=(DatastoreParserObjectWriteHost&&) noexcept = - default; - -PJ_parser_object_write_host_t DatastoreParserObjectWriteHost::raw() noexcept { - return PJ_parser_object_write_host_t{.ctx = state_.get(), .vtable = &kParserObjectWriteVTable}; -} - -void DatastoreParserObjectWriteHost::setTarget(ObjectStore* target) noexcept { - state_->target.store(target, std::memory_order_release); -} - -} // namespace PJ diff --git a/pj_datastore/src/query.cpp b/pj_datastore/src/query.cpp deleted file mode 100644 index 9fbbcb3b..00000000 --- a/pj_datastore/src/query.cpp +++ /dev/null @@ -1,430 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/query.hpp" - -#include -#include -#include -#include -#include - -namespace PJ { -namespace { - -[[nodiscard]] Range normalized(Range range) { - if (range.max < range.min) { - std::swap(range.min, range.max); - } - return range; -} - -[[nodiscard]] bool isBoolColumn(const TopicChunk& chunk, std::size_t column_index) { - return column_index < chunk.columns.size() && chunk.columns[column_index].descriptor && - chunk.columns[column_index].descriptor->logical_type == PrimitiveType::kBool; -} - -[[nodiscard]] std::optional readSeriesValue( - const TopicChunk& chunk, std::size_t column_index, std::size_t row) { - if (column_index >= chunk.columns.size() || row >= chunk.stats.row_count || chunk.isNull(column_index, row)) { - return std::nullopt; - } - if (isBoolColumn(chunk, column_index)) { - return chunk.readBool(column_index, row) ? 1.0 : 0.0; - } - return chunk.readNumericAsDouble(column_index, row); -} - -[[nodiscard]] SeriesSample makeSeriesSample(const TopicChunk& chunk, std::size_t column_index, std::size_t row) { - const auto value = readSeriesValue(chunk, column_index, row); - assert(value.has_value()); - return SeriesSample{chunk.readTimestamp(row), *value, &chunk, row}; -} - -[[nodiscard]] Range allTime() { - return Range{ - .min = std::numeric_limits::min(), - .max = std::numeric_limits::max(), - }; -} - -} // namespace - -// =========================================================================== -// RangeCursor -// =========================================================================== - -RangeCursor::RangeCursor(const std::deque& chunks, Timestamp t_min, Timestamp t_max) - : chunks_(&chunks), t_min_(t_min), t_max_(t_max) { - findFirstValid(); -} - -bool RangeCursor::valid() const noexcept { - return chunk_index_ < chunks_->size(); -} - -SampleRow RangeCursor::current() const { - assert(valid()); - const auto& chunk = (*chunks_)[chunk_index_]; - return SampleRow{chunk.readTimestamp(row_index_), &chunk, row_index_}; -} - -void RangeCursor::advance() { - assert(valid()); - const auto& chunk = (*chunks_)[chunk_index_]; - ++row_index_; - if (row_index_ >= chunk.stats.row_count) { - ++chunk_index_; - row_index_ = 0; - } - skipToValid(); -} - -void RangeCursor::forEach(std::function callback) { - while (valid()) { - callback(current()); - advance(); - } -} - -void RangeCursor::forEachChunk(std::function callback) { - while (chunk_index_ < chunks_->size()) { - const auto& chunk = (*chunks_)[chunk_index_]; - - // Skip chunks entirely before our range - if (chunk.stats.t_max < t_min_) { - ++chunk_index_; - continue; - } - // Stop if chunk is entirely after our range - if (chunk.stats.t_min > t_max_) { - break; - } - - // Find first valid row in this chunk (>= t_min_) - std::size_t first = row_index_; - while (first < chunk.stats.row_count && chunk.readTimestamp(first) < t_min_) { - ++first; - } - - // Find one-past-last valid row in this chunk (<= t_max_) - std::size_t end = first; - while (end < chunk.stats.row_count && chunk.readTimestamp(end) <= t_max_) { - ++end; - } - - if (first < end) { - callback(ChunkRowRange{&chunk, first, end}); - } - - // Move to next chunk - ++chunk_index_; - row_index_ = 0; - } - // Mark cursor exhausted - chunk_index_ = chunks_->size(); -} - -void RangeCursor::findFirstValid() { - const auto& chunks = *chunks_; - - // First chunk that could contain a row in range, i.e. whose t_max >= t_min_. - // Committed chunks are non-empty and time-ordered (each chunk's t_min >= the - // previous chunk's t_max), so t_max is non-decreasing across the deque and we - // can binary-search it. - const auto chunk_it = std::lower_bound( - chunks.begin(), chunks.end(), t_min_, - [](const TopicChunk& chunk, Timestamp value) { return chunk.stats.t_max < value; }); - if (chunk_it == chunks.end()) { - // All data is strictly before t_min_. - chunk_index_ = chunks.size(); - row_index_ = 0; - return; - } - chunk_index_ = static_cast(chunk_it - chunks.begin()); - - // First row with timestamp >= t_min_ within that chunk. Such a row exists - // because t_max (the chunk's last timestamp) >= t_min_. - const TopicChunk& chunk = *chunk_it; - const auto ts_begin = chunk.timestamps.begin(); - const auto ts_end = ts_begin + static_cast(chunk.stats.row_count); - const auto row_it = std::lower_bound(ts_begin, ts_end, t_min_); - row_index_ = static_cast(row_it - ts_begin); - - // If the first row at or after t_min_ is already past t_max_, nothing in the - // deque falls inside [t_min_, t_max_]. - if (row_it == ts_end || *row_it > t_max_) { - chunk_index_ = chunks.size(); - row_index_ = 0; - } -} - -void RangeCursor::skipToValid() { - if (!valid()) { - return; - } - const auto& chunk = (*chunks_)[chunk_index_]; - Timestamp ts = chunk.readTimestamp(row_index_); - if (ts > t_max_) { - // Past the end of the query range - chunk_index_ = chunks_->size(); - return; - } - // ts >= t_min_ is guaranteed by how we advance through sorted data -} - -// =========================================================================== -// latest_at -// =========================================================================== - -std::optional latestAt(const std::deque& chunks, Timestamp t) { - // Last chunk that can contain a row at or before t, i.e. the latest chunk - // whose t_min <= t. Committed chunks are non-empty and have non-decreasing - // t_min, so upper_bound finds the first chunk strictly after t; the chunk - // before it is the answer. (At a shared boundary timestamp this selects the - // later chunk, matching the previous reverse-scan behaviour.) - const auto after = std::upper_bound(chunks.begin(), chunks.end(), t, [](Timestamp value, const TopicChunk& chunk) { - return value < chunk.stats.t_min; - }); - if (after == chunks.begin()) { - // Empty deque, or every chunk starts strictly after t. - return std::nullopt; - } - const TopicChunk& chunk = *(after - 1); - - // Last row with timestamp <= t within that chunk. Such a row exists because - // the chunk's first timestamp (t_min) is <= t. - const auto ts_begin = chunk.timestamps.begin(); - const auto ts_end = ts_begin + static_cast(chunk.stats.row_count); - const auto row_after = std::upper_bound(ts_begin, ts_end, t); - if (row_after == ts_begin) { - return std::nullopt; // unreachable for committed chunks (row 0 ts == t_min <= t) - } - const std::size_t row = static_cast((row_after - 1) - ts_begin); - return SampleRow{chunk.readTimestamp(row), &chunk, row}; -} - -// =========================================================================== -// range_query -// =========================================================================== - -RangeCursor rangeQuery(const std::deque& chunks, Timestamp t_min, Timestamp t_max) { - return RangeCursor(chunks, t_min, t_max); -} - -// =========================================================================== -// SeriesCursor -// =========================================================================== - -SeriesCursor::SeriesCursor(const std::deque& chunks, std::size_t column_index, Range time_range) - : chunks_(&chunks), column_index_(column_index), time_range_(normalized(time_range)) { - skipToSample(); -} - -bool SeriesCursor::valid() const noexcept { - return chunk_index_ < chunks_->size(); -} - -SeriesSample SeriesCursor::current() const { - assert(valid()); - return makeSeriesSample((*chunks_)[chunk_index_], column_index_, row_index_); -} - -void SeriesCursor::advance() { - assert(valid()); - ++row_index_; - skipToSample(); -} - -void SeriesCursor::forEach(std::function callback) { - while (valid()) { - callback(current()); - advance(); - } -} - -void SeriesCursor::skipToSample() { - while (chunk_index_ < chunks_->size()) { - const auto& chunk = (*chunks_)[chunk_index_]; - - if (chunk.stats.row_count == 0 || chunk.stats.t_max < time_range_.min || column_index_ >= chunk.columns.size()) { - ++chunk_index_; - row_index_ = 0; - continue; - } - - if (chunk.stats.t_min > time_range_.max) { - chunk_index_ = chunks_->size(); - return; - } - - while (row_index_ < chunk.stats.row_count) { - const Timestamp ts = chunk.readTimestamp(row_index_); - if (ts < time_range_.min) { - ++row_index_; - continue; - } - if (ts > time_range_.max) { - chunk_index_ = chunks_->size(); - return; - } - if (readSeriesValue(chunk, column_index_, row_index_).has_value()) { - return; - } - ++row_index_; - } - - ++chunk_index_; - row_index_ = 0; - } -} - -// =========================================================================== -// SeriesReader -// =========================================================================== - -SeriesReader::SeriesReader(const std::deque& chunks, std::size_t column_index) - : chunks_(&chunks), column_index_(column_index) {} - -std::size_t SeriesReader::size() const { - std::size_t count = 0; - for (const TopicChunk& chunk : *chunks_) { - if (column_index_ >= chunk.columns.size()) { - continue; - } - for (std::size_t row = 0; row < chunk.stats.row_count; ++row) { - if (readSeriesValue(chunk, column_index_, row).has_value()) { - ++count; - } - } - } - return count; -} - -bool SeriesReader::empty() const { - return size() == 0; -} - -std::optional SeriesReader::sampleAt(std::size_t index) const { - std::size_t series_index = 0; - for (const TopicChunk& chunk : *chunks_) { - if (column_index_ >= chunk.columns.size()) { - continue; - } - for (std::size_t row = 0; row < chunk.stats.row_count; ++row) { - if (!readSeriesValue(chunk, column_index_, row).has_value()) { - continue; - } - if (series_index == index) { - return makeSeriesSample(chunk, column_index_, row); - } - ++series_index; - } - } - return std::nullopt; -} - -std::optional SeriesReader::indexAtOrBeforeTime(Timestamp t) const { - std::optional latest; - std::size_t series_index = 0; - for (const TopicChunk& chunk : *chunks_) { - if (chunk.stats.row_count == 0 || column_index_ >= chunk.columns.size()) { - continue; - } - if (chunk.stats.t_min > t) { - break; - } - for (std::size_t row = 0; row < chunk.stats.row_count; ++row) { - const Timestamp ts = chunk.readTimestamp(row); - if (ts > t) { - return latest; - } - if (readSeriesValue(chunk, column_index_, row).has_value()) { - latest = series_index; - ++series_index; - } - } - } - return latest; -} - -std::optional SeriesReader::indexAtOrAfterTime(Timestamp t) const { - std::size_t series_index = 0; - for (const TopicChunk& chunk : *chunks_) { - if (chunk.stats.row_count == 0 || column_index_ >= chunk.columns.size()) { - continue; - } - if (chunk.stats.t_max < t) { - for (std::size_t row = 0; row < chunk.stats.row_count; ++row) { - if (readSeriesValue(chunk, column_index_, row).has_value()) { - ++series_index; - } - } - continue; - } - for (std::size_t row = 0; row < chunk.stats.row_count; ++row) { - if (!readSeriesValue(chunk, column_index_, row).has_value()) { - continue; - } - if (chunk.readTimestamp(row) >= t) { - return series_index; - } - ++series_index; - } - } - return std::nullopt; -} - -std::optional SeriesReader::sampleAtOrBeforeTime(Timestamp t) const { - const auto index = indexAtOrBeforeTime(t); - return index.has_value() ? sampleAt(*index) : std::nullopt; -} - -std::optional SeriesReader::sampleAtOrAfterTime(Timestamp t) const { - const auto index = indexAtOrAfterTime(t); - return index.has_value() ? sampleAt(*index) : std::nullopt; -} - -SeriesCursor SeriesReader::samples(Range time_range) const { - return SeriesCursor(*chunks_, column_index_, time_range); -} - -std::optional SeriesReader::bounds() const { - return bounds(allTime()); -} - -std::optional SeriesReader::bounds(Range time_range) const { - SeriesBounds result; - bool found_time = false; - bool found_value = false; - auto cursor = samples(time_range); - cursor.forEach([&](const SeriesSample& sample) { - if (!found_time) { - result.time.min = sample.timestamp; - result.time.max = sample.timestamp; - found_time = true; - } else { - result.time.min = std::min(result.time.min, sample.timestamp); - result.time.max = std::max(result.time.max, sample.timestamp); - } - - if (std::isfinite(sample.value)) { - if (!found_value) { - result.value.min = sample.value; - result.value.max = sample.value; - found_value = true; - } else { - result.value.min = std::min(result.value.min, sample.value); - result.value.max = std::max(result.value.max, sample.value); - } - } - ++result.sample_count; - }); - - if (!found_time || !found_value) { - return std::nullopt; - } - return result; -} - -} // namespace PJ diff --git a/pj_datastore/src/reader.cpp b/pj_datastore/src/reader.cpp deleted file mode 100644 index 76279e53..00000000 --- a/pj_datastore/src/reader.cpp +++ /dev/null @@ -1,183 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/reader.hpp" - -#include - -#include -#include -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/type_registry.hpp" - -namespace PJ { -namespace { - -[[nodiscard]] bool isSeriesValueType(PrimitiveType type) noexcept { - switch (type) { - case PrimitiveType::kFloat32: - case PrimitiveType::kFloat64: - case PrimitiveType::kInt8: - case PrimitiveType::kInt16: - case PrimitiveType::kInt32: - case PrimitiveType::kInt64: - case PrimitiveType::kUint8: - case PrimitiveType::kUint16: - case PrimitiveType::kUint32: - case PrimitiveType::kUint64: - case PrimitiveType::kBool: - return true; - case PrimitiveType::kString: - case PrimitiveType::kUnspecified: - return false; - } - return false; -} - -void flattenColumns( - const TypeTreeNode& node, std::string_view prefix, FieldId& next_id, std::vector& out) { - const std::string path = prefix.empty() ? node.name : std::string(prefix) + "." + node.name; - - if (node.kind == TypeKind::kStruct) { - for (const auto& child : node.children) { - flattenColumns(*child, path, next_id, out); - } - return; - } - - if (node.kind == TypeKind::kArray) { - if (!node.element_type || !node.fixed_array_size.has_value()) { - return; - } - for (uint32_t i = 0; i < *node.fixed_array_size; ++i) { - const std::string element_path = path + "[" + std::to_string(i) + "]"; - if (node.element_type->kind == TypeKind::kStruct) { - for (const auto& child : node.element_type->children) { - flattenColumns(*child, element_path, next_id, out); - } - } else { - out.push_back( - ColumnDescriptor{ - .field_id = next_id++, - .logical_type = node.element_type->primitive_type.value_or(PrimitiveType::kFloat64), - .field_path = element_path, - }); - } - } - return; - } - - out.push_back( - ColumnDescriptor{ - .field_id = next_id++, - .logical_type = node.primitive_type.value_or(PrimitiveType::kFloat64), - .field_path = path, - }); -} - -[[nodiscard]] std::vector columnsForTopic(const DataEngine& engine, const TopicStorage& storage) { - if (!storage.columnDescriptors().empty()) { - return storage.columnDescriptors(); - } - - const auto& chunks = storage.sealedChunks(); - if (!chunks.empty()) { - std::vector columns; - columns.reserve(chunks.front().columns.size()); - for (const auto& column : chunks.front().columns) { - if (column.descriptor) { - columns.push_back(*column.descriptor); - } - } - return columns; - } - - const TypeTreeNode* type_tree = engine.typeRegistry().lookup(storage.descriptor().schema_id); - if (type_tree == nullptr) { - return {}; - } - - std::vector columns; - FieldId next_id = 0; - if (type_tree->kind == TypeKind::kStruct) { - for (const auto& child : type_tree->children) { - flattenColumns(*child, "", next_id, columns); - } - } else { - flattenColumns(*type_tree, "", next_id, columns); - } - return columns; -} - -} // namespace - -DataReader::DataReader(const DataEngine& engine) : engine_(engine) {} - -std::vector DataReader::listDatasets() const { - return engine_.listDatasets(); -} - -std::vector DataReader::listTopics(DatasetId dataset_id) const { - return engine_.listTopics(dataset_id); -} - -const TypeTreeNode* DataReader::getTypeTree(TopicId topic_id) const { - const TopicStorage* storage = engine_.getTopicStorage(topic_id); - if (storage == nullptr) { - return nullptr; - } - SchemaId schema_id = storage->descriptor().schema_id; - return engine_.typeRegistry().lookup(schema_id); -} - -std::optional DataReader::getMetadata(TopicId topic_id) const { - const TopicStorage* storage = engine_.getTopicStorage(topic_id); - if (storage == nullptr) { - return std::nullopt; - } - return storage->metadata(); -} - -Expected DataReader::rangeQuery(const QueryRange& range) const { - const TopicStorage* storage = engine_.getTopicStorage(range.topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", range.topic_id)); - } - return PJ::rangeQuery(storage->sealedChunks(), range.t_min, range.t_max); -} - -PJ::Expected> DataReader::latestAt(const QueryPoint& point) const { - const TopicStorage* storage = engine_.getTopicStorage(point.topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", point.topic_id)); - } - return PJ::latestAt(storage->sealedChunks(), point.t); -} - -Expected DataReader::series(TopicId topic_id, std::size_t column_index) const { - const TopicStorage* storage = engine_.getTopicStorage(topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", topic_id)); - } - - const std::vector columns = columnsForTopic(engine_, *storage); - if (column_index >= columns.size()) { - return PJ::unexpected(fmt::format("Column {} not found in topic {}", column_index, topic_id)); - } - - if (!isSeriesValueType(columns[column_index].logical_type)) { - return PJ::unexpected(fmt::format("Column {} in topic {} is not a numeric series", column_index, topic_id)); - } - - return SeriesReader(storage->sealedChunks(), column_index); -} - -} // namespace PJ diff --git a/pj_datastore/src/topic_storage.cpp b/pj_datastore/src/topic_storage.cpp deleted file mode 100644 index 8d91c7b0..00000000 --- a/pj_datastore/src/topic_storage.cpp +++ /dev/null @@ -1,168 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/topic_storage.hpp" - -#include - -#include -#include - -#include "pj_base/expected.hpp" - -namespace PJ { - -TopicStorage::TopicStorage(TopicId topic_id, TopicDescriptor descriptor) - : topic_id_(topic_id), descriptor_(std::move(descriptor)) {} - -PJ::Status TopicStorage::appendSealedChunk(TopicChunk chunk) { - if (!sealed_chunks_.empty() && chunk.stats.t_min < sealed_chunks_.back().stats.t_max) { - // Reject any chunk whose t_min overlaps with the previous chunk's time range. - // Using t_max (not t_min) as the boundary: a new chunk starting exactly at - // the previous t_max is allowed (equal-boundary chunks from normal chunking). - return PJ::unexpected( - fmt::format( - "Overlapping chunk: new t_min={} < last t_max={}", chunk.stats.t_min, sealed_chunks_.back().stats.t_max)); - } - sealed_chunks_.push_back(std::move(chunk)); - return PJ::okStatus(); -} - -void TopicStorage::evictBefore(Timestamp t_keep_min) { - // Chunks are commit-ordered: evict the contiguous prefix whose chunks are - // entirely older than t_keep_min. - size_t end_to_remove = 0; - while (end_to_remove < sealed_chunks_.size() && sealed_chunks_[end_to_remove].stats.t_max < t_keep_min) { - ++end_to_remove; - } - - if (end_to_remove > 0) { - sealed_chunks_.erase(sealed_chunks_.begin(), sealed_chunks_.begin() + static_cast(end_to_remove)); - } -} - -void TopicStorage::clearChunks() noexcept { - sealed_chunks_.clear(); -} - -void TopicStorage::setColumnDescriptors(std::vector descs) noexcept { - column_descriptors_ = std::move(descs); -} - -const std::vector& TopicStorage::columnDescriptors() const noexcept { - return column_descriptors_; -} - -const std::deque& TopicStorage::sealedChunks() const noexcept { - return sealed_chunks_; -} - -TopicMetadata TopicStorage::metadata() const { - TopicMetadata meta; - meta.topic_id = topic_id_; - meta.name = descriptor_.name; - meta.current_schema = descriptor_.schema_id; - meta.dataset_id = descriptor_.dataset_id; - - meta.max_observed_array_length = max_observed_array_length_; - meta.truncated_sample_count = truncated_sample_count_; - - if (sealed_chunks_.empty()) { - return meta; - } - - meta.time_range_min = sealed_chunks_.front().stats.t_min; - meta.time_range_max = sealed_chunks_.back().stats.t_max; - - for (const auto& chunk : sealed_chunks_) { - meta.total_row_count += chunk.stats.row_count; - - // Approximate byte size: sum encoded timestamp buffer + all encoded column buffers - meta.total_byte_size += chunk.timestamps.size() * sizeof(Timestamp); - for (const auto& col : chunk.columns) { - std::visit( - [&](const auto& v) { - using T = std::decay_t; - if constexpr (std::is_same_v) { - meta.total_byte_size += v.size(); - } else if constexpr (std::is_same_v) { - meta.total_byte_size += v.indices.size(); - for (const auto& s : v.dictionary) { - meta.total_byte_size += s.size(); - } - } else if constexpr (std::is_same_v) { - meta.total_byte_size += v.bits.size(); - } else if constexpr (std::is_same_v) { - meta.total_byte_size += v.value_size; - } else if constexpr (std::is_same_v) { - meta.total_byte_size += v.offsets.size(); - } - }, - col.data); - if (col.validity_bitmap) { - meta.total_byte_size += col.validity_bitmap->sizeBytes(); - } - } - } - - return meta; -} - -const TopicDescriptor& TopicStorage::descriptor() const noexcept { - return descriptor_; -} - -TopicId TopicStorage::topic_id() const noexcept { - return topic_id_; -} - -bool TopicStorage::empty() const noexcept { - return sealed_chunks_.empty(); -} - -Timestamp TopicStorage::time_min() const noexcept { - if (sealed_chunks_.empty()) { - return 0; - } - return sealed_chunks_.front().stats.t_min; -} - -Timestamp TopicStorage::time_max() const noexcept { - if (sealed_chunks_.empty()) { - return 0; - } - return sealed_chunks_.back().stats.t_max; -} - -void TopicStorage::updateSchema(SchemaId new_schema) { - descriptor_.schema_id = new_schema; -} - -void TopicStorage::updateMaxObservedArrayLength(uint32_t observed_length) { - if (observed_length > max_observed_array_length_) { - max_observed_array_length_ = observed_length; - } -} - -void TopicStorage::incrementTruncatedSampleCount() { - ++truncated_sample_count_; -} - -uint32_t TopicStorage::maxObservedArrayLength() const noexcept { - return max_observed_array_length_; -} - -uint32_t TopicStorage::truncatedSampleCount() const noexcept { - return truncated_sample_count_; -} - -uint32_t TopicStorage::arrayExpansionCount(const std::string& field_path) const noexcept { - auto it = array_expansion_counts_.find(field_path); - return it != array_expansion_counts_.end() ? it->second : 0; -} - -void TopicStorage::setArrayExpansionCount(const std::string& field_path, uint32_t count) { - array_expansion_counts_[field_path] = count; -} - -} // namespace PJ diff --git a/pj_datastore/src/type_registry.cpp b/pj_datastore/src/type_registry.cpp deleted file mode 100644 index 927d4b36..00000000 --- a/pj_datastore/src/type_registry.cpp +++ /dev/null @@ -1,151 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/type_registry.hpp" - -#include -#include - -#include -#include -#include - -#include "pj_base/expected.hpp" - -namespace PJ { -namespace { - -// Flatten a type tree into leaf paths paired with their PrimitiveType. -// For primitives: uses primitive_type directly. -// For enums: uses the underlying primitive_type. -// For arrays: uses the element_type's primitive_type (if primitive/enum). -// For structs: recurses into children. -void flatten_leaf_types_impl( - const PJ::TypeTreeNode& node, std::string_view prefix, - std::vector>& out) { - std::string current_path = prefix.empty() ? node.name : fmt::format("{}.{}", prefix, node.name); - - switch (node.kind) { - case PJ::TypeKind::kPrimitive: - if (node.primitive_type.has_value()) { - out.emplace_back(std::move(current_path), *node.primitive_type); - } - return; - case PJ::TypeKind::kEnum: - if (node.primitive_type.has_value()) { - out.emplace_back(std::move(current_path), *node.primitive_type); - } - return; - case PJ::TypeKind::kArray: - // Treat the array itself as a leaf node with its element's type - if (node.element_type && node.element_type->primitive_type.has_value()) { - out.emplace_back(std::move(current_path), *node.element_type->primitive_type); - } - return; - case PJ::TypeKind::kStruct: - for (const auto& child : node.children) { - flatten_leaf_types_impl(*child, current_path, out); - } - return; - } -} - -// Flatten starting from root, skipping the root struct name (same convention -// as flatten_field_paths). -std::vector> flatten_leaf_types(const PJ::TypeTreeNode& root) { - std::vector> result; - if (root.kind != PJ::TypeKind::kStruct) { - if (root.primitive_type.has_value()) { - result.emplace_back(root.name, *root.primitive_type); - } - return result; - } - for (const auto& child : root.children) { - flatten_leaf_types_impl(*child, "", result); - } - return result; -} - -} // namespace - -struct TypeRegistry::Impl { - PJ::SchemaId next_id = 1; - tsl::robin_map> schemas; - tsl::robin_map name_to_id; -}; - -TypeRegistry::TypeRegistry() : impl_(std::make_unique()) {} -TypeRegistry::~TypeRegistry() = default; -TypeRegistry::TypeRegistry(TypeRegistry&&) noexcept = default; -TypeRegistry& TypeRegistry::operator=(TypeRegistry&&) noexcept = default; - -PJ::Expected TypeRegistry::registerSchema( - std::string schema_name, std::shared_ptr type_tree) { - if (impl_->name_to_id.contains(schema_name)) { - return PJ::unexpected(fmt::format("Schema '{}' already registered", schema_name)); - } - PJ::SchemaId id = impl_->next_id++; - impl_->name_to_id.emplace(schema_name, id); - impl_->schemas.emplace(id, std::move(type_tree)); - return id; -} - -PJ::Expected TypeRegistry::registerOrGet( - std::string schema_name, std::shared_ptr type_tree) { - auto it = impl_->name_to_id.find(schema_name); - if (it != impl_->name_to_id.end()) { - return it->second; - } - return registerSchema(std::move(schema_name), std::move(type_tree)); -} - -const PJ::TypeTreeNode* TypeRegistry::lookup(PJ::SchemaId id) const { - auto it = impl_->schemas.find(id); - if (it == impl_->schemas.end()) { - return nullptr; - } - return it->second.get(); -} - -std::optional TypeRegistry::findByName(std::string_view name) const { - auto it = impl_->name_to_id.find(std::string(name)); - if (it == impl_->name_to_id.end()) { - return std::nullopt; - } - return it->second; -} - -PJ::Status TypeRegistry::evolveSchema(PJ::SchemaId id, std::shared_ptr updated_tree) { - auto it = impl_->schemas.find(id); - if (it == impl_->schemas.end()) { - return PJ::unexpected(fmt::format("Schema ID {} not found", id)); - } - - const auto& old_tree = it->second; - auto old_leaves = flatten_leaf_types(*old_tree); - auto new_leaves = flatten_leaf_types(*updated_tree); - - // Build a map from path -> PrimitiveType for the new tree - tsl::robin_map new_leaf_map; - new_leaf_map.reserve(new_leaves.size()); - for (auto& [path, ptype] : new_leaves) { - new_leaf_map.emplace(std::move(path), ptype); - } - - // Every old leaf must exist in the new tree with the same type - for (const auto& [old_path, old_type] : old_leaves) { - auto new_it = new_leaf_map.find(old_path); - if (new_it == new_leaf_map.end()) { - return PJ::unexpected(fmt::format("Field '{}' was removed in the updated schema", old_path)); - } - if (new_it->second != old_type) { - return PJ::unexpected(fmt::format("Field '{}' changed type in the updated schema", old_path)); - } - } - - // Validation passed — replace with updated tree - it.value() = std::move(updated_tree); - return PJ::okStatus(); -} - -} // namespace PJ diff --git a/pj_datastore/src/writer.cpp b/pj_datastore/src/writer.cpp deleted file mode 100644 index b42ee91f..00000000 --- a/pj_datastore/src/writer.cpp +++ /dev/null @@ -1,804 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/writer.hpp" - -#include -#include - -#include -#include -#include -#include -#include -#include -#include - -#include "pj_base/assert.hpp" -#include "pj_base/expected.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/type_registry.hpp" - -namespace PJ { - -// --------------------------------------------------------------------------- -// Impl definition -// --------------------------------------------------------------------------- - -struct DataWriter::Impl { - explicit Impl(DataEngine& eng) : engine(eng) {} - DataEngine& engine; - tsl::robin_map builders; - tsl::robin_map> pending_chunks; - tsl::robin_map> topic_columns; -}; - -// --------------------------------------------------------------------------- -// ColumnData methods -// --------------------------------------------------------------------------- - -namespace { - -template -struct overloaded : Ts... { - using Ts::operator()...; -}; -template -overloaded(Ts...) -> overloaded; - -} // namespace - -std::size_t ColumnData::rowCount() const { - return std::visit( - overloaded{ - [](const StringData& s) -> std::size_t { return s.offsets.empty() ? 0 : s.offsets.size() - 1; }, - [](const auto& span) -> std::size_t { return span.size(); }, - }, - data); -} - -StorageKind ColumnData::kind() const { - static constexpr StorageKind kinds[] = { - StorageKind::kFloat32, StorageKind::kFloat64, StorageKind::kInt32, StorageKind::kInt64, - StorageKind::kUint64, StorageKind::kBool, StorageKind::kString, - }; - return kinds[data.index()]; -} - -namespace { - -/// Map NumericType to PrimitiveType (same enum values, different enum types). -constexpr PrimitiveType numeric_to_primitive(NumericType nt) noexcept { - switch (nt) { - case NumericType::kFloat32: - return PrimitiveType::kFloat32; - case NumericType::kFloat64: - return PrimitiveType::kFloat64; - case NumericType::kInt8: - return PrimitiveType::kInt8; - case NumericType::kInt16: - return PrimitiveType::kInt16; - case NumericType::kInt32: - return PrimitiveType::kInt32; - case NumericType::kInt64: - return PrimitiveType::kInt64; - case NumericType::kUint8: - return PrimitiveType::kUint8; - case NumericType::kUint16: - return PrimitiveType::kUint16; - case NumericType::kUint32: - return PrimitiveType::kUint32; - case NumericType::kUint64: - return PrimitiveType::kUint64; - } - return PrimitiveType::kFloat64; // unreachable -} - -// Forward declarations — flatten_columns_impl and flatten_array_element_impl are mutually recursive. -void flatten_columns_impl( - const TypeTreeNode& node, std::string_view prefix, FieldId& next_field_id, std::vector& out); - -// Expand one array element (at path `element_path`, e.g., "poses[0]") into ColumnDescriptors. -// Handles: struct element (recurse into children), primitive/enum element (single column). -void flatten_array_element_impl( - const TypeTreeNode& element_type, std::string_view element_path, FieldId& next_field_id, - std::vector& out) { - if (element_type.kind == TypeKind::kStruct) { - for (const auto& child : element_type.children) { - flatten_columns_impl(*child, element_path, next_field_id, out); - } - } else { - ColumnDescriptor desc; - desc.field_id = next_field_id++; - desc.field_path = std::string(element_path); - desc.logical_type = element_type.primitive_type.value_or(PrimitiveType::kFloat64); - out.push_back(std::move(desc)); - } -} - -/// Recursively flatten a type tree into ColumnDescriptors, collecting both -/// field paths and PrimitiveTypes for each leaf node. -void flatten_columns_impl( - const TypeTreeNode& node, std::string_view prefix, FieldId& next_field_id, std::vector& out) { - std::string current_path = prefix.empty() ? node.name : fmt::format("{}.{}", prefix, node.name); - - if (node.kind == TypeKind::kStruct) { - for (const auto& child : node.children) { - flatten_columns_impl(*child, current_path, next_field_id, out); - } - return; - } - - if (node.kind == TypeKind::kArray) { - if (node.fixed_array_size.has_value()) { - // Fixed-size: expand all elements now at schema registration time - for (uint32_t i = 0; i < *node.fixed_array_size; ++i) { - std::string elem_path = fmt::format("{}[{}]", current_path, i); - flatten_array_element_impl(*node.element_type, elem_path, next_field_id, out); - } - } - // Variable-length: 0 columns initially — caller uses expandArray() to grow dynamically - return; - } - - // Leaf node (primitive or enum) -- produce a column descriptor - ColumnDescriptor desc; - desc.field_id = next_field_id++; - desc.field_path = std::move(current_path); - desc.logical_type = node.primitive_type.value_or(PrimitiveType::kFloat64); - out.push_back(std::move(desc)); -} - -// Find a TypeTreeNode child by dotted path relative to root's children. -// E.g., find_child_at_path(root, "body.poses") returns the "poses" node inside "body". -// Returns nullptr if any segment is not found or path passes through a non-struct. -const TypeTreeNode* find_child_at_path(const TypeTreeNode& root, std::string_view path) { - std::string_view remaining = path; - const TypeTreeNode* cur = &root; - while (!remaining.empty()) { - size_t dot = remaining.find('.'); - std::string_view segment = remaining.substr(0, dot); - remaining = (dot == std::string_view::npos) ? std::string_view{} : remaining.substr(dot + 1); - if (cur->kind != TypeKind::kStruct) { - return nullptr; - } - const TypeTreeNode* found = nullptr; - for (const auto& child : cur->children) { - if (child->name == segment) { - found = child.get(); - break; - } - } - if (!found) { - return nullptr; - } - cur = found; - } - return cur; -} - -} // namespace - -// --------------------------------------------------------------------------- -// Construction / destruction / move -// --------------------------------------------------------------------------- - -DataWriter::DataWriter(DataEngine& engine) : impl_(std::make_unique(engine)) {} - -DataWriter::~DataWriter() = default; -DataWriter::DataWriter(DataWriter&&) noexcept = default; -DataWriter& DataWriter::operator=(DataWriter&&) noexcept = default; - -// --------------------------------------------------------------------------- -// Schema registration -// --------------------------------------------------------------------------- - -Expected DataWriter::registerSchema(std::string schema_name, std::shared_ptr type_tree) { - return impl_->engine.typeRegistry().registerSchema(std::move(schema_name), std::move(type_tree)); -} - -// --------------------------------------------------------------------------- -// Topic registration -// --------------------------------------------------------------------------- - -Expected DataWriter::registerTopic(DatasetId dataset_id, TopicDescriptor descriptor) { - return impl_->engine.createTopic(dataset_id, std::move(descriptor)); -} - -// --------------------------------------------------------------------------- -// Bind for fast-path access -// --------------------------------------------------------------------------- - -Expected DataWriter::bindTopicWriter(TopicId topic_id) { - const auto* storage = impl_->engine.getTopicStorage(topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", topic_id)); - } - - // Ensure column descriptors are cached - auto& builder = getOrCreateBuilder(topic_id); - (void)builder; // we just need the side effect of caching columns - - const auto& columns = impl_->topic_columns.at(topic_id); - TopicWriteHandle handle; - handle.topic_id = topic_id; - handle.field_ids.reserve(columns.size()); - for (const auto& col : columns) { - handle.field_ids.push_back(col.field_id); - } - return handle; -} - -// --------------------------------------------------------------------------- -// Field resolution -// --------------------------------------------------------------------------- - -Expected DataWriter::resolveField(TopicId topic_id, std::string_view field_path) { - // Ensure columns are cached by getting or creating the builder - auto& builder = getOrCreateBuilder(topic_id); - (void)builder; - - auto col_it = impl_->topic_columns.find(topic_id); - if (col_it == impl_->topic_columns.end()) { - return PJ::unexpected(fmt::format("Topic {} not found", topic_id)); - } - - for (const auto& col : col_it->second) { - if (col.field_path == field_path) { - return col.field_id; - } - } - return PJ::unexpected(fmt::format("Field '{}' not found in topic {}", field_path, topic_id)); -} - -// --------------------------------------------------------------------------- -// Row-at-a-time append -// --------------------------------------------------------------------------- - -PJ::Status DataWriter::beginRow(TopicId topic_id, Timestamp t) { - auto* storage = impl_->engine.getTopicStorage(topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", topic_id)); - } - auto& builder = getOrCreateBuilder(topic_id); - if (builder.rowCount() > 0 && t < builder.lastTimestamp()) { - return PJ::unexpected(fmt::format("Out-of-order timestamp: t={} < last_timestamp={}", t, builder.lastTimestamp())); - } - builder.beginRow(t); - return PJ::okStatus(); -} - -PJ::Status DataWriter::finishRow(PJ::TopicId topic_id) { - auto it = impl_->builders.find(topic_id); - if (it == impl_->builders.end()) { - return PJ::unexpected(fmt::format("finish_row: no active row for topic {}", topic_id)); - } - it.value().finishRow(); - - if (it->second.isFull()) { - autoSeal(topic_id); - } - return PJ::okStatus(); -} - -// --------------------------------------------------------------------------- -// Set values — templatized -// --------------------------------------------------------------------------- - -template -void DataWriter::set(TopicId topic_id, std::size_t col_index, T value) { - auto it = impl_->builders.find(topic_id); - PJ_ASSERT(it != impl_->builders.end(), "set: no builder for topic"); - if (it != impl_->builders.end()) { - it.value().set(col_index, value); - } -} - -template void DataWriter::set(TopicId, std::size_t, float); -template void DataWriter::set(TopicId, std::size_t, double); -template void DataWriter::set(TopicId, std::size_t, int32_t); -template void DataWriter::set(TopicId, std::size_t, int64_t); -template void DataWriter::set(TopicId, std::size_t, uint64_t); -template void DataWriter::set(TopicId, std::size_t, bool); -template void DataWriter::set(TopicId, std::size_t, std::string_view); - -void DataWriter::setNull(TopicId topic_id, std::size_t col_index) { - auto it = impl_->builders.find(topic_id); - PJ_ASSERT(it != impl_->builders.end(), "set_null: no builder for topic"); - if (it != impl_->builders.end()) { - it.value().setNull(col_index); - } -} - -// --------------------------------------------------------------------------- -// Bulk column append -// --------------------------------------------------------------------------- - -namespace { - -void append_single_column_to_builder( - TopicChunkBuilder& builder, const ColumnData& col, std::size_t offset, std::size_t batch_size) { - std::visit( - overloaded{ - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](Span d) { builder.appendColumn(col.col_index, d.subspan(offset, batch_size)); }, - [&](const ColumnData::StringData& s) { - builder.appendColumnStrings(col.col_index, s.offsets.subspan(offset, batch_size + 1), s.values); - }, - }, - col.data); - - // Apply validity bitmap if present - if (!col.validity.empty()) { - builder.appendColumnValidity(col.col_index, col.validity.subspan(offset, batch_size)); - } -} - -} // namespace - -PJ::Status DataWriter::appendColumns( - TopicId topic_id, Span timestamps, Span columns) { - auto* storage = impl_->engine.getTopicStorage(topic_id); - if (storage == nullptr) { - return PJ::unexpected(fmt::format("Topic {} not found", topic_id)); - } - - // Validate all column row counts match timestamp count - for (const auto& col : columns) { - const std::size_t n = col.rowCount(); - if (n != timestamps.size()) { - return PJ::unexpected( - fmt::format("Column {} has {} rows but {} timestamps provided", col.col_index, n, timestamps.size())); - } - - if (!col.validity.empty()) { - if (col.validity.bit_length != n) { - return PJ::unexpected(fmt::format("Column {} validity bit_length mismatch", col.col_index)); - } - const std::size_t available_bits = col.validity.bytes.size() * 8; - if (col.validity.bit_offset + col.validity.bit_length > available_bits) { - return PJ::unexpected(fmt::format("Column {} validity range out of bounds", col.col_index)); - } - } - } - - if (timestamps.empty()) { - return PJ::okStatus(); - } - - // Validate timestamp ordering - auto& builder = getOrCreateBuilder(topic_id); - if (builder.rowCount() > 0 && timestamps[0] < builder.lastTimestamp()) { - return PJ::unexpected( - fmt::format("Out-of-order timestamp: t={} < last_timestamp={}", timestamps[0], builder.lastTimestamp())); - } - - std::size_t offset = 0; - const std::size_t total = timestamps.size(); - - while (offset < total) { - auto& b = getOrCreateBuilder(topic_id); - const std::size_t batch_size = std::min(total - offset, static_cast(b.remainingCapacity())); - - b.appendTimestamps(timestamps.subspan(offset, batch_size)); - for (const auto& col : columns) { - append_single_column_to_builder(b, col, offset, batch_size); - } - b.finishBulkAppend(); - - if (b.isFull()) { - autoSeal(topic_id); - } - - offset += batch_size; - } - - return PJ::okStatus(); -} - -// --------------------------------------------------------------------------- -// Scalar convenience API -// --------------------------------------------------------------------------- - -Expected DataWriter::registerScalarSeries( - DatasetId dataset_id, std::string_view topic_name, NumericType value_type) { - // Create a topic descriptor for a scalar series (schema_id = 0) - TopicDescriptor desc; - desc.name = std::string(topic_name); - desc.schema_id = 0; - - auto topic_id_or = impl_->engine.createTopic(dataset_id, std::move(desc)); - if (!topic_id_or.has_value()) { - return PJ::unexpected(topic_id_or.error()); - } - TopicId topic_id = *topic_id_or; - - // Build a single column descriptor for the "value" field - ColumnDescriptor col_desc; - col_desc.field_id = 0; - col_desc.logical_type = numeric_to_primitive(value_type); - col_desc.field_path = "value"; - - std::vector columns; - columns.push_back(std::move(col_desc)); - impl_->topic_columns[topic_id] = columns; - - // Persist the column layout in TopicStorage so fresh writers and the derived - // engine can resolve it without requiring a committed (sealed) chunk. - if (auto* storage = impl_->engine.getTopicStorage(topic_id)) { - storage->setColumnDescriptors(std::move(columns)); - } - - ScalarSeriesHandle handle{topic_id, 0}; - return handle; -} - -void DataWriter::appendScalar(const ScalarSeriesHandle& handle, Timestamp t, NumericValue value) { - auto& builder = getOrCreateBuilder(handle.topic_id); - PJ_ASSERT(builder.rowCount() == 0 || t >= builder.lastTimestamp(), "append_scalar: out-of-order timestamp"); - builder.beginRow(t); - - const auto col = static_cast(handle.value_field); - std::visit( - [&builder, col](const auto& v) { - using T = std::decay_t; - if constexpr (std::is_same_v) { - builder.set(col, v); - } else if constexpr (std::is_same_v) { - builder.set(col, v); - } else if constexpr (std::is_same_v) { - builder.set(col, v); - } else if constexpr (std::is_same_v || std::is_same_v || std::is_same_v) { - builder.set(col, static_cast(v)); - } else if constexpr ( - std::is_same_v || std::is_same_v || std::is_same_v || - std::is_same_v) { - builder.set(col, static_cast(v)); - } - }, - value); - - builder.finishRow(); - - if (builder.isFull()) { - autoSeal(handle.topic_id); - } -} - -// --------------------------------------------------------------------------- -// Flush -// --------------------------------------------------------------------------- - -std::vector DataWriter::flush(TopicId topic_id) { - std::vector result; - - // Collect any pending (auto-sealed) chunks - auto pending_it = impl_->pending_chunks.find(topic_id); - if (pending_it != impl_->pending_chunks.end()) { - result = std::move(pending_it.value()); - impl_->pending_chunks.erase(pending_it); - } - - // Seal the current builder if it has rows - auto builder_it = impl_->builders.find(topic_id); - if (builder_it != impl_->builders.end() && builder_it->second.rowCount() > 0) { - result.push_back(builder_it.value().seal()); - impl_->builders.erase(builder_it); - } - - return result; -} - -std::vector> DataWriter::flushAll() { - std::vector> result; - - // Collect all pending chunks - for (auto it = impl_->pending_chunks.begin(); it != impl_->pending_chunks.end(); ++it) { - for (auto& chunk : it.value()) { - result.emplace_back(it->first, std::move(chunk)); - } - } - impl_->pending_chunks.clear(); - - // Seal all non-empty builders - // Collect topic IDs first to avoid modifying map during iteration - std::vector builder_ids; - builder_ids.reserve(impl_->builders.size()); - for (auto it = impl_->builders.begin(); it != impl_->builders.end(); ++it) { - if (it->second.rowCount() > 0) { - builder_ids.push_back(it->first); - } - } - for (TopicId topic_id : builder_ids) { - auto it = impl_->builders.find(topic_id); - if (it != impl_->builders.end()) { - result.emplace_back(topic_id, it.value().seal()); - impl_->builders.erase(it); - } - } - - return result; -} - -// --------------------------------------------------------------------------- -// Dynamic column addition -// --------------------------------------------------------------------------- - -Expected DataWriter::ensureColumn(TopicId topic_id, std::string_view field_path, PrimitiveType type) { - auto* storage = impl_->engine.getTopicStorage(topic_id); - if (!storage) { - return PJ::unexpected(fmt::format("ensure_column: topic {} not found", topic_id)); - } - - ensureColsLoaded(topic_id, *storage); - auto& cols = impl_->topic_columns[topic_id]; - - // No-op: column already exists — return existing field id. - // Type mismatch is an error: caller must not re-register with a different type. - for (const auto& col : cols) { - if (col.field_path == field_path) { - if (col.logical_type != type) { - return PJ::unexpected( - fmt::format("ensure_column: field '{}' already exists with a different type", field_path)); - } - return col.field_id; - } - } - - // Guard: no row in progress - auto builder_it = impl_->builders.find(topic_id); - if (builder_it != impl_->builders.end() && builder_it->second.isRowInProgress()) { - return PJ::unexpected( - fmt::format( - "ensure_column: topic {} has a row in progress; call finishRow() before adding new columns", topic_id)); - } - - // Seal the current builder (if any) before changing the column layout. - sealBeforeLayoutChange(topic_id); - - // Append new column (field ids are always dense starting at 0 — assert the invariant) - PJ_ASSERT( - cols.empty() || cols.back().field_id == static_cast(cols.size() - 1), - "ensure_column: field_id invariant broken — non-dense column ids detected"); - FieldId new_id = static_cast(cols.size()); - ColumnDescriptor desc; - desc.field_id = new_id; - desc.logical_type = type; - desc.field_path = std::string(field_path); - cols.push_back(std::move(desc)); - - storage->setColumnDescriptors(cols); - return new_id; -} - -// --------------------------------------------------------------------------- -// Variable-length array expansion -// --------------------------------------------------------------------------- - -PJ::Expected DataWriter::expandArray( - PJ::TopicId topic_id, std::string_view array_field_path, uint32_t new_length, PJ::PrimitiveType element_type) { - // Validate topic exists - TopicStorage* storage = impl_->engine.getTopicStorage(topic_id); - if (!storage) { - return PJ::unexpected(fmt::format("expand_array: topic {} not found", topic_id)); - } - - // Track the largest observed array length for metadata - storage->updateMaxObservedArrayLength(new_length); - - // Read authoritative expansion count from TopicStorage — shared across all DataWriter instances. - std::string path_key(array_field_path); - const uint32_t current = storage->arrayExpansionCount(path_key); - - // Fast no-op - if (new_length <= current) { - return current; - } - - // Get type tree — may be null for schemaless topics (schema_id == 0). - SchemaId schema_id = storage->descriptor().schema_id; - const TypeTreeNode* type_tree = impl_->engine.typeRegistry().lookup(schema_id); - - // Typed topics: validate the array field against the schema before touching any state. - const TypeTreeNode* array_node = nullptr; - if (type_tree) { - array_node = find_child_at_path(*type_tree, array_field_path); - if (!array_node) { - return PJ::unexpected(fmt::format("expand_array: field '{}' not found in schema", array_field_path)); - } - if (array_node->kind != TypeKind::kArray) { - return PJ::unexpected(fmt::format("expand_array: field '{}' is not an array node", array_field_path)); - } - if (array_node->fixed_array_size.has_value()) { - return PJ::unexpected( - fmt::format("expand_array: field '{}' is fixed-size; use schema declaration", array_field_path)); - } - } - - // Apply expansion limit (clamp and record truncation) — common to both paths. - uint32_t limit = storage->descriptor().array_expansion_limit; - uint32_t actual = std::min(new_length, limit); - if (new_length > limit) { - storage->incrementTruncatedSampleCount(); - } - if (actual <= current) { - return current; - } - - // Reject expansion if a row is currently in progress (between begin_row and finish_row). - auto builder_it = impl_->builders.find(topic_id); - if (builder_it != impl_->builders.end() && builder_it->second.isRowInProgress()) { - return PJ::unexpected( - fmt::format( - "expand_array: topic {}" - " has a row in progress; call finishRow() or abandon the row before calling expandArray()", - topic_id)); - } - - // Seal and stage the current builder (if any) before changing the column layout. - sealBeforeLayoutChange(topic_id); - - // Load current column descriptor list for this topic. - ensureColsLoaded(topic_id, *storage); - auto& cols = impl_->topic_columns[topic_id]; - - if (!type_tree) { - // Schemaless path: any field path is accepted; use element_type for new columns. - PJ_ASSERT( - cols.empty() || cols.back().field_id == static_cast(cols.size() - 1), - "expand_array: field_id invariant broken — non-dense column ids detected"); - FieldId next_field_id = static_cast(cols.size()); - for (uint32_t i = current; i < actual; ++i) { - std::string elem_path = fmt::format("{}[{}]", array_field_path, i); - // Idempotent: skip if already present (e.g. added via ensure_column) - bool already_exists = false; - for (const auto& col : cols) { - if (col.field_path == elem_path) { - already_exists = true; - break; - } - } - if (!already_exists) { - ColumnDescriptor desc; - desc.field_id = next_field_id++; - desc.logical_type = element_type; - desc.field_path = std::move(elem_path); - cols.push_back(std::move(desc)); - } - } - } else { - // Typed path: generate columns from the schema element type. - // Use a per-index existence check (same as schemaless) so that columns - // manually added via ensure_column are not duplicated. - PJ_ASSERT( - cols.empty() || cols.back().field_id == static_cast(cols.size() - 1), - "expand_array: field_id invariant broken — non-dense column ids detected"); - FieldId next_field_id = static_cast(cols.size()); - for (uint32_t i = current; i < actual; ++i) { - std::string elem_prefix = fmt::format("{}[{}]", array_field_path, i); - // Check for both exact match (primitive element) and prefix match (struct element fields). - bool already_exists = false; - for (const auto& col : cols) { - if (col.field_path == elem_prefix || col.field_path.starts_with(elem_prefix + ".")) { - already_exists = true; - break; - } - } - if (!already_exists) { - flatten_array_element_impl(*array_node->element_type, elem_prefix, next_field_id, cols); - } - } - } - - // Persist updated layout and expansion count in TopicStorage - storage->setColumnDescriptors(cols); - storage->setArrayExpansionCount(path_key, actual); - - return actual; -} - -// --------------------------------------------------------------------------- -// Private helpers -// --------------------------------------------------------------------------- - -void DataWriter::ensureColsLoaded(TopicId topic_id, const TopicStorage& storage) { - auto& cols = impl_->topic_columns[topic_id]; - if (!cols.empty()) { - return; - } - // Always prefer the layout persisted in TopicStorage when it is non-empty. - // expandArray() / ensureColumn() call storage->setColumnDescriptors() to record the - // current (potentially grown) column layout. A second DataWriter created - // after an expansion must see the expanded layout, not a stale rebuild. - const auto& stored = storage.columnDescriptors(); - if (!stored.empty()) { - cols = stored; - return; - } - const auto* type_tree = impl_->engine.typeRegistry().lookup(storage.descriptor().schema_id); - if (type_tree) { - cols = buildColumnDescriptors(*type_tree); - return; - } - // schema_id==0 with no stored layout: fall back to first committed chunk. - const auto& chunks = storage.sealedChunks(); - if (!chunks.empty()) { - cols.reserve(chunks[0].columns.size()); - for (const auto& col : chunks[0].columns) { - cols.push_back(*col.descriptor); - } - } - // else: stays empty — valid for brand-new schemaless topic -} - -TopicChunkBuilder& DataWriter::getOrCreateBuilder(TopicId topic_id) { - auto it = impl_->builders.find(topic_id); - if (it != impl_->builders.end()) { - return it.value(); - } - - const auto* storage = impl_->engine.getTopicStorage(topic_id); - PJ_ASSERT(storage != nullptr, "get_or_create_builder: topic storage not found"); - - const auto& desc = storage->descriptor(); - uint32_t max_rows = desc.max_chunk_rows; - - ensureColsLoaded(topic_id, *storage); - auto col_it = impl_->topic_columns.find(topic_id); - - auto [insert_it, inserted] = impl_->builders.emplace( - std::piecewise_construct, std::forward_as_tuple(topic_id), - std::forward_as_tuple(topic_id, desc.schema_id, col_it->second, max_rows)); - - return insert_it.value(); -} - -std::vector DataWriter::buildColumnDescriptors(const TypeTreeNode& root) { - std::vector result; - FieldId next_id = 0; - - if (root.kind != TypeKind::kStruct) { - ColumnDescriptor desc; - desc.field_id = next_id++; - desc.field_path = root.name; - desc.logical_type = root.primitive_type.value_or(PrimitiveType::kFloat64); - result.push_back(std::move(desc)); - return result; - } - - for (const auto& child : root.children) { - flatten_columns_impl(*child, "", next_id, result); - } - return result; -} - -void DataWriter::autoSeal(TopicId topic_id) { - auto it = impl_->builders.find(topic_id); - if (it == impl_->builders.end()) { - return; - } - impl_->pending_chunks[topic_id].push_back(it.value().seal()); - impl_->builders.erase(it); -} - -void DataWriter::sealBeforeLayoutChange(TopicId topic_id) { - auto it = impl_->builders.find(topic_id); - if (it == impl_->builders.end()) { - return; - } - if (it->second.rowCount() > 0) { - impl_->pending_chunks[topic_id].push_back(it.value().seal()); - } - impl_->builders.erase(it); -} - -} // namespace PJ diff --git a/pj_datastore/tests/array_expansion_test.cpp b/pj_datastore/tests/array_expansion_test.cpp deleted file mode 100644 index 15dd185c..00000000 --- a/pj_datastore/tests/array_expansion_test.cpp +++ /dev/null @@ -1,1276 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include "pj_base/dataset.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/reader.hpp" -#include "pj_datastore/writer.hpp" - -using namespace PJ; - -namespace { - -// ───────────────────────────────────────────────────────────────── -// Task 1: flatten_columns_impl for fixed-size and variable arrays -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, FixedSizeArray_Primitive_ProducesNColumns) { - // Schema: struct msg { float32[3] accel } - // Expected: 3 columns named accel[0], accel[1], accel[2] - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto accel = PJ::makeArray("accel", PJ::makePrimitive("", PJ::PrimitiveType::kFloat32), 3u); - auto root = PJ::makeStruct("msg", {accel}); - auto sid = *writer.registerSchema("msg_fixed", root); - - TopicDescriptor desc; - desc.name = "imu"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 3u); - - EXPECT_EQ(*writer.resolveField(topic_id, "accel[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "accel[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "accel[2]"), 2u); -} - -TEST(ArrayExpansionTest, FixedSizeArray_StructElement_ProducesNxMColumns) { - // Schema: struct msg { Pose[2] poses } where Pose = struct { float32 x, y, z } - // Expected: 6 columns: poses[0].x, poses[0].y, poses[0].z, poses[1].x, poses[1].y, poses[1].z - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - // Element type: anonymous struct with x, y, z - auto pose_elem = PJ::makeStruct( - "", { - PJ::makePrimitive("x", PJ::PrimitiveType::kFloat32), - PJ::makePrimitive("y", PJ::PrimitiveType::kFloat32), - PJ::makePrimitive("z", PJ::PrimitiveType::kFloat32), - }); - auto poses = PJ::makeArray("poses", pose_elem, 2u); - auto root = PJ::makeStruct("msg", {poses}); - auto sid = *writer.registerSchema("msg_struct_arr", root); - - TopicDescriptor desc; - desc.name = "poses_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 6u); - - EXPECT_EQ(*writer.resolveField(topic_id, "poses[0].x"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[0].y"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[0].z"), 2u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[1].x"), 3u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[1].y"), 4u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[1].z"), 5u); -} - -TEST(ArrayExpansionTest, VarLenArray_InitiallyZeroColumns) { - // Schema: struct msg { float64[] data } - // Variable-length: no fixed_size → 0 columns until expandArray() is called - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("msg_varlen", root); - - TopicDescriptor desc; - desc.name = "varlen_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - auto handle = *writer.bindTopicWriter(topic_id); - EXPECT_EQ(handle.field_ids.size(), 0u); // 0 columns initially -} - -TEST(ArrayExpansionTest, FixedSizeArray_WriteAndRead) { - // Schema: struct msg { float32[3] accel } - // Write 4 rows. Read back all values via range_query. - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto accel = PJ::makeArray("accel", PJ::makePrimitive("", PJ::PrimitiveType::kFloat32), 3u); - auto root = PJ::makeStruct("msg", {accel}); - auto sid = *writer.registerSchema("msg_wr", root); - - TopicDescriptor desc; - desc.name = "imu_wr"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - for (int i = 0; i < 4; ++i) { - ASSERT_TRUE(writer.beginRow(topic_id, PJ::Timestamp(i) * 1000).has_value()); - writer.set(topic_id, 0, static_cast(i) * 1.0f); - writer.set(topic_id, 1, static_cast(i) * 2.0f); - writer.set(topic_id, 2, static_cast(i) * 3.0f); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 4000}); - std::size_t count = 0; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), count * 1.0, 1e-4); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), count * 2.0, 1e-4); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), count * 3.0, 1e-4); - ++count; - }); - EXPECT_EQ(count, 4u); -} - -// ───────────────────────────────────────────────────────────────── -// Task 2: DataWriter::expand_array -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, VarLenArray_ExpandArray_AddsColumns) { - // expandArray("data", 3) on a float64[] topic → 3 columns: data[0], data[1], data[2] - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("exp_test", root); - - TopicDescriptor desc; - desc.name = "exp_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Initially 0 columns - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 0u); - - // Expand to 3 - auto result = writer.expandArray(topic_id, "data", 3u); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 3u); - - // Now 3 columns - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 3u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[2]"), 2u); -} - -TEST(ArrayExpansionTest, VarLenArray_ExpandIsIdempotent_ShrinkIsNoop) { - // expand(3) then expand(2) → no-op; still 3 columns - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("idem_test", root); - TopicDescriptor desc; - desc.name = "idem_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 3u).has_value()); - - // Try to shrink: must be no-op, return current count (3) - auto result = writer.expandArray(topic_id, "data", 2u); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(*result, 3u); - - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 3u); -} - -TEST(ArrayExpansionTest, VarLenArray_WriteAndRead_BasicValues) { - // expand(3), write 4 rows with all 3 elements, read back values - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("varlen_wr", root); - TopicDescriptor desc; - desc.name = "varlen_wr"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 3u).has_value()); - - for (int i = 0; i < 4; ++i) { - ASSERT_TRUE(writer.beginRow(topic_id, PJ::Timestamp(i) * 1000).has_value()); - writer.set(topic_id, 0, i * 1.0); - writer.set(topic_id, 1, i * 2.0); - writer.set(topic_id, 2, i * 3.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 4000}); - std::size_t count = 0; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), count * 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), count * 2.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), count * 3.0, 1e-9); - ++count; - }); - EXPECT_EQ(count, 4u); -} - -TEST(ArrayExpansionTest, VarLenArray_ExpandUnknownTopic_ReturnsError) { - DataEngine engine; - DataWriter writer = engine.createWriter(); - auto result = writer.expandArray(/*topic_id=*/999u, "data", 3u); - EXPECT_FALSE(result.has_value()); -} - -TEST(ArrayExpansionTest, VarLenArray_ExpandNonArrayField_ReturnsError) { - // Schema: struct { float64 value } — "value" is not an array - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto root = PJ::makeStruct("msg", {PJ::makePrimitive("value", PJ::PrimitiveType::kFloat64)}); - auto sid = *writer.registerSchema("non_arr", root); - TopicDescriptor desc; - desc.name = "non_arr_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - auto result = writer.expandArray(topic_id, "value", 3u); - EXPECT_FALSE(result.has_value()); -} - -// ───────────────────────────────────────────────────────────────── -// Task 3: array_expansion_limit and metadata tracking -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, ExpansionLimit_ClampsColumns) { - // array_expansion_limit = 4; expand to 10 → actual = 4 - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("limit_test", root); - - TopicDescriptor desc; - desc.name = "limited"; - desc.schema_id = sid; - desc.array_expansion_limit = 4; - auto topic_id = *writer.registerTopic(ds, desc); - - auto result = writer.expandArray(topic_id, "data", 10u); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 4u); // clamped to limit - - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 4u); -} - -TEST(ArrayExpansionTest, MaxObservedArrayLength_TrackedInMetadata) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("obs_test", root); - TopicDescriptor desc; - desc.name = "observed"; - desc.schema_id = sid; - desc.array_expansion_limit = 10; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 3u).has_value()); - - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - EXPECT_EQ(storage->maxObservedArrayLength(), 3u); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 5u).has_value()); - EXPECT_EQ(storage->maxObservedArrayLength(), 5u); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 8u).has_value()); - EXPECT_EQ(storage->maxObservedArrayLength(), 8u); -} - -TEST(ArrayExpansionTest, TruncatedSampleCount_TrackedOnClamping) { - // array_expansion_limit = 3. expand(10) → 1 truncation. expand(2) → no truncation. - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("trunc_test", root); - TopicDescriptor desc; - desc.name = "truncated"; - desc.schema_id = sid; - desc.array_expansion_limit = 3; - auto topic_id = *writer.registerTopic(ds, desc); - - const TopicStorage* storage = engine.getTopicStorage(topic_id); - - // expand to 10 exceeds limit=3 → 1 truncation - ASSERT_TRUE(writer.expandArray(topic_id, "data", 10u).has_value()); - EXPECT_EQ(storage->truncatedSampleCount(), 1u); - - // expand to 2 — no-op (current=3 >= 2); no new truncation - ASSERT_TRUE(writer.expandArray(topic_id, "data", 2u).has_value()); - EXPECT_EQ(storage->truncatedSampleCount(), 1u); - - // expand to 20 — actual=3 (same as current), but still truncated (20 > limit) - ASSERT_TRUE(writer.expandArray(topic_id, "data", 20u).has_value()); - EXPECT_EQ(storage->truncatedSampleCount(), 2u); -} - -TEST(ArrayExpansionTest, Metadata_ExposedViaTopicMetadata) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("meta_test", root); - TopicDescriptor desc; - desc.name = "meta_topic"; - desc.schema_id = sid; - desc.array_expansion_limit = 4; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 10u).has_value()); // clamped to 4; 1 truncation; max_observed=10 - - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - writer.set(topic_id, 2, 3.0); - writer.set(topic_id, 3, 4.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto meta = reader.getMetadata(topic_id); - ASSERT_TRUE(meta.has_value()); - EXPECT_EQ(meta->max_observed_array_length, 10u); - EXPECT_EQ(meta->truncated_sample_count, 1u); -} - -// ───────────────────────────────────────────────────────────────── -// Task 4: Cross-chunk expansion and null auto-fill -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, CrossChunkExpansion_OldChunksHaveFewerColumns) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("cross_chunk", root); - TopicDescriptor desc; - desc.name = "cc"; - desc.schema_id = sid; - desc.max_chunk_rows = 8; - auto topic_id = *writer.registerTopic(ds, desc); - - // Phase 1: expand to 2, write 10 rows (8 auto-sealed + 2 in builder) - ASSERT_TRUE(writer.expandArray(topic_id, "data", 2u).has_value()); - for (int i = 0; i < 10; ++i) { - ASSERT_TRUE(writer.beginRow(topic_id, PJ::Timestamp(i) * 1000).has_value()); - writer.set(topic_id, 0, i * 1.0); - writer.set(topic_id, 1, i * 2.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - - // Phase 2: expand to 4 — seals the in-progress builder (rows 8–9), creates new 4-column layout - ASSERT_TRUE(writer.expandArray(topic_id, "data", 4u).has_value()); - - // Write 5 more rows with all 4 elements - for (int i = 10; i < 15; ++i) { - ASSERT_TRUE(writer.beginRow(topic_id, PJ::Timestamp(i) * 1000).has_value()); - writer.set(topic_id, 0, i * 1.0); - writer.set(topic_id, 1, i * 2.0); - writer.set(topic_id, 2, i * 3.0); - writer.set(topic_id, 3, i * 4.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - engine.commitChunks(writer.flushAll()); - - // Verify chunk column counts - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - for (const auto& chunk : storage->sealedChunks()) { - if (chunk.stats.t_max < PJ::Timestamp(10) * 1000) { - EXPECT_EQ(chunk.columns.size(), 2u) << "pre-expansion chunk must have 2 columns"; - } else { - EXPECT_EQ(chunk.columns.size(), 4u) << "post-expansion chunk must have 4 columns"; - } - } - - // Range query: all 15 rows accessible - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 15000}); - std::size_t count = 0; - cursor.forEach([&](const SampleRow& row) { - int i = static_cast(count); - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), i * 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), i * 2.0, 1e-9); - if (i >= 10) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), i * 3.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(3, row.row_index), i * 4.0, 1e-9); - } - ++count; - }); - EXPECT_EQ(count, 15u); -} - -TEST(ArrayExpansionTest, UnsetElementsAutoNullFilled) { - // expand(4), write rows setting only elements [0] and [1]. - // Elements [2] and [3] must be null (auto-filled by finish_row). - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("partial_set", root); - TopicDescriptor desc; - desc.name = "partial"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.expandArray(topic_id, "data", 4u).has_value()); - - // Write row with only first 2 elements; cols 2 and 3 are unset - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - // col 2 and col 3 intentionally not set - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 2.0, 1e-9); - EXPECT_TRUE(row.chunk->isNull(2, row.row_index)); - EXPECT_TRUE(row.chunk->isNull(3, row.row_index)); - visited = true; - }); - EXPECT_TRUE(visited); -} - -TEST(ArrayExpansionTest, VarLenStructArray_ExpandAndWrite) { - // Schema: struct msg { Pose[] poses } where Pose = struct { float32 x, y } - // expandArray("poses", 2) → 4 columns: poses[0].x, poses[0].y, poses[1].x, poses[1].y - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto pose_elem = PJ::makeStruct( - "", { - PJ::makePrimitive("x", PJ::PrimitiveType::kFloat32), - PJ::makePrimitive("y", PJ::PrimitiveType::kFloat32), - }); - auto poses_arr = PJ::makeArray("poses", pose_elem, std::nullopt); - auto root = PJ::makeStruct("msg", {poses_arr}); - auto sid = *writer.registerSchema("struct_arr_var", root); - - TopicDescriptor desc; - desc.name = "poses"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Initially 0 columns - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 0u); - - // Expand to 2 Pose elements → 4 columns - auto result = writer.expandArray(topic_id, "poses", 2u); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 2u); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 4u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[0].x"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[0].y"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[1].x"), 2u); - EXPECT_EQ(*writer.resolveField(topic_id, "poses[1].y"), 3u); - - // Write and read back - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0f); - writer.set(topic_id, 1, 2.0f); - writer.set(topic_id, 2, 3.0f); - writer.set(topic_id, 3, 4.0f); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 1.0, 1e-4); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 2.0, 1e-4); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), 3.0, 1e-4); - EXPECT_NEAR(row.chunk->readNumericAsDouble(3, row.row_index), 4.0, 1e-4); - visited = true; - }); - EXPECT_TRUE(visited); -} - -// ───────────────────────────────────────────────────────────────── -// Bug 1: Second DataWriter for a typed variable-length array topic -// ignores the expanded layout stored in TopicStorage. -// -// Root cause: get_or_create_builder checks `type_tree != nullptr` first. -// For typed topics (schema_id != 0), it always rebuilds from the type tree -// via buildColumnDescriptors(), which yields 0 columns for variable-length -// arrays. The `else` branch that reads storage->columnDescriptors() is -// only reachable when schema_id == 0, so the expanded layout persisted by -// Writer A via setColumnDescriptors() is silently ignored by Writer B. -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, SecondWriter_PicksUpExpandedLayout) { - // Writer A expands data[] to 3 elements and writes one row. - // Writer B is created fresh for the same engine/topic. - // Writer B must see 3 columns — not 0. - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - - // Writer A: register schema + topic, expand and write - DataWriter writerA = engine.createWriter(); - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writerA.registerSchema("second_writer_schema", root); - - TopicDescriptor desc; - desc.name = "second_writer_topic"; - desc.schema_id = sid; - auto topic_id = *writerA.registerTopic(ds, desc); - - auto expand_result = writerA.expandArray(topic_id, "data", 3u); - ASSERT_TRUE(expand_result.has_value()) << expand_result.error(); - ASSERT_EQ(*expand_result, 3u); - - ASSERT_TRUE(writerA.beginRow(topic_id, 1000).has_value()); - writerA.set(topic_id, 0, 10.0); - writerA.set(topic_id, 1, 20.0); - writerA.set(topic_id, 2, 30.0); - ASSERT_TRUE(writerA.finishRow(topic_id).has_value()); - engine.commitChunks(writerA.flushAll()); - - // Writer B: a brand new DataWriter on the same engine/topic. - // It must see the 3-column layout that Writer A established. - DataWriter writerB = engine.createWriter(); - - // Verify column count via bind_topic_writer - auto handle = writerB.bindTopicWriter(topic_id); - ASSERT_TRUE(handle.has_value()) << handle.error(); - // BUG: This will be 0 instead of 3 before the fix. - EXPECT_EQ(handle->field_ids.size(), 3u) << "Writer B must inherit the 3-column layout expanded by Writer A"; - - // Also verify field resolution works correctly - EXPECT_EQ(*writerB.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[2]"), 2u); -} - -// ───────────────────────────────────────────────────────────────── -// Bug 2: expandArray() called while a row is in progress (between -// begin_row and finish_row) originally silently discarded the -// in-progress row. -// -// Root cause: expand_array checked `builder.rowCount() > 0` before -// sealing. rowCount() counts only *completed* rows (incremented in -// finish_row). An active row started with beginRow() but not yet -// finished had rowCount() == 0, so the builder was erased without -// sealing. The incomplete row was permanently lost with no error. -// -// Fixed behavior: expand_array now returns an error when called with -// a row in progress, and the previously completed rows remain intact -// in the pending chunks. -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, ExpandArray_WhileRowInProgress_ReturnsError) { - // Write two complete rows, begin a third row, then call expand_array. - // The expand_array call must fail with an error (row in progress). - // The two previously completed rows must still be accessible after flush. - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("in_progress_schema", root); - - TopicDescriptor desc; - desc.name = "in_progress_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // First expand so we have 2 columns to write. - auto init_expand = writer.expandArray(topic_id, "data", 2u); - ASSERT_TRUE(init_expand.has_value()) << init_expand.error(); - - // Row 1 (completed) - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // Row 2 (completed) - ASSERT_TRUE(writer.beginRow(topic_id, 2000).has_value()); - writer.set(topic_id, 0, 3.0); - writer.set(topic_id, 1, 4.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // Row 3: begin_row but do NOT call finish_row before expand_array. - // expand_array must return an error — it must not silently drop the row. - ASSERT_TRUE(writer.beginRow(topic_id, 3000).has_value()); - writer.set(topic_id, 0, 5.0); - writer.set(topic_id, 1, 6.0); - auto expand_result = writer.expandArray(topic_id, "data", 3u); - EXPECT_FALSE(expand_result.has_value()) - << "expand_array must fail when a row is in progress; got success unexpectedly"; - - // Finish the in-progress row, then flush - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - // All 3 rows must be readable — the failed expand did not corrupt data - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 4000}); - std::size_t count = 0; - cursor.forEach([&](const SampleRow&) { ++count; }); - EXPECT_EQ(count, 3u) << "All 3 rows must survive; the failed expand must not corrupt the builder"; -} - -// ───────────────────────────────────────────────────────────────── -// Bug 3: Cross-writer re-expansion of the same array field can -// duplicate existing indices. -// -// Root cause: expandArray() tracks current expansion in -// DataWriter::expanded_arrays_, which is per-writer state. A fresh -// DataWriter sees the persisted expanded columns via TopicStorage, but -// its expanded_arrays_ entry starts at 0. Expanding "data" from 3 to 5 -// can incorrectly append data[0], data[1], data[2] again, producing -// 8 columns instead of 5. -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, SecondWriter_ReExpandSameField_DoesNotDuplicateColumns) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - - // Writer A creates topic and expands data[] to 3. - DataWriter writerA = engine.createWriter(); - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writerA.registerSchema("reexpand_schema", root); - - TopicDescriptor desc; - desc.name = "reexpand_topic"; - desc.schema_id = sid; - auto topic_id = *writerA.registerTopic(ds, desc); - - auto r1 = writerA.expandArray(topic_id, "data", 3u); - ASSERT_TRUE(r1.has_value()) << r1.error(); - ASSERT_EQ(*r1, 3u); - - // Writer B starts fresh, then expands the same field to 5. - DataWriter writerB = engine.createWriter(); - auto before = writerB.bindTopicWriter(topic_id); - ASSERT_TRUE(before.has_value()) << before.error(); - ASSERT_EQ(before->field_ids.size(), 3u); - - auto r2 = writerB.expandArray(topic_id, "data", 5u); - ASSERT_TRUE(r2.has_value()) << r2.error(); - EXPECT_EQ(*r2, 5u); - - // Expected layout is exactly data[0..4] => 5 columns total. - auto after = writerB.bindTopicWriter(topic_id); - ASSERT_TRUE(after.has_value()) << after.error(); - EXPECT_EQ(after->field_ids.size(), 5u) - << "Re-expanding in a new writer must append only new indices [3..4], not duplicate [0..2]"; - EXPECT_EQ(*writerB.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[2]"), 2u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[3]"), 3u); - EXPECT_EQ(*writerB.resolveField(topic_id, "data[4]"), 4u); -} - -// ───────────────────────────────────────────────────────────────── -// Correctness lock-in: mixed-schema topic (scalar + variable-length -// array) — verify column ordering and field IDs after expansion. -// -// This was not tested in the original suite. It verifies that when a -// schema has pre-existing scalar columns, expand_array appends the new -// array element columns at the correct positions with the correct -// FieldIds, and that the scalar column is still readable after the -// layout change. -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, MixedSchema_ScalarPlusVarLenArray_CorrectColumnOrder) { - // Schema: struct msg { float64 timestamp_sec; float64[] values } - // After expandArray("values", 3): - // col 0 → timestamp_sec (field_id 0) - // col 1 → values[0] (field_id 1) - // col 2 → values[1] (field_id 2) - // col 3 → values[2] (field_id 3) - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto ts_field = PJ::makePrimitive("timestamp_sec", PJ::PrimitiveType::kFloat64); - auto values_arr = PJ::makeArray("values", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {ts_field, values_arr}); - auto sid = *writer.registerSchema("mixed_schema", root); - - TopicDescriptor desc; - desc.name = "mixed_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Before expansion: 1 scalar column only - auto handle_before = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle_before.field_ids.size(), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "timestamp_sec"), 0u); - - // Expand values[] to 3 elements - auto result = writer.expandArray(topic_id, "values", 3u); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 3u); - - // Now 4 columns in total - auto handle_after = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle_after.field_ids.size(), 4u); - - // Verify field path resolution - EXPECT_EQ(*writer.resolveField(topic_id, "timestamp_sec"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "values[0]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "values[1]"), 2u); - EXPECT_EQ(*writer.resolveField(topic_id, "values[2]"), 3u); - - // Write and read back to confirm data integrity - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 100.0); // timestamp_sec - writer.set(topic_id, 1, 1.0); // values[0] - writer.set(topic_id, 2, 2.0); // values[1] - writer.set(topic_id, 3, 3.0); // values[2] - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 100.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), 2.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(3, row.row_index), 3.0, 1e-9); - visited = true; - }); - EXPECT_TRUE(visited); -} - -// ───────────────────────────────────────────────────────────────── -// Correctness lock-in: two independent expansions of different fields -// on the same topic. -// -// Ensures that expand_array for a second distinct field appends -// correctly after the columns of the first expansion, producing the -// right field_id sequence and readable data. -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, TwoDistinctArrayFields_ExpandBoth_CorrectLayout) { - // Schema: struct msg { float64[] positions; float64[] velocities } - // Expand positions to 2, then velocities to 3. - // Expected columns: - // positions[0] (id 0), positions[1] (id 1), - // velocities[0] (id 2), velocities[1] (id 3), velocities[2] (id 4) - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto pos_arr = PJ::makeArray("positions", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto vel_arr = PJ::makeArray("velocities", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {pos_arr, vel_arr}); - auto sid = *writer.registerSchema("two_arrays_schema", root); - - TopicDescriptor desc; - desc.name = "two_arrays_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Initially 0 columns - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 0u); - - // Expand positions to 2 - auto r1 = writer.expandArray(topic_id, "positions", 2u); - ASSERT_TRUE(r1.has_value()) << r1.error(); - EXPECT_EQ(*r1, 2u); - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 2u); - - // Expand velocities to 3 - auto r2 = writer.expandArray(topic_id, "velocities", 3u); - ASSERT_TRUE(r2.has_value()) << r2.error(); - EXPECT_EQ(*r2, 3u); - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 5u); - - // Verify field ordering - EXPECT_EQ(*writer.resolveField(topic_id, "positions[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "positions[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "velocities[0]"), 2u); - EXPECT_EQ(*writer.resolveField(topic_id, "velocities[1]"), 3u); - EXPECT_EQ(*writer.resolveField(topic_id, "velocities[2]"), 4u); - - // Write a row and read it back to confirm no data corruption - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - writer.set(topic_id, 2, 10.0); - writer.set(topic_id, 3, 20.0); - writer.set(topic_id, 4, 30.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 2.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), 10.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(3, row.row_index), 20.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(4, row.row_index), 30.0, 1e-9); - visited = true; - }); - EXPECT_TRUE(visited); -} - -// ───────────────────────────────────────────────────────────────── -// Group A: schemaless ensure_column -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, Schemaless_EnsureColumn_AddsColumn) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ec_add"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - auto fid = writer.ensureColumn(topic_id, "value", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid.has_value()) << fid.error(); - EXPECT_EQ(*fid, 0u); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 1u); - - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 42.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 42.0, 1e-9); - visited = true; - }); - EXPECT_TRUE(visited); -} - -TEST(ArrayExpansionTest, Schemaless_EnsureColumn_Idempotent) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ec_idem"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - auto fid1 = writer.ensureColumn(topic_id, "x", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid1.has_value()); - EXPECT_EQ(*fid1, 0u); - - auto fid2 = writer.ensureColumn(topic_id, "x", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid2.has_value()); - EXPECT_EQ(*fid2, 0u); // same field id — idempotent - - auto handle = *writer.bindTopicWriter(topic_id); - EXPECT_EQ(handle.field_ids.size(), 1u); // no duplicate -} - -TEST(ArrayExpansionTest, Schemaless_EnsureColumn_MultipleColumns_WriteAndRead) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ec_multi"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - EXPECT_EQ(*writer.ensureColumn(topic_id, "x", PJ::PrimitiveType::kFloat64), 0u); - EXPECT_EQ(*writer.ensureColumn(topic_id, "y", PJ::PrimitiveType::kFloat64), 1u); - EXPECT_EQ(*writer.ensureColumn(topic_id, "z", PJ::PrimitiveType::kFloat32), 2u); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 3u); - - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - writer.set(topic_id, 2, 3.0f); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 2.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(2, row.row_index), 3.0, 1e-4); - visited = true; - }); - EXPECT_TRUE(visited); -} - -TEST(ArrayExpansionTest, Schemaless_EnsureColumn_SecondWriter_PicksUpLayout) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writerA = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ec_second_writer"; - desc.schema_id = 0; - auto topic_id = *writerA.registerTopic(ds, desc); - - ASSERT_TRUE(writerA.ensureColumn(topic_id, "v", PJ::PrimitiveType::kFloat64).has_value()); - ASSERT_TRUE(writerA.beginRow(topic_id, 1000).has_value()); - writerA.set(topic_id, 0, 99.0); - ASSERT_TRUE(writerA.finishRow(topic_id).has_value()); - engine.commitChunks(writerA.flushAll()); - - DataWriter writerB = engine.createWriter(); - auto handle = writerB.bindTopicWriter(topic_id); - ASSERT_TRUE(handle.has_value()) << handle.error(); - ASSERT_EQ(handle->field_ids.size(), 1u); - EXPECT_EQ(*writerB.resolveField(topic_id, "v"), 0u); -} - -TEST(ArrayExpansionTest, Schemaless_EnsureColumn_WhileRowInProgress_ReturnsError) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ec_row_progress"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - ASSERT_TRUE(writer.ensureColumn(topic_id, "x", PJ::PrimitiveType::kFloat64).has_value()); - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - - auto result = writer.ensureColumn(topic_id, "y", PJ::PrimitiveType::kFloat64); - EXPECT_FALSE(result.has_value()) << "ensure_column must fail while a row is in progress"; - - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // After finishing the row, ensure_column should auto-seal and succeed. - auto result2 = writer.ensureColumn(topic_id, "y", PJ::PrimitiveType::kFloat64); - EXPECT_TRUE(result2.has_value()) << "ensure_column should auto-seal and succeed after rows"; -} - -// ───────────────────────────────────────────────────────────────── -// Group B: schemaless expand_array -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, Schemaless_ExpandArray_AddsColumns) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ea_add"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - auto result = writer.expandArray(topic_id, "data", 3u, PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 3u); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 3u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[2]"), 2u); -} - -TEST(ArrayExpansionTest, Schemaless_ExpandArray_IncrementalExpansion) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ea_incremental"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - auto r1 = writer.expandArray(topic_id, "data", 2u, PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(r1.has_value()) << r1.error(); - EXPECT_EQ(*r1, 2u); - - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - auto r2 = writer.expandArray(topic_id, "data", 5u, PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(r2.has_value()) << r2.error(); - EXPECT_EQ(*r2, 5u); - - ASSERT_TRUE(writer.beginRow(topic_id, 2000).has_value()); - writer.set(topic_id, 0, 10.0); - writer.set(topic_id, 1, 20.0); - writer.set(topic_id, 2, 30.0); - writer.set(topic_id, 3, 40.0); - writer.set(topic_id, 4, 50.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - ASSERT_GE(storage->sealedChunks().size(), 2u); - EXPECT_EQ(storage->sealedChunks().back().columns.size(), 5u); -} - -TEST(ArrayExpansionTest, Schemaless_ExpandArray_Idempotent_ShrinkIsNoop) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ea_idem"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - EXPECT_EQ(*writer.expandArray(topic_id, "data", 3u, PJ::PrimitiveType::kFloat64), 3u); - EXPECT_EQ(*writer.expandArray(topic_id, "data", 3u, PJ::PrimitiveType::kFloat64), 3u); - EXPECT_EQ(*writer.expandArray(topic_id, "data", 1u, PJ::PrimitiveType::kFloat64), 3u); - - EXPECT_EQ(writer.bindTopicWriter(topic_id)->field_ids.size(), 3u); -} - -TEST(ArrayExpansionTest, Schemaless_ExpandArray_DefaultType_IsFloat64) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "sl_ea_default_type"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - // No element_type arg — default is kFloat64 - auto result = writer.expandArray(topic_id, "data", 2u); - ASSERT_TRUE(result.has_value()) << result.error(); - - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - const auto& cols = storage->columnDescriptors(); - ASSERT_EQ(cols.size(), 2u); - EXPECT_EQ(cols[0].logical_type, PJ::PrimitiveType::kFloat64); - EXPECT_EQ(cols[1].logical_type, PJ::PrimitiveType::kFloat64); -} - -// ───────────────────────────────────────────────────────────────── -// Group C: mixed usage -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, Mixed_EnsureColumnThenExpandArray_NoConflict) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "mixed_ec_ea"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - // Pre-add data[0] via ensure_column - auto fid0 = writer.ensureColumn(topic_id, "data[0]", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid0.has_value()); - EXPECT_EQ(*fid0, 0u); - - // expand_array sees data[0] already exists, adds data[1] and data[2] - auto result = writer.expandArray(topic_id, "data", 3u, PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 3u); - - auto handle = *writer.bindTopicWriter(topic_id); - EXPECT_EQ(handle.field_ids.size(), 3u); // exactly 3, no duplicates - EXPECT_EQ(*writer.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[2]"), 2u); -} - -TEST(ArrayExpansionTest, Mixed_ExpandArrayThenEnsureColumn_NoConflict) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "mixed_ea_ec"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - auto r = writer.expandArray(topic_id, "data", 2u, PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(r.has_value()) << r.error(); - EXPECT_EQ(*r, 2u); - - // data[0] already exists — should be no-op, returns existing field id - auto fid0 = writer.ensureColumn(topic_id, "data[0]", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid0.has_value()); - EXPECT_EQ(*fid0, 0u); - - // extra column not part of array - auto fid_extra = writer.ensureColumn(topic_id, "extra", PJ::PrimitiveType::kFloat32); - ASSERT_TRUE(fid_extra.has_value()); - EXPECT_EQ(*fid_extra, 2u); // new column at index 2 - - auto handle = *writer.bindTopicWriter(topic_id); - EXPECT_EQ(handle.field_ids.size(), 3u); // data[0], data[1], extra -} - -// ───────────────────────────────────────────────────────────────── -// Group C2: regression tests for review findings -// ───────────────────────────────────────────────────────────────── - -// Review finding 1: typed expand_array duplicated element columns when -// ensure_column had already added an array-indexed path. -TEST(ArrayExpansionTest, Typed_EnsureColumnThenExpandArray_NoDuplicateColumns) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto data_arr = PJ::makeArray("data", PJ::makePrimitive("", PJ::PrimitiveType::kFloat64), std::nullopt); - auto root = PJ::makeStruct("msg", {data_arr}); - auto sid = *writer.registerSchema("typed_ec_ea_dedup", root); - - TopicDescriptor desc; - desc.name = "typed_ec_ea_dedup"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Manually add data[0] via ensureColumn (does not update array_expansion_count) - auto fid0 = writer.ensureColumn(topic_id, "data[0]", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid0.has_value()) << fid0.error(); - EXPECT_EQ(*fid0, 0u); - - // expand_array for the same typed field: must NOT duplicate data[0] - auto result = writer.expandArray(topic_id, "data", 3u); - ASSERT_TRUE(result.has_value()) << result.error(); - EXPECT_EQ(*result, 3u); - - auto handle = *writer.bindTopicWriter(topic_id); - EXPECT_EQ(handle.field_ids.size(), 3u) << "Must have exactly 3 columns, no duplicates"; - EXPECT_EQ(*writer.resolveField(topic_id, "data[0]"), 0u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[1]"), 1u); - EXPECT_EQ(*writer.resolveField(topic_id, "data[2]"), 2u); -} - -// Review finding 2: ensure_column must return error when the path already exists -// with a different type instead of silently accepting the mismatch. -TEST(ArrayExpansionTest, EnsureColumn_TypeMismatch_ReturnsError) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - TopicDescriptor desc; - desc.name = "type_mismatch"; - desc.schema_id = 0; - auto topic_id = *writer.registerTopic(ds, desc); - - // Register as float64 - auto fid = writer.ensureColumn(topic_id, "value", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid.has_value()); - EXPECT_EQ(*fid, 0u); - - // Re-register same path with float32: must fail - auto mismatch = writer.ensureColumn(topic_id, "value", PJ::PrimitiveType::kFloat32); - EXPECT_FALSE(mismatch.has_value()) << "Type mismatch must return error, not silent success"; - - // Re-register same path with same type: must succeed (idempotent) - auto same_type = writer.ensureColumn(topic_id, "value", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(same_type.has_value()); - EXPECT_EQ(*same_type, 0u); // same field id -} - -// ───────────────────────────────────────────────────────────────── -// Group D: typed topic with ensure_column -// ───────────────────────────────────────────────────────────────── - -TEST(ArrayExpansionTest, Typed_EnsureColumn_AddsColumnOutsideSchema) { - DataEngine engine; - DatasetId ds = *engine.createDataset(PJ::DatasetDescriptor{.source_name = "t"}); - DataWriter writer = engine.createWriter(); - - auto root = PJ::makeStruct("msg", {PJ::makePrimitive("x", PJ::PrimitiveType::kFloat64)}); - auto sid = *writer.registerSchema("typed_ensure", root); - - TopicDescriptor desc; - desc.name = "typed_ensure_topic"; - desc.schema_id = sid; - auto topic_id = *writer.registerTopic(ds, desc); - - // Schema gives field "x" at id=0 - EXPECT_EQ(*writer.resolveField(topic_id, "x"), 0u); - - // Add an extra column outside the schema - auto fid = writer.ensureColumn(topic_id, "debug_extra", PJ::PrimitiveType::kFloat64); - ASSERT_TRUE(fid.has_value()) << fid.error(); - EXPECT_EQ(*fid, 1u); - - auto handle = *writer.bindTopicWriter(topic_id); - ASSERT_EQ(handle.field_ids.size(), 2u); - - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0); - writer.set(topic_id, 1, 2.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - engine.commitChunks(writer.flushAll()); - - DataReader reader = engine.createReader(); - auto cursor = *reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2000}); - bool visited = false; - cursor.forEach([&](const SampleRow& row) { - EXPECT_NEAR(row.chunk->readNumericAsDouble(0, row.row_index), 1.0, 1e-9); - EXPECT_NEAR(row.chunk->readNumericAsDouble(1, row.row_index), 2.0, 1e-9); - visited = true; - }); - EXPECT_TRUE(visited); -} - -} // namespace diff --git a/pj_datastore/tests/arrow_import_test.cpp b/pj_datastore/tests/arrow_import_test.cpp deleted file mode 100644 index 8b2aee8d..00000000 --- a/pj_datastore/tests/arrow_import_test.cpp +++ /dev/null @@ -1,570 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/arrow_import.hpp" - -#include - -#include -#include -#include -#include -#include -#include - -#include "nanoarrow/nanoarrow.h" -#include "nanoarrow/nanoarrow.hpp" -#include "nanoarrow/nanoarrow_ipc.h" -#include "pj_base/dataset.hpp" -#include "pj_base/span.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/reader.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ::arrow_import { -namespace { - -// --------------------------------------------------------------------------- -// Helper: serialize an ArrowArrayStream to IPC bytes using nanoarrow -// --------------------------------------------------------------------------- - -// Build an IPC byte buffer from a schema + array. The array must be a struct -// array whose children are the column arrays. -std::vector serialize_to_ipc(ArrowSchema* schema, ArrowArray* array) { - // Create output buffer - ArrowBuffer out_buf; - ArrowBufferInit(&out_buf); - - // Create output stream backed by buffer - ArrowIpcOutputStream out_stream; - EXPECT_EQ(ArrowIpcOutputStreamInitBuffer(&out_stream, &out_buf), NANOARROW_OK); - - // Create writer - ArrowIpcWriter writer; - EXPECT_EQ(ArrowIpcWriterInit(&writer, &out_stream), NANOARROW_OK); - - // Write schema - ArrowError error; - EXPECT_EQ(ArrowIpcWriterWriteSchema(&writer, schema, &error), NANOARROW_OK) << error.message; - - // Create array view from schema for writing - nanoarrow::UniqueArrayView view; - EXPECT_EQ(ArrowArrayViewInitFromSchema(view.get(), schema, nullptr), NANOARROW_OK); - EXPECT_EQ(ArrowArrayViewSetArray(view.get(), array, nullptr), NANOARROW_OK); - - // Write array as a record batch - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, view.get(), &error), NANOARROW_OK) << error.message; - - // Write end-of-stream marker - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, nullptr, &error), NANOARROW_OK); - - ArrowIpcWriterReset(&writer); - - // Copy bytes out - std::vector result(static_cast(out_buf.size_bytes)); - std::memcpy(result.data(), out_buf.data, result.size()); - ArrowBufferReset(&out_buf); - - return result; -} - -// Serialize multiple batches into a single IPC stream -std::vector serialize_batches_to_ipc(ArrowSchema* schema, std::vector batches) { - ArrowBuffer out_buf; - ArrowBufferInit(&out_buf); - - ArrowIpcOutputStream out_stream; - EXPECT_EQ(ArrowIpcOutputStreamInitBuffer(&out_stream, &out_buf), NANOARROW_OK); - - ArrowIpcWriter writer; - EXPECT_EQ(ArrowIpcWriterInit(&writer, &out_stream), NANOARROW_OK); - - ArrowError error; - EXPECT_EQ(ArrowIpcWriterWriteSchema(&writer, schema, &error), NANOARROW_OK) << error.message; - - for (auto* batch : batches) { - nanoarrow::UniqueArrayView view; - EXPECT_EQ(ArrowArrayViewInitFromSchema(view.get(), schema, nullptr), NANOARROW_OK); - EXPECT_EQ(ArrowArrayViewSetArray(view.get(), batch, nullptr), NANOARROW_OK); - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, view.get(), &error), NANOARROW_OK) << error.message; - } - - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, nullptr, &error), NANOARROW_OK); - ArrowIpcWriterReset(&writer); - - std::vector result(static_cast(out_buf.size_bytes)); - std::memcpy(result.data(), out_buf.data, result.size()); - ArrowBufferReset(&out_buf); - - return result; -} - -// =========================================================================== -// Test: schema_from_ipc — mixed supported/unsupported types -// =========================================================================== - -TEST(ArrowImportTest, SchemaFromIpc) { - // Build schema: float32 "x", float64 "y", utf8 "name", list (skip) - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 4), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_FLOAT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "x"), NANOARROW_OK); - - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_DOUBLE), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "y"), NANOARROW_OK); - - ArrowSchemaInit(schema->children[2]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[2], NANOARROW_TYPE_STRING), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[2], "name"), NANOARROW_OK); - - // Unsupported type: list - // ArrowSchemaSetType for LIST auto-allocates one child - ArrowSchemaInit(schema->children[3]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[3], NANOARROW_TYPE_LIST), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[3], "unsupported"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[3]->children[0], NANOARROW_TYPE_INT32), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[3]->children[0], "item"), NANOARROW_OK); - - // Build a minimal struct array with 0 rows just to serialize - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - auto result_or = schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - ASSERT_TRUE(result_or.has_value()) << result_or.error(); - - const auto& [type_tree, mappings] = *result_or; - ASSERT_EQ(mappings.size(), 3u); - - EXPECT_EQ(mappings[0].field_name, "x"); - EXPECT_EQ(mappings[0].pj_type, PrimitiveType::kFloat32); - EXPECT_EQ(mappings[0].pj_column_index, 0u); - EXPECT_EQ(mappings[0].arrow_column_index, 0); - - EXPECT_EQ(mappings[1].field_name, "y"); - EXPECT_EQ(mappings[1].pj_type, PrimitiveType::kFloat64); - - EXPECT_EQ(mappings[2].field_name, "name"); - EXPECT_EQ(mappings[2].pj_type, PrimitiveType::kString); - EXPECT_EQ(mappings[2].arrow_column_index, 2); - - EXPECT_EQ(type_tree->name, "arrow_row"); - EXPECT_EQ(type_tree->children.size(), 3u); -} - -// =========================================================================== -// Test: import float32 columns via IPC -// =========================================================================== - -TEST(ArrowImportTest, ImportFloat32) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { float32 "x", float32 "y" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 2), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_FLOAT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "x"), NANOARROW_OK); - - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_FLOAT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "y"), NANOARROW_OK); - - // Build array with 100 rows - constexpr int64_t N = 100; - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - std::vector x_vals(N), y_vals(N); - for (int64_t i = 0; i < N; ++i) { - x_vals[static_cast(i)] = static_cast(i) * 0.1F; - y_vals[static_cast(i)] = static_cast(i) * 0.2F; - - ASSERT_EQ( - ArrowArrayAppendDouble(array->children[0], static_cast(x_vals[static_cast(i)])), - NANOARROW_OK); - ASSERT_EQ( - ArrowArrayAppendDouble(array->children[1], static_cast(y_vals[static_cast(i)])), - NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - } - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - // Parse schema and register - auto [type_tree, mappings] = *schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - auto schema_id = *writer.registerSchema("test_schema", type_tree); - - TopicDescriptor desc; - desc.name = "test_topic"; - desc.schema_id = schema_id; - auto topic_id = *writer.registerTopic(*ds_or, desc); - - // Import - auto status = - importIpcStream(writer, topic_id, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - // Verify round-trip - DataReader reader = engine.createReader(); - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = N - 1}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&](const SampleRow& row) { - auto x = static_cast(row.chunk->readNumericAsDouble(0, row.row_index)); - EXPECT_FLOAT_EQ(x, x_vals[count]); - ++count; - }); - EXPECT_EQ(count, static_cast(N)); -} - -// =========================================================================== -// Test: import with explicit timestamp column -// =========================================================================== - -TEST(ArrowImportTest, ImportWithTimestampColumn) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { int64 "timestamp", float64 "value" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 2), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT64), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "timestamp"), NANOARROW_OK); - - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_DOUBLE), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "value"), NANOARROW_OK); - - constexpr int64_t N = 50; - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - for (int64_t i = 0; i < N; ++i) { - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], i * 1000), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendDouble(array->children[1], static_cast(i) * 0.5), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - } - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - // Only map "value" column (not timestamp) - std::vector mappings = {{ - .arrow_column_index = 1, - .pj_column_index = 0, - .pj_type = PrimitiveType::kFloat64, - .field_name = "value", - }}; - - auto val_tree = makePrimitive("value", PrimitiveType::kFloat64); - auto sid = *writer.registerSchema("ts_schema", val_tree); - TopicDescriptor desc; - desc.name = "ts_topic"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(*ds_or, desc); - - // Import with timestamp_column=0 - auto status = importIpcStream(writer, tid, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings, 0); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - // Verify timestamps - DataReader reader = engine.createReader(); - auto latest_or = reader.latestAt(QueryPoint{.topic_id = tid, .t = 25000}); - ASSERT_TRUE(latest_or.has_value()) << latest_or.error(); - ASSERT_TRUE(latest_or->has_value()); - EXPECT_EQ((*latest_or)->timestamp, 25000); - EXPECT_DOUBLE_EQ((*latest_or)->chunk->readNumericAsDouble(0, (*latest_or)->row_index), 25.0 * 0.5); -} - -// =========================================================================== -// Test: import string columns -// =========================================================================== - -TEST(ArrowImportTest, ImportStrings) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { utf8 "name" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 1), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_STRING), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "name"), NANOARROW_OK); - - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - ArrowStringView sv; - - sv = ArrowCharView("alpha"); - ASSERT_EQ(ArrowArrayAppendString(array->children[0], sv), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - sv = ArrowCharView("bravo"); - ASSERT_EQ(ArrowArrayAppendString(array->children[0], sv), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - sv = ArrowCharView("charlie"); - ASSERT_EQ(ArrowArrayAppendString(array->children[0], sv), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - auto [type_tree, mappings] = *schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - auto sid = *writer.registerSchema("str_schema", type_tree); - TopicDescriptor desc; - desc.name = "str_topic"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(*ds_or, desc); - - auto status = importIpcStream(writer, tid, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - // Verify strings - DataReader reader = engine.createReader(); - std::vector read_strings; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = tid, .t_min = 0, .t_max = 10}); - ASSERT_TRUE(cursor_or.has_value()); - cursor_or->forEach([&](const SampleRow& row) { read_strings.emplace_back(row.chunk->readString(0, row.row_index)); }); - ASSERT_EQ(read_strings.size(), 3u); - EXPECT_EQ(read_strings[0], "alpha"); - EXPECT_EQ(read_strings[1], "bravo"); - EXPECT_EQ(read_strings[2], "charlie"); -} - -// =========================================================================== -// Test: narrow integer widening (int8 → int64) -// =========================================================================== - -TEST(ArrowImportTest, ImportNarrowIntegerWidening) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { int8 "val" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 1), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT8), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "val"), NANOARROW_OK); - - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 10), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], -20), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 127), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - auto [type_tree, mappings] = *schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - auto sid = *writer.registerSchema("i8_schema", type_tree); - TopicDescriptor desc; - desc.name = "i8_topic"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(*ds_or, desc); - - auto status = importIpcStream(writer, tid, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - DataReader reader = engine.createReader(); - std::vector values; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = tid, .t_min = 0, .t_max = 10}); - ASSERT_TRUE(cursor_or.has_value()); - cursor_or->forEach([&](const SampleRow& row) { values.push_back(row.chunk->readNumericAsDouble(0, row.row_index)); }); - ASSERT_EQ(values.size(), 3u); - EXPECT_DOUBLE_EQ(values[0], 10.0); - EXPECT_DOUBLE_EQ(values[1], -20.0); - EXPECT_DOUBLE_EQ(values[2], 127.0); -} - -// =========================================================================== -// Test: large dataset (500+ rows, multiple chunks) -// =========================================================================== - -TEST(ArrowImportTest, ImportLargeDataset) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { float64 "value" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 1), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_DOUBLE), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "value"), NANOARROW_OK); - - constexpr int64_t N = 500; - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - for (int64_t i = 0; i < N; ++i) { - ASSERT_EQ(ArrowArrayAppendDouble(array->children[0], static_cast(i)), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - } - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - auto [type_tree, mappings] = *schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - auto sid = *writer.registerSchema("tbl_schema", type_tree); - TopicDescriptor desc; - desc.name = "tbl_topic"; - desc.schema_id = sid; - desc.max_chunk_rows = 128; - auto tid = *writer.registerTopic(*ds_or, desc); - - auto status = importIpcStream(writer, tid, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - DataReader reader = engine.createReader(); - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = tid, .t_min = 0, .t_max = N - 1}); - ASSERT_TRUE(cursor_or.has_value()); - cursor_or->forEach([&](const SampleRow&) { ++count; }); - EXPECT_EQ(count, static_cast(N)); -} - -// =========================================================================== -// Test: import with nulls -// =========================================================================== - -TEST(ArrowImportTest, ImportWithNulls) { - DataEngine engine; - auto ds_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(ds_or.has_value()); - - DataWriter writer = engine.createWriter(); - - // Build schema: struct { float32 "val" } - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 1), NANOARROW_OK); - - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_FLOAT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "val"), NANOARROW_OK); - - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayAppendDouble(array->children[0], 1.0), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayAppendNull(array->children[0], 1), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayAppendDouble(array->children[0], 3.0), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayAppendNull(array->children[0], 1), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - auto ipc_bytes = serialize_to_ipc(schema.get(), array.get()); - - auto [type_tree, mappings] = *schemaFromIpc(PJ::Span(ipc_bytes.data(), ipc_bytes.size())); - auto sid = *writer.registerSchema("null_schema", type_tree); - TopicDescriptor desc; - desc.name = "null_topic"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(*ds_or, desc); - - auto status = importIpcStream(writer, tid, PJ::Span(ipc_bytes.data(), ipc_bytes.size()), mappings); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - DataReader reader = engine.createReader(); - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = tid, .t_min = 0, .t_max = 10}); - ASSERT_TRUE(cursor_or.has_value()); - std::size_t row = 0; - cursor_or->forEach([&](const SampleRow& r) { - if (row == 0) { - EXPECT_FALSE(r.chunk->isNull(0, r.row_index)); - EXPECT_FLOAT_EQ(static_cast(r.chunk->readNumericAsDouble(0, r.row_index)), 1.0F); - } else if (row == 1) { - EXPECT_TRUE(r.chunk->isNull(0, r.row_index)); - } else if (row == 2) { - EXPECT_FALSE(r.chunk->isNull(0, r.row_index)); - EXPECT_FLOAT_EQ(static_cast(r.chunk->readNumericAsDouble(0, r.row_index)), 3.0F); - } else if (row == 3) { - EXPECT_TRUE(r.chunk->isNull(0, r.row_index)); - } - ++row; - }); - EXPECT_EQ(row, 4u); -} - -} // namespace -} // namespace PJ::arrow_import diff --git a/pj_datastore/tests/arrow_stream_round_trip_test.cpp b/pj_datastore/tests/arrow_stream_round_trip_test.cpp deleted file mode 100644 index 12c9e61e..00000000 --- a/pj_datastore/tests/arrow_stream_round_trip_test.cpp +++ /dev/null @@ -1,289 +0,0 @@ -/** - * @file arrow_stream_round_trip_test.cpp - * @brief End-to-end round trip through the v4 Arrow C Data Interface path. - * - * Writes known small time series into the datastore via - * DatastoreSourceWriteHost::append_arrow_stream and - * DatastoreParserWriteHost::append_arrow_stream (the v4 ABI slots), then - * reads them back via DatastoreToolboxHost::read_series_arrow. - * - * This exercises the Phase 1b host-side implementation without going through - * a dlopen'd plugin — all ABI calls are made directly on the C vtable. - */ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include - -#include "nanoarrow/nanoarrow.h" -#include "nanoarrow/nanoarrow.hpp" -#include "pj_base/dataset.hpp" -#include "pj_base/plugin_data_api.h" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" - -namespace PJ { -namespace { - -// --------------------------------------------------------------------------- -// Build a one-batch ArrowArrayStream with columns {timestamp: int64, value: double} -// --------------------------------------------------------------------------- - -struct BuiltStream { - nanoarrow::UniqueSchema schema; - nanoarrow::UniqueArray array; -}; - -BuiltStream makeStream(const std::vector& timestamps, const std::vector& values) { - EXPECT_EQ(timestamps.size(), values.size()); - const int64_t n = static_cast(timestamps.size()); - - BuiltStream result; - ArrowSchemaInit(result.schema.get()); - EXPECT_EQ(ArrowSchemaSetTypeStruct(result.schema.get(), 2), NANOARROW_OK); - ArrowSchemaInit(result.schema->children[0]); - EXPECT_EQ(ArrowSchemaSetType(result.schema->children[0], NANOARROW_TYPE_INT64), NANOARROW_OK); - EXPECT_EQ(ArrowSchemaSetName(result.schema->children[0], "ts_col"), NANOARROW_OK); - ArrowSchemaInit(result.schema->children[1]); - EXPECT_EQ(ArrowSchemaSetType(result.schema->children[1], NANOARROW_TYPE_DOUBLE), NANOARROW_OK); - EXPECT_EQ(ArrowSchemaSetName(result.schema->children[1], "value"), NANOARROW_OK); - - ArrowError err; - EXPECT_EQ(ArrowArrayInitFromSchema(result.array.get(), result.schema.get(), &err), NANOARROW_OK) << err.message; - EXPECT_EQ(ArrowArrayStartAppending(result.array.get()), NANOARROW_OK); - for (int64_t i = 0; i < n; ++i) { - EXPECT_EQ(ArrowArrayAppendInt(result.array->children[0], timestamps[static_cast(i)]), NANOARROW_OK); - EXPECT_EQ(ArrowArrayAppendDouble(result.array->children[1], values[static_cast(i)]), NANOARROW_OK); - EXPECT_EQ(ArrowArrayFinishElement(result.array.get()), NANOARROW_OK); - } - EXPECT_EQ(ArrowArrayFinishBuildingDefault(result.array.get(), &err), NANOARROW_OK) << err.message; - return result; -} - -/// Stream producer that yields one batch then end-of-stream. -struct OneBatchStreamState { - nanoarrow::UniqueSchema schema; - nanoarrow::UniqueArray array; - bool exhausted = false; - std::string last_error_buf; -}; - -int onebatch_get_schema(ArrowArrayStream* stream, ArrowSchema* out) { - auto* s = static_cast(stream->private_data); - return ArrowSchemaDeepCopy(s->schema.get(), out); -} - -int onebatch_get_next(ArrowArrayStream* stream, ArrowArray* out) { - auto* s = static_cast(stream->private_data); - if (s->exhausted) { - out->release = nullptr; // sentinel for end-of-stream per Arrow spec - return NANOARROW_OK; - } - ArrowArrayMove(s->array.get(), out); - s->exhausted = true; - return NANOARROW_OK; -} - -const char* onebatch_get_last_error(ArrowArrayStream* stream) { - auto* s = static_cast(stream->private_data); - return s->last_error_buf.empty() ? nullptr : s->last_error_buf.c_str(); -} - -void onebatch_release(ArrowArrayStream* stream) { - delete static_cast(stream->private_data); - stream->private_data = nullptr; - stream->release = nullptr; -} - -void initOneBatchStream(ArrowArrayStream* out_stream, BuiltStream built) { - auto* state = new OneBatchStreamState{std::move(built.schema), std::move(built.array), false, {}}; - out_stream->get_schema = onebatch_get_schema; - out_stream->get_next = onebatch_get_next; - out_stream->get_last_error = onebatch_get_last_error; - out_stream->release = onebatch_release; - out_stream->private_data = state; -} - -// --------------------------------------------------------------------------- -// Round-trip test -// --------------------------------------------------------------------------- - -TEST(ArrowStreamRoundTripTest, WriteViaAppendArrowStreamReadViaReadSeriesArrow) { - // Set up engine + dataset. - DataEngine engine; - auto td_id = engine.createTimeDomain("test_td"); - ASSERT_TRUE(td_id.has_value()) << td_id.error(); - auto ds_id = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = *td_id}); - ASSERT_TRUE(ds_id.has_value()) << ds_id.error(); - - // Write host bound to that dataset. - DatastoreSourceWriteHost write_host(engine, PJ_data_source_handle_t{static_cast(*ds_id)}); - auto write_vtable = write_host.raw(); - - // Ensure a topic named "metric" up-front (matches the stream's later schema). - PJ_topic_handle_t topic{}; - PJ_error_t err{}; - PJ_string_view_t topic_name{"metric", 6}; - ASSERT_TRUE(write_vtable.vtable->ensure_topic(write_vtable.ctx, topic_name, &topic, &err)) << err.message; - - // Build a stream with {timestamp, value} and feed it through append_arrow_stream. - const std::vector timestamps = {1000, 2000, 3000, 4000, 5000}; - const std::vector values = {1.5, 2.5, 3.5, 4.5, 5.5}; - auto built = makeStream(timestamps, values); - - ArrowArrayStream stream{}; - initOneBatchStream(&stream, std::move(built)); - - PJ_string_view_t ts_col_name{"ts_col", 6}; - ASSERT_TRUE(write_vtable.vtable->append_arrow_stream(write_vtable.ctx, topic, &stream, ts_col_name, &err)) - << err.message; - - // append_arrow_stream ABI: on success, the host takes ownership of the - // stream and releases it before returning. Our local `stream` must now - // have a null release pointer (it was zeroed by the release callback). - EXPECT_EQ(stream.release, nullptr); - - write_host.flushPending(); - - // Catalog snapshot — look up the field handle for "value". - ObjectStore object_store; - DatastoreToolboxHost tb_host(engine, object_store); - auto tb_vtable = tb_host.raw(); - - PJ_catalog_snapshot_t snapshot{}; - ASSERT_TRUE(tb_vtable.vtable->acquire_catalog_snapshot(tb_vtable.ctx, &snapshot, &err)) << err.message; - - PJ_field_handle_t value_field{}; - bool value_found = false; - for (std::size_t i = 0; i < snapshot.field_count; ++i) { - const auto& f = snapshot.fields[i]; - if (std::string(f.name.data, f.name.size).find("value") != std::string::npos) { - value_field = f.handle; - value_found = true; - break; - } - } - snapshot.release(snapshot.release_ctx); - ASSERT_TRUE(value_found) << "field 'value' missing from catalog"; - - // Read it back via read_series_arrow. - ArrowSchema out_schema{}; - ArrowArray out_array{}; - ASSERT_TRUE(tb_vtable.vtable->read_series_arrow(tb_vtable.ctx, value_field, &out_schema, &out_array, &err)) - << err.message; - ASSERT_NE(out_schema.release, nullptr); - ASSERT_NE(out_array.release, nullptr); - - // Schema: struct { timestamp: int64, : double } - EXPECT_EQ(std::string(out_schema.format), "+s"); - ASSERT_EQ(out_schema.n_children, 2); - EXPECT_EQ(std::string(out_schema.children[0]->name), "timestamp"); - EXPECT_EQ(std::string(out_schema.children[0]->format), "l"); // int64 - EXPECT_EQ(std::string(out_schema.children[1]->format), "g"); // float64 - - // Array layout matches. - ASSERT_EQ(out_array.length, static_cast(timestamps.size())); - ASSERT_EQ(out_array.n_children, 2); - - // Walk via ArrowArrayView to extract the values. - nanoarrow::UniqueArrayView view; - ArrowError vf_err; - ASSERT_EQ(ArrowArrayViewInitFromSchema(view.get(), &out_schema, &vf_err), NANOARROW_OK) << vf_err.message; - ASSERT_EQ(ArrowArrayViewSetArray(view.get(), &out_array, &vf_err), NANOARROW_OK) << vf_err.message; - - for (int64_t i = 0; i < out_array.length; ++i) { - EXPECT_EQ(ArrowArrayViewGetIntUnsafe(view->children[0], i), timestamps[static_cast(i)]); - EXPECT_DOUBLE_EQ(ArrowArrayViewGetDoubleUnsafe(view->children[1], i), values[static_cast(i)]); - } - - // Release the host-owned structs as per the ABI contract. - out_schema.release(&out_schema); - out_array.release(&out_array); - EXPECT_EQ(out_schema.release, nullptr); - EXPECT_EQ(out_array.release, nullptr); -} - -TEST(ArrowStreamRoundTripTest, ParserWriteHostAppendArrowStreamWritesBoundTopic) { - DataEngine engine; - auto td_id = engine.createTimeDomain("parser_td"); - ASSERT_TRUE(td_id.has_value()) << td_id.error(); - auto ds_id = engine.createDataset(DatasetDescriptor{.source_name = "parser", .time_domain_id = *td_id}); - ASSERT_TRUE(ds_id.has_value()) << ds_id.error(); - - DatastoreSourceWriteHost source_write_host(engine, PJ_data_source_handle_t{static_cast(*ds_id)}); - auto source_vtable = source_write_host.raw(); - - PJ_topic_handle_t topic{}; - PJ_error_t err{}; - PJ_string_view_t topic_name{"parser_metric", 13}; - ASSERT_TRUE(source_vtable.vtable->ensure_topic(source_vtable.ctx, topic_name, &topic, &err)) << err.message; - - DatastoreParserWriteHost parser_write_host(engine, topic); - auto parser_vtable = parser_write_host.raw(); - - const std::vector timestamps = {10, 20, 30}; - const std::vector values = {7.0, 8.5, 9.25}; - auto built = makeStream(timestamps, values); - - ArrowArrayStream stream{}; - initOneBatchStream(&stream, std::move(built)); - - PJ_string_view_t ts_col_name{"ts_col", 6}; - ASSERT_TRUE(parser_vtable.vtable->append_arrow_stream(parser_vtable.ctx, &stream, ts_col_name, &err)) << err.message; - EXPECT_EQ(stream.release, nullptr); - - parser_write_host.flushPending(); - - ObjectStore object_store; - DatastoreToolboxHost tb_host(engine, object_store); - auto tb_vtable = tb_host.raw(); - - PJ_catalog_snapshot_t snapshot{}; - ASSERT_TRUE(tb_vtable.vtable->acquire_catalog_snapshot(tb_vtable.ctx, &snapshot, &err)) << err.message; - - PJ_field_handle_t value_field{}; - bool value_found = false; - for (std::size_t i = 0; i < snapshot.field_count; ++i) { - const auto& f = snapshot.fields[i]; - if (std::string(f.name.data, f.name.size).find("value") != std::string::npos) { - value_field = f.handle; - value_found = true; - break; - } - } - snapshot.release(snapshot.release_ctx); - ASSERT_TRUE(value_found) << "field 'value' missing from catalog"; - - ArrowSchema out_schema{}; - ArrowArray out_array{}; - ASSERT_TRUE(tb_vtable.vtable->read_series_arrow(tb_vtable.ctx, value_field, &out_schema, &out_array, &err)) - << err.message; - ASSERT_NE(out_schema.release, nullptr); - ASSERT_NE(out_array.release, nullptr); - - nanoarrow::UniqueArrayView view; - ArrowError vf_err; - ASSERT_EQ(ArrowArrayViewInitFromSchema(view.get(), &out_schema, &vf_err), NANOARROW_OK) << vf_err.message; - ASSERT_EQ(ArrowArrayViewSetArray(view.get(), &out_array, &vf_err), NANOARROW_OK) << vf_err.message; - - ASSERT_EQ(out_array.length, static_cast(timestamps.size())); - for (int64_t i = 0; i < out_array.length; ++i) { - EXPECT_EQ(ArrowArrayViewGetIntUnsafe(view->children[0], i), timestamps[static_cast(i)]); - EXPECT_DOUBLE_EQ(ArrowArrayViewGetDoubleUnsafe(view->children[1], i), values[static_cast(i)]); - } - - out_schema.release(&out_schema); - out_array.release(&out_array); - EXPECT_EQ(out_schema.release, nullptr); - EXPECT_EQ(out_array.release, nullptr); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/buffer_test.cpp b/pj_datastore/tests/buffer_test.cpp deleted file mode 100644 index 9e4835ff..00000000 --- a/pj_datastore/tests/buffer_test.cpp +++ /dev/null @@ -1,182 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/buffer.hpp" - -#include - -#include -#include - -namespace PJ { -namespace { - -// =========================================================================== -// RawBuffer tests -// =========================================================================== - -TEST(RawBufferTest, DefaultConstructEmpty) { - RawBuffer buf; - EXPECT_TRUE(buf.empty()); - EXPECT_EQ(buf.size(), 0u); -} - -TEST(RawBufferTest, ConstructWithCapacity) { - constexpr std::size_t kCap = 256; - RawBuffer buf(kCap); - EXPECT_TRUE(buf.empty()); - EXPECT_GE(buf.capacity(), kCap); -} - -TEST(RawBufferTest, AppendData) { - RawBuffer buf; - const std::array payload = {0xDE, 0xAD, 0xBE, 0xEF}; - buf.append(payload.data(), payload.size()); - - EXPECT_EQ(buf.size(), 4u); - EXPECT_FALSE(buf.empty()); - EXPECT_EQ(buf.data()[0], 0xDE); - EXPECT_EQ(buf.data()[1], 0xAD); - EXPECT_EQ(buf.data()[2], 0xBE); - EXPECT_EQ(buf.data()[3], 0xEF); -} - -TEST(RawBufferTest, AppendMultipleTimes) { - RawBuffer buf; - const std::array first = {1, 2, 3}; - const std::array second = {4, 5, 6, 7, 8}; - - buf.append(first.data(), first.size()); - buf.append(second.data(), second.size()); - - EXPECT_EQ(buf.size(), 8u); - for (uint8_t i = 0; i < 8; ++i) { - EXPECT_EQ(buf.data()[i], i + 1); - } -} - -TEST(RawBufferTest, Resize) { - RawBuffer buf; - buf.resize(16); - EXPECT_EQ(buf.size(), 16u); - EXPECT_FALSE(buf.empty()); -} - -TEST(RawBufferTest, Clear) { - RawBuffer buf; - const std::array payload = {1, 2, 3, 4}; - buf.append(payload.data(), payload.size()); - EXPECT_FALSE(buf.empty()); - - buf.clear(); - EXPECT_TRUE(buf.empty()); - EXPECT_EQ(buf.size(), 0u); -} - -TEST(RawBufferTest, Reserve) { - RawBuffer buf; - buf.reserve(1000); - EXPECT_GE(buf.capacity(), 1000u); - EXPECT_EQ(buf.size(), 0u); - EXPECT_TRUE(buf.empty()); -} - -// =========================================================================== -// BitVector tests -// =========================================================================== - -TEST(BitVectorTest, BytesForBits) { - EXPECT_EQ(BitVector::bytesForBits(0), 0u); - EXPECT_EQ(BitVector::bytesForBits(1), 1u); - EXPECT_EQ(BitVector::bytesForBits(7), 1u); - EXPECT_EQ(BitVector::bytesForBits(8), 1u); - EXPECT_EQ(BitVector::bytesForBits(9), 2u); - EXPECT_EQ(BitVector::bytesForBits(16), 2u); - EXPECT_EQ(BitVector::bytesForBits(17), 3u); -} - -TEST(BitVectorTest, InitAllValid) { - BitVector bits; - bits.initValid(16); - - EXPECT_EQ(bits.sizeBytes(), 2u); - for (std::size_t i = 0; i < 16; ++i) { - EXPECT_TRUE(bits.isValid(i)) << "bit " << i << " should be valid after init"; - } -} - -TEST(BitVectorTest, SetNull) { - BitVector bits; - bits.initValid(16); - - bits.setNull(5); - EXPECT_FALSE(bits.isValid(5)); - - // All other bits should remain valid. - for (std::size_t i = 0; i < 16; ++i) { - if (i == 5) { - continue; - } - EXPECT_TRUE(bits.isValid(i)) << "bit " << i << " should still be valid"; - } -} - -TEST(BitVectorTest, SetValidAfterNull) { - BitVector bits; - bits.initValid(16); - - bits.setNull(5); - EXPECT_FALSE(bits.isValid(5)); - - bits.setValid(5); - EXPECT_TRUE(bits.isValid(5)); -} - -TEST(BitVectorTest, CountNulls) { - BitVector bits; - bits.initValid(16); - - bits.setNull(3); - bits.setNull(7); - bits.setNull(15); - - EXPECT_EQ(bits.countNulls(16), 3u); -} - -TEST(BitVectorTest, ByteBoundary) { - BitVector bits; - bits.initValid(16); - - // Bit 7 is the last bit in byte 0; bit 8 is the first bit in byte 1. - bits.setNull(7); - bits.setNull(8); - - EXPECT_FALSE(bits.isValid(7)); - EXPECT_FALSE(bits.isValid(8)); - - // Neighbours should be unaffected. - EXPECT_TRUE(bits.isValid(6)); - EXPECT_TRUE(bits.isValid(9)); -} - -TEST(BitVectorTest, CountNullsWithNoNulls) { - BitVector bits; - bits.initValid(32); - EXPECT_EQ(bits.countNulls(32), 0u); -} - -TEST(BitVectorTest, BitSpanView) { - BitVector bits; - bits.initValid(8); - bits.setNull(1); - bits.setNull(6); - - const BitSpan view = bits.bitSpan(); - EXPECT_EQ(view.sizeBits(), 8u); - EXPECT_TRUE(view.test(0)); - EXPECT_FALSE(view.test(1)); - EXPECT_FALSE(view.test(6)); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/chunk_test.cpp b/pj_datastore/tests/chunk_test.cpp deleted file mode 100644 index 6dfdb4d6..00000000 --- a/pj_datastore/tests/chunk_test.cpp +++ /dev/null @@ -1,1191 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/chunk.hpp" - -#include - -#include -#include -#include -#include -#include -#include - -namespace PJ { -namespace { - -// --------------------------------------------------------------------------- -// Helper: create a vector of ColumnDescriptors -// --------------------------------------------------------------------------- - -ColumnDescriptor make_col(FieldId id, PrimitiveType type, std::string path) { - return ColumnDescriptor{id, type, std::move(path)}; -} - -// =========================================================================== -// Test 1: Build and seal float32 chunk -// =========================================================================== - -TEST(ChunkTest, BuildAndSealFloat32Chunk) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "x"), - make_col(2, PrimitiveType::kFloat32, "y"), - make_col(3, PrimitiveType::kFloat32, "z"), - }; - TopicChunkBuilder builder(/*topic_id=*/10, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Add 5 rows - for (uint32_t i = 0; i < 5; ++i) { - Timestamp ts = 1000 + static_cast(i) * 100; - builder.beginRow(ts); - builder.set(0, static_cast(i) * 1.0F); - builder.set(1, static_cast(i) * 2.0F); - builder.set(2, static_cast(i) * 3.0F); - builder.finishRow(); - } - - EXPECT_EQ(builder.rowCount(), 5U); - EXPECT_FALSE(builder.isFull()); - - const auto& stats = builder.stats(); - EXPECT_EQ(stats.t_min, 1000); - EXPECT_EQ(stats.t_max, 1400); - EXPECT_EQ(stats.row_count, 5U); - - // Column 0 (x): values 0, 1, 2, 3, 4 - EXPECT_DOUBLE_EQ(*stats.column_stats[0].min_value, 0.0); - EXPECT_DOUBLE_EQ(*stats.column_stats[0].max_value, 4.0); - - // Column 1 (y): values 0, 2, 4, 6, 8 - EXPECT_DOUBLE_EQ(*stats.column_stats[1].min_value, 0.0); - EXPECT_DOUBLE_EQ(*stats.column_stats[1].max_value, 8.0); - - // Column 2 (z): values 0, 3, 6, 9, 12 - EXPECT_DOUBLE_EQ(*stats.column_stats[2].min_value, 0.0); - EXPECT_DOUBLE_EQ(*stats.column_stats[2].max_value, 12.0); - - TopicChunk chunk = builder.seal(); - EXPECT_NE(chunk.id, 0U); - EXPECT_EQ(chunk.topic_id, 10U); - EXPECT_EQ(chunk.schema_version, 1U); - EXPECT_EQ(chunk.stats.row_count, 5U); - EXPECT_EQ(chunk.columns.size(), 3U); - for (std::size_t c = 0; c < 3; ++c) { - EXPECT_EQ(chunk.columnEncoding(c), EncodingType::kRaw); - } -} - -// =========================================================================== -// Test 2: Read back sealed values -// =========================================================================== - -TEST(ChunkTest, ReadBackSealedValues) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "x"), - make_col(2, PrimitiveType::kFloat64, "y"), - make_col(3, PrimitiveType::kInt32, "z"), - }; - TopicChunkBuilder builder(/*topic_id=*/20, /*schema_id=*/2, std::move(cols), /*max_rows=*/100); - - Timestamp timestamps[] = {1000, 1100, 1200, 1300, 1400}; - float x_vals[] = {1.5F, 2.5F, 3.5F, 4.5F, 5.5F}; - double y_vals[] = {10.0, 20.0, 30.0, 40.0, 50.0}; - int32_t z_vals[] = {-1, 0, 1, 2, 3}; - - for (int i = 0; i < 5; ++i) { - builder.beginRow(timestamps[i]); - builder.set(0, x_vals[i]); - builder.set(1, y_vals[i]); - builder.set(2, z_vals[i]); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - // Read back timestamps - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_EQ(chunk.readTimestamp(i), timestamps[i]) << "row " << i; - } - - // Read back float32 column as double - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, i)), x_vals[i]) << "row " << i; - } - - // Read back float64 column - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(1, i), y_vals[i]) << "row " << i; - } - - // Read back int32 column as double (may be FOR-encoded since range is small) - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(2, i), static_cast(z_vals[i])) << "row " << i; - } -} - -// =========================================================================== -// Test 3: is_full -// =========================================================================== - -TEST(ChunkTest, IsFull) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/30, /*schema_id=*/1, std::move(cols), /*max_rows=*/3); - - EXPECT_FALSE(builder.isFull()); - EXPECT_EQ(builder.rowCount(), 0U); - - for (uint32_t i = 0; i < 3; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, static_cast(i)); - builder.finishRow(); - } - - EXPECT_TRUE(builder.isFull()); - EXPECT_EQ(builder.rowCount(), 3U); -} - -// =========================================================================== -// Test 4: String column -// =========================================================================== - -TEST(ChunkTest, StringColumn) { - std::vector cols = { - make_col(1, PrimitiveType::kString, "label"), - }; - TopicChunkBuilder builder(/*topic_id=*/40, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - std::string_view strings[] = {"hello", "world", "hello", "world"}; - for (int i = 0; i < 4; ++i) { - builder.beginRow(static_cast(i * 100)); - builder.set(0, strings[i]); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kDictionary); - const auto& dict = std::get(chunk.columns[0].data); - // 2 unique strings: "hello" and "world" - EXPECT_EQ(dict.dictionary.size(), 2U); - - // Read back all strings - for (std::size_t i = 0; i < 4; ++i) { - EXPECT_EQ(chunk.readString(0, i), strings[i]) << "row " << i; - } -} - -// =========================================================================== -// Test 5: Bool column -// =========================================================================== - -TEST(ChunkTest, BoolColumn) { - std::vector cols = { - make_col(1, PrimitiveType::kBool, "flag"), - }; - TopicChunkBuilder builder(/*topic_id=*/50, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - bool bools[] = {true, false, true, true, false}; - for (int i = 0; i < 5; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, bools[i]); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kPackedBool); - - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_EQ(chunk.readBool(0, i), bools[i]) << "row " << i; - } -} - -// =========================================================================== -// Test 6: Null handling -// =========================================================================== - -TEST(ChunkTest, NullHandling) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/60, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Row 0: 10.0, Row 1: null, Row 2: 30.0, Row 3: null, Row 4: 50.0 - builder.beginRow(100); - builder.set(0, 10.0); - builder.finishRow(); - - builder.beginRow(200); - builder.setNull(0); - builder.finishRow(); - - builder.beginRow(300); - builder.set(0, 30.0); - builder.finishRow(); - - builder.beginRow(400); - builder.setNull(0); - builder.finishRow(); - - builder.beginRow(500); - builder.set(0, 50.0); - builder.finishRow(); - - const auto& stats = builder.stats(); - EXPECT_EQ(stats.column_stats[0].null_count, 2U); - - TopicChunk chunk = builder.seal(); - - EXPECT_FALSE(chunk.isNull(0, 0)); - EXPECT_TRUE(chunk.isNull(0, 1)); - EXPECT_FALSE(chunk.isNull(0, 2)); - EXPECT_TRUE(chunk.isNull(0, 3)); - EXPECT_FALSE(chunk.isNull(0, 4)); - - // Non-null values should read back correctly - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 0), 10.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 2), 30.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 4), 50.0); -} - -// =========================================================================== -// Test 7: Mixed types -// =========================================================================== - -TEST(ChunkTest, MixedTypes) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "position"), - make_col(2, PrimitiveType::kString, "label"), - make_col(3, PrimitiveType::kBool, "active"), - }; - TopicChunkBuilder builder(/*topic_id=*/70, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(1000); - builder.set(0, 1.5F); - builder.set(1, std::string_view("alpha")); - builder.set(2, true); - builder.finishRow(); - - builder.beginRow(2000); - builder.set(0, 2.5F); - builder.set(1, std::string_view("beta")); - builder.set(2, false); - builder.finishRow(); - - builder.beginRow(3000); - builder.set(0, 3.5F); - builder.set(1, std::string_view("alpha")); - builder.set(2, true); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - // Check encodings - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kRaw); - EXPECT_EQ(chunk.columnEncoding(1), EncodingType::kDictionary); - EXPECT_EQ(chunk.columnEncoding(2), EncodingType::kPackedBool); - - // Read back all values - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, 0)), 1.5F); - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, 1)), 2.5F); - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, 2)), 3.5F); - - EXPECT_EQ(chunk.readString(1, 0), "alpha"); - EXPECT_EQ(chunk.readString(1, 1), "beta"); - EXPECT_EQ(chunk.readString(1, 2), "alpha"); - - EXPECT_TRUE(chunk.readBool(2, 0)); - EXPECT_FALSE(chunk.readBool(2, 1)); - EXPECT_TRUE(chunk.readBool(2, 2)); - - // Timestamps - EXPECT_EQ(chunk.readTimestamp(0), 1000); - EXPECT_EQ(chunk.readTimestamp(1), 2000); - EXPECT_EQ(chunk.readTimestamp(2), 3000); -} - -// =========================================================================== -// Test 8: Column stats (min/max, is_constant, run_count) -// =========================================================================== - -TEST(ChunkTest, ColumnStatsNumeric) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "varying"), - make_col(2, PrimitiveType::kFloat64, "constant"), - }; - TopicChunkBuilder builder(/*topic_id=*/80, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // varying: -5, 0, 10, 3, 10 - // constant: 42, 42, 42, 42, 42 - double varying[] = {-5.0, 0.0, 10.0, 3.0, 10.0}; - for (int i = 0; i < 5; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, varying[i]); - builder.set(1, 42.0); - builder.finishRow(); - } - - const auto& stats = builder.stats(); - - // Varying column - EXPECT_DOUBLE_EQ(*stats.column_stats[0].min_value, -5.0); - EXPECT_DOUBLE_EQ(*stats.column_stats[0].max_value, 10.0); - EXPECT_FALSE(stats.column_stats[0].is_constant); - // run_count: -5->0 (change), 0->10 (change), 10->3 (change), 3->10 (change) = 1 + 4 = 5 - EXPECT_EQ(stats.column_stats[0].run_count, 5U); - - // Constant column - EXPECT_DOUBLE_EQ(*stats.column_stats[1].min_value, 42.0); - EXPECT_DOUBLE_EQ(*stats.column_stats[1].max_value, 42.0); - EXPECT_TRUE(stats.column_stats[1].is_constant); - EXPECT_EQ(stats.column_stats[1].run_count, 1U); -} - -// =========================================================================== -// Test: Unique chunk IDs -// =========================================================================== - -TEST(ChunkTest, UniqueChunkIds) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - - TopicChunkBuilder builder1(1, 1, cols, 10); - builder1.beginRow(100); - builder1.set(0, 1.0F); - builder1.finishRow(); - TopicChunk c1 = builder1.seal(); - - TopicChunkBuilder builder2(1, 1, cols, 10); - builder2.beginRow(200); - builder2.set(0, 2.0F); - builder2.finishRow(); - TopicChunk c2 = builder2.seal(); - - EXPECT_NE(c1.id, c2.id); - EXPECT_NE(c1.id, kInvalidChunkId); - EXPECT_NE(c2.id, kInvalidChunkId); -} - -// =========================================================================== -// Test: Integer types round-trip -// =========================================================================== - -TEST(ChunkTest, IntegerTypesRoundTrip) { - // int8/int16 logical types widen to int64 storage; int32 has its own storage; - // uint8/uint16/uint32 widen to uint64 storage. - std::vector cols = { - make_col(1, PrimitiveType::kInt8, "i8"), make_col(2, PrimitiveType::kInt16, "i16"), - make_col(3, PrimitiveType::kInt32, "i32"), make_col(4, PrimitiveType::kInt64, "i64"), - make_col(5, PrimitiveType::kUint8, "u8"), make_col(6, PrimitiveType::kUint16, "u16"), - make_col(7, PrimitiveType::kUint32, "u32"), make_col(8, PrimitiveType::kUint64, "u64"), - }; - TopicChunkBuilder builder(/*topic_id=*/90, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(1000); - builder.set(0, static_cast(-42)); // int8 → int64 storage - builder.set(1, static_cast(-1000)); // int16 → int64 storage - builder.set(2, -999999); // int32 → int32 storage - builder.set(3, static_cast(123456789012345LL)); - builder.set(4, static_cast(255)); // uint8 → uint64 storage - builder.set(5, static_cast(65535)); // uint16 → uint64 storage - builder.set(6, static_cast(4000000000U)); // uint32 → uint64 storage - builder.set(7, static_cast(18000000000000000000ULL)); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - // Single-row chunks will be constant-encoded, but readback should be the same - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 0), -42.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(1, 0), -1000.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(2, 0), -999999.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(3, 0), 123456789012345.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(4, 0), 255.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(5, 0), 65535.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(6, 0), 4000000000.0); - // uint64 large values may lose precision in double, so just check close - EXPECT_NEAR(chunk.readNumericAsDouble(7, 0), 1.8e19, 1e4); -} - -// =========================================================================== -// Test: No nulls means is_null always returns false -// =========================================================================== - -TEST(ChunkTest, NoNullsIsNullReturnsFalse) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/100, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - for (int i = 0; i < 3; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, static_cast(i)); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_FALSE(chunk.isNull(0, i)) << "row " << i; - } -} - -// =========================================================================== -// Test: String column is_constant and run_count -// =========================================================================== - -TEST(ChunkTest, StringColumnStats) { - std::vector cols = { - make_col(1, PrimitiveType::kString, "tag"), - }; - TopicChunkBuilder builder(/*topic_id=*/110, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // All same string -> is_constant = true, run_count = 1 - for (int i = 0; i < 4; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, std::string_view("same")); - builder.finishRow(); - } - - const auto& stats = builder.stats(); - EXPECT_TRUE(stats.column_stats[0].is_constant); - EXPECT_EQ(stats.column_stats[0].run_count, 1U); - // String columns should not have numeric min/max - EXPECT_FALSE(stats.column_stats[0].min_value.has_value()); - EXPECT_FALSE(stats.column_stats[0].max_value.has_value()); -} - -// =========================================================================== -// Test: Empty chunk (0 rows) -// =========================================================================== - -TEST(ChunkTest, EmptyChunk) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/120, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - EXPECT_EQ(builder.rowCount(), 0U); - EXPECT_FALSE(builder.isFull()); - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.stats.row_count, 0U); - EXPECT_TRUE(chunk.timestamps.empty()); -} - -// =========================================================================== -// Test: Bulk read column as doubles (float32) -// =========================================================================== - -TEST(ChunkTest, BulkReadFloat32) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "x"), - }; - TopicChunkBuilder builder(/*topic_id=*/130, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - constexpr int kRows = 10; - for (int i = 0; i < kRows; ++i) { - builder.beginRow(static_cast(i * 100)); - builder.set(0, static_cast(i) * 1.5F); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - // Read all rows - std::vector out(kRows); - chunk.readColumnAsDoubles(0, Span(out), 0); - for (int i = 0; i < kRows; ++i) { - EXPECT_FLOAT_EQ(static_cast(out[static_cast(i)]), static_cast(i) * 1.5F) << "row " << i; - } - - // Read a sub-range [3, 7) - std::vector sub(4); - chunk.readColumnAsDoubles(0, Span(sub), 3); - for (int i = 0; i < 4; ++i) { - EXPECT_FLOAT_EQ(static_cast(sub[static_cast(i)]), static_cast(i + 3) * 1.5F) - << "sub row " << i; - } -} - -// =========================================================================== -// Test: Bulk read column as doubles (int64) -// =========================================================================== - -TEST(ChunkTest, BulkReadInt64) { - std::vector cols = { - make_col(1, PrimitiveType::kInt64, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/140, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - constexpr int kRows = 5; - int64_t values[] = {-100, 0, 42, 999, -1}; - for (int i = 0; i < kRows; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, values[i]); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - std::vector out(kRows); - chunk.readColumnAsDoubles(0, Span(out), 0); - for (int i = 0; i < kRows; ++i) { - EXPECT_DOUBLE_EQ(out[static_cast(i)], static_cast(values[i])) << "row " << i; - } -} - -// =========================================================================== -// Test: Bulk read bool/string columns returns NaN -// =========================================================================== - -TEST(ChunkTest, BulkReadBoolStringReturnsNaN) { - std::vector cols = { - make_col(1, PrimitiveType::kBool, "flag"), - make_col(2, PrimitiveType::kString, "label"), - }; - TopicChunkBuilder builder(/*topic_id=*/150, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(100); - builder.set(0, true); - builder.set(1, std::string_view("hello")); - builder.finishRow(); - - builder.beginRow(200); - builder.set(0, false); - builder.set(1, std::string_view("world")); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - std::vector out(2); - - chunk.readColumnAsDoubles(0, Span(out.data(), 2), 0); - EXPECT_TRUE(std::isnan(out[0])); - EXPECT_TRUE(std::isnan(out[1])); - - chunk.readColumnAsDoubles(1, Span(out.data(), 2), 0); - EXPECT_TRUE(std::isnan(out[0])); - EXPECT_TRUE(std::isnan(out[1])); -} - -// =========================================================================== -// Test: Bulk read zero rows (boundary) -// =========================================================================== - -TEST(ChunkTest, BulkReadZeroRows) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "x"), - }; - TopicChunkBuilder builder(/*topic_id=*/160, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(100); - builder.set(0, 1.0); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - // Should not crash when reading 0 rows - double dummy = 0.0; - chunk.readColumnAsDoubles(0, Span(&dummy, 0), 0); - EXPECT_DOUBLE_EQ(dummy, 0.0); // untouched -} - -// =========================================================================== -// NEW: Constant int column gets constant encoding -// =========================================================================== - -TEST(ChunkTest, ConstantIntColumnGetsConstantEncoding) { - std::vector cols = { - make_col(1, PrimitiveType::kInt32, "const_val"), - }; - TopicChunkBuilder builder(/*topic_id=*/200, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - for (int i = 0; i < 100; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, 42); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - - // Read back every row - for (std::size_t i = 0; i < 100; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, i), 42.0) << "row " << i; - } - - // Bulk read - std::vector out(100); - chunk.readColumnAsDoubles(0, Span(out), 0); - for (std::size_t i = 0; i < 100; ++i) { - EXPECT_DOUBLE_EQ(out[i], 42.0) << "bulk row " << i; - } -} - -// =========================================================================== -// NEW: Constant float column gets constant encoding -// =========================================================================== - -TEST(ChunkTest, ConstantFloatColumnGetsConstantEncoding) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "const_f64"), - }; - TopicChunkBuilder builder(/*topic_id=*/201, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - for (int i = 0; i < 50; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, 3.14); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - - for (std::size_t i = 0; i < 50; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, i), 3.14) << "row " << i; - } -} - -// =========================================================================== -// NEW: Narrow range int column gets FOR encoding -// =========================================================================== - -TEST(ChunkTest, NarrowRangeIntColumnGetsFOR) { - std::vector cols = { - make_col(1, PrimitiveType::kInt32, "narrow"), - }; - TopicChunkBuilder builder(/*topic_id=*/202, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - // Values in [1000, 1100] — range=100, fits in uint8 (1 byte vs 4 native) - for (int i = 0; i < 101; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, 1000 + static_cast(i)); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kFrameOfReference); - const auto& for_enc = std::get(chunk.columns[0].data); - EXPECT_EQ(for_enc.offset_bytes, 1); - EXPECT_EQ(for_enc.reference, 1000); - - // Per-row read - for (std::size_t i = 0; i < 101; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, i), 1000.0 + static_cast(i)) << "row " << i; - } - - // Bulk read - std::vector out(101); - chunk.readColumnAsDoubles(0, Span(out), 0); - for (std::size_t i = 0; i < 101; ++i) { - EXPECT_DOUBLE_EQ(out[i], 1000.0 + static_cast(i)) << "bulk row " << i; - } -} - -// =========================================================================== -// NEW: Wide range int column stays raw -// =========================================================================== - -TEST(ChunkTest, WideRangeIntColumnStaysRaw) { - std::vector cols = { - make_col(1, PrimitiveType::kInt32, "wide"), - }; - TopicChunkBuilder builder(/*topic_id=*/203, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - // Range that spans full int32 — FOR can't narrow below 4 bytes - builder.beginRow(0); - builder.set(0, -2000000000); - builder.finishRow(); - - builder.beginRow(1); - builder.set(0, 2000000000); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kRaw); - - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 0), -2000000000.0); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, 1), 2000000000.0); -} - -// =========================================================================== -// NEW: Float column always stays raw (never gets FOR) -// =========================================================================== - -TEST(ChunkTest, FloatColumnAlwaysStaysRaw) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "f32"), - make_col(2, PrimitiveType::kFloat64, "f64"), - }; - TopicChunkBuilder builder(/*topic_id=*/204, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - // Varying float values - for (int i = 0; i < 10; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, static_cast(i) * 0.1F); - builder.set(1, static_cast(i) * 0.1); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kRaw); - EXPECT_EQ(chunk.columnEncoding(1), EncodingType::kRaw); -} - -// =========================================================================== -// NEW: Constant bool column gets constant encoding -// =========================================================================== - -TEST(ChunkTest, ConstantBoolGetsConstantEncoding) { - std::vector cols = { - make_col(1, PrimitiveType::kBool, "always_true"), - }; - TopicChunkBuilder builder(/*topic_id=*/205, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - for (int i = 0; i < 50; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, true); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - - for (std::size_t i = 0; i < 50; ++i) { - EXPECT_TRUE(chunk.readBool(0, i)) << "row " << i; - } -} - -// =========================================================================== -// Bulk append: basic float32 -// =========================================================================== - -TEST(ChunkTest, BulkAppendFloat32) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "x"), - make_col(2, PrimitiveType::kFloat32, "y"), - }; - TopicChunkBuilder builder(/*topic_id=*/500, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - constexpr std::size_t N = 100; - std::vector ts(N); - std::vector x_vals(N), y_vals(N); - for (std::size_t i = 0; i < N; ++i) { - ts[i] = static_cast(i) * 10; - x_vals[i] = static_cast(i) * 1.0F; - y_vals[i] = static_cast(i) * 2.0F; - } - - builder.appendTimestamps(ts); - builder.appendColumn(0, x_vals); - builder.appendColumn(1, y_vals); - builder.finishBulkAppend(); - - EXPECT_EQ(builder.rowCount(), 100U); - EXPECT_EQ(builder.lastTimestamp(), 990); - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.stats.row_count, 100U); - EXPECT_EQ(chunk.stats.t_min, 0); - EXPECT_EQ(chunk.stats.t_max, 990); - - // Verify round-trip - for (std::size_t i = 0; i < N; ++i) { - EXPECT_EQ(chunk.readTimestamp(i), static_cast(i) * 10); - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, i)), x_vals[i]); - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(1, i)), y_vals[i]); - } -} - -// =========================================================================== -// Bulk append: stats are correct -// =========================================================================== - -TEST(ChunkTest, BulkAppendStats) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/501, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - const double data[] = {3.0, 1.0, 4.0, 1.0, 5.0}; - const Timestamp ts[] = {10, 20, 30, 40, 50}; - - builder.appendTimestamps(Span(ts, 5)); - builder.appendColumn(0, Span(data, 5)); - builder.finishBulkAppend(); - - const auto& cs = builder.stats().column_stats[0]; - EXPECT_DOUBLE_EQ(*cs.min_value, 1.0); - EXPECT_DOUBLE_EQ(*cs.max_value, 5.0); - EXPECT_FALSE(cs.is_constant); - EXPECT_GT(cs.run_count, 1U); -} - -// =========================================================================== -// Bulk append: constant column -// =========================================================================== - -TEST(ChunkTest, BulkAppendConstantColumn) { - std::vector cols = { - make_col(1, PrimitiveType::kInt32, "const"), - }; - TopicChunkBuilder builder(/*topic_id=*/502, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - constexpr std::size_t N = 50; - std::vector ts(N); - std::vector vals(N, 42); - for (std::size_t i = 0; i < N; ++i) { - ts[i] = static_cast(i); - } - - builder.appendTimestamps(ts); - builder.appendColumn(0, vals); - builder.finishBulkAppend(); - - const auto& cs = builder.stats().column_stats[0]; - EXPECT_TRUE(cs.is_constant); - EXPECT_DOUBLE_EQ(*cs.min_value, 42.0); - EXPECT_DOUBLE_EQ(*cs.max_value, 42.0); - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - for (std::size_t i = 0; i < N; ++i) { - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(0, i), 42.0); - } -} - -// =========================================================================== -// Bulk append: string column -// =========================================================================== - -TEST(ChunkTest, BulkAppendStrings) { - std::vector cols = { - make_col(1, PrimitiveType::kString, "name"), - }; - TopicChunkBuilder builder(/*topic_id=*/503, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - const char string_data[] = "alphaBravoCharlie"; - const uint32_t offsets[] = {0, 5, 10, 17}; - const Timestamp ts[] = {10, 20, 30}; - - builder.appendTimestamps(Span(ts, 3)); - builder.appendColumnStrings(0, Span(offsets, 4), Span(string_data, 17)); - builder.finishBulkAppend(); - - EXPECT_EQ(builder.rowCount(), 3U); - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.readString(0, 0), "alpha"); - EXPECT_EQ(chunk.readString(0, 1), "Bravo"); - EXPECT_EQ(chunk.readString(0, 2), "Charlie"); -} - -// =========================================================================== -// Bulk append: remaining_capacity -// =========================================================================== - -TEST(ChunkTest, BulkRemainingCapacity) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "x"), - }; - TopicChunkBuilder builder(/*topic_id=*/504, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - EXPECT_EQ(builder.remainingCapacity(), 100U); - - const Timestamp ts[] = {1, 2, 3}; - const float vals[] = {1.0F, 2.0F, 3.0F}; - builder.appendTimestamps(Span(ts, 3)); - builder.appendColumn(0, Span(vals, 3)); - builder.finishBulkAppend(); - - EXPECT_EQ(builder.remainingCapacity(), 97U); -} - -// =========================================================================== -// Bulk append: with validity bitmap -// =========================================================================== - -TEST(ChunkTest, BulkAppendWithValidity) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat64, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/505, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - const double data[] = {1.0, 0.0, 3.0, 0.0}; - const Timestamp ts[] = {10, 20, 30, 40}; - - builder.appendTimestamps(Span(ts, 4)); - builder.appendColumn(0, Span(data, 4)); - // validity: bits [1, 0, 1, 0] = 0b0101 = 0x05 - const uint8_t bitmap[] = {0x05}; - builder.appendColumnValidity(0, BitSpan{Span(bitmap, 1), 0, 4}); - builder.finishBulkAppend(); - - TopicChunk chunk = builder.seal(); - EXPECT_FALSE(chunk.isNull(0, 0)); - EXPECT_TRUE(chunk.isNull(0, 1)); - EXPECT_FALSE(chunk.isNull(0, 2)); - EXPECT_TRUE(chunk.isNull(0, 3)); - EXPECT_EQ(chunk.stats.column_stats[0].null_count, 2U); -} - -// =========================================================================== -// Bulk append: mixed types -// =========================================================================== - -TEST(ChunkTest, BulkAppendMixedTypes) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "f32"), - make_col(2, PrimitiveType::kInt64, "i64"), - make_col(3, PrimitiveType::kBool, "b"), - }; - TopicChunkBuilder builder(/*topic_id=*/506, /*schema_id=*/1, std::move(cols), /*max_rows=*/1000); - - const Timestamp ts[] = {100, 200, 300}; - const float f32[] = {1.0F, 2.0F, 3.0F}; - const int64_t i64[] = {10, 20, 30}; - const uint8_t bools[] = {1, 0, 1}; - - builder.appendTimestamps(Span(ts, 3)); - builder.appendColumn(0, Span(f32, 3)); - builder.appendColumn(1, Span(i64, 3)); - builder.appendColumn(2, Span(bools, 3)); - builder.finishBulkAppend(); - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.stats.row_count, 3U); - EXPECT_FLOAT_EQ(static_cast(chunk.readNumericAsDouble(0, 1)), 2.0F); - EXPECT_DOUBLE_EQ(chunk.readNumericAsDouble(1, 2), 30.0); - EXPECT_TRUE(chunk.readBool(2, 0)); - EXPECT_FALSE(chunk.readBool(2, 1)); - EXPECT_TRUE(chunk.readBool(2, 2)); -} - -// =========================================================================== -// BUG-3: readNumericAsInt64 precision loss through double for constant/FOR -// =========================================================================== - -TEST(ChunkTest, ReadInt64PrecisionConstant) { - std::vector cols = { - make_col(1, PrimitiveType::kInt64, "big"), - }; - TopicChunkBuilder builder(/*topic_id=*/700, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Value that exceeds 2^53 — not exactly representable as double - constexpr int64_t kBig = (int64_t{1} << 53) + 1; // 9007199254740993 - - // All same value → constant encoding - for (int i = 0; i < 3; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, kBig); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_EQ(chunk.readNumericAsInt64(0, i), kBig) << "row " << i; - } -} - -TEST(ChunkTest, ReadInt64PrecisionFOR) { - std::vector cols = { - make_col(1, PrimitiveType::kInt64, "big"), - }; - TopicChunkBuilder builder(/*topic_id=*/701, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Two values > 2^53 with small range → FOR encoding - constexpr int64_t kBase = (int64_t{1} << 53) + 1; - constexpr int64_t kValues[] = {kBase, kBase + 100, kBase + 50}; - - for (int i = 0; i < 3; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, kValues[i]); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - // Range is 100, fits in 1 byte offset vs 8 byte native → FOR - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kFrameOfReference); - - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_EQ(chunk.readNumericAsInt64(0, i), kValues[i]) << "row " << i; - } -} - -// =========================================================================== -// BUG-4: readNumericAsUint64 precision loss through double for constant/FOR -// =========================================================================== - -TEST(ChunkTest, ReadUint64PrecisionConstant) { - std::vector cols = { - make_col(1, PrimitiveType::kUint64, "big_u"), - }; - TopicChunkBuilder builder(/*topic_id=*/702, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - constexpr uint64_t kBig = (uint64_t{1} << 53) + 1; - - for (int i = 0; i < 3; ++i) { - builder.beginRow(static_cast(i)); - builder.set(0, kBig); - builder.finishRow(); - } - - TopicChunk chunk = builder.seal(); - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kConstant); - - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_EQ(chunk.readNumericAsUint64(0, i), kBig) << "row " << i; - } -} - -// =========================================================================== -// BUG-1: Stats tracking loses int64 precision via double cast -// BUG-2: FOR reference computed from lossy double min/max -// =========================================================================== - -TEST(ChunkTest, Int64StatsPreservePrecision) { - std::vector cols = { - make_col(1, PrimitiveType::kInt64, "precise"), - }; - TopicChunkBuilder builder(/*topic_id=*/703, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Two int64 values beyond 2^53 that map to the same double - constexpr int64_t kA = (int64_t{1} << 54) - 1; // rounds to 2^54 - constexpr int64_t kB = (int64_t{1} << 54) + 1; // also rounds to 2^54 - // These must NOT be detected as constant (even though static_cast(kA) == static_cast(kB)) - static_assert( - static_cast(kA) == static_cast(kB), - "test premise: kA and kB must have the same double representation"); - - builder.beginRow(0); - builder.set(0, kA); - builder.finishRow(); - - builder.beginRow(1); - builder.set(0, kB); - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - // The column must NOT be constant-encoded since kA != kB - EXPECT_NE(chunk.columnEncoding(0), EncodingType::kConstant); - - // Values must round-trip exactly - EXPECT_EQ(chunk.readNumericAsInt64(0, 0), kA); - EXPECT_EQ(chunk.readNumericAsInt64(0, 1), kB); -} - -// =========================================================================== -// BUG-8: FOR encode offset truncation (offset_bytes default case) -// =========================================================================== - -TEST(ChunkTest, FOREncodeLargeRange) { - std::vector cols = { - make_col(1, PrimitiveType::kInt64, "wide_for"), - }; - TopicChunkBuilder builder(/*topic_id=*/704, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - // Range fits in uint32 (< 2^32) but needs 4-byte offsets - constexpr int64_t kMin = 0; - constexpr int64_t kMax = int64_t{1} << 31; // 2^31, range fits in uint32 - - builder.beginRow(0); - builder.set(0, kMin); - builder.finishRow(); - builder.beginRow(1); - builder.set(0, kMax); - builder.finishRow(); - builder.beginRow(2); - builder.set(0, kMax / 2); // mid-range value - builder.finishRow(); - - TopicChunk chunk = builder.seal(); - - // Should be FOR encoded with 4-byte offsets (range 2^31 < 2^32 and < 8 byte native) - EXPECT_EQ(chunk.columnEncoding(0), EncodingType::kFrameOfReference); - - EXPECT_EQ(chunk.readNumericAsInt64(0, 0), kMin); - EXPECT_EQ(chunk.readNumericAsInt64(0, 1), kMax); - EXPECT_EQ(chunk.readNumericAsInt64(0, 2), kMax / 2); -} - -// =========================================================================== -// Death tests: debug asserts catch misuse -// Active in debug builds (assert) or when PJ_ASSERT_THROWS is defined. -// =========================================================================== - -#if !defined(NDEBUG) || defined(PJ_ASSERT_THROWS) - -#ifdef PJ_ASSERT_THROWS -#define PJ_EXPECT_ASSERT_FAIL(stmt, msg) EXPECT_THROW(stmt, std::runtime_error) -#else -#define PJ_EXPECT_ASSERT_FAIL(stmt, msg) ASSERT_DEATH(stmt, msg) -#endif - -TEST(ChunkDeathTest, SetWithoutBeginRowAsserts) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/300, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - PJ_EXPECT_ASSERT_FAIL(builder.set(0, 1.0F), "set_float32 called without begin_row"); -} - -TEST(ChunkDeathTest, FinishRowWithoutBeginRowAsserts) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/301, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - PJ_EXPECT_ASSERT_FAIL(builder.finishRow(), "finish_row called without begin_row"); -} - -TEST(ChunkDeathTest, BeginRowWhileRowInProgressAsserts) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/302, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(100); - PJ_EXPECT_ASSERT_FAIL(builder.beginRow(200), "begin_row called while row already in progress"); -} - -TEST(ChunkDeathTest, OutOfBoundsColIndexAsserts) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/303, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(100); - PJ_EXPECT_ASSERT_FAIL(builder.set(5, 1.0F), "col_index out of bounds"); -} - -TEST(ChunkDeathTest, OutOfOrderTimestampAsserts) { - std::vector cols = { - make_col(1, PrimitiveType::kFloat32, "val"), - }; - TopicChunkBuilder builder(/*topic_id=*/304, /*schema_id=*/1, std::move(cols), /*max_rows=*/100); - - builder.beginRow(200); - builder.set(0, 1.0F); - builder.finishRow(); - - PJ_EXPECT_ASSERT_FAIL(builder.beginRow(100), "timestamps must be monotonically non-decreasing"); -} - -#endif // !defined(NDEBUG) || defined(PJ_ASSERT_THROWS) - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/column_buffer_test.cpp b/pj_datastore/tests/column_buffer_test.cpp deleted file mode 100644 index 909aa2df..00000000 --- a/pj_datastore/tests/column_buffer_test.cpp +++ /dev/null @@ -1,485 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/column_buffer.hpp" - -#include - -#include -#include -#include - -namespace PJ { -namespace { - -// Helper: create a ColumnDescriptor for a given PrimitiveType. -ColumnDescriptor make_descriptor(PrimitiveType type, std::string path = "test_field") { - return ColumnDescriptor{.field_id = 0, .logical_type = type, .field_path = std::move(path)}; -} - -// ----------------------------------------------------------------------- -// 1. Float32 append/read -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, Float32AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - buf.appendFloat32(1.5f); - buf.appendFloat32(-3.14f); - buf.appendFloat32(0.0f); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_FLOAT_EQ(buf.readFloat32(0), 1.5f); - EXPECT_FLOAT_EQ(buf.readFloat32(1), -3.14f); - EXPECT_FLOAT_EQ(buf.readFloat32(2), 0.0f); -} - -// ----------------------------------------------------------------------- -// 2. Float64 append/read -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, Float64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat64)); - buf.appendFloat64(2.718281828459045); - buf.appendFloat64(-1e308); - - EXPECT_EQ(buf.rowCount(), 2u); - EXPECT_DOUBLE_EQ(buf.readFloat64(0), 2.718281828459045); - EXPECT_DOUBLE_EQ(buf.readFloat64(1), -1e308); -} - -// ----------------------------------------------------------------------- -// 3. Int32 append/read (including negative) -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, Int32AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - buf.appendInt32(42); - buf.appendInt32(-100); - buf.appendInt32(0); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_EQ(buf.readInt32(0), 42); - EXPECT_EQ(buf.readInt32(1), -100); - EXPECT_EQ(buf.readInt32(2), 0); -} - -// ----------------------------------------------------------------------- -// 4. Int64 append/read (large values) -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, Int64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt64)); - buf.appendInt64(INT64_MAX); - buf.appendInt64(INT64_MIN); - buf.appendInt64(0); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_EQ(buf.readInt64(0), INT64_MAX); - EXPECT_EQ(buf.readInt64(1), INT64_MIN); - EXPECT_EQ(buf.readInt64(2), 0); -} - -// ----------------------------------------------------------------------- -// 5. Uint64 append/read (uint8 logical type widens to uint64 storage) -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, Uint64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kUint64)); - buf.appendUint64(255); - buf.appendUint64(0); - buf.appendUint64(18000000000000000000ULL); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_EQ(buf.readUint64(0), 255U); - EXPECT_EQ(buf.readUint64(1), 0U); - EXPECT_EQ(buf.readUint64(2), 18000000000000000000ULL); -} - -// ----------------------------------------------------------------------- -// 6. Bool append/read -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BoolAppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kBool)); - buf.appendBool(true); - buf.appendBool(false); - buf.appendBool(true); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_TRUE(buf.readBool(0)); - EXPECT_FALSE(buf.readBool(1)); - EXPECT_TRUE(buf.readBool(2)); -} - -// ----------------------------------------------------------------------- -// 7. String append/read -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, StringAppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - buf.appendString("hello"); - buf.appendString("world"); - buf.appendString("test"); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_EQ(buf.readString(0), "hello"); - EXPECT_EQ(buf.readString(1), "world"); - EXPECT_EQ(buf.readString(2), "test"); -} - -// ----------------------------------------------------------------------- -// 8. Null handling -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, NullHandling) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - buf.appendFloat32(1.0f); - buf.appendNull(); - buf.appendFloat32(3.0f); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_FALSE(buf.isNull(0)); - EXPECT_TRUE(buf.isNull(1)); - EXPECT_FALSE(buf.isNull(2)); - - // Non-null values are still readable. - EXPECT_FLOAT_EQ(buf.readFloat32(0), 1.0f); - EXPECT_FLOAT_EQ(buf.readFloat32(2), 3.0f); -} - -// ----------------------------------------------------------------------- -// 9. has_nulls -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, HasNulls) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - EXPECT_FALSE(buf.hasNulls()); - - buf.appendInt32(10); - EXPECT_FALSE(buf.hasNulls()); - - buf.appendNull(); - EXPECT_TRUE(buf.hasNulls()); -} - -// ----------------------------------------------------------------------- -// 10. row_count increments correctly -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, RowCountIncrements) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat64)); - EXPECT_EQ(buf.rowCount(), 0u); - - buf.appendFloat64(1.0); - EXPECT_EQ(buf.rowCount(), 1u); - - buf.appendFloat64(2.0); - EXPECT_EQ(buf.rowCount(), 2u); - - buf.appendNull(); - EXPECT_EQ(buf.rowCount(), 3u); - - buf.appendFloat64(4.0); - EXPECT_EQ(buf.rowCount(), 4u); -} - -// ----------------------------------------------------------------------- -// 11. read_as_double for float32 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, ReadAsDoubleFloat32) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - buf.appendFloat32(1.5f); - - EXPECT_DOUBLE_EQ(buf.readAsDouble(0), 1.5); -} - -// ----------------------------------------------------------------------- -// 12. read_as_double for int32 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, ReadAsDoubleInt32) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - buf.appendInt32(42); - - EXPECT_DOUBLE_EQ(buf.readAsDouble(0), 42.0); -} - -// ----------------------------------------------------------------------- -// 13. Multiple strings of varying lengths -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, MultipleStringsVaryingLengths) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - buf.appendString("a"); - buf.appendString("bb"); - buf.appendString("ccc"); - buf.appendString("dddd"); - buf.appendString("eeeee"); - - EXPECT_EQ(buf.rowCount(), 5u); - EXPECT_EQ(buf.readString(0), "a"); - EXPECT_EQ(buf.readString(1), "bb"); - EXPECT_EQ(buf.readString(2), "ccc"); - EXPECT_EQ(buf.readString(3), "dddd"); - EXPECT_EQ(buf.readString(4), "eeeee"); -} - -// ----------------------------------------------------------------------- -// 14. Empty string -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, EmptyString) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - buf.appendString(""); - - EXPECT_EQ(buf.rowCount(), 1u); - EXPECT_EQ(buf.readString(0), ""); - EXPECT_TRUE(buf.readString(0).empty()); -} - -// ----------------------------------------------------------------------- -// Additional: descriptor accessor -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, DescriptorAccessor) { - auto desc = make_descriptor(PrimitiveType::kFloat64, "position.x"); - TypedColumnBuffer buf(desc); - - EXPECT_EQ(buf.descriptor().logical_type, PrimitiveType::kFloat64); - EXPECT_EQ(buf.descriptor().field_path, "position.x"); -} - -// ----------------------------------------------------------------------- -// Additional: read_as_double for bool -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, ReadAsDoubleBool) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kBool)); - buf.appendBool(true); - buf.appendBool(false); - - EXPECT_DOUBLE_EQ(buf.readAsDouble(0), 1.0); - EXPECT_DOUBLE_EQ(buf.readAsDouble(1), 0.0); -} - -// ----------------------------------------------------------------------- -// Additional: read_as_double for string returns NaN -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, ReadAsDoubleStringReturnsNaN) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - buf.appendString("hello"); - - EXPECT_TRUE(std::isnan(buf.readAsDouble(0))); -} - -// ----------------------------------------------------------------------- -// Additional: null in string column -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, NullInStringColumn) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - buf.appendString("hello"); - buf.appendNull(); - buf.appendString("world"); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_FALSE(buf.isNull(0)); - EXPECT_TRUE(buf.isNull(1)); - EXPECT_FALSE(buf.isNull(2)); - EXPECT_EQ(buf.readString(0), "hello"); - EXPECT_EQ(buf.readString(1), ""); // null string reads as empty - EXPECT_EQ(buf.readString(2), "world"); -} - -// ----------------------------------------------------------------------- -// Additional: buffer accessors are non-empty after appends -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, ValueBufferNonEmpty) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - EXPECT_TRUE(buf.valueBuffer().empty()); - - buf.appendInt32(10); - EXPECT_FALSE(buf.valueBuffer().empty()); - EXPECT_EQ(buf.valueBuffer().size(), sizeof(int32_t)); -} - -TEST(TypedColumnBufferTest, OffsetsBufferForStrings) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - EXPECT_TRUE(buf.offsetsBuffer().empty()); - - buf.appendString("hi"); - // offsets: [0, 2] => 2 * sizeof(uint32_t) = 8 - EXPECT_EQ(buf.offsetsBuffer().size(), 2 * sizeof(uint32_t)); -} - -// ----------------------------------------------------------------------- -// Bulk append: Float32 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkFloat32AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - const float data[] = {1.0f, 2.0f, 3.0f, 4.0f, 5.0f}; - buf.appendFloat32Bulk(Span(data, 5)); - - EXPECT_EQ(buf.rowCount(), 5u); - for (std::size_t i = 0; i < 5; ++i) { - EXPECT_FLOAT_EQ(buf.readFloat32(i), data[i]); - } -} - -// ----------------------------------------------------------------------- -// Bulk append: Float64 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkFloat64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat64)); - const double data[] = {1.1, 2.2, 3.3}; - buf.appendFloat64Bulk(Span(data, 3)); - - EXPECT_EQ(buf.rowCount(), 3u); - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_DOUBLE_EQ(buf.readFloat64(i), data[i]); - } -} - -// ----------------------------------------------------------------------- -// Bulk append: Int32 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkInt32AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - const int32_t data[] = {-100, 0, 42, 999}; - buf.appendInt32Bulk(Span(data, 4)); - - EXPECT_EQ(buf.rowCount(), 4u); - for (std::size_t i = 0; i < 4; ++i) { - EXPECT_EQ(buf.readInt32(i), data[i]); - } -} - -// ----------------------------------------------------------------------- -// Bulk append: Int64 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkInt64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt64)); - const int64_t data[] = {INT64_MIN, 0, INT64_MAX}; - buf.appendInt64Bulk(Span(data, 3)); - - EXPECT_EQ(buf.rowCount(), 3u); - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_EQ(buf.readInt64(i), data[i]); - } -} - -// ----------------------------------------------------------------------- -// Bulk append: Uint64 -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkUint64AppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kUint64)); - const uint64_t data[] = {0, 255, 18000000000000000000ULL}; - buf.appendUint64Bulk(Span(data, 3)); - - EXPECT_EQ(buf.rowCount(), 3u); - for (std::size_t i = 0; i < 3; ++i) { - EXPECT_EQ(buf.readUint64(i), data[i]); - } -} - -// ----------------------------------------------------------------------- -// Bulk append: Bool -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkBoolAppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kBool)); - const uint8_t data[] = {1, 0, 1, 1, 0}; - buf.appendBoolBulk(Span(data, 5)); - - EXPECT_EQ(buf.rowCount(), 5u); - EXPECT_TRUE(buf.readBool(0)); - EXPECT_FALSE(buf.readBool(1)); - EXPECT_TRUE(buf.readBool(2)); - EXPECT_TRUE(buf.readBool(3)); - EXPECT_FALSE(buf.readBool(4)); -} - -// ----------------------------------------------------------------------- -// Bulk append: Strings -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkStringAppendRead) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - // "hello" "world" "!" - const char string_data[] = "helloworld!"; - const uint32_t offsets[] = {0, 5, 10, 11}; - buf.appendStringsBulk(Span(offsets, 4), Span(string_data, 11)); - - EXPECT_EQ(buf.rowCount(), 3u); - EXPECT_EQ(buf.readString(0), "hello"); - EXPECT_EQ(buf.readString(1), "world"); - EXPECT_EQ(buf.readString(2), "!"); -} - -// ----------------------------------------------------------------------- -// Bulk append: Validity bitmap -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkValidityBitmap) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - const float data[] = {1.0f, 0.0f, 3.0f, 0.0f}; - buf.appendFloat32Bulk(Span(data, 4)); - - // Validity bitmap: bits [1,0,1,0] = 0b0101 = 0x05 - const uint8_t bitmap[] = {0x05}; - buf.appendValidityBulk(BitSpan{Span(bitmap, 1), 0, 4}); - - EXPECT_FALSE(buf.isNull(0)); - EXPECT_TRUE(buf.isNull(1)); - EXPECT_FALSE(buf.isNull(2)); - EXPECT_TRUE(buf.isNull(3)); - EXPECT_TRUE(buf.hasNulls()); -} - -// ----------------------------------------------------------------------- -// Bulk append: Mixed single + bulk -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkAfterSingleAppend) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - buf.appendFloat32(1.0f); - buf.appendFloat32(2.0f); - - const float bulk[] = {3.0f, 4.0f, 5.0f}; - buf.appendFloat32Bulk(Span(bulk, 3)); - - EXPECT_EQ(buf.rowCount(), 5u); - EXPECT_FLOAT_EQ(buf.readFloat32(0), 1.0f); - EXPECT_FLOAT_EQ(buf.readFloat32(1), 2.0f); - EXPECT_FLOAT_EQ(buf.readFloat32(2), 3.0f); - EXPECT_FLOAT_EQ(buf.readFloat32(3), 4.0f); - EXPECT_FLOAT_EQ(buf.readFloat32(4), 5.0f); -} - -// ----------------------------------------------------------------------- -// Bulk append: Zero count is a no-op -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkZeroCount) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kFloat32)); - buf.appendFloat32Bulk(Span()); - EXPECT_EQ(buf.rowCount(), 0u); -} - -// ----------------------------------------------------------------------- -// Bulk append: Strings with non-zero base offset -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkStringsNonZeroBaseOffset) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kString)); - // Simulates Arrow-style offsets that don't start at 0 - const char string_data[] = "XXXXXhelloworld"; - const uint32_t offsets[] = {5, 10, 15}; // 2 strings: "hello", "world" - buf.appendStringsBulk(Span(offsets, 3), Span(string_data, 15)); - - EXPECT_EQ(buf.rowCount(), 2u); - EXPECT_EQ(buf.readString(0), "hello"); - EXPECT_EQ(buf.readString(1), "world"); -} - -// ----------------------------------------------------------------------- -// Bulk append: Validity with bit_offset -// ----------------------------------------------------------------------- -TEST(TypedColumnBufferTest, BulkValidityWithBitOffset) { - TypedColumnBuffer buf(make_descriptor(PrimitiveType::kInt32)); - const int32_t data[] = {10, 20, 30}; - buf.appendInt32Bulk(Span(data, 3)); - - // bitmap = 0b01010000, bit_offset = 4, so bits 4,5,6 = 1,0,1 - // In Arrow's LSB-first layout: byte 0x50 = 0b01010000 - // bit 4 = (0x50 >> 4) & 1 = 1 (valid) - // bit 5 = (0x50 >> 5) & 1 = 0 (null) - // bit 6 = (0x50 >> 6) & 1 = 1 (valid) - const uint8_t bitmap[] = {0x50}; - buf.appendValidityBulk(BitSpan{Span(bitmap, 1), 4, 3}); - - EXPECT_FALSE(buf.isNull(0)); // bit 4 = 1 -> valid - EXPECT_TRUE(buf.isNull(1)); // bit 5 = 0 -> null - EXPECT_FALSE(buf.isNull(2)); // bit 6 = 1 -> valid -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/derived_engine_test.cpp b/pj_datastore/tests/derived_engine_test.cpp deleted file mode 100644 index 1b5de7e8..00000000 --- a/pj_datastore/tests/derived_engine_test.cpp +++ /dev/null @@ -1,1264 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/derived_engine.hpp" - -#include - -#include -#include -#include -#include -#include - -#include "pj_base/dataset.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/builtin_transforms.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -// --------------------------------------------------------------------------- -// Test helpers -// --------------------------------------------------------------------------- - -// Create a dataset and return its id. -static PJ::DatasetId make_dataset(DataEngine& engine, const std::string& name = "test") { - auto id_or = engine.createDataset(PJ::DatasetDescriptor{.source_name = name, .time_domain_id = 0}); - return *id_or; -} - -// Write `n` rows to a float64 scalar topic with value = slope * (t_ns / 1e9). -// Commits the chunk. Returns the TopicId. -// Timestamps: 0, step_ns, 2*step_ns, ... -static PJ::TopicId make_linear_topic( - DataEngine& engine, PJ::DatasetId dataset_id, double slope, int n, PJ::Timestamp step_ns = 1'000'000'000LL) { - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(dataset_id, "src", PJ::NumericType::kFloat64); - PJ::TopicId tid = handle_or->topic_id; - for (int i = 0; i < n; ++i) { - PJ::Timestamp ts = static_cast(i) * step_ns; - double v = slope * (static_cast(i) * static_cast(step_ns) * 1e-9); - writer.appendScalar(*handle_or, ts, v); - } - // commit_chunks returns the changed topic IDs — pass directly to on_source_committed. - engine.commitChunks(writer.flushAll()); - return tid; -} - -// Append `n` more rows to an existing scalar topic (continuing timestamps from start_i). -static void append_linear_rows( - DataEngine& engine, PJ::TopicId src_topic_id, double slope, int n, int start_i, - PJ::Timestamp step_ns = 1'000'000'000LL) { - // We need to write to an existing topic via begin_row / set_float64 / finish_row. - DataWriter writer = engine.createWriter(); - for (int i = start_i; i < start_i + n; ++i) { - PJ::Timestamp ts = static_cast(i) * step_ns; - double v = slope * (static_cast(i) * static_cast(step_ns) * 1e-9); - auto s = writer.beginRow(src_topic_id, ts); - (void)s; - writer.set(src_topic_id, 0, v); - auto s2 = writer.finishRow(src_topic_id); - (void)s2; - } - engine.commitChunks(writer.flushAll()); -} - -// Collect all float64 values from a topic in timestamp order. -static std::vector collect_values(DataEngine& engine, PJ::TopicId topic_id) { - const TopicStorage* storage = engine.getTopicStorage(topic_id); - if (!storage) { - return {}; - } - std::vector out; - auto cursor = rangeQuery(storage->sealedChunks(), 0, std::numeric_limits::max()); - cursor.forEach([&](const SampleRow& row) { out.push_back(row.chunk->readNumericAsDouble(0, row.row_index)); }); - return out; -} - -// Wrapper: on_source_committed from an initializer list (std::span can't take {}). -static void notify(DerivedEngine& derived, std::initializer_list topics) { - std::vector v(topics); - derived.onSourceCommitted(v); -} - -// Collect (timestamp, value) pairs. -static std::vector> collect_rows(DataEngine& engine, PJ::TopicId topic_id) { - const TopicStorage* storage = engine.getTopicStorage(topic_id); - if (!storage) { - return {}; - } - std::vector> out; - auto cursor = rangeQuery(storage->sealedChunks(), 0, std::numeric_limits::max()); - cursor.forEach( - [&](const SampleRow& row) { out.emplace_back(row.timestamp, row.chunk->readNumericAsDouble(0, row.row_index)); }); - return out; -} - -// --------------------------------------------------------------------------- -// DerivativeTransform — unit tests (no engine needed) -// --------------------------------------------------------------------------- - -TEST(DerivativeTransformTest, SkipsFirstRow) { - DerivativeTransform op; - PJ::Timestamp t = 0; - VarValue v = 0.0; - EXPECT_FALSE(op.calculate(0, VarValue{0.0}, t, v)); -} - -TEST(DerivativeTransformTest, CorrectDerivative_ConstantRate) { - DerivativeTransform op; - PJ::Timestamp out_t; - VarValue out_v = 0.0; - - // slope = 3.0, step = 1s → dy/dt should be 3.0 - EXPECT_FALSE(op.calculate(0, VarValue{0.0}, out_t, out_v)); - EXPECT_TRUE(op.calculate(1'000'000'000LL, VarValue{3.0}, out_t, out_v)); - EXPECT_EQ(out_t, 1'000'000'000LL); - EXPECT_NEAR(std::get(out_v), 3.0, 1e-9); - - EXPECT_TRUE(op.calculate(2'000'000'000LL, VarValue{6.0}, out_t, out_v)); - EXPECT_NEAR(std::get(out_v), 3.0, 1e-9); -} - -TEST(DerivativeTransformTest, ZeroDt_DoesNotDivideByZero) { - // dt = 0 should produce 0.0, not inf/nan - DerivativeTransform op; - PJ::Timestamp out_t; - VarValue out_v = 0.0; - EXPECT_FALSE(op.calculate(0, VarValue{1.0}, out_t, out_v)); - EXPECT_TRUE(op.calculate(0, VarValue{2.0}, out_t, out_v)); - EXPECT_EQ(std::get(out_v), 0.0); -} - -TEST(DerivativeTransformTest, Reset_ClearsState) { - DerivativeTransform op; - PJ::Timestamp out_t; - VarValue out_v = 0.0; - - // First pass - EXPECT_FALSE(op.calculate(0, VarValue{0.0}, out_t, out_v)); - EXPECT_TRUE(op.calculate(1'000'000'000LL, VarValue{5.0}, out_t, out_v)); - - // After reset, first row must be suppressed again - op.reset(); - EXPECT_FALSE(op.calculate(0, VarValue{0.0}, out_t, out_v)); - EXPECT_TRUE(op.calculate(1'000'000'000LL, VarValue{5.0}, out_t, out_v)); - EXPECT_NEAR(std::get(out_v), 5.0, 1e-9); -} - -TEST(DerivativeTransformTest, OutputKind_IsFloat64) { - DerivativeTransform op; - EXPECT_EQ(op.outputKind(StorageKind::kFloat64), StorageKind::kFloat64); - EXPECT_EQ(op.outputKind(StorageKind::kFloat32), StorageKind::kFloat64); - EXPECT_EQ(op.outputKind(StorageKind::kInt64), StorageKind::kFloat64); -} - -// --------------------------------------------------------------------------- -// add_siso_transform -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, AddTransform_CreatesOutputTopic) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - auto node_or = derived.addSisoTransform(src, "deriv", ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::NodeId node = *node_or; - EXPECT_TRUE(derived.hasNode(node)); - - auto out_topics = derived.outputTopics(node); - ASSERT_EQ(out_topics.size(), 1u); - EXPECT_NE(engine.getTopicStorage(out_topics[0]), nullptr); -} - -TEST(DerivedEngineTest, AddTransform_DuplicateOutputName_Fails) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - ASSERT_TRUE(derived.addSisoTransform(src, "deriv", ds, std::make_unique()).has_value()); - // Same output name → should fail - auto r = derived.addSisoTransform(src, "deriv", ds, std::make_unique()); - EXPECT_FALSE(r.has_value()); -} - -TEST(DerivedEngineTest, AddTransform_UnknownInputTopic_Fails) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - auto r = derived.addSisoTransform(9999u, "deriv", ds, std::make_unique()); - EXPECT_FALSE(r.has_value()); -} - -// --------------------------------------------------------------------------- -// topological_order -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, TopologicalOrder_SingleNode) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - PJ::NodeId n = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - auto order = derived.topologicalOrder(); - ASSERT_EQ(order.size(), 1u); - EXPECT_EQ(order[0], n); -} - -TEST(DerivedEngineTest, TopologicalOrder_Chain_ABOrder) { - // A → B: output of A is input of B. Order must be [A, B]. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 10); - PJ::NodeId a = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - PJ::TopicId a_out = derived.outputTopics(a)[0]; - PJ::NodeId b = *derived.addSisoTransform(a_out, "d2", ds, std::make_unique()); - - auto order = derived.topologicalOrder(); - ASSERT_EQ(order.size(), 2u); - EXPECT_EQ(order[0], a); - EXPECT_EQ(order[1], b); -} - -TEST(DerivedEngineTest, TopologicalOrder_Fork) { - // A → B and A → C: A must appear before both B and C. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 10); - PJ::NodeId a = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - PJ::TopicId a_out = derived.outputTopics(a)[0]; - PJ::NodeId b = *derived.addSisoTransform(a_out, "d2", ds, std::make_unique()); - PJ::NodeId c = *derived.addSisoTransform(a_out, "d3", ds, std::make_unique()); - - auto order = derived.topologicalOrder(); - ASSERT_EQ(order.size(), 3u); - EXPECT_EQ(order[0], a); - // B and C must both appear after A - EXPECT_NE(std::find(order.begin(), order.end(), b), order.end()); - EXPECT_NE(std::find(order.begin(), order.end(), c), order.end()); -} - -// --------------------------------------------------------------------------- -// on_source_committed / dirty propagation -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, DirtyPropagation_SourceChanged) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - PJ::NodeId n = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - - // Run schedule to clear dirty flag - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // Append more data and notify - append_linear_rows(engine, src, 1.0, 5, 5); - notify(derived, {src}); - - // Node must be dirty again — schedule should produce more rows - auto before = collect_values(engine, derived.outputTopics(n)[0]).size(); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto after = collect_values(engine, derived.outputTopics(n)[0]).size(); - EXPECT_GT(after, before); -} - -TEST(DerivedEngineTest, DirtyPropagation_Chain) { - // A → B: committing source dirtifies A; schedule runs A and dirtifies B; - // subsequent schedule runs B. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 10); - PJ::NodeId a = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - PJ::TopicId a_out = derived.outputTopics(a)[0]; - PJ::NodeId b = *derived.addSisoTransform(a_out, "d2", ds, std::make_unique()); - - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // Both A and B should have been processed - EXPECT_FALSE(collect_values(engine, derived.outputTopics(a)[0]).empty()); - EXPECT_FALSE(collect_values(engine, derived.outputTopics(b)[0]).empty()); -} - -// --------------------------------------------------------------------------- -// schedule (incremental) -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, Schedule_ProducesCorrectDerivative) { - // slope=2.0, step=1s, 11 rows → derivative is always 2.0 (10 rows output) - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 2.0, 11); - PJ::NodeId node = *derived.addSisoTransform(src, "deriv", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto vals = collect_values(engine, derived.outputTopics(node)[0]); - ASSERT_EQ(vals.size(), 10u); - for (double v : vals) { - EXPECT_NEAR(v, 2.0, 1e-6); - } -} - -TEST(DerivedEngineTest, Schedule_SecondCallNoNewChunks_NoOp) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - PJ::NodeId node = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto count1 = collect_values(engine, derived.outputTopics(node)[0]).size(); - - // No new data — second schedule should not change output count - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto count2 = collect_values(engine, derived.outputTopics(node)[0]).size(); - EXPECT_EQ(count1, count2); -} - -TEST(DerivedEngineTest, Schedule_Lazy_SkipsInactiveNode) { - // Two independent source nodes. schedule({a}) should not run b. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src1 = make_linear_topic(engine, ds, 1.0, 5); - PJ::TopicId src2 = make_linear_topic(engine, ds, 2.0, 5); - - PJ::NodeId a = *derived.addSisoTransform(src1, "da", ds, std::make_unique()); - PJ::NodeId b = *derived.addSisoTransform(src2, "db", ds, std::make_unique()); - - notify(derived, {src1, src2}); - // Only process node A - ASSERT_TRUE(derived.scheduleActive({a}).has_value()); - - auto a_vals = collect_values(engine, derived.outputTopics(a)[0]); - auto b_vals = collect_values(engine, derived.outputTopics(b)[0]); - - EXPECT_FALSE(a_vals.empty()); // A was processed - EXPECT_TRUE(b_vals.empty()); // B was skipped -} - -TEST(DerivedEngineTest, Schedule_Chain_BothNodesRun) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 12); - PJ::NodeId a = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - PJ::TopicId a_out = derived.outputTopics(a)[0]; - PJ::NodeId b = *derived.addSisoTransform(a_out, "d2", ds, std::make_unique()); - - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // A: 11 derivative rows of linear → all constant - auto a_vals = collect_values(engine, derived.outputTopics(a)[0]); - EXPECT_EQ(a_vals.size(), 11u); - - // B: derivative of constant → all zero (10 rows, first suppressed) - auto b_vals = collect_values(engine, derived.outputTopics(b)[0]); - EXPECT_EQ(b_vals.size(), 10u); - for (double v : b_vals) { - EXPECT_NEAR(v, 0.0, 1e-6); - } -} - -// --------------------------------------------------------------------------- -// recompute_batch -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, RecomputeBatch_ClearsAndRegenerates) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 6); - PJ::NodeId node = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto before = collect_values(engine, derived.outputTopics(node)[0]); - ASSERT_FALSE(before.empty()); - - // recompute_batch clears output and replays from scratch - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - - auto after = collect_values(engine, derived.outputTopics(node)[0]); - EXPECT_EQ(before.size(), after.size()); - for (std::size_t i = 0; i < before.size(); ++i) { - EXPECT_NEAR(before[i], after[i], 1e-9); - } -} - -// --------------------------------------------------------------------------- -// Parity: incremental == batch -// --------------------------------------------------------------------------- - -TEST(DerivedEngineTest, Parity_SingleChunk) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 3.0, 11); - PJ::NodeId node = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto incremental = collect_values(engine, derived.outputTopics(node)[0]); - - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, derived.outputTopics(node)[0]); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -TEST(DerivedEngineTest, Parity_TwoChunks_CrossBoundary) { - // The cross-chunk boundary row is the key correctness test: state must carry - // over naturally in the incremental path. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - // Chunk 1: 20 rows (forces auto-chunk at 1024 capacity — but step_ns large enough - // that all rows stay in one chunk unless we push more) - PJ::TopicId src = make_linear_topic(engine, ds, 2.0, 20); - PJ::NodeId node = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // Chunk 2: append 20 more rows - append_linear_rows(engine, src, 2.0, 20, 20); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto incremental = collect_values(engine, derived.outputTopics(node)[0]); - - // Batch recompute and compare - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, derived.outputTopics(node)[0]); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -TEST(DerivedEngineTest, Parity_ThreeChunks) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 5.0, 10); - PJ::NodeId node = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - append_linear_rows(engine, src, 5.0, 10, 10); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - append_linear_rows(engine, src, 5.0, 10, 20); - notify(derived, {src}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto incremental = collect_values(engine, derived.outputTopics(node)[0]); - - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, derived.outputTopics(node)[0]); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -// commit_chunks returns deduplicated changed topic IDs usable directly with -// on_source_committed — the streaming frame-loop pattern. -TEST(DerivedEngineTest, CommitCycle_ReturnValueDrivesNotify) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - DataWriter writer = engine.createWriter(); - auto handle = *writer.registerScalarSeries(ds, "sig", PJ::NumericType::kFloat64); - PJ::TopicId src = handle.topic_id; - - PJ::NodeId node = *derived.addSisoTransform(src, "d_sig", ds, std::make_unique()); - PJ::TopicId out = derived.outputTopics(node)[0]; - - // Frame 1: write 5 samples and use the one-liner pattern. - for (int i = 0; i < 5; ++i) { - writer.appendScalar(handle, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - derived.onSourceCommitted(engine.commitChunks(writer.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto after_frame1 = collect_values(engine, out).size(); - EXPECT_GT(after_frame1, 0u); - - // Frame 2: write 5 more samples. - for (int i = 5; i < 10; ++i) { - writer.appendScalar(handle, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - derived.onSourceCommitted(engine.commitChunks(writer.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto after_frame2 = collect_values(engine, out).size(); - EXPECT_GT(after_frame2, after_frame1); - - // Verify return value: single topic flushed → exactly one ID returned. - for (int i = 10; i < 13; ++i) { - writer.appendScalar(handle, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - auto changed = engine.commitChunks(writer.flushAll()); - ASSERT_EQ(changed.size(), 1u); - EXPECT_EQ(changed[0], src); -} - -// Regression: add_siso_transform must work on a series created via -// register_scalar_series even when no chunk has been committed yet -// (fewer rows than max_chunk_rows, or no flush/commit called at all). -TEST(DerivedEngineTest, AddTransform_NoCommittedChunks_Succeeds) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - // Create a topic with a few rows but do NOT commit any chunks. - DataWriter writer = engine.createWriter(); - auto handle = *writer.registerScalarSeries(ds, "tiny_series", PJ::NumericType::kFloat64); - PJ::TopicId src = handle.topic_id; - - // Write 3 rows (well below default max_chunk_rows=1024) — no commit. - for (int i = 0; i < 3; ++i) { - writer.appendScalar(handle, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - // Deliberately skip flush / commit_chunks. - - // add_siso_transform must succeed without any committed chunks in storage. - auto result = derived.addSisoTransform(src, "d_tiny", ds, std::make_unique()); - EXPECT_TRUE(result.has_value()) << "Expected success, got: " << (result.has_value() ? "" : result.error()); -} - -// --------------------------------------------------------------------------- -// MIMO transform helpers -// --------------------------------------------------------------------------- - -// SumMimoTransform: N inputs → 1 output (sum of all inputs as double). -class SumMimoTransform : public IMIMOTransform { - public: - std::vector outputKinds(PJ::Span /*input_kinds*/) const override { - return {StorageKind::kFloat64}; - } - - bool calculate( - PJ::Timestamp time, PJ::Span inputs, PJ::Timestamp& out_time, - std::vector& output) override { - out_time = time; - double sum = 0.0; - for (const auto& v : inputs) { - sum += std::get(v); - } - output[0] = sum; - return true; - } -}; - -// DiffMimoTransform: 2 inputs → 1 output (inputs[0] - inputs[1]). -class DiffMimoTransform : public IMIMOTransform { - public: - std::vector outputKinds(PJ::Span /*input_kinds*/) const override { - return {StorageKind::kFloat64}; - } - - bool calculate( - PJ::Timestamp time, PJ::Span inputs, PJ::Timestamp& out_time, - std::vector& output) override { - out_time = time; - output[0] = std::get(inputs[0]) - std::get(inputs[1]); - return true; - } -}; - -// Collect (timestamp, value) pairs for a given column index. -static std::vector> collect_rows_col( - DataEngine& engine, PJ::TopicId topic_id, std::size_t col = 0) { - const TopicStorage* storage = engine.getTopicStorage(topic_id); - if (!storage) { - return {}; - } - std::vector> out; - auto cursor = rangeQuery(storage->sealedChunks(), 0, std::numeric_limits::max()); - cursor.forEach([&](const SampleRow& row) { - out.emplace_back(row.timestamp, row.chunk->readNumericAsDouble(col, row.row_index)); - }); - return out; -} - -// --------------------------------------------------------------------------- -// MIMO — add_mimo_transform: basic registration -// --------------------------------------------------------------------------- - -TEST(MimoTransformTest, AddMimo_CreatesOutputTopic) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 5); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 5); - - auto node_or = derived.addMimoTransform({t1, t2}, {"sum_out"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - - PJ::NodeId node = *node_or; - EXPECT_TRUE(derived.hasNode(node)); - auto outs = derived.outputTopics(node); - ASSERT_EQ(outs.size(), 1u); - EXPECT_NE(engine.getTopicStorage(outs[0]), nullptr); -} - -TEST(MimoTransformTest, AddMimo_UnknownInputTopic_Fails) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - auto r = derived.addMimoTransform({9999u}, {"out"}, ds, std::make_unique()); - EXPECT_FALSE(r.has_value()); -} - -TEST(MimoTransformTest, AddMimo_DuplicateOutputName_Fails) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 5); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 5); - - ASSERT_TRUE(derived.addMimoTransform({t1, t2}, {"dup"}, ds, std::make_unique()).has_value()); - // Same output name in same dataset must fail. - auto r = derived.addMimoTransform({t1, t2}, {"dup"}, ds, std::make_unique()); - EXPECT_FALSE(r.has_value()); -} - -TEST(MimoTransformTest, AddMimo_MultipleOutputTopics) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 5); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 5); - - // DiffMimoTransform produces 1 output; use two separate nodes for two outputs. - auto node_or = derived.addMimoTransform({t1, t2}, {"diff_out"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - ASSERT_EQ(derived.outputTopics(*node_or).size(), 1u); -} - -// --------------------------------------------------------------------------- -// MIMO — join semantics (inner join on exact timestamps) -// --------------------------------------------------------------------------- - -TEST(MimoTransformTest, JoinSemantics_OnlyMatchingTimestamps) { - // Topic A: t=0,1,2,3,4 s (all 5 timestamps) - // Topic B: t=0,2,4 s (3 timestamps, a subset) - // Sum at t=0: 0+0=0, t=2: 2+4=6, t=4: 4+8=12 - // t=1 and t=3 are in A but not B → no output row. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - // Topic A: t = 0,1,2,3,4 s, value = i seconds - PJ::TopicId ta; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "a", PJ::NumericType::kFloat64); - ta = h.topic_id; - for (int i = 0; i < 5; ++i) { - w.appendScalar(h, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - engine.commitChunks(w.flushAll()); - } - - // Topic B: t = 0,2,4 s, value = 0,4,8 - PJ::TopicId tb; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "b", PJ::NumericType::kFloat64); - tb = h.topic_id; - for (int i = 0; i < 3; ++i) { - w.appendScalar(h, static_cast(i * 2) * 1'000'000'000LL, static_cast(i * 2 * 2)); - } - engine.commitChunks(w.flushAll()); - } - - auto node_or = derived.addMimoTransform({ta, tb}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - - notify(derived, {ta, tb}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto rows = collect_rows_col(engine, derived.outputTopics(*node_or)[0]); - ASSERT_EQ(rows.size(), 3u); // only t=0,2,4 produce output - EXPECT_NEAR(rows[0].second, 0.0, 1e-9); // 0+0 - EXPECT_NEAR(rows[1].second, 6.0, 1e-9); // 2+4 - EXPECT_NEAR(rows[2].second, 12.0, 1e-9); // 4+8 -} - -TEST(MimoTransformTest, JoinSemantics_NoCommonTimestamps_NoOutput) { - // A: t=0,1 s; B: t=2,3 s → no overlap → no output rows - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId ta; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "a2", PJ::NumericType::kFloat64); - ta = h.topic_id; - for (int i = 0; i < 2; ++i) { - w.appendScalar(h, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - engine.commitChunks(w.flushAll()); - } - PJ::TopicId tb; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "b2", PJ::NumericType::kFloat64); - tb = h.topic_id; - for (int i = 2; i < 4; ++i) { - w.appendScalar(h, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - engine.commitChunks(w.flushAll()); - } - - auto node_or = derived.addMimoTransform({ta, tb}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - notify(derived, {ta, tb}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - EXPECT_TRUE(collect_values(engine, derived.outputTopics(*node_or)[0]).empty()); -} - -// --------------------------------------------------------------------------- -// MIMO — schedule (incremental) -// --------------------------------------------------------------------------- - -TEST(MimoTransformTest, Schedule_ProducesCorrectSum) { - // Two topics with the same timestamps (0,1,2,...,9 seconds). - // A[i] = 1.0 * i, B[i] = 2.0 * i. Sum[i] = 3.0 * i. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 10); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 10); - - auto node_or = derived.addMimoTransform({t1, t2}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::TopicId out = derived.outputTopics(*node_or)[0]; - - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto rows = collect_rows_col(engine, out); - ASSERT_EQ(rows.size(), 10u); - for (int i = 0; i < 10; ++i) { - double expected = 3.0 * static_cast(i); // (1.0 + 2.0) * i - EXPECT_NEAR(rows[i].second, expected, 1e-9) << "row " << i; - } -} - -TEST(MimoTransformTest, Schedule_IncrementalTwoChunks) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 10); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 10); - - auto node_or = derived.addMimoTransform({t1, t2}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::TopicId out = derived.outputTopics(*node_or)[0]; - - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - std::size_t after_first = collect_values(engine, out).size(); - EXPECT_EQ(after_first, 10u); - - // Second batch of data - append_linear_rows(engine, t1, 1.0, 10, 10); - append_linear_rows(engine, t2, 2.0, 10, 10); - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - std::size_t after_second = collect_values(engine, out).size(); - EXPECT_EQ(after_second, 20u); -} - -// --------------------------------------------------------------------------- -// MIMO — recompute_batch + parity -// --------------------------------------------------------------------------- - -TEST(MimoTransformTest, Parity_IncrementalMatchesBatch_SingleChunk) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 10); - PJ::TopicId t2 = make_linear_topic(engine, ds, 3.0, 10); - - auto node_or = derived.addMimoTransform({t1, t2}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::NodeId node = *node_or; - PJ::TopicId out = derived.outputTopics(node)[0]; - - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto incremental = collect_values(engine, out); - - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, out); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -TEST(MimoTransformTest, Parity_IncrementalMatchesBatch_MultipleChunks) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 10); - PJ::TopicId t2 = make_linear_topic(engine, ds, 2.0, 10); - - auto node_or = derived.addMimoTransform({t1, t2}, {"sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::NodeId node = *node_or; - PJ::TopicId out = derived.outputTopics(node)[0]; - - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - append_linear_rows(engine, t1, 1.0, 10, 10); - append_linear_rows(engine, t2, 2.0, 10, 10); - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto incremental = collect_values(engine, out); - - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, out); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -// --------------------------------------------------------------------------- -// MIMO — chained with SISO -// --------------------------------------------------------------------------- - -TEST(MimoTransformTest, ChainedSisoThenMimo) { - // Compute derivative of two source series (SISO), then sum the derivatives (MIMO). - // src1: y = 2*t → dy/dt = 2.0 - // src2: y = 3*t → dy/dt = 3.0 - // MIMO sum = 5.0 for each output row. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src1 = make_linear_topic(engine, ds, 2.0, 11); - PJ::TopicId src2 = make_linear_topic(engine, ds, 3.0, 11); - - PJ::NodeId n1 = *derived.addSisoTransform(src1, "d1", ds, std::make_unique()); - PJ::NodeId n2 = *derived.addSisoTransform(src2, "d2", ds, std::make_unique()); - PJ::TopicId d1_out = derived.outputTopics(n1)[0]; - PJ::TopicId d2_out = derived.outputTopics(n2)[0]; - - auto node_or = derived.addMimoTransform({d1_out, d2_out}, {"sum_deriv"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::TopicId sum_out = derived.outputTopics(*node_or)[0]; - - notify(derived, {src1, src2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto vals = collect_values(engine, sum_out); - EXPECT_FALSE(vals.empty()); - for (double v : vals) { - EXPECT_NEAR(v, 5.0, 1e-6); - } -} - -// --------------------------------------------------------------------------- -// MIMO — duplicate timestamp bug (C1) -// --------------------------------------------------------------------------- - -// When topic A has two rows at t=5 (equal timestamps are permitted by the -// engine), the N-way join must produce exactly one output row at t=5 using -// consistent (last-write-wins) values — NOT two output rows at t=5 (duplicate). -TEST(MimoTransformTest, DuplicateTimestamp_ProducesOneOutputRow) { - // Topic A: t=0,5,5,10 ns — t=5 appears twice (values 1.0 and 2.0) - // Topic B: t=0,5,10 ns — normal, no duplicates - // Expected join at t=0,5,10 → 3 output rows. - // Bug: without dedup, joined_ts=[0,5,5,10] → 4 output rows, t=5 appears twice. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId ta; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "a_dup", PJ::NumericType::kFloat64); - ta = h.topic_id; - w.appendScalar(h, 0LL, 0.0); - w.appendScalar(h, 5LL, 1.0); // first row at t=5 - w.appendScalar(h, 5LL, 2.0); // second row at t=5 (equal timestamp allowed) - w.appendScalar(h, 10LL, 3.0); - engine.commitChunks(w.flushAll()); - } - - PJ::TopicId tb; - { - DataWriter w = engine.createWriter(); - auto h = *w.registerScalarSeries(ds, "b_dup", PJ::NumericType::kFloat64); - tb = h.topic_id; - w.appendScalar(h, 0LL, 10.0); - w.appendScalar(h, 5LL, 20.0); - w.appendScalar(h, 10LL, 30.0); - engine.commitChunks(w.flushAll()); - } - - auto node_or = derived.addMimoTransform({ta, tb}, {"sum_dup"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - - notify(derived, {ta, tb}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto rows = collect_rows_col(engine, derived.outputTopics(*node_or)[0]); - // Must produce exactly 3 rows (t=0, t=5, t=10), not 4. - ASSERT_EQ(rows.size(), 3u) << "Duplicate timestamp in input caused duplicate output rows"; - EXPECT_NEAR(rows[0].second, 10.0, 1e-9); // 0+10 - // t=5: last-write-wins for A → 2.0; B=20.0 → sum=22.0 - EXPECT_NEAR(rows[1].second, 22.0, 1e-9); - EXPECT_NEAR(rows[2].second, 33.0, 1e-9); // 3+30 -} - -// --------------------------------------------------------------------------- -// MIMO — staggered cross-chunk parity (I1) -// --------------------------------------------------------------------------- - -// Parity with staggered inputs: A advances ahead of B across chunk boundaries. -// This tests the watermark's correctness when topic rates differ. -TEST(MimoTransformTest, Parity_StaggeredChunks_IncrementalMatchesBatch) { - // Chunk 1: A commits t=0..9 (10 rows), B commits t=0,2,4,6,8 (5 rows, even only) - // Chunk 2: A commits t=10..14 (5 rows), B commits t=10,12,14 (3 rows, even only) - // Expected joins: t=0,2,4,6,8,10,12,14 (8 rows) - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - // Register both topics upfront - DataWriter wa = engine.createWriter(); - auto ha = *wa.registerScalarSeries(ds, "stag_a", PJ::NumericType::kFloat64); - PJ::TopicId ta = ha.topic_id; - - DataWriter wb = engine.createWriter(); - auto hb = *wb.registerScalarSeries(ds, "stag_b", PJ::NumericType::kFloat64); - PJ::TopicId tb = hb.topic_id; - - auto node_or = derived.addMimoTransform({ta, tb}, {"stag_sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::NodeId node = *node_or; - PJ::TopicId out = derived.outputTopics(node)[0]; - - // Chunk 1 - for (int i = 0; i < 10; ++i) { - wa.appendScalar(ha, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - for (int i = 0; i < 5; ++i) { - wb.appendScalar(hb, static_cast(i * 2) * 1'000'000'000LL, static_cast(i * 2)); - } - derived.onSourceCommitted(engine.commitChunks(wa.flushAll())); - derived.onSourceCommitted(engine.commitChunks(wb.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // Chunk 2 - for (int i = 10; i < 15; ++i) { - wa.appendScalar(ha, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - for (int i = 5; i < 8; ++i) { - wb.appendScalar(hb, static_cast(i * 2) * 1'000'000'000LL, static_cast(i * 2)); - } - derived.onSourceCommitted(engine.commitChunks(wa.flushAll())); - derived.onSourceCommitted(engine.commitChunks(wb.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - auto incremental = collect_rows_col(engine, out); - ASSERT_EQ(incremental.size(), 8u) << "Expected 8 joined rows (even timestamps 0..14)"; - - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_rows_col(engine, out); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_EQ(incremental[i].first, batch[i].first) << "timestamp mismatch at row " << i; - EXPECT_NEAR(incremental[i].second, batch[i].second, 1e-9) << "value mismatch at row " << i; - } -} - -// --------------------------------------------------------------------------- -// MIMO — stateful transform reset (I2) -// --------------------------------------------------------------------------- - -// A stateful MIMO transform that accumulates a running sum across calls. -// recompute_batch must call reset() and produce the same result as incremental. -class AccumulatingSumMimoTransform : public IMIMOTransform { - public: - std::vector outputKinds(PJ::Span /*input_kinds*/) const override { - return {StorageKind::kFloat64}; - } - - void reset() override { - running_sum_ = 0.0; - } - - bool calculate( - PJ::Timestamp time, PJ::Span inputs, PJ::Timestamp& out_time, - std::vector& output) override { - out_time = time; - running_sum_ += std::get(inputs[0]) + std::get(inputs[1]); - output[0] = running_sum_; - return true; - } - - private: - double running_sum_ = 0.0; -}; - -TEST(MimoTransformTest, StatefulTransform_RecomputeBatchCallsReset) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 5); - PJ::TopicId t2 = make_linear_topic(engine, ds, 1.0, 5); - - auto node_or = derived.addMimoTransform({t1, t2}, {"acc"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::NodeId node = *node_or; - PJ::TopicId out = derived.outputTopics(node)[0]; - - notify(derived, {t1, t2}); - ASSERT_TRUE(derived.scheduleAll().has_value()); - auto incremental = collect_values(engine, out); - - // recompute_batch must reset the transform and produce identical output - ASSERT_TRUE(derived.recompute_batch(node).has_value()); - auto batch = collect_values(engine, out); - - ASSERT_EQ(incremental.size(), batch.size()); - for (std::size_t i = 0; i < batch.size(); ++i) { - EXPECT_NEAR(incremental[i], batch[i], 1e-9) << "mismatch at row " << i; - } -} - -// --------------------------------------------------------------------------- -// MIMO — on_source_committed with partial inputs (I4) -// --------------------------------------------------------------------------- - -// Notifying about only one of two MIMO inputs should cause the node to run -// but find no new data from the other input and produce no output. -// The watermark must NOT advance, so future data from the other input can match. -TEST(MimoTransformTest, PartialNotify_DoesNotAdvanceWatermark) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - // Register both topics; initially only commit data for t1 - DataWriter w1 = engine.createWriter(); - auto h1 = *w1.registerScalarSeries(ds, "p1", PJ::NumericType::kFloat64); - PJ::TopicId t1 = h1.topic_id; - - DataWriter w2 = engine.createWriter(); - auto h2 = *w2.registerScalarSeries(ds, "p2", PJ::NumericType::kFloat64); - PJ::TopicId t2 = h2.topic_id; - - auto node_or = derived.addMimoTransform({t1, t2}, {"partial_sum"}, ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()) << node_or.error(); - PJ::TopicId out = derived.outputTopics(*node_or)[0]; - - // Commit t1 only, notify only t1 - for (int i = 0; i < 5; ++i) { - w1.appendScalar(h1, static_cast(i) * 1'000'000'000LL, static_cast(i)); - } - derived.onSourceCommitted(engine.commitChunks(w1.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // No output: t2 has no data yet - EXPECT_TRUE(collect_values(engine, out).empty()); - - // Now commit t2 with the SAME timestamps as t1 - for (int i = 0; i < 5; ++i) { - w2.appendScalar(h2, static_cast(i) * 1'000'000'000LL, static_cast(i * 2)); - } - derived.onSourceCommitted(engine.commitChunks(w2.flushAll())); - ASSERT_TRUE(derived.scheduleAll().has_value()); - - // Now all 5 joins should be found (watermark was NOT advanced by the first schedule) - auto rows = collect_rows_col(engine, out); - ASSERT_EQ(rows.size(), 5u) << "Watermark advanced incorrectly; missed joins after lazy topic"; - for (int i = 0; i < 5; ++i) { - EXPECT_NEAR(rows[i].second, static_cast(i) + static_cast(i * 2), 1e-9); - } -} - -// --------------------------------------------------------------------------- -// MIMO — wrong output_kinds count is rejected (M4) -// --------------------------------------------------------------------------- - -class WrongOutputKindsMimoTransform : public IMIMOTransform { - public: - std::vector outputKinds(PJ::Span /*input_kinds*/) const override { - return {StorageKind::kFloat64, StorageKind::kFloat64}; // returns 2 but caller expects 1 - } - - bool calculate(PJ::Timestamp, PJ::Span, PJ::Timestamp&, std::vector&) override { - return false; - } -}; - -TEST(MimoTransformTest, WrongOutputKindsCount_Fails) { - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId t1 = make_linear_topic(engine, ds, 1.0, 5); - // One output topic name but op returns two kinds → error - auto r = derived.addMimoTransform({t1}, {"single_out"}, ds, std::make_unique()); - EXPECT_FALSE(r.has_value()); -} - -TEST(MimoTransformTest, TopologicalOrder_MimoComesAfterSiso) { - // n1: src→d1, n2: src→d2, n_mimo: (d1,d2)→sum. Order: n1,n2 before n_mimo. - DataEngine engine; - DerivedEngine derived(engine); - PJ::DatasetId ds = make_dataset(engine); - - PJ::TopicId src = make_linear_topic(engine, ds, 1.0, 5); - PJ::NodeId n1 = *derived.addSisoTransform(src, "d1", ds, std::make_unique()); - PJ::NodeId n2 = *derived.addSisoTransform(src, "d2", ds, std::make_unique()); - PJ::TopicId d1_out = derived.outputTopics(n1)[0]; - PJ::TopicId d2_out = derived.outputTopics(n2)[0]; - - PJ::NodeId n_mimo = *derived.addMimoTransform({d1_out, d2_out}, {"sum"}, ds, std::make_unique()); - - auto order = derived.topologicalOrder(); - ASSERT_EQ(order.size(), 3u); - - auto pos = [&](PJ::NodeId id) { - return static_cast(std::find(order.begin(), order.end(), id) - order.begin()); - }; - EXPECT_LT(pos(n1), pos(n_mimo)); - EXPECT_LT(pos(n2), pos(n_mimo)); -} - -// --------------------------------------------------------------------------- -// Uint64 precision round-trip (BUG-001) -// --------------------------------------------------------------------------- - -// Identity SISO transform: passes value through unchanged, preserving StorageKind. -class Uint64IdentityTransform : public ISISOTransform { - public: - StorageKind outputKind(StorageKind input_kind) const override { - return input_kind; - } - - bool calculate(PJ::Timestamp time, const VarValue& input, PJ::Timestamp& out_time, VarValue& out_value) override { - out_time = time; - out_value = input; - return true; - } -}; - -TEST(DerivedEngine, Uint64PrecisionRoundTrip) { - DataEngine engine; - auto ds = make_dataset(engine); - - // Create a uint64 scalar topic with values that exceed double precision (>2^53) - // and exceed int64_t range (>INT64_MAX). - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(ds, "u64_src", PJ::NumericType::kUint64); - ASSERT_TRUE(handle_or.has_value()); - PJ::TopicId src_tid = handle_or->topic_id; - - const std::vector test_values = { - 0ULL, - 42ULL, - (1ULL << 53) + 1, // first value not exactly representable as double - std::numeric_limits::max(), // INT64_MAX - static_cast(std::numeric_limits::max()) + 1, // INT64_MAX + 1 - std::numeric_limits::max(), // UINT64_MAX - }; - - for (std::size_t i = 0; i < test_values.size(); ++i) { - auto ts = static_cast(i * 1'000'000'000LL); - writer.appendScalar(*handle_or, ts, test_values[i]); - } - engine.commitChunks(writer.flushAll()); - - // Register identity transform: kUint64 → kUint64 - DerivedEngine derived(engine); - auto node_or = derived.addSisoTransform(src_tid, "u64_out", ds, std::make_unique()); - ASSERT_TRUE(node_or.has_value()); - - derived.onSourceCommitted({&src_tid, 1}); - auto s = derived.scheduleAll(); - ASSERT_TRUE(s.has_value()) << s.error(); - - // Read back and verify exact bit-for-bit equality. - auto out_topics = derived.outputTopics(*node_or); - ASSERT_EQ(out_topics.size(), 1u); - - const TopicStorage* out_storage = engine.getTopicStorage(out_topics[0]); - ASSERT_NE(out_storage, nullptr); - const auto& chunks = out_storage->sealedChunks(); - ASSERT_EQ(chunks.size(), 1u); - ASSERT_EQ(chunks[0].stats.row_count, test_values.size()); - - for (std::size_t i = 0; i < test_values.size(); ++i) { - uint64_t actual = chunks[0].readNumericAsUint64(0, i); - EXPECT_EQ(actual, test_values[i]) << "Mismatch at row " << i; - } -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/encoding_test.cpp b/pj_datastore/tests/encoding_test.cpp deleted file mode 100644 index c7ff4890..00000000 --- a/pj_datastore/tests/encoding_test.cpp +++ /dev/null @@ -1,356 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/encoding.hpp" - -#include - -#include -#include -#include -#include -#include - -#include "pj_base/type_tree.hpp" // PrimitiveType - -namespace PJ::encoding { -namespace { - -using PJ::Span; - -// ========================================================================== -// Helper: build offsets + values buffers from a vector of strings, mimicking -// what TypedColumnBuffer would produce for a string column. -// ========================================================================== -struct StringColumnData { - std::vector offsets; - std::vector values; -}; - -StringColumnData make_string_column(const std::vector& strings) { - StringColumnData col; - - // offsets: (strings.size() + 1) uint32_t entries - col.offsets.resize((strings.size() + 1) * sizeof(uint32_t)); - uint32_t running = 0; - for (std::size_t i = 0; i < strings.size(); ++i) { - std::memcpy(col.offsets.data() + i * sizeof(uint32_t), &running, sizeof(uint32_t)); - running += static_cast(strings[i].size()); - } - std::memcpy(col.offsets.data() + strings.size() * sizeof(uint32_t), &running, sizeof(uint32_t)); - - // values: concatenated string bytes - col.values.reserve(running); - for (const auto& s : strings) { - col.values.insert(col.values.end(), s.begin(), s.end()); - } - - return col; -} - -// ========================================================================== -// Dictionary encoding tests -// ========================================================================== - -TEST(DictionaryEncoding, RepeatedStrings) { - auto col = make_string_column({"base", "world", "base", "base"}); - - auto encoded = dictionaryEncodeStrings(Span(col.offsets), Span(col.values), 4); - - EXPECT_EQ(encoded.count, 4u); - EXPECT_EQ(encoded.dictionary.size(), 2u); - EXPECT_EQ(encoded.dictionary[0], "base"); - EXPECT_EQ(encoded.dictionary[1], "world"); - - EXPECT_EQ(dictionaryLookup(encoded, 0), "base"); - EXPECT_EQ(dictionaryLookup(encoded, 1), "world"); - EXPECT_EQ(dictionaryLookup(encoded, 2), "base"); - EXPECT_EQ(dictionaryLookup(encoded, 3), "base"); -} - -TEST(DictionaryEncoding, AllUniqueStrings) { - auto col = make_string_column({"alpha", "beta", "gamma", "delta"}); - - auto encoded = dictionaryEncodeStrings(Span(col.offsets), Span(col.values), 4); - - EXPECT_EQ(encoded.count, 4u); - EXPECT_EQ(encoded.dictionary.size(), 4u); -} - -TEST(DictionaryEncoding, LookupCorrectness) { - std::vector strings = {"foo", "bar", "baz", "foo", "qux"}; - auto col = make_string_column(strings); - - auto encoded = dictionaryEncodeStrings(Span(col.offsets), Span(col.values), 5); - - EXPECT_EQ(encoded.count, 5u); - - for (std::size_t i = 0; i < strings.size(); ++i) { - EXPECT_EQ(dictionaryLookup(encoded, i), strings[i]) << "mismatch at row " << i; - } -} - -// ========================================================================== -// Packed bools tests -// ========================================================================== - -TEST(PackedBools, SixteenValues) { - // 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 - std::array values = {1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 0, 0, 1, 1, 0, 1}; - - auto packed = packBools(values); - - EXPECT_EQ(packed.count, 16u); - EXPECT_EQ(packed.bits.size(), 2u); // 16 bits = 2 bytes - - for (std::size_t i = 0; i < values.size(); ++i) { - EXPECT_EQ(unpackBool(packed, i), values[i] != 0) << "mismatch at index " << i; - } -} - -TEST(PackedBools, ExactByteBoundary) { - // Exactly 8 values - std::array values = {1, 1, 0, 1, 0, 1, 1, 0}; - - auto packed = packBools(values); - - EXPECT_EQ(packed.count, 8u); - EXPECT_EQ(packed.bits.size(), 1u); // 8 bits = 1 byte - - for (std::size_t i = 0; i < values.size(); ++i) { - EXPECT_EQ(unpackBool(packed, i), values[i] != 0) << "mismatch at index " << i; - } -} - -TEST(PackedBools, NineValues) { - // One past a byte boundary - std::array values = {1, 0, 1, 1, 0, 0, 1, 0, 1}; - - auto packed = packBools(values); - - EXPECT_EQ(packed.count, 9u); - EXPECT_EQ(packed.bits.size(), 2u); // ceil(9/8) = 2 bytes - - for (std::size_t i = 0; i < values.size(); ++i) { - EXPECT_EQ(unpackBool(packed, i), values[i] != 0) << "mismatch at index " << i; - } -} - -TEST(PackedBools, Empty) { - auto packed = packBools(Span()); - - EXPECT_EQ(packed.count, 0u); - EXPECT_TRUE(packed.bits.empty()); -} - -// ========================================================================== -// Constant encoding tests -// ========================================================================== - -TEST(ConstantEncoding, Float64) { - constexpr std::size_t count = 100; - std::vector buf(count * sizeof(double)); - for (std::size_t i = 0; i < count; ++i) { - double val = 3.14; - std::memcpy(buf.data() + i * sizeof(double), &val, sizeof(double)); - } - - auto enc = constantEncode(buf, StorageKind::kFloat64, count); - EXPECT_EQ(enc.count, count); - EXPECT_EQ(enc.value_kind, StorageKind::kFloat64); - EXPECT_EQ(enc.value_size, sizeof(double)); - - EXPECT_DOUBLE_EQ(constantDecodeAsDouble(enc), 3.14); -} - -TEST(ConstantEncoding, Int32) { - constexpr std::size_t count = 100; - std::vector buf(count * sizeof(int32_t)); - for (std::size_t i = 0; i < count; ++i) { - int32_t val = -42; - std::memcpy(buf.data() + i * sizeof(int32_t), &val, sizeof(int32_t)); - } - - auto enc = constantEncode(buf, StorageKind::kInt32, count); - EXPECT_EQ(enc.count, count); - EXPECT_EQ(enc.value_kind, StorageKind::kInt32); - - EXPECT_DOUBLE_EQ(constantDecodeAsDouble(enc), -42.0); -} - -// ========================================================================== -// Frame of Reference encoding tests -// ========================================================================== - -TEST(FOREncoding, NarrowRange_Int32) { - // Values [1000..1100]: range=100, fits in uint8 - constexpr std::size_t count = 101; - std::vector buf(count * sizeof(int32_t)); - for (std::size_t i = 0; i < count; ++i) { - auto val = static_cast(1000 + i); - std::memcpy(buf.data() + i * sizeof(int32_t), &val, sizeof(int32_t)); - } - - auto enc = forEncode(buf, StorageKind::kInt32, count, 1000, 1100); - EXPECT_EQ(enc.offset_bytes, 1); - EXPECT_EQ(enc.reference, 1000); - EXPECT_EQ(enc.count, count); - EXPECT_EQ(enc.offsets.size(), count * 1); // 1 byte per offset - - // Verify round-trip all values - for (std::size_t i = 0; i < count; ++i) { - EXPECT_DOUBLE_EQ(forDecodeOneAsDouble(enc, i), 1000.0 + static_cast(i)) << "row " << i; - } -} - -TEST(FOREncoding, MediumRange_Int32) { - // Values [0..50000]: range=50000, fits in uint16 - constexpr std::size_t count = 501; - std::vector buf(count * sizeof(int32_t)); - for (std::size_t i = 0; i < count; ++i) { - auto val = static_cast(i * 100); - std::memcpy(buf.data() + i * sizeof(int32_t), &val, sizeof(int32_t)); - } - - auto enc = forEncode(buf, StorageKind::kInt32, count, 0, 50000); - EXPECT_EQ(enc.offset_bytes, 2); - EXPECT_EQ(enc.reference, 0); - - for (std::size_t i = 0; i < count; ++i) { - EXPECT_DOUBLE_EQ(forDecodeOneAsDouble(enc, i), static_cast(i * 100)) << "row " << i; - } -} - -TEST(FOREncoding, NegativeRange_Int64) { - // Values [-100..100]: range=200, fits in uint8 (int64 storage) - constexpr std::size_t count = 201; - std::vector buf(count * sizeof(int64_t)); - for (std::size_t i = 0; i < count; ++i) { - auto val = static_cast(-100 + static_cast(i)); - std::memcpy(buf.data() + i * sizeof(int64_t), &val, sizeof(int64_t)); - } - - auto enc = forEncode(buf, StorageKind::kInt64, count, -100, 100); - EXPECT_EQ(enc.offset_bytes, 1); - EXPECT_EQ(enc.reference, -100); - - for (std::size_t i = 0; i < count; ++i) { - EXPECT_DOUBLE_EQ(forDecodeOneAsDouble(enc, i), -100.0 + static_cast(i)) << "row " << i; - } -} - -TEST(FOREncoding, BulkDecode) { - constexpr std::size_t count = 50; - std::vector buf(count * sizeof(int32_t)); - for (std::size_t i = 0; i < count; ++i) { - auto val = static_cast(500 + i); - std::memcpy(buf.data() + i * sizeof(int32_t), &val, sizeof(int32_t)); - } - - auto enc = forEncode(buf, StorageKind::kInt32, count, 500, 549); - - std::vector out(count); - forDecodeRangeAsDoubles(enc, out, 0); - - for (std::size_t i = 0; i < count; ++i) { - EXPECT_DOUBLE_EQ(out[i], 500.0 + static_cast(i)) << "row " << i; - } - - // Decode a sub-range - std::vector sub(10); - forDecodeRangeAsDoubles(enc, sub, 20); - for (std::size_t i = 0; i < 10; ++i) { - EXPECT_DOUBLE_EQ(sub[i], 520.0 + static_cast(i)) << "sub " << i; - } -} - -// ========================================================================== -// Dictionary encoding with narrowed indices -// ========================================================================== - -TEST(DictionaryEncoding, NarrowIndices) { - // 4 unique strings, 1000 rows → indices should be uint8 (1 byte each) - std::vector data; - data.reserve(1000); - for (int i = 0; i < 1000; ++i) { - switch (i % 4) { - case 0: - data.push_back("alpha"); - break; - case 1: - data.push_back("beta"); - break; - case 2: - data.push_back("gamma"); - break; - case 3: - data.push_back("delta"); - break; - } - } - auto col = make_string_column(data); - - auto encoded = dictionaryEncodeStrings(Span(col.offsets), Span(col.values), 1000); - - EXPECT_EQ(encoded.dictionary.size(), 4u); - EXPECT_EQ(encoded.index_bytes, 1); - // Indices buffer should be 1000 bytes (uint8), not 4000 (uint32) - EXPECT_EQ(encoded.indices.size(), 1000u); - - // Verify lookups - for (std::size_t i = 0; i < 1000; ++i) { - EXPECT_EQ(dictionaryLookup(encoded, i), data[i]) << "row " << i; - } -} - -TEST(DictionaryEncoding, MediumIndices) { - // 300 unique strings → indices should be uint16 (2 bytes each) - std::vector dict_values; - dict_values.reserve(300); - for (int i = 0; i < 300; ++i) { - dict_values.push_back("str_" + std::to_string(i)); - } - - // Build data with 600 rows cycling through all 300 - std::vector data; - data.reserve(600); - for (int i = 0; i < 600; ++i) { - data.push_back(dict_values[static_cast(i % 300)]); - } - auto col = make_string_column(data); - - auto encoded = dictionaryEncodeStrings(Span(col.offsets), Span(col.values), 600); - - EXPECT_EQ(encoded.dictionary.size(), 300u); - EXPECT_EQ(encoded.index_bytes, 2); - // Indices buffer should be 600*2 = 1200 bytes - EXPECT_EQ(encoded.indices.size(), 1200u); - - for (std::size_t i = 0; i < 600; ++i) { - EXPECT_EQ(dictionaryLookup(encoded, i), data[i]) << "row " << i; - } -} - -// ========================================================================== -// Byte-width helper tests -// ========================================================================== - -TEST(ByteWidthHelpers, IndexBytesFor) { - EXPECT_EQ(indexBytesFor(1), 1); - EXPECT_EQ(indexBytesFor(256), 1); - EXPECT_EQ(indexBytesFor(257), 2); - EXPECT_EQ(indexBytesFor(65536), 2); - EXPECT_EQ(indexBytesFor(65537), 4); -} - -TEST(ByteWidthHelpers, OffsetBytesFor) { - EXPECT_EQ(offsetBytesFor(0), 1); - EXPECT_EQ(offsetBytesFor(255), 1); - EXPECT_EQ(offsetBytesFor(256), 2); - EXPECT_EQ(offsetBytesFor(65535), 2); - EXPECT_EQ(offsetBytesFor(65536), 4); - EXPECT_EQ(offsetBytesFor(uint64_t{1} << 32), 8); -} - -} // namespace -} // namespace PJ::encoding diff --git a/pj_datastore/tests/engine_integration_test.cpp b/pj_datastore/tests/engine_integration_test.cpp deleted file mode 100644 index 3d771734..00000000 --- a/pj_datastore/tests/engine_integration_test.cpp +++ /dev/null @@ -1,1123 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include -#include -#include -#include - -#include "pj_base/dataset.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/column_buffer.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/query.hpp" -#include "pj_datastore/reader.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/type_registry.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -// =========================================================================== -// Test 1: End-to-end scalar write + read -// -// Creates an engine, registers a float64 scalar series, appends 5000 values -// (spanning multiple chunks at default max_chunk_rows=1024), flushes, -// commits, then verifies range_query returns all 5000 rows and latest_at -// returns the correct value at the midpoint. -// =========================================================================== - -TEST(EngineIntegrationTest, EndToEndScalarWriteRead) { - DataEngine engine; - - // Create dataset with default time domain id=0 - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test_source", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // Create writer, register scalar series (float64) - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(dataset_id, "temperature", NumericType::kFloat64); - ASSERT_TRUE(handle_or.has_value()) << handle_or.error(); - ScalarSeriesHandle handle = *handle_or; - - // Append 5000 scalar values: timestamps 0, 1000, 2000, ... - // Values: i * 0.5 - constexpr std::size_t kRowCount = 5000; - for (std::size_t i = 0; i < kRowCount; ++i) { - Timestamp ts = static_cast(i) * 1000; - double value = static_cast(i) * 0.5; - writer.appendScalar(handle, ts, value); - } - - // Flush all pending chunks and commit to engine - auto flushed = writer.flushAll(); - EXPECT_FALSE(flushed.empty()); - engine.commitChunks(std::move(flushed)); - - // Verify via reader: range_query full range - DataReader reader = engine.createReader(); - - std::size_t count = 0; - auto cursor_or = reader.rangeQuery( - QueryRange{.topic_id = handle.topic_id, .t_min = 0, .t_max = static_cast(kRowCount - 1) * 1000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow& row) { - (void)row; - ++count; - }); - EXPECT_EQ(count, kRowCount); - - // Verify multiple chunks were created (5000 rows / 1024 max = 5 chunks) - const TopicStorage* storage = engine.getTopicStorage(handle.topic_id); - ASSERT_NE(storage, nullptr); - EXPECT_GE(storage->sealedChunks().size(), 2U); - - // latest_at at midpoint: t = 2500 * 1000 = 2500000 - // Expected value at i=2500: 2500 * 0.5 = 1250.0 - Timestamp midpoint_ts = static_cast(2500) * 1000; - auto latest_or = reader.latestAt(QueryPoint{.topic_id = handle.topic_id, .t = midpoint_ts}); - ASSERT_TRUE(latest_or.has_value()) << latest_or.error(); - auto& latest = *latest_or; - ASSERT_TRUE(latest.has_value()); - EXPECT_EQ(latest->timestamp, midpoint_ts); - ASSERT_NE(latest->chunk, nullptr); - double midpoint_value = latest->chunk->readNumericAsDouble(0, latest->row_index); - EXPECT_DOUBLE_EQ(midpoint_value, 1250.0); - - // Also verify metadata - auto metadata_opt = reader.getMetadata(handle.topic_id); - ASSERT_TRUE(metadata_opt.has_value()); - EXPECT_EQ(metadata_opt->total_row_count, kRowCount); - EXPECT_EQ(metadata_opt->time_range_min, 0); - EXPECT_EQ(metadata_opt->time_range_max, static_cast(kRowCount - 1) * 1000); -} - -// =========================================================================== -// Test 2: End-to-end structured write + read -// -// Registers a schema for a robot_pose struct (float32 x, y, z + string -// frame_name), creates a topic bound to that schema, writes 200 rows, -// flushes, commits, then verifies field values round-trip via range_query -// and checks that the string column uses dictionary encoding. -// =========================================================================== - -TEST(EngineIntegrationTest, EndToEndStructuredWriteRead) { - DataEngine engine; - - // Build type tree: struct robot_pose { float32 x, y, z; string frame_name } - auto x = makePrimitive("x", PrimitiveType::kFloat32); - auto y = makePrimitive("y", PrimitiveType::kFloat32); - auto z = makePrimitive("z", PrimitiveType::kFloat32); - auto frame = makePrimitive("frame_name", PrimitiveType::kString); - auto robot_pose = makeStruct("robot_pose", {x, y, z, frame}); - - // Create dataset - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "robot", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // Register schema and create topic via writer - DataWriter writer = engine.createWriter(); - auto schema_id_or = writer.registerSchema("robot_pose", robot_pose); - ASSERT_TRUE(schema_id_or.has_value()) << schema_id_or.error(); - SchemaId schema_id = *schema_id_or; - - TopicDescriptor topic_desc; - topic_desc.name = "pose"; - topic_desc.schema_id = schema_id; - auto topic_id_or = writer.registerTopic(dataset_id, topic_desc); - ASSERT_TRUE(topic_id_or.has_value()) << topic_id_or.error(); - TopicId topic_id = *topic_id_or; - - // Bind to get field IDs (column indices) - auto write_handle_or = writer.bindTopicWriter(topic_id); - ASSERT_TRUE(write_handle_or.has_value()) << write_handle_or.error(); - TopicWriteHandle write_handle = *write_handle_or; - ASSERT_EQ(write_handle.field_ids.size(), 4U); - - // Column layout after flatten: col 0=x, col 1=y, col 2=z, col 3=frame_name - constexpr std::size_t kRows = 200; - for (std::size_t i = 0; i < kRows; ++i) { - Timestamp ts = static_cast(i) * 1000000; // 1ms apart - ASSERT_TRUE(writer.beginRow(topic_id, ts).has_value()); - writer.set(topic_id, 0, static_cast(i) * 1.0F); - writer.set(topic_id, 1, static_cast(i) * 2.0F); - writer.set(topic_id, 2, static_cast(i) * 3.0F); - // Alternate frame name for variety - std::string_view frame_name = (i % 2 == 0) ? "base_link" : "odom"; - writer.set(topic_id, 3, frame_name); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - - // Flush + commit - auto flushed = writer.flushAll(); - EXPECT_FALSE(flushed.empty()); - engine.commitChunks(std::move(flushed)); - - // Read back via reader - DataReader reader = engine.createReader(); - - // Range query: verify total count - std::size_t count = 0; - auto cursor_or = reader.rangeQuery( - QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = static_cast(kRows - 1) * 1000000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - // Collect first and last rows for verification - SampleRow first_row{}; - SampleRow last_row{}; - cursor_or->forEach([&count, &first_row, &last_row](const SampleRow& row) { - if (count == 0) { - first_row = row; - } - last_row = row; - ++count; - }); - EXPECT_EQ(count, kRows); - - // Verify first row (i=0): x=0, y=0, z=0, frame_name="base_link" - ASSERT_NE(first_row.chunk, nullptr); - EXPECT_FLOAT_EQ(static_cast(first_row.chunk->readNumericAsDouble(0, first_row.row_index)), 0.0F); - EXPECT_FLOAT_EQ(static_cast(first_row.chunk->readNumericAsDouble(1, first_row.row_index)), 0.0F); - EXPECT_FLOAT_EQ(static_cast(first_row.chunk->readNumericAsDouble(2, first_row.row_index)), 0.0F); - EXPECT_EQ(first_row.chunk->readString(3, first_row.row_index), "base_link"); - - // Verify last row (i=199): x=199, y=398, z=597, frame_name="odom" - ASSERT_NE(last_row.chunk, nullptr); - EXPECT_FLOAT_EQ(static_cast(last_row.chunk->readNumericAsDouble(0, last_row.row_index)), 199.0F); - EXPECT_FLOAT_EQ(static_cast(last_row.chunk->readNumericAsDouble(1, last_row.row_index)), 398.0F); - EXPECT_FLOAT_EQ(static_cast(last_row.chunk->readNumericAsDouble(2, last_row.row_index)), 597.0F); - EXPECT_EQ(last_row.chunk->readString(3, last_row.row_index), "odom"); - - // Verify dictionary encoding on the string column (col 3) in sealed chunks - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - for (const auto& chunk : storage->sealedChunks()) { - ASSERT_GT(chunk.columns.size(), 3U); - EXPECT_EQ(chunk.columnEncoding(3), EncodingType::kDictionary) << "String column should use dictionary encoding"; - const auto& dict = std::get(chunk.columns[3].data); - // At most 2 unique values: "base_link" and "odom" - EXPECT_LE(dict.dictionary.size(), 2U); - } - - // Verify type tree is retrievable via reader - const TypeTreeNode* tree = reader.getTypeTree(topic_id); - ASSERT_NE(tree, nullptr); - EXPECT_EQ(tree->name, "robot_pose"); - EXPECT_EQ(tree->kind, TypeKind::kStruct); - EXPECT_EQ(tree->children.size(), 4U); -} - -// =========================================================================== -// Test 3: Retention eviction -// -// Writes data spanning a timestamp range, commits, then enforces retention -// to evict old chunks. Verifies that old data is gone and recent data -// remains intact. -// =========================================================================== - -TEST(EngineIntegrationTest, RetentionEviction) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "retention_test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - DataWriter writer = engine.createWriter(); - - // Use small max_chunk_rows to force many chunks for fine-grained eviction - // We use scalar API but with a custom topic for control over chunk size - auto x_tree = makePrimitive("value", PrimitiveType::kFloat64); - auto schema_id_or = writer.registerSchema("scalar_retention", x_tree); - ASSERT_TRUE(schema_id_or.has_value()) << schema_id_or.error(); - SchemaId schema_id = *schema_id_or; - - TopicDescriptor topic_desc; - topic_desc.name = "sensor"; - topic_desc.schema_id = schema_id; - topic_desc.max_chunk_rows = 100; // small chunks for easier eviction testing - auto topic_id_or = writer.registerTopic(dataset_id, topic_desc); - ASSERT_TRUE(topic_id_or.has_value()) << topic_id_or.error(); - TopicId topic_id = *topic_id_or; - - auto write_handle_or = writer.bindTopicWriter(topic_id); - ASSERT_TRUE(write_handle_or.has_value()) << write_handle_or.error(); - - // Write 3000 rows: timestamps 0 to 9999 (spacing ~3.333) - // Simplify: timestamps from 0 to 2999, one per integer timestamp - constexpr std::size_t kRowCount = 3000; - for (std::size_t i = 0; i < kRowCount; ++i) { - Timestamp ts = static_cast(i); - ASSERT_TRUE(writer.beginRow(topic_id, ts).has_value()); - writer.set(topic_id, 0, static_cast(i) * 0.1); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - // Verify all data present - DataReader reader = engine.createReader(); - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 2999}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - EXPECT_EQ(count, kRowCount); - } - - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - std::size_t chunks_before = storage->sealedChunks().size(); - EXPECT_GT(chunks_before, 1U); - - // Enforce retention window of 1500ns. - // t_max = 2999, so evictBefore(2999 - 1500 = 1499). - // Chunks with t_max < 1499 are evicted. - engine.enforceRetention(1500); - - std::size_t chunks_after = storage->sealedChunks().size(); - EXPECT_LT(chunks_after, chunks_before) << "Some chunks should have been evicted"; - - // Query old range [0, 999]: should return fewer or zero rows - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 999}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - // Old chunks (t_max < 1499) are fully evicted. - // Chunks with rows in [0, 999] and t_max < 1499 are gone. - // At chunk size 100: chunks [0..99], [100..199], ..., [900..999] all have - // t_max < 1499, so they should be evicted. - EXPECT_EQ(count, 0U) << "Old data should be fully evicted"; - } - - // Query recent range [1500, 2999]: should return all data in that range - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 1500, .t_max = 2999}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - EXPECT_EQ(count, 1500U) << "Recent data should be intact"; - } -} - -// =========================================================================== -// Test 4: Schema evolution -// -// Registers a topic with schema v1 (x, y, z as float32), writes 100 rows, -// evolves the schema to v2 (adds w as float32), writes 100 more rows with -// the new column, then verifies that old rows have 3 columns and new rows -// have 4 columns accessible. -// =========================================================================== - -TEST(EngineIntegrationTest, SchemaEvolution) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "evolution", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // Schema v1: struct { float32 x, y, z } - auto x = makePrimitive("x", PrimitiveType::kFloat32); - auto y = makePrimitive("y", PrimitiveType::kFloat32); - auto z = makePrimitive("z", PrimitiveType::kFloat32); - auto schema_v1 = makeStruct("pose", {x, y, z}); - - DataWriter writer = engine.createWriter(); - auto schema_id_or = writer.registerSchema("pose", schema_v1); - ASSERT_TRUE(schema_id_or.has_value()) << schema_id_or.error(); - SchemaId schema_id = *schema_id_or; - - TopicDescriptor topic_desc; - topic_desc.name = "position"; - topic_desc.schema_id = schema_id; - topic_desc.max_chunk_rows = 200; // Ensure v1 data fits in one chunk - auto topic_id_or = writer.registerTopic(dataset_id, topic_desc); - ASSERT_TRUE(topic_id_or.has_value()) << topic_id_or.error(); - TopicId topic_id = *topic_id_or; - - auto wh_or = writer.bindTopicWriter(topic_id); - ASSERT_TRUE(wh_or.has_value()) << wh_or.error(); - EXPECT_EQ(wh_or->field_ids.size(), 3U); - - // Write 100 rows with v1 schema (3 columns) - for (std::size_t i = 0; i < 100; ++i) { - Timestamp ts = static_cast(i) * 1000; - ASSERT_TRUE(writer.beginRow(topic_id, ts).has_value()); - writer.set(topic_id, 0, static_cast(i) * 1.0F); - writer.set(topic_id, 1, static_cast(i) * 2.0F); - writer.set(topic_id, 2, static_cast(i) * 3.0F); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - } - - // Flush v1 data and commit - auto flushed_v1 = writer.flushAll(); - EXPECT_FALSE(flushed_v1.empty()); - engine.commitChunks(std::move(flushed_v1)); - - // Evolve schema: add float32 w - auto w = makePrimitive("w", PrimitiveType::kFloat32); - auto schema_v2 = makeStruct("pose", {x, y, z, w}); - auto evolve_status = engine.typeRegistry().evolveSchema(schema_id, schema_v2); - ASSERT_TRUE(evolve_status.has_value()) << evolve_status.error(); - - // Create a new writer so it picks up the evolved schema's column layout - DataWriter writer2 = engine.createWriter(); - auto wh2_or = writer2.bindTopicWriter(topic_id); - ASSERT_TRUE(wh2_or.has_value()) << wh2_or.error(); - EXPECT_EQ(wh2_or->field_ids.size(), 4U); - - // Write 100 more rows with v2 schema (4 columns) - for (std::size_t i = 0; i < 100; ++i) { - Timestamp ts = static_cast(100 + i) * 1000; - ASSERT_TRUE(writer2.beginRow(topic_id, ts).has_value()); - writer2.set(topic_id, 0, static_cast(100 + i) * 1.0F); - writer2.set(topic_id, 1, static_cast(100 + i) * 2.0F); - writer2.set(topic_id, 2, static_cast(100 + i) * 3.0F); - writer2.set(topic_id, 3, static_cast(100 + i) * 4.0F); - ASSERT_TRUE(writer2.finishRow(topic_id).has_value()); - } - - auto flushed_v2 = writer2.flushAll(); - EXPECT_FALSE(flushed_v2.empty()); - engine.commitChunks(std::move(flushed_v2)); - - // Read back data spanning both versions - DataReader reader = engine.createReader(); - - // Query old rows [0, 99000]: should return 100 rows with 3 columns - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 99000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow& row) { - ASSERT_NE(row.chunk, nullptr); - // Old chunks should have 3 column descriptors - EXPECT_EQ(row.chunk->columns.size(), 3U); - if (count == 0) { - // Verify first old row: x=0, y=0, z=0 - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(0, row.row_index)), 0.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(1, row.row_index)), 0.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(2, row.row_index)), 0.0F); - } - ++count; - }); - EXPECT_EQ(count, 100U); - } - - // Query new rows [100000, 199000]: should return 100 rows with 4 columns - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 100000, .t_max = 199000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow& row) { - ASSERT_NE(row.chunk, nullptr); - // New chunks should have 4 column descriptors - EXPECT_EQ(row.chunk->columns.size(), 4U); - if (count == 0) { - // Verify first new row (i=100): x=100, y=200, z=300, w=400 - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(0, row.row_index)), 100.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(1, row.row_index)), 200.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(2, row.row_index)), 300.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(3, row.row_index)), 400.0F); - } - ++count; - }); - EXPECT_EQ(count, 100U); - } - - // Verify full range returns all 200 rows - { - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 199000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - EXPECT_EQ(count, 200U); - } -} - -// =========================================================================== -// Test 5: Time domain offset -// -// Creates 2 time domains with different names and display_offsets, creates -// datasets bound to each, and verifies the offsets are stored and -// retrievable. Also verifies the display_time = raw_time - offset -// relationship. -// =========================================================================== - -TEST(EngineIntegrationTest, TimeDomainOffset) { - DataEngine engine; - - // Create two time domains - auto td1_or = engine.createTimeDomain("ros_time"); - ASSERT_TRUE(td1_or.has_value()) << td1_or.error(); - TimeDomainId td1_id = *td1_or; - - auto td2_or = engine.createTimeDomain("sim_time"); - ASSERT_TRUE(td2_or.has_value()) << td2_or.error(); - TimeDomainId td2_id = *td2_or; - - // Verify they have distinct IDs - EXPECT_NE(td1_id, td2_id); - - // Set display offsets - constexpr Timestamp kOffset1 = 1000000000; // 1 second - constexpr Timestamp kOffset2 = 5000000000; // 5 seconds - engine.setDisplayOffset(td1_id, kOffset1); - engine.setDisplayOffset(td2_id, kOffset2); - - // Verify offsets stored and retrievable - const TimeDomain* td1 = engine.getTimeDomain(td1_id); - ASSERT_NE(td1, nullptr); - EXPECT_EQ(td1->name, "ros_time"); - EXPECT_EQ(td1->display_offset, kOffset1); - - const TimeDomain* td2 = engine.getTimeDomain(td2_id); - ASSERT_NE(td2, nullptr); - EXPECT_EQ(td2->name, "sim_time"); - EXPECT_EQ(td2->display_offset, kOffset2); - - // Create datasets bound to each time domain - auto ds1_or = engine.createDataset(DatasetDescriptor{.source_name = "robot1", .time_domain_id = td1_id}); - ASSERT_TRUE(ds1_or.has_value()) << ds1_or.error(); - DatasetId ds1_id = *ds1_or; - - auto ds2_or = engine.createDataset(DatasetDescriptor{.source_name = "simulator", .time_domain_id = td2_id}); - ASSERT_TRUE(ds2_or.has_value()) << ds2_or.error(); - DatasetId ds2_id = *ds2_or; - - // Verify datasets have the correct time domains - const DatasetInfo* ds1 = engine.getDataset(ds1_id); - ASSERT_NE(ds1, nullptr); - EXPECT_EQ(ds1->time_domain.id, td1_id); - EXPECT_EQ(ds1->time_domain.name, "ros_time"); - EXPECT_EQ(ds1->time_domain.display_offset, kOffset1); - - const DatasetInfo* ds2 = engine.getDataset(ds2_id); - ASSERT_NE(ds2, nullptr); - EXPECT_EQ(ds2->time_domain.id, td2_id); - EXPECT_EQ(ds2->time_domain.name, "sim_time"); - EXPECT_EQ(ds2->time_domain.display_offset, kOffset2); - - // Verify display_time = raw_time - offset relationship - constexpr Timestamp kRawTime = 10000000000; // 10 seconds - Timestamp display_time_1 = kRawTime - td1->display_offset; - Timestamp display_time_2 = kRawTime - td2->display_offset; - - EXPECT_EQ(display_time_1, 9000000000); // 10s - 1s = 9s - EXPECT_EQ(display_time_2, 5000000000); // 10s - 5s = 5s - - // Verify that updating offset is reflected immediately - constexpr Timestamp kNewOffset1 = 2000000000; // 2 seconds - engine.setDisplayOffset(td1_id, kNewOffset1); - - const TimeDomain* td1_updated = engine.getTimeDomain(td1_id); - ASSERT_NE(td1_updated, nullptr); - EXPECT_EQ(td1_updated->display_offset, kNewOffset1); - - Timestamp display_time_1_updated = kRawTime - td1_updated->display_offset; - EXPECT_EQ(display_time_1_updated, 8000000000); // 10s - 2s = 8s - - // Verify listing datasets shows both - auto datasets = engine.listDatasets(); - EXPECT_EQ(datasets.size(), 2U); - - // Verify non-existent time domain returns nullptr - EXPECT_EQ(engine.getTimeDomain(999), nullptr); - - // Verify creating dataset with non-existent time domain fails - auto bad_ds = engine.createDataset(DatasetDescriptor{.source_name = "bad", .time_domain_id = 999}); - EXPECT_FALSE(bad_ds.has_value()); -} - -// =========================================================================== -// Test 6: create_topic rejects non-existent schema_id -// -// Before fix: create_topic accepted any schema_id, leading to empty columns -// and UB when setting values on the non-existent columns. -// =========================================================================== - -TEST(EngineIntegrationTest, CreateTopicRejectsInvalidSchemaId) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // schema_id=999 doesn't exist — should fail - TopicDescriptor desc; - desc.name = "bad_topic"; - desc.schema_id = 999; - auto result = engine.createTopic(dataset_id, desc); - EXPECT_FALSE(result.has_value()); - - // schema_id=0 (inline columns) should still succeed - TopicDescriptor scalar_desc; - scalar_desc.name = "scalar_topic"; - scalar_desc.schema_id = 0; - auto scalar_result = engine.createTopic(dataset_id, scalar_desc); - EXPECT_TRUE(scalar_result.has_value()) << scalar_result.error(); -} - -// =========================================================================== -// Test 7: begin_row rejects non-existent topic_id -// -// Before fix: begin_row called get_or_create_builder which hit an assert -// (UB in Release builds) when the topic didn't exist. -// =========================================================================== - -TEST(EngineIntegrationTest, BeginRowRejectsInvalidTopicId) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - - DataWriter writer = engine.createWriter(); - auto status = writer.beginRow(/*topic_id=*/999, /*t=*/1000); - EXPECT_FALSE(status.has_value()); -} - -// =========================================================================== -// Test 8: Partial row auto-fills missing columns with null -// -// Before fix: finish_row incremented row_count but left column buffers -// with divergent lengths, causing later reads to go out-of-bounds. -// =========================================================================== - -TEST(EngineIntegrationTest, PartialRowAutoFillsNulls) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // Create a 3-column schema: float32 x, y, z - auto x = makePrimitive("x", PrimitiveType::kFloat32); - auto y = makePrimitive("y", PrimitiveType::kFloat32); - auto z = makePrimitive("z", PrimitiveType::kFloat32); - auto schema_tree = makeStruct("point", {x, y, z}); - - DataWriter writer = engine.createWriter(); - auto schema_id_or = writer.registerSchema("point", schema_tree); - ASSERT_TRUE(schema_id_or.has_value()) << schema_id_or.error(); - - TopicDescriptor topic_desc; - topic_desc.name = "partial"; - topic_desc.schema_id = *schema_id_or; - auto topic_id_or = writer.registerTopic(dataset_id, topic_desc); - ASSERT_TRUE(topic_id_or.has_value()) << topic_id_or.error(); - TopicId topic_id = *topic_id_or; - - auto wh_or = writer.bindTopicWriter(topic_id); - ASSERT_TRUE(wh_or.has_value()) << wh_or.error(); - - // Row 1: set only x (columns y and z should be auto-null-filled) - ASSERT_TRUE(writer.beginRow(topic_id, 1000).has_value()); - writer.set(topic_id, 0, 1.0F); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // Row 2: set all 3 columns (no nulls) - ASSERT_TRUE(writer.beginRow(topic_id, 2000).has_value()); - writer.set(topic_id, 0, 2.0F); - writer.set(topic_id, 1, 3.0F); - writer.set(topic_id, 2, 4.0F); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - auto flushed = writer.flushAll(); - ASSERT_FALSE(flushed.empty()); - engine.commitChunks(std::move(flushed)); - - // Read back and verify - DataReader reader = engine.createReader(); - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 3000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - - std::size_t count = 0; - cursor_or->forEach([&count](const SampleRow& row) { - ASSERT_NE(row.chunk, nullptr); - if (count == 0) { - // Row 1: x=1.0, y=null, z=null - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(0, row.row_index)), 1.0F); - EXPECT_TRUE(row.chunk->isNull(1, row.row_index)); - EXPECT_TRUE(row.chunk->isNull(2, row.row_index)); - } else if (count == 1) { - // Row 2: x=2.0, y=3.0, z=4.0 (no nulls) - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(0, row.row_index)), 2.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(1, row.row_index)), 3.0F); - EXPECT_FLOAT_EQ(static_cast(row.chunk->readNumericAsDouble(2, row.row_index)), 4.0F); - EXPECT_FALSE(row.chunk->isNull(0, row.row_index)); - EXPECT_FALSE(row.chunk->isNull(1, row.row_index)); - EXPECT_FALSE(row.chunk->isNull(2, row.row_index)); - } - ++count; - }); - EXPECT_EQ(count, 2U); -} - -// =========================================================================== -// Test 9: Retention works with negative timestamps -// -// Before fix: enforce_retention checked `t_max > 0` to skip empty topics, -// which also skipped topics with legitimate non-positive timestamps. -// =========================================================================== - -TEST(EngineIntegrationTest, RetentionWorksWithNegativeTimestamps) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(dataset_id, "negative_ts", NumericType::kFloat64); - ASSERT_TRUE(handle_or.has_value()) << handle_or.error(); - ScalarSeriesHandle handle = *handle_or; - - // Write data with negative timestamps: -1000, -900, ..., -100, 0 - // Use small chunk size topic instead of scalar API to control chunk size - // Actually, scalar API uses default chunk size 1024, which means all 11 - // values fit in one chunk. That's fine for testing retention logic. - for (int i = -1000; i <= 0; i += 100) { - writer.appendScalar(handle, static_cast(i), static_cast(i)); - } - - auto flushed = writer.flushAll(); - ASSERT_FALSE(flushed.empty()); - engine.commitChunks(std::move(flushed)); - - const TopicStorage* storage = engine.getTopicStorage(handle.topic_id); - ASSERT_NE(storage, nullptr); - EXPECT_FALSE(storage->empty()); - EXPECT_EQ(storage->time_max(), 0); - - // Enforce retention with window of 500: evictBefore(0 - 500 = -500) - // Chunks with t_max < -500 should be evicted. - // Our single chunk spans [-1000, 0], so t_max=0 > -500 → not evicted. - engine.enforceRetention(500); - - // Data should still be present (the chunk wasn't evicted) - EXPECT_FALSE(storage->empty()); - - DataReader reader = engine.createReader(); - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = handle.topic_id, .t_min = -1000, .t_max = 0}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - std::size_t count = 0; - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - EXPECT_EQ(count, 11U); -} - -// =========================================================================== -// Test 10: range_query / latest_at return NotFound for non-existent topics -// -// Before fix: these returned empty results indistinguishable from -// "topic exists but has no data." -// =========================================================================== - -TEST(EngineIntegrationTest, QueryReturnsErrorForMissingTopic) { - DataEngine engine; - - DataReader reader = engine.createReader(); - - // range_query with non-existent topic - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = 999, .t_min = 0, .t_max = 1000}); - EXPECT_FALSE(cursor_or.has_value()); - - // latest_at with non-existent topic - auto latest_or = reader.latestAt(QueryPoint{.topic_id = 999, .t = 500}); - EXPECT_FALSE(latest_or.has_value()); -} - -// =========================================================================== -// Test 11: begin_row rejects out-of-order timestamp -// =========================================================================== - -TEST(EngineIntegrationTest, BeginRowRejectsOutOfOrderTimestamp) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(dataset_id, "ordered", NumericType::kFloat64); - ASSERT_TRUE(handle_or.has_value()) << handle_or.error(); - TopicId topic_id = handle_or->topic_id; - - // First row at t=200 succeeds - ASSERT_TRUE(writer.beginRow(topic_id, 200).has_value()); - writer.set(topic_id, 0, 1.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // Second row at t=100 (out of order) should fail - auto status = writer.beginRow(topic_id, 100); - EXPECT_FALSE(status.has_value()); -} - -// =========================================================================== -// Test 12: Equal timestamps are allowed (non-decreasing) -// =========================================================================== - -TEST(EngineIntegrationTest, EqualTimestampsAllowed) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - DataWriter writer = engine.createWriter(); - auto handle_or = writer.registerScalarSeries(dataset_id, "equal_ts", NumericType::kFloat64); - ASSERT_TRUE(handle_or.has_value()) << handle_or.error(); - TopicId topic_id = handle_or->topic_id; - - // Two rows at t=100 should both succeed - ASSERT_TRUE(writer.beginRow(topic_id, 100).has_value()); - writer.set(topic_id, 0, 1.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - ASSERT_TRUE(writer.beginRow(topic_id, 100).has_value()); - writer.set(topic_id, 0, 2.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - // Third row at higher timestamp also succeeds - ASSERT_TRUE(writer.beginRow(topic_id, 200).has_value()); - writer.set(topic_id, 0, 3.0); - ASSERT_TRUE(writer.finishRow(topic_id).has_value()); - - auto flushed = writer.flushAll(); - engine.commitChunks(std::move(flushed)); - - DataReader reader = engine.createReader(); - std::size_t count = 0; - auto cursor_or = reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = 300}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow&) { ++count; }); - EXPECT_EQ(count, 3U); -} - -// =========================================================================== -// Test 13: Bulk append_columns spanning multiple chunks -// =========================================================================== - -TEST(EngineIntegrationTest, BulkAppendColumnsMultiChunk) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "bulk_test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()) << dataset_id_or.error(); - DatasetId dataset_id = *dataset_id_or; - - // Create a 3-column schema: float32 x, y, z - auto x = makePrimitive("x", PrimitiveType::kFloat32); - auto y = makePrimitive("y", PrimitiveType::kFloat32); - auto z = makePrimitive("z", PrimitiveType::kFloat32); - auto schema_tree = makeStruct("imu", {x, y, z}); - - DataWriter writer = engine.createWriter(); - auto schema_id_or = writer.registerSchema("imu_schema", schema_tree); - ASSERT_TRUE(schema_id_or.has_value()) << schema_id_or.error(); - - TopicDescriptor topic_desc; - topic_desc.name = "imu_data"; - topic_desc.schema_id = *schema_id_or; - topic_desc.max_chunk_rows = 256; // small chunks to force splitting - auto topic_id_or = writer.registerTopic(dataset_id, topic_desc); - ASSERT_TRUE(topic_id_or.has_value()) << topic_id_or.error(); - TopicId topic_id = *topic_id_or; - - // Prepare 1000 rows of bulk data - constexpr std::size_t N = 1000; - std::vector timestamps(N); - std::vector x_vals(N), y_vals(N), z_vals(N); - for (std::size_t i = 0; i < N; ++i) { - timestamps[i] = static_cast(i) * 1000; - x_vals[i] = static_cast(i) * 0.1F; - y_vals[i] = static_cast(i) * 0.2F; - z_vals[i] = static_cast(i) * 0.3F; - } - - std::vector columns = { - ColumnData::Float32(0, x_vals), - ColumnData::Float32(1, y_vals), - ColumnData::Float32(2, z_vals), - }; - auto status = writer.appendColumns(topic_id, timestamps, columns); - ASSERT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - EXPECT_FALSE(flushed.empty()); - engine.commitChunks(std::move(flushed)); - - // Should have multiple chunks (1000 / 256 = 4 chunks) - const TopicStorage* storage = engine.getTopicStorage(topic_id); - ASSERT_NE(storage, nullptr); - EXPECT_GE(storage->sealedChunks().size(), 3U); - - // Verify round-trip via range_query - DataReader reader = engine.createReader(); - std::size_t count = 0; - auto cursor_or = - reader.rangeQuery(QueryRange{.topic_id = topic_id, .t_min = 0, .t_max = static_cast(N - 1) * 1000}); - ASSERT_TRUE(cursor_or.has_value()) << cursor_or.error(); - cursor_or->forEach([&count](const SampleRow& row) { - ASSERT_NE(row.chunk, nullptr); - ++count; - }); - EXPECT_EQ(count, N); - - // Spot-check specific values via latest_at - auto latest_or = reader.latestAt(QueryPoint{.topic_id = topic_id, .t = 500 * 1000}); - ASSERT_TRUE(latest_or.has_value()) << latest_or.error(); - ASSERT_TRUE(latest_or->has_value()); - EXPECT_EQ((*latest_or)->timestamp, 500 * 1000); - EXPECT_FLOAT_EQ( - static_cast((*latest_or)->chunk->readNumericAsDouble(0, (*latest_or)->row_index)), 500.0F * 0.1F); - EXPECT_FLOAT_EQ( - static_cast((*latest_or)->chunk->readNumericAsDouble(1, (*latest_or)->row_index)), 500.0F * 0.2F); -} - -// =========================================================================== -// Test 14: Bulk append error handling -// =========================================================================== - -TEST(EngineIntegrationTest, BulkAppendErrorHandling) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "err_test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()); - DatasetId dataset_id = *dataset_id_or; - - DataWriter writer = engine.createWriter(); - - // Non-existent topic - { - const Timestamp ts[] = {1}; - const float vals[] = {1.0F}; - std::vector cols = { - ColumnData::Float32(0, Span(vals, 1)), - }; - auto status = writer.appendColumns(999, Span(ts, 1), cols); - EXPECT_FALSE(status.has_value()); - } - - // Mismatched column count - { - auto tree = makePrimitive("val", PrimitiveType::kFloat32); - auto sid = *writer.registerSchema("s1", tree); - TopicDescriptor desc; - desc.name = "t1"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(dataset_id, desc); - - const Timestamp ts[] = {1, 2, 3}; - const float vals[] = {1.0F, 2.0F}; // 2 values, 3 timestamps - std::vector cols = { - ColumnData::Float32(0, Span(vals, 2)), - }; - auto status = writer.appendColumns(tid, Span(ts, 3), cols); - EXPECT_FALSE(status.has_value()); - } -} - -// =========================================================================== -// Test 15: Bulk append empty is a no-op -// =========================================================================== - -TEST(EngineIntegrationTest, BulkAppendEmpty) { - DataEngine engine; - - auto dataset_id_or = engine.createDataset(DatasetDescriptor{.source_name = "empty_test", .time_domain_id = 0}); - ASSERT_TRUE(dataset_id_or.has_value()); - DatasetId dataset_id = *dataset_id_or; - - auto tree = makePrimitive("val", PrimitiveType::kFloat64); - DataWriter writer = engine.createWriter(); - auto sid = *writer.registerSchema("s_empty", tree); - TopicDescriptor desc; - desc.name = "empty_topic"; - desc.schema_id = sid; - auto tid = *writer.registerTopic(dataset_id, desc); - - // Empty append should succeed and produce no data - auto status = writer.appendColumns(tid, Span(), Span()); - EXPECT_TRUE(status.has_value()) << status.error(); - - auto flushed = writer.flushAll(); - EXPECT_TRUE(flushed.empty()); -} - -// ========================================================================= -// Cross-engine flush (flushTo) — zero-copy chunk transfer -// ========================================================================= - -namespace { - -// Builds two engines with the same topic registered (lockstep pattern used by -// pj4's StreamingSourceManager dual-buffer) and writes `row_count` rows to src. -struct FlushFixture { - DataEngine src; - DataEngine dst; - DatasetId src_dataset = 0; - DatasetId dst_dataset = 0; - ScalarSeriesHandle src_handle; - ScalarSeriesHandle dst_handle; -}; - -FlushFixture buildFlushFixture(const std::string& topic = "scalar/topic") { - FlushFixture f; - f.src_dataset = *f.src.createDataset(DatasetDescriptor{.source_name = "src", .time_domain_id = 0}); - f.dst_dataset = *f.dst.createDataset(DatasetDescriptor{.source_name = "dst", .time_domain_id = 0}); - - DataWriter sw = f.src.createWriter(); - f.src_handle = *sw.registerScalarSeries(f.src_dataset, topic, NumericType::kFloat64); - - DataWriter dw = f.dst.createWriter(); - f.dst_handle = *dw.registerScalarSeries(f.dst_dataset, topic, NumericType::kFloat64); - return f; -} - -void writeScalars(DataEngine& engine, ScalarSeriesHandle handle, Timestamp start, std::size_t count) { - DataWriter w = engine.createWriter(); - for (std::size_t i = 0; i < count; ++i) { - w.appendScalar(handle, start + static_cast(i) * 1000, static_cast(i)); - } - auto flushed = w.flushAll(); - engine.commitChunks(std::move(flushed)); -} - -} // namespace - -TEST(DataEngineFlushTest, MovesAllChunksFromSrcToDst) { - auto f = buildFlushFixture(); - writeScalars(f.src, f.src_handle, /*start=*/0, /*count=*/2500); // ~3 chunks at default 1024 rows. - - const auto* src_storage = f.src.getTopicStorage(f.src_handle.topic_id); - const auto* dst_storage = f.dst.getTopicStorage(f.dst_handle.topic_id); - ASSERT_NE(src_storage, nullptr); - ASSERT_NE(dst_storage, nullptr); - - const std::size_t pre_src_chunks = src_storage->sealedChunks().size(); - ASSERT_GE(pre_src_chunks, 2U); - ASSERT_EQ(dst_storage->sealedChunks().size(), 0U); - - auto result = f.src.flushTo(f.dst); - ASSERT_TRUE(result.has_value()) << result.error(); - - EXPECT_EQ(src_storage->sealedChunks().size(), 0U); - EXPECT_EQ(dst_storage->sealedChunks().size(), pre_src_chunks); - - // The destination can read the data via the standard reader interface. - DataReader reader = f.dst.createReader(); - std::size_t count = 0; - auto cursor = reader.rangeQuery( - QueryRange{ - .topic_id = f.dst_handle.topic_id, - .t_min = 0, - .t_max = static_cast(2499) * 1000, - }); - ASSERT_TRUE(cursor.has_value()) << cursor.error(); - cursor->forEach([&count](const SampleRow& row) { - (void)row; - ++count; - }); - EXPECT_EQ(count, 2500U); -} - -TEST(DataEngineFlushTest, AppendsToExistingDstChunks) { - auto f = buildFlushFixture(); - // dst already has data covering [0, 1023*1000]. - writeScalars(f.dst, f.dst_handle, /*start=*/0, /*count=*/1024); - // src has the next window [1024*1000, 2047*1000]. - writeScalars(f.src, f.src_handle, /*start=*/static_cast(1024) * 1000, /*count=*/1024); - - ASSERT_TRUE(f.src.flushTo(f.dst).has_value()); - - DataReader reader = f.dst.createReader(); - std::size_t count = 0; - auto cursor = reader.rangeQuery( - QueryRange{ - .topic_id = f.dst_handle.topic_id, - .t_min = 0, - .t_max = static_cast(2047) * 1000, - }); - ASSERT_TRUE(cursor.has_value()); - cursor->forEach([&count](const SampleRow& row) { - (void)row; - ++count; - }); - EXPECT_EQ(count, 2048U); -} - -TEST(DataEngineFlushTest, RejectsMonotonicityViolation) { - auto f = buildFlushFixture(); - writeScalars(f.dst, f.dst_handle, /*start=*/static_cast(1000) * 1000, /*count=*/1024); - writeScalars(f.src, f.src_handle, /*start=*/0, /*count=*/1024); // earlier than dst. - - const auto* src_storage = f.src.getTopicStorage(f.src_handle.topic_id); - const auto* dst_storage = f.dst.getTopicStorage(f.dst_handle.topic_id); - const std::size_t pre_src = src_storage->sealedChunks().size(); - const std::size_t pre_dst = dst_storage->sealedChunks().size(); - - auto result = f.src.flushTo(f.dst); - EXPECT_FALSE(result.has_value()); - - // Neither engine mutated. - EXPECT_EQ(src_storage->sealedChunks().size(), pre_src); - EXPECT_EQ(dst_storage->sealedChunks().size(), pre_dst); -} - -TEST(DataEngineFlushTest, RejectsUnknownTopicInDst) { - DataEngine src, dst; - DatasetId src_dataset = *src.createDataset(DatasetDescriptor{.source_name = "src", .time_domain_id = 0}); - DatasetId dst_dataset = *dst.createDataset(DatasetDescriptor{.source_name = "dst", .time_domain_id = 0}); - - DataWriter sw = src.createWriter(); - auto src_handle = *sw.registerScalarSeries(src_dataset, "only/in/src", NumericType::kFloat64); - - DataWriter dw = dst.createWriter(); - (void)dw.registerScalarSeries(dst_dataset, "only/in/dst", NumericType::kFloat64); - - writeScalars(src, src_handle, /*start=*/0, /*count=*/100); - - auto result = src.flushTo(dst); - EXPECT_FALSE(result.has_value()); - const auto* src_storage = src.getTopicStorage(src_handle.topic_id); - EXPECT_GE(src_storage->sealedChunks().size(), 1U); // src not mutated. -} - -TEST(DataEngineFlushTest, RejectsSameEngine) { - DataEngine engine; - auto dataset_id = *engine.createDataset(DatasetDescriptor{.source_name = "self", .time_domain_id = 0}); - DataWriter w = engine.createWriter(); - auto handle = *w.registerScalarSeries(dataset_id, "topic", NumericType::kFloat64); - writeScalars(engine, handle, /*start=*/0, /*count=*/100); - - auto result = engine.flushTo(engine); - EXPECT_FALSE(result.has_value()); - const auto* storage = engine.getTopicStorage(handle.topic_id); - EXPECT_GE(storage->sealedChunks().size(), 1U); // not mutated. -} - -TEST(DataEngineFlushTest, PreservesTopicRegistrationOnSrc) { - auto f = buildFlushFixture(); - writeScalars(f.src, f.src_handle, /*start=*/0, /*count=*/1024); - - ASSERT_TRUE(f.src.flushTo(f.dst).has_value()); - - // src topic is still registered — a fresh writer can push more data. - const auto* src_storage = f.src.getTopicStorage(f.src_handle.topic_id); - ASSERT_NE(src_storage, nullptr); - EXPECT_TRUE(src_storage->empty()); - - writeScalars(f.src, f.src_handle, /*start=*/static_cast(2000) * 1000, /*count=*/100); - EXPECT_FALSE(src_storage->empty()); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/object_store_test.cpp b/pj_datastore/tests/object_store_test.cpp deleted file mode 100644 index bd825073..00000000 --- a/pj_datastore/tests/object_store_test.cpp +++ /dev/null @@ -1,737 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/object_store.hpp" - -#include - -#include -#include -#include -#include -#include -#include -#include - -namespace PJ { -namespace { - -ObjectTopicId registerTestTopic(ObjectStore& store, const std::string& name = "test/topic") { - auto id_or = store.registerTopic({.dataset_id = 1, .topic_name = name, .metadata_json = "{}"}); - EXPECT_TRUE(id_or.has_value()); - return *id_or; -} - -std::vector makePayload(size_t size, uint8_t fill = 0xAB) { - return std::vector(size, fill); -} - -// ========================================================================= -// Registration -// ========================================================================= - -TEST(ObjectStoreTest, RegisterAndDescriptor) { - ObjectStore store; - auto id = registerTestTopic(store, "cam/image"); - auto desc = store.descriptor(id); - EXPECT_EQ(desc.topic_name, "cam/image"); - EXPECT_EQ(desc.dataset_id, 1u); -} - -TEST(ObjectStoreTest, DuplicateRegistrationFails) { - ObjectStore store; - registerTestTopic(store, "cam/image"); - auto dup = store.registerTopic({.dataset_id = 1, .topic_name = "cam/image", .metadata_json = "{}"}); - EXPECT_FALSE(dup.has_value()); -} - -TEST(ObjectStoreTest, SameNameDifferentDatasetOk) { - ObjectStore store; - auto id1 = registerTestTopic(store, "cam/image"); - auto id2_or = store.registerTopic({.dataset_id = 2, .topic_name = "cam/image", .metadata_json = "{}"}); - ASSERT_TRUE(id2_or.has_value()); - EXPECT_NE(id1.id, id2_or->id); -} - -TEST(ObjectStoreTest, FindTopicReturnsRegisteredId) { - ObjectStore store; - auto id = registerTestTopic(store, "cam/image"); - auto found = store.findTopic(1, "cam/image"); - ASSERT_TRUE(found.has_value()); - EXPECT_EQ(found->id, id.id); -} - -TEST(ObjectStoreTest, FindTopicMissingReturnsNullopt) { - ObjectStore store; - registerTestTopic(store, "cam/image"); - EXPECT_FALSE(store.findTopic(1, "other/topic").has_value()); - EXPECT_FALSE(store.findTopic(99, "cam/image").has_value()); -} - -TEST(ObjectStoreTest, ListTopics) { - ObjectStore store; - auto id1 = registerTestTopic(store, "topic_a"); - auto id2 = registerTestTopic(store, "topic_b"); - auto all = store.listTopics(); - EXPECT_EQ(all.size(), 2u); -} - -TEST(ObjectStoreTest, ListTopicsByDataset) { - ObjectStore store; - store.registerTopic({.dataset_id = 1, .topic_name = "a", .metadata_json = "{}"}); - store.registerTopic({.dataset_id = 2, .topic_name = "b", .metadata_json = "{}"}); - store.registerTopic({.dataset_id = 1, .topic_name = "c", .metadata_json = "{}"}); - auto ds1 = store.listTopics(1); - auto ds2 = store.listTopics(2); - EXPECT_EQ(ds1.size(), 2u); - EXPECT_EQ(ds2.size(), 1u); -} - -// ========================================================================= -// Push + basic queries -// ========================================================================= - -TEST(ObjectStoreTest, PushOwnedAndEntryCount) { - ObjectStore store; - auto id = registerTestTopic(store); - constexpr size_t kCount = 100; - for (size_t i = 0; i < kCount; ++i) { - auto ts = static_cast(i) * 1000; - auto result = store.pushOwned(id, ts, makePayload(64)); - ASSERT_TRUE(result.has_value()) << result.error(); - } - EXPECT_EQ(store.entryCount(id), kCount); -} - -TEST(ObjectStoreTest, TimeRange) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(8)); - store.pushOwned(id, 500, makePayload(8)); - store.pushOwned(id, 900, makePayload(8)); - auto [t_min, t_max] = store.timeRange(id); - EXPECT_EQ(t_min, 100); - EXPECT_EQ(t_max, 900); -} - -TEST(ObjectStoreTest, EmptyTopicQueries) { - ObjectStore store; - auto id = registerTestTopic(store); - EXPECT_EQ(store.entryCount(id), 0u); - auto [t_min, t_max] = store.timeRange(id); - EXPECT_EQ(t_min, 0); - EXPECT_EQ(t_max, 0); - EXPECT_FALSE(store.latestAt(id, 1000).has_value()); - EXPECT_FALSE(store.at(id, 0).has_value()); - EXPECT_FALSE(store.indexAt(id, 1000).has_value()); -} - -TEST(ObjectStoreTest, UnknownTopicQueries) { - ObjectStore store; - ObjectTopicId bogus{999}; - EXPECT_EQ(store.entryCount(bogus), 0u); - EXPECT_FALSE(store.latestAt(bogus, 0).has_value()); - EXPECT_FALSE(store.at(bogus, 0).has_value()); - auto result = store.pushOwned(bogus, 0, makePayload(8)); - EXPECT_FALSE(result.has_value()); -} - -// ========================================================================= -// latestAt semantics -// ========================================================================= - -TEST(ObjectStoreTest, LatestAtExact) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4, 0x01)); - store.pushOwned(id, 200, makePayload(4, 0x02)); - store.pushOwned(id, 300, makePayload(4, 0x03)); - - auto r = store.latestAt(id, 200); - ASSERT_TRUE(r.has_value()); - EXPECT_EQ(r->timestamp, 200); - EXPECT_EQ(r->payload.bytes[0], 0x02); -} - -TEST(ObjectStoreTest, LatestAtBetween) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4, 0x01)); - store.pushOwned(id, 300, makePayload(4, 0x03)); - - auto r = store.latestAt(id, 200); - ASSERT_TRUE(r.has_value()); - EXPECT_EQ(r->timestamp, 100); -} - -TEST(ObjectStoreTest, LatestAtBeforeFirst) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - EXPECT_FALSE(store.latestAt(id, 50).has_value()); -} - -TEST(ObjectStoreTest, LatestAtAfterLast) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4, 0x01)); - store.pushOwned(id, 200, makePayload(4, 0x02)); - - auto r = store.latestAt(id, 999); - ASSERT_TRUE(r.has_value()); - EXPECT_EQ(r->timestamp, 200); -} - -// ========================================================================= -// at(index) -// ========================================================================= - -TEST(ObjectStoreTest, AtValidIndex) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4, 0x01)); - store.pushOwned(id, 200, makePayload(4, 0x02)); - - auto r0 = store.at(id, 0); - ASSERT_TRUE(r0.has_value()); - EXPECT_EQ(r0->timestamp, 100); - - auto r1 = store.at(id, 1); - ASSERT_TRUE(r1.has_value()); - EXPECT_EQ(r1->timestamp, 200); -} - -TEST(ObjectStoreTest, AtOutOfRange) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - EXPECT_FALSE(store.at(id, 1).has_value()); - EXPECT_FALSE(store.at(id, 999).has_value()); -} - -// ========================================================================= -// indexAt -// ========================================================================= - -TEST(ObjectStoreTest, IndexAtExact) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - store.pushOwned(id, 200, makePayload(4)); - store.pushOwned(id, 300, makePayload(4)); - - auto idx = store.indexAt(id, 200); - ASSERT_TRUE(idx.has_value()); - EXPECT_EQ(*idx, 1u); -} - -TEST(ObjectStoreTest, IndexAtBetween) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - store.pushOwned(id, 300, makePayload(4)); - - auto idx = store.indexAt(id, 250); - ASSERT_TRUE(idx.has_value()); - EXPECT_EQ(*idx, 0u); -} - -TEST(ObjectStoreTest, IndexAtBeforeFirst) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - EXPECT_FALSE(store.indexAt(id, 50).has_value()); -} - -// ========================================================================= -// EntryTimestampsView -// ========================================================================= - -TEST(ObjectStoreTest, EntryTimestampsView) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - store.pushOwned(id, 200, makePayload(4)); - store.pushOwned(id, 300, makePayload(4)); - - auto view = store.entryTimestamps(id); - EXPECT_EQ(view.size(), 3u); - EXPECT_EQ(view[0], 100); - EXPECT_EQ(view[1], 200); - EXPECT_EQ(view[2], 300); - - size_t count = 0; - for (auto it = view.begin(); it != view.end(); ++it) { - ++count; - } - EXPECT_EQ(count, 3u); -} - -TEST(ObjectStoreTest, EntryTimestampsViewEmpty) { - ObjectStore store; - auto id = registerTestTopic(store); - auto view = store.entryTimestamps(id); - EXPECT_TRUE(view.empty()); - EXPECT_EQ(view.size(), 0u); -} - -// ========================================================================= -// pushLazy -// ========================================================================= - -TEST(ObjectStoreTest, PushLazyResolves) { - ObjectStore store; - auto id = registerTestTopic(store); - int call_count = 0; - store.pushLazy(id, 100, [&call_count]() -> sdk::PayloadView { - ++call_count; - return sdk::makePayloadView({0xDE, 0xAD}); - }); - - EXPECT_EQ(call_count, 0); - auto r = store.latestAt(id, 100); - ASSERT_TRUE(r.has_value()); - EXPECT_EQ(call_count, 1); - EXPECT_EQ(r->payload.bytes.size(), 2u); - EXPECT_EQ(r->payload.bytes[0], 0xDE); -} - -// Regression: anchor type-erasure must survive resolveEntry. The anchor here -// is a shared_ptr (not vector); a prior static_pointer_cast to -// vector would UB. -TEST(ObjectStoreTest, PushLazyPreservesAnchorType) { - struct TestBuffer { - std::array bytes{0x11, 0x22, 0x33, 0x44}; - }; - ObjectStore store; - auto id = registerTestTopic(store); - - auto buffer = std::make_shared(); - std::weak_ptr weak_buffer = buffer; - - store.pushLazy(id, 100, [buffer]() -> sdk::PayloadView { - return sdk::PayloadView{ - Span{buffer->bytes.data(), buffer->bytes.size()}, - sdk::BufferAnchor{buffer}, - }; - }); - - auto r = store.latestAt(id, 100); - ASSERT_TRUE(r.has_value()); - ASSERT_EQ(r->payload.bytes.size(), 4u); - EXPECT_EQ(r->payload.bytes[0], 0x11); - EXPECT_EQ(r->payload.bytes[3], 0x44); - EXPECT_FALSE(weak_buffer.expired()); // anchor still holds the buffer alive -} - -// Regression: the producer's Span is a sub-range of the anchor's storage. -// resolveEntry must propagate it verbatim — not the anchor's full extent. -TEST(ObjectStoreTest, PushLazyHonorsSpanSubview) { - ObjectStore store; - auto id = registerTestTopic(store); - - auto chunk = std::make_shared>(100); - for (size_t i = 0; i < chunk->size(); ++i) { - (*chunk)[i] = static_cast(i); - } - - store.pushLazy(id, 100, [chunk]() -> sdk::PayloadView { - return sdk::PayloadView{ - Span{chunk->data() + 20, 10}, // bytes [20, 30) - sdk::BufferAnchor{chunk}, - }; - }); - - auto r = store.latestAt(id, 100); - ASSERT_TRUE(r.has_value()); - EXPECT_EQ(r->payload.bytes.size(), 10u); - EXPECT_EQ(r->payload.bytes.data(), chunk->data() + 20); - EXPECT_EQ(r->payload.bytes[0], 20); - EXPECT_EQ(r->payload.bytes[9], 29); -} - -// ========================================================================= -// Timestamp monotonicity -// ========================================================================= - -TEST(ObjectStoreTest, OutOfOrderPushFails) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 200, makePayload(4)); - auto result = store.pushOwned(id, 100, makePayload(4)); - EXPECT_FALSE(result.has_value()); - EXPECT_EQ(store.entryCount(id), 1u); -} - -TEST(ObjectStoreTest, EqualTimestampAllowed) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4)); - auto result = store.pushOwned(id, 100, makePayload(4)); - EXPECT_TRUE(result.has_value()); - EXPECT_EQ(store.entryCount(id), 2u); -} - -// ========================================================================= -// Owning handle survives eviction -// ========================================================================= - -TEST(ObjectStoreTest, HandleSurvivesEviction) { - ObjectStore store; - auto id = registerTestTopic(store); - store.pushOwned(id, 100, makePayload(4, 0xAA)); - store.pushOwned(id, 200, makePayload(4, 0xBB)); - - auto handle = store.latestAt(id, 100); - ASSERT_TRUE(handle.has_value()); - EXPECT_EQ(handle->payload.bytes[0], 0xAA); - - store.evictBefore(id, 150); - EXPECT_EQ(store.entryCount(id), 1u); - - EXPECT_EQ(handle->payload.bytes.size(), 4u); - EXPECT_EQ(handle->payload.bytes[0], 0xAA); -} - -// ========================================================================= -// evictBefore -// ========================================================================= - -TEST(ObjectStoreTest, EvictBefore) { - ObjectStore store; - auto id = registerTestTopic(store); - for (int i = 0; i < 10; ++i) { - store.pushOwned(id, static_cast(i) * 100, makePayload(8)); - } - EXPECT_EQ(store.entryCount(id), 10u); - - store.evictBefore(id, 500); - EXPECT_EQ(store.entryCount(id), 5u); - auto [t_min, t_max] = store.timeRange(id); - EXPECT_EQ(t_min, 500); - EXPECT_EQ(t_max, 900); -} - -// ========================================================================= -// removeTopic / clear -// ========================================================================= - -TEST(ObjectStoreTest, RemoveTopic) { - ObjectStore store; - auto id = registerTestTopic(store, "to_remove"); - store.pushOwned(id, 100, makePayload(4)); - store.removeTopic(id); - EXPECT_EQ(store.entryCount(id), 0u); - EXPECT_TRUE(store.listTopics().empty()); -} - -TEST(ObjectStoreTest, Clear) { - ObjectStore store; - registerTestTopic(store, "a"); - registerTestTopic(store, "b"); - EXPECT_EQ(store.listTopics().size(), 2u); - store.clear(); - EXPECT_TRUE(store.listTopics().empty()); -} - -// ========================================================================= -// Retention budget -// ========================================================================= - -TEST(ObjectStoreTest, TimeWindowRetention) { - ObjectStore store; - auto id = registerTestTopic(store); - store.setRetentionBudget(id, {.time_window_ns = 2000, .max_memory_bytes = 0}); - - for (int i = 0; i < 100; ++i) { - store.pushOwned(id, static_cast(i) * 100, makePayload(8)); - } - - auto [t_min, t_max] = store.timeRange(id); - EXPECT_GE(t_min, t_max - 2000); -} - -TEST(ObjectStoreTest, MemoryRetention) { - ObjectStore store; - auto id = registerTestTopic(store); - store.setRetentionBudget(id, {.time_window_ns = 0, .max_memory_bytes = 500}); - - for (int i = 0; i < 100; ++i) { - store.pushOwned(id, static_cast(i) * 100, makePayload(100)); - } - - EXPECT_LE(store.memoryUsage(id), 500u); -} - -TEST(ObjectStoreTest, DefaultBudgetNoEviction) { - ObjectStore store; - auto id = registerTestTopic(store); - for (int i = 0; i < 100; ++i) { - store.pushOwned(id, static_cast(i) * 100, makePayload(100)); - } - EXPECT_EQ(store.entryCount(id), 100u); -} - -TEST(ObjectStoreTest, LazyEntriesZeroMemory) { - ObjectStore store; - auto id = registerTestTopic(store); - for (int i = 0; i < 10; ++i) { - store.pushLazy(id, static_cast(i) * 100, []() -> sdk::PayloadView { - return sdk::makePayloadView(makePayload(1000)); - }); - } - EXPECT_EQ(store.memoryUsage(id), 0u); -} - -// ========================================================================= -// Concurrency (basic smoke test — M2 will add thorough tests) -// ========================================================================= - -TEST(ObjectStoreTest, ConcurrentReadWriteSmoke) { - ObjectStore store; - auto id = registerTestTopic(store); - - constexpr int kPushCount = 1000; - std::thread writer([&]() { - for (int i = 0; i < kPushCount; ++i) { - store.pushOwned(id, static_cast(i) * 100, makePayload(16)); - } - }); - - std::thread reader([&]() { - for (int i = 0; i < kPushCount; ++i) { - store.latestAt(id, static_cast(i) * 100); - store.entryCount(id); - } - }); - - writer.join(); - reader.join(); - EXPECT_EQ(store.entryCount(id), static_cast(kPushCount)); -} - -// ========================================================================= -// Cross-store flush (flushTo) -// ========================================================================= - -namespace { - -ObjectTopicId registerSameDescriptor( - ObjectStore& store, DatasetId dataset_id = 1, const std::string& name = "test/topic") { - auto id_or = store.registerTopic({.dataset_id = dataset_id, .topic_name = name, .metadata_json = "{}"}); - EXPECT_TRUE(id_or.has_value()); - return *id_or; -} - -} // namespace - -TEST(ObjectStoreFlushTest, BasicMoveOwnedEntries) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - src.pushOwned(src_id, 100, makePayload(8, 0xAA)); - src.pushOwned(src_id, 200, makePayload(8, 0xBB)); - src.pushOwned(src_id, 300, makePayload(8, 0xCC)); - - ASSERT_EQ(src.entryCount(src_id), 3u); - ASSERT_EQ(dst.entryCount(dst_id), 0u); - - auto result = src.flushTo(dst); - ASSERT_TRUE(result.has_value()) << result.error(); - - EXPECT_EQ(src.entryCount(src_id), 0u); - EXPECT_EQ(dst.entryCount(dst_id), 3u); - - auto e0 = dst.at(dst_id, 0); - ASSERT_TRUE(e0.has_value()); - EXPECT_EQ(e0->timestamp, 100); - EXPECT_EQ(e0->payload.bytes.size(), 8u); - EXPECT_EQ(e0->payload.bytes[0], 0xAA); - - auto e2 = dst.at(dst_id, 2); - ASSERT_TRUE(e2.has_value()); - EXPECT_EQ(e2->timestamp, 300); - EXPECT_EQ(e2->payload.bytes[0], 0xCC); -} - -TEST(ObjectStoreFlushTest, PreservesTopicRegistrationOnSrc) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - registerSameDescriptor(dst); - - src.pushOwned(src_id, 100, makePayload(4)); - - ASSERT_TRUE(src.flushTo(dst).has_value()); - - // src is empty but the topic is still registered — the same id can accept new pushes. - EXPECT_EQ(src.entryCount(src_id), 0u); - auto post_push = src.pushOwned(src_id, 400, makePayload(4)); - EXPECT_TRUE(post_push.has_value()); - EXPECT_EQ(src.entryCount(src_id), 1u); -} - -TEST(ObjectStoreFlushTest, AppendsToExistingDstEntries) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - dst.pushOwned(dst_id, 50, makePayload(4, 0x11)); - dst.pushOwned(dst_id, 100, makePayload(4, 0x22)); - src.pushOwned(src_id, 200, makePayload(4, 0x33)); - src.pushOwned(src_id, 300, makePayload(4, 0x44)); - - ASSERT_TRUE(src.flushTo(dst).has_value()); - - EXPECT_EQ(dst.entryCount(dst_id), 4u); - EXPECT_EQ(dst.at(dst_id, 0)->timestamp, 50); - EXPECT_EQ(dst.at(dst_id, 1)->timestamp, 100); - EXPECT_EQ(dst.at(dst_id, 2)->timestamp, 200); - EXPECT_EQ(dst.at(dst_id, 3)->timestamp, 300); -} - -TEST(ObjectStoreFlushTest, RejectsMonotonicityViolation) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - dst.pushOwned(dst_id, 200, makePayload(4)); - src.pushOwned(src_id, 100, makePayload(4)); // earlier than dst's last - - auto result = src.flushTo(dst); - EXPECT_FALSE(result.has_value()); - - // Neither store is mutated on failure. - EXPECT_EQ(src.entryCount(src_id), 1u); - EXPECT_EQ(dst.entryCount(dst_id), 1u); -} - -TEST(ObjectStoreFlushTest, RejectsUnknownTopicInDst) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src, 1, "missing/topic"); - // dst has a different topic. - registerSameDescriptor(dst, 1, "other/topic"); - - src.pushOwned(src_id, 100, makePayload(4)); - - auto result = src.flushTo(dst); - EXPECT_FALSE(result.has_value()); - EXPECT_EQ(src.entryCount(src_id), 1u); -} - -TEST(ObjectStoreFlushTest, EmptySourceSeriesSkipped) { - ObjectStore src, dst; - auto src_id_with_data = registerSameDescriptor(src, 1, "with/data"); - registerSameDescriptor(src, 1, "empty/series"); // registered but no pushes - - auto dst_id_with_data = registerSameDescriptor(dst, 1, "with/data"); - // dst does NOT register "empty/series" — flush should still succeed since src - // has no entries for it. - - src.pushOwned(src_id_with_data, 100, makePayload(4)); - - auto result = src.flushTo(dst); - EXPECT_TRUE(result.has_value()) << (result.has_value() ? "" : result.error()); - EXPECT_EQ(dst.entryCount(dst_id_with_data), 1u); -} - -TEST(ObjectStoreFlushTest, RejectsSameStore) { - ObjectStore store; - auto id = registerSameDescriptor(store); - store.pushOwned(id, 100, makePayload(4)); - - auto result = store.flushTo(store); - EXPECT_FALSE(result.has_value()); - EXPECT_EQ(store.entryCount(id), 1u); -} - -TEST(ObjectStoreFlushTest, MultipleTopicsFlushed) { - ObjectStore src, dst; - auto src_a = registerSameDescriptor(src, 1, "topic/a"); - auto src_b = registerSameDescriptor(src, 1, "topic/b"); - auto dst_a = registerSameDescriptor(dst, 1, "topic/a"); - auto dst_b = registerSameDescriptor(dst, 1, "topic/b"); - - src.pushOwned(src_a, 100, makePayload(4, 0xAA)); - src.pushOwned(src_b, 100, makePayload(4, 0xBB)); - src.pushOwned(src_a, 200, makePayload(4, 0xCC)); - src.pushOwned(src_b, 200, makePayload(4, 0xDD)); - - ASSERT_TRUE(src.flushTo(dst).has_value()); - - EXPECT_EQ(dst.entryCount(dst_a), 2u); - EXPECT_EQ(dst.entryCount(dst_b), 2u); - EXPECT_EQ(dst.at(dst_a, 0)->payload.bytes[0], 0xAA); - EXPECT_EQ(dst.at(dst_b, 1)->payload.bytes[0], 0xDD); -} - -TEST(ObjectStoreFlushTest, LazyEntriesPreserveSemantics) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - int src_invocations = 0; - src.pushLazy(src_id, 100, [&src_invocations]() -> sdk::PayloadView { - ++src_invocations; - return sdk::makePayloadView({0xDE, 0xAD, 0xBE, 0xEF}); - }); - - // Flush itself must NOT invoke the closure — it just moves the std::function. - EXPECT_EQ(src_invocations, 0); - ASSERT_TRUE(src.flushTo(dst).has_value()); - EXPECT_EQ(src_invocations, 0); - - EXPECT_EQ(src.entryCount(src_id), 0u); - EXPECT_EQ(dst.entryCount(dst_id), 1u); - - // Reading from dst invokes the closure once, exactly like in src. - auto resolved = dst.latestAt(dst_id, 100); - ASSERT_TRUE(resolved.has_value()); - EXPECT_EQ(src_invocations, 1); - ASSERT_EQ(resolved->payload.bytes.size(), 4u); - EXPECT_EQ(resolved->payload.bytes[0], 0xDE); -} - -TEST(ObjectStoreFlushTest, RetentionBudgetAppliedAfterFlush) { - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - // Time-window retention of 100 ns on dst: anything older than newest - 100 evicts. - dst.setRetentionBudget(dst_id, RetentionBudget{.time_window_ns = 100, .max_memory_bytes = 0}); - - src.pushOwned(src_id, 100, makePayload(4)); - src.pushOwned(src_id, 150, makePayload(4)); - src.pushOwned(src_id, 250, makePayload(4)); // newest after flush; evict anything < 150. - - ASSERT_TRUE(src.flushTo(dst).has_value()); - - // After flush: newest is 250, threshold = 150, so the entry at 100 evicts. - EXPECT_EQ(dst.entryCount(dst_id), 2u); - EXPECT_EQ(dst.at(dst_id, 0)->timestamp, 150); - EXPECT_EQ(dst.at(dst_id, 1)->timestamp, 250); -} - -TEST(ObjectStoreFlushTest, ZeroCopyOwnershipChainSurvives) { - // A shared_ptr handed in via pushOwned is shared between the entry's payload - // and any consumer of `at()`. After flushTo, the same shared_ptr instance - // must be the one observed in the destination — no copy, just refcount - // transfer through the move of the underlying ObjectEntry. - ObjectStore src, dst; - auto src_id = registerSameDescriptor(src); - auto dst_id = registerSameDescriptor(dst); - - src.pushOwned(src_id, 100, makePayload(16, 0x77)); - auto pre_handle = src.at(src_id, 0); - ASSERT_TRUE(pre_handle.has_value()); - const auto* pre_ptr = pre_handle->payload.anchor.get(); - - ASSERT_TRUE(src.flushTo(dst).has_value()); - - auto post_handle = dst.at(dst_id, 0); - ASSERT_TRUE(post_handle.has_value()); - EXPECT_EQ(post_handle->payload.anchor.get(), pre_ptr) << "shared_ptr identity must survive the flush"; -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/plugin_data_host_object_read_test.cpp b/pj_datastore/tests/plugin_data_host_object_read_test.cpp deleted file mode 100644 index 30cd5e0c..00000000 --- a/pj_datastore/tests/plugin_data_host_object_read_test.cpp +++ /dev/null @@ -1,199 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include -#include - -#include "pj_base/sdk/object_bytes.hpp" -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" - -namespace PJ { -namespace { - -using sdk::ObjectBytes; -using sdk::ObjectTopicHandle; -using sdk::SourceObjectWriteHostView; -using sdk::ToolboxObjectReadHostView; - -constexpr DatasetId kDatasetId = 99; - -struct Fixture { - ObjectStore store; - DatastoreSourceObjectWriteHost write_impl{store, kDatasetId}; - DatastoreToolboxObjectReadHost read_impl{store}; - SourceObjectWriteHostView writer{write_impl.raw()}; - ToolboxObjectReadHostView reader{read_impl.raw()}; -}; - -TEST(ToolboxObjectReadHostTest, ReadsBytesWrittenByWriteHost) { - Fixture f; - const auto topic = *f.writer.registerTopic("markers", R"({"media_class":"scene"})"); - - const std::vector payload{0x01, 0x02, 0x03, 0x04}; - ASSERT_TRUE(f.writer.pushOwned(topic, 1000, payload).has_value()); - - auto bytes = f.reader.readLatestAt(topic, 1500); - ASSERT_TRUE(bytes.has_value()) << bytes.error(); - ASSERT_TRUE(*bytes); - const auto view = bytes->view(); - EXPECT_EQ(view.size(), payload.size()); - EXPECT_EQ(std::vector(view.begin(), view.end()), payload); -} - -TEST(ToolboxObjectReadHostTest, ObjectBytesDestructorReleasesExactlyOnce) { - Fixture f; - const auto topic = *f.writer.registerTopic("images", "{}"); - const std::vector payload{0xAA, 0xBB}; - ASSERT_TRUE(f.writer.pushOwned(topic, 1, payload).has_value()); - - // Scope the ObjectBytes holder; destructor must release without leaks. - { - auto bytes = f.reader.readLatestAt(topic, 1); - ASSERT_TRUE(bytes.has_value()); - EXPECT_FALSE(bytes->empty()); - // Holder goes out of scope here — vtable->release_bytes runs exactly - // once. ASAN would flag double-free or leak. - } - - // Subsequent reads still work (store state unaffected). - auto again = f.reader.readLatestAt(topic, 1); - ASSERT_TRUE(again.has_value()); - EXPECT_EQ(again->view().size(), payload.size()); -} - -TEST(ToolboxObjectReadHostTest, OwningHandleSurvivesStoreMutation) { - Fixture f; - const auto topic = *f.writer.registerTopic("pointclouds", "{}"); - const std::vector original{0x10, 0x20, 0x30}; - ASSERT_TRUE(f.writer.pushOwned(topic, 100, original).has_value()); - - auto bytes = f.reader.readLatestAt(topic, 100); - ASSERT_TRUE(bytes.has_value()); - - // Mutate the store: push a new entry, evict the first one. - const std::vector replacement{0xFF}; - ASSERT_TRUE(f.writer.pushOwned(topic, 200, replacement).has_value()); - f.store.evictBefore(ObjectTopicId{topic.id}, 150); - EXPECT_EQ(f.store.entryCount(ObjectTopicId{topic.id}), 1U); - - // The original handle still points at the original bytes — the - // shared_ptr inside the handle kept them alive despite eviction. - const auto view = bytes->view(); - EXPECT_EQ(std::vector(view.begin(), view.end()), original); -} - -TEST(ToolboxObjectReadHostTest, LookupTopicByName) { - Fixture f; - const auto registered = *f.writer.registerTopic("lidar/front", "{}"); - - const auto found = f.reader.lookupTopic("lidar/front"); - ASSERT_TRUE(found.has_value()); - EXPECT_EQ(found->id, registered.id); - - EXPECT_FALSE(f.reader.lookupTopic("no-such-topic").has_value()); -} - -TEST(ToolboxObjectReadHostTest, ListTopicsReturnsAllRegistered) { - Fixture f; - const auto a = *f.writer.registerTopic("a", "{}"); - const auto b = *f.writer.registerTopic("b", "{}"); - const auto c = *f.writer.registerTopic("c", "{}"); - - auto topics = f.reader.listTopics(); - ASSERT_TRUE(topics.has_value()) << topics.error(); - ASSERT_EQ(topics->size(), 3U); - // Order matches insertion in the ObjectStore. - EXPECT_EQ((*topics)[0].id, a.id); - EXPECT_EQ((*topics)[1].id, b.id); - EXPECT_EQ((*topics)[2].id, c.id); -} - -TEST(ToolboxObjectReadHostTest, TopicMetadataRoundTrip) { - Fixture f; - const auto topic = *f.writer.registerTopic("camera", R"({"media_class":"image","encoding":"jpeg"})"); - EXPECT_EQ(f.reader.topicMetadata(topic), R"({"media_class":"image","encoding":"jpeg"})"); -} - -TEST(ToolboxObjectReadHostTest, EntryCountAndTimeRange) { - Fixture f; - const auto topic = *f.writer.registerTopic("stream", "{}"); - const std::vector one{0x01}; - const std::vector two{0x02}; - const std::vector three{0x03}; - ASSERT_TRUE(f.writer.pushOwned(topic, 10, one).has_value()); - ASSERT_TRUE(f.writer.pushOwned(topic, 20, two).has_value()); - ASSERT_TRUE(f.writer.pushOwned(topic, 30, three).has_value()); - - EXPECT_EQ(f.reader.entryCount(topic), 3U); - const auto range = f.reader.timeRange(topic); - EXPECT_EQ(range.first, 10); - EXPECT_EQ(range.second, 30); -} - -TEST(ToolboxObjectReadHostTest, ReadLatestAtReturnsErrorOnMiss) { - Fixture f; - const auto topic = *f.writer.registerTopic("empty", "{}"); - - auto bytes = f.reader.readLatestAt(topic, 42); - EXPECT_FALSE(bytes.has_value()); -} - -TEST(ToolboxObjectReadHostTest, HandleSurvivesAcrossThreads) { - Fixture f; - const auto topic = *f.writer.registerTopic("threaded", "{}"); - const std::vector payload(256, 0x7F); - ASSERT_TRUE(f.writer.pushOwned(topic, 1, payload).has_value()); - - auto bytes = f.reader.readLatestAt(topic, 1); - ASSERT_TRUE(bytes.has_value()); - - // Move the holder into a worker thread. Writer mutates the store - // concurrently; the consumer's view remains valid until the worker - // drops the holder. - std::thread worker([b = std::move(*bytes), &payload]() { - const auto view = b.view(); - ASSERT_EQ(view.size(), payload.size()); - for (std::size_t i = 0; i < payload.size(); ++i) { - ASSERT_EQ(view[i], payload[i]); - } - }); - // Meanwhile the main thread can still push new entries and evict. - const std::vector other{0x00}; - ASSERT_TRUE(f.writer.pushOwned(topic, 2, other).has_value()); - f.store.evictBefore(ObjectTopicId{topic.id}, 2); - - worker.join(); -} - -TEST(ToolboxObjectReadHostTest, ViewReportsInvalidWhenUnbound) { - ToolboxObjectReadHostView empty; - EXPECT_FALSE(empty.valid()); - EXPECT_FALSE(empty.lookupTopic("x").has_value()); - EXPECT_FALSE(empty.listTopics().has_value()); - EXPECT_FALSE(empty.readLatestAt(ObjectTopicHandle{1}, 0).has_value()); - EXPECT_EQ(empty.entryCount(ObjectTopicHandle{1}), 0U); - EXPECT_EQ(empty.timeRange(ObjectTopicHandle{1}).first, 0); -} - -TEST(ToolboxObjectReadHostTest, MovedObjectBytesIsSafelyEmptied) { - Fixture f; - const auto topic = *f.writer.registerTopic("move", "{}"); - const std::vector one_byte{0xAB}; - ASSERT_TRUE(f.writer.pushOwned(topic, 5, one_byte).has_value()); - - auto a = f.reader.readLatestAt(topic, 5); - ASSERT_TRUE(a.has_value()); - ObjectBytes moved = std::move(*a); - EXPECT_FALSE(a->empty() && moved.empty()) << "both cannot be empty after move"; - EXPECT_TRUE(a->empty()); // moved-from holder releases nothing on destruction. - EXPECT_FALSE(moved.empty()); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/plugin_data_host_object_test.cpp b/pj_datastore/tests/plugin_data_host_object_test.cpp deleted file mode 100644 index 9014525a..00000000 --- a/pj_datastore/tests/plugin_data_host_object_test.cpp +++ /dev/null @@ -1,290 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include -#include -#include -#include - -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" - -namespace PJ { -namespace { - -using sdk::ObjectTopicHandle; -using sdk::SourceObjectWriteHostView; - -constexpr DatasetId kDatasetId = 42; - -struct Fixture { - ObjectStore store; - DatastoreSourceObjectWriteHost host_impl{store, kDatasetId}; - SourceObjectWriteHostView host{host_impl.raw()}; -}; - -TEST(PluginDataHostObjectTest, RegisterTopicReturnsUsableHandle) { - Fixture f; - auto handle = f.host.registerTopic("markers", R"({"media_class":"scene"})"); - ASSERT_TRUE(handle.has_value()) << handle.error(); - EXPECT_NE(handle->id, 0U); - - // Metadata round-trips through the store. - const auto topics = f.store.listTopics(kDatasetId); - ASSERT_EQ(topics.size(), 1U); - const auto& desc = f.store.descriptor(topics[0]); - EXPECT_EQ(desc.topic_name, "markers"); - EXPECT_EQ(desc.metadata_json, R"({"media_class":"scene"})"); - EXPECT_EQ(desc.dataset_id, kDatasetId); -} - -TEST(PluginDataHostObjectTest, RegisterTopicRejectsDuplicateName) { - Fixture f; - ASSERT_TRUE(f.host.registerTopic("markers", "{}").has_value()); - auto again = f.host.registerTopic("markers", "{}"); - EXPECT_FALSE(again.has_value()); -} - -TEST(PluginDataHostObjectTest, PushOwnedStoresBytes) { - Fixture f; - const auto topic = *f.host.registerTopic("markers", "{}"); - - const std::vector payload = {1, 2, 3, 4, 5}; - auto status = f.host.pushOwned(topic, 1000, payload); - ASSERT_TRUE(status.has_value()) << status.error(); - status = f.host.pushOwned(topic, 2000, payload); - ASSERT_TRUE(status.has_value()) << status.error(); - - const ObjectTopicId store_id{topic.id}; - EXPECT_EQ(f.store.entryCount(store_id), 2U); - auto resolved = f.store.latestAt(store_id, 2000); - ASSERT_TRUE(resolved.has_value()); - ASSERT_NE(resolved->payload.anchor, nullptr); - EXPECT_EQ(resolved->payload.bytes.size(), payload.size()); - EXPECT_TRUE( - std::equal(resolved->payload.bytes.begin(), resolved->payload.bytes.end(), payload.begin(), payload.end())); -} - -TEST(PluginDataHostObjectTest, PushLazyRetainsClosureUntilEviction) { - Fixture f; - const auto topic = *f.host.registerTopic("images", R"({"media_class":"image"})"); - - // Use an atomic destroy-counter embedded in the shared state to prove the - // fetch_ctx_destroy callback runs exactly once per evicted entry. - struct SharedState { - std::atomic fetch_calls{0}; - std::atomic destroy_calls{0}; - std::vector payload; - }; - auto shared = std::make_shared(); - shared->payload = {0xDE, 0xAD, 0xBE, 0xEF}; - - auto closure = [shared]() -> std::vector { - shared->fetch_calls.fetch_add(1); - return shared->payload; - }; - - auto status = f.host.pushLazy(topic, 42, closure); - ASSERT_TRUE(status.has_value()) << status.error(); - - // Each read invokes the fetch closure. - auto first = f.store.latestAt(ObjectTopicId{topic.id}, 42); - ASSERT_TRUE(first.has_value()); - ASSERT_NE(first->payload.anchor, nullptr); - EXPECT_TRUE( - std::equal( - first->payload.bytes.begin(), first->payload.bytes.end(), shared->payload.begin(), shared->payload.end())); - EXPECT_GE(shared->fetch_calls.load(), 1); - - auto second = f.store.latestAt(ObjectTopicId{topic.id}, 42); - ASSERT_TRUE(second.has_value()); - EXPECT_GE(shared->fetch_calls.load(), 2); - - // Destroy has not been invoked yet — the entry is still alive. - // (The test's `shared` is one ref; the closure captured in the store is - // another; a temporary held by the ObjectStore's fetch wrapper is a - // third. Refcount is implementation-detail — assert the visible effect.) - // SharedState has NOT been destroyed, so destroy_calls is still 0. - EXPECT_EQ(shared->destroy_calls.load(), 0); - - // Evict: store drops the entry, which drops the std::function, which drops - // the plugin's shared holder, which runs fetch_ctx_destroy. This test - // can't directly observe fetch_ctx_destroy because the SDK's LazyBox - // destroy just `delete`s its box; but by construction `closure` owns - // only `shared`, and when the store drops its copy of closure, only our - // local `closure` variable + our local `shared` remain. We can still - // verify that the closure is gone from the store by observing - // `entryCount` drop to zero. - f.store.evictBefore(ObjectTopicId{topic.id}, 100); - EXPECT_EQ(f.store.entryCount(ObjectTopicId{topic.id}), 0U); -} - -TEST(PluginDataHostObjectTest, PushLazyDestroyCallbackRunsExactlyOnceOnEviction) { - // Integration test using the raw C ABI — explicitly verifies the destroy - // callback fires exactly once when the entry is evicted from the store. - Fixture f; - const auto topic = *f.host.registerTopic("pointclouds", R"({"media_class":"pointcloud"})"); - - struct Ctx { - std::atomic destroy_count{0}; - std::vector last_bytes; - std::vector payload{0x11, 0x22, 0x33}; - }; - auto* ctx = new Ctx(); - - auto fetch_fn = [](void* c, const uint8_t** out_data, uint64_t* out_size) noexcept -> bool { - auto* self = static_cast(c); - self->last_bytes = self->payload; - *out_data = self->last_bytes.data(); - *out_size = self->last_bytes.size(); - return true; - }; - auto destroy_fn = [](void* c) noexcept { static_cast(c)->destroy_count.fetch_add(1); }; - - const auto raw = f.host.raw(); - PJ_error_t err{}; - ASSERT_TRUE(raw.vtable->push_lazy(raw.ctx, topic, 100, fetch_fn, ctx, destroy_fn, &err)); - - // Fetch once — the callback runs but the ctx stays alive. - auto resolved = f.store.latestAt(ObjectTopicId{topic.id}, 100); - ASSERT_TRUE(resolved.has_value()); - EXPECT_TRUE( - std::equal( - resolved->payload.bytes.begin(), resolved->payload.bytes.end(), ctx->payload.begin(), ctx->payload.end())); - EXPECT_EQ(ctx->destroy_count.load(), 0); - - // Evict — destroy_fn runs exactly once. - f.store.evictBefore(ObjectTopicId{topic.id}, 1000); - EXPECT_EQ(f.store.entryCount(ObjectTopicId{topic.id}), 0U); - EXPECT_EQ(ctx->destroy_count.load(), 1); - - delete ctx; // clean up the raw box we allocated in the test. -} - -TEST(PluginDataHostObjectTest, PushLazyWithNullFetchFnFails) { - Fixture f; - const auto topic = *f.host.registerTopic("bogus", "{}"); - - const auto raw = f.host.raw(); - PJ_error_t err{}; - std::atomic destroyed{0}; - auto destroy_fn = [](void* c) noexcept { static_cast*>(c)->fetch_add(1); }; - EXPECT_FALSE(raw.vtable->push_lazy(raw.ctx, topic, 1, nullptr, &destroyed, destroy_fn, &err)); - // Even on failure, the store calls destroy_fn to free plugin-owned ctx. - EXPECT_EQ(destroyed.load(), 1); -} - -TEST(PluginDataHostObjectTest, PushRejectsUnknownTopicHandle) { - Fixture f; - const ObjectTopicHandle bogus{99999}; - const std::vector payload{1, 2, 3}; - auto status = f.host.pushOwned(bogus, 1, payload); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostObjectTest, SetRetentionBudgetEnforcesTimeWindow) { - Fixture f; - const auto topic = *f.host.registerTopic("rolling", "{}"); - - // 10 ns window. Pushes at t=0,1,...,100 — only entries within 10 ns of - // the newest timestamp should survive. - f.host.setRetentionBudget(topic, /*time_window_ns=*/10, /*max_memory_bytes=*/0); - const std::vector payload{0xAA}; - for (int64_t t = 0; t <= 100; ++t) { - ASSERT_TRUE(f.host.pushOwned(topic, t, payload).has_value()); - } - // Entries older than 90 ns (100 - 10) are evicted. - const auto range = f.store.timeRange(ObjectTopicId{topic.id}); - EXPECT_GE(range.first, 90); - EXPECT_EQ(range.second, 100); -} - -TEST(PluginDataHostObjectTest, ViewReportsNotBoundWhenRawIsEmpty) { - SourceObjectWriteHostView empty; - EXPECT_FALSE(empty.valid()); - auto status = empty.pushOwned(ObjectTopicHandle{1}, 0, {}); - EXPECT_FALSE(status.has_value()); -} - -// =========================================================================== -// setTarget — streaming two-store flow -// =========================================================================== - -TEST(PluginDataHostObjectTest, SetTargetRedirectsRegisterAndPushToSecondary) { - // Simulates the streaming pause/resume routing: the manager flips the host - // between a primary and a secondary store. After setTarget(secondary) the - // host's registerTopic + pushOwned land in the secondary; pushes against - // primary stop. Flipping back to primary resumes routing there. - ObjectStore primary; - ObjectStore secondary; - DatastoreSourceObjectWriteHost host_impl{primary, kDatasetId}; - SourceObjectWriteHostView host{host_impl.raw()}; - - // Lockstep registration on both stores BEFORE the swap, so the topic id - // is the same on each side (auto-counter ticks identically). This matches - // the manager wiring described in the two-store plan. - const auto primary_topic = *host.registerTopic("cam", "{}"); - host_impl.setTarget(&secondary); - const auto secondary_topic = *host.registerTopic("cam", "{}"); - EXPECT_EQ(primary_topic.id, secondary_topic.id); - - // A push now must land in secondary, not primary. - const std::vector payload_a{0x01, 0x02}; - ASSERT_TRUE(host.pushOwned(secondary_topic, 1000, payload_a).has_value()); - EXPECT_EQ(primary.entryCount(ObjectTopicId{primary_topic.id}), 0U); - EXPECT_EQ(secondary.entryCount(ObjectTopicId{secondary_topic.id}), 1U); - - // Flip back: subsequent pushes return to primary. - host_impl.setTarget(&primary); - const std::vector payload_b{0x03}; - ASSERT_TRUE(host.pushOwned(primary_topic, 2000, payload_b).has_value()); - EXPECT_EQ(primary.entryCount(ObjectTopicId{primary_topic.id}), 1U); - EXPECT_EQ(secondary.entryCount(ObjectTopicId{secondary_topic.id}), 1U); -} - -TEST(PluginDataHostObjectTest, ParserSetTargetRedirectsPushToSecondary) { - // Same test as above but for the parser-scoped host, which is the one the - // streaming worker actually drives (parser-bound topic id captured at - // bind() — invariant under the swap because the manager has registered - // the topic in both stores via lockstep registerTopic). - ObjectStore primary; - ObjectStore secondary; - // Pre-register on both stores in lockstep. Both auto-assign the same id. - const auto primary_topic = - *primary.registerTopic({.dataset_id = kDatasetId, .topic_name = "cam", .metadata_json = "{}"}); - const auto secondary_topic = - *secondary.registerTopic({.dataset_id = kDatasetId, .topic_name = "cam", .metadata_json = "{}"}); - ASSERT_EQ(primary_topic.id, secondary_topic.id); - - DatastoreParserObjectWriteHost host_impl{primary, primary_topic.id}; - sdk::ParserObjectWriteHostView host{host_impl.raw()}; - - const std::vector payload_a{0x01}; - const std::vector payload_b{0x02}; - const std::vector payload_c{0x03}; - - // Push before swap goes to primary. - ASSERT_TRUE(host.pushOwned(1000, payload_a).has_value()); - EXPECT_EQ(primary.entryCount(primary_topic), 1U); - EXPECT_EQ(secondary.entryCount(secondary_topic), 0U); - - // Swap and push: now lands in secondary, primary untouched. - host_impl.setTarget(&secondary); - ASSERT_TRUE(host.pushOwned(2000, payload_b).has_value()); - EXPECT_EQ(primary.entryCount(primary_topic), 1U); - EXPECT_EQ(secondary.entryCount(secondary_topic), 1U); - - // Swap back: resumes pushing into primary. - host_impl.setTarget(&primary); - ASSERT_TRUE(host.pushOwned(3000, payload_c).has_value()); - EXPECT_EQ(primary.entryCount(primary_topic), 2U); - EXPECT_EQ(secondary.entryCount(secondary_topic), 1U); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/plugin_host_read_test.cpp b/pj_datastore/tests/plugin_host_read_test.cpp deleted file mode 100644 index d0e74e26..00000000 --- a/pj_datastore/tests/plugin_host_read_test.cpp +++ /dev/null @@ -1,453 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include -#include -#include -#include - -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -using namespace PJ::sdk; - -struct Fixture { - DataEngine engine; - ObjectStore object_store; - DatastoreToolboxHost toolbox_impl{engine, object_store}; - ToolboxHostView toolbox{toolbox_impl.raw()}; -}; - -TEST(PluginDataHostReadTest, CatalogSnapshotIsDeterministicAndIncludesSchemaBackedTopics) { - Fixture f; - DataWriter writer(f.engine); - - auto schema = - makeStruct("pose", {makePrimitive("x", PrimitiveType::kFloat32), makePrimitive("y", PrimitiveType::kInt16)}); - const auto schema_id = *writer.registerSchema("pose_schema", schema); - const auto source_a = *f.toolbox.createDataSource("robot_b"); - const auto source_b = *f.toolbox.createDataSource("robot_a"); - - TopicDescriptor desc; - desc.name = "pose"; - desc.schema_id = schema_id; - ASSERT_TRUE(writer.registerTopic(source_a.id, desc).has_value()); - - auto topic_b = *f.toolbox.ensureTopic(source_b, "imu"); - ASSERT_TRUE(f.toolbox.ensureField(topic_b, "ax", PrimitiveType::kFloat32).has_value()); - - auto snapshot_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_or.has_value()); - auto snapshot = std::move(snapshot_or.value()); - ASSERT_EQ(snapshot.dataSources().size(), 2U); - EXPECT_EQ(snapshot.dataSources()[0].handle.id, source_a.id); - EXPECT_EQ(snapshot.dataSources()[1].handle.id, source_b.id); - - std::vector field_names; - for (const auto& field : snapshot.fields()) { - field_names.push_back(toStringView(field.name)); - } - EXPECT_NE(std::find(field_names.begin(), field_names.end(), "x"), field_names.end()); - EXPECT_NE(std::find(field_names.begin(), field_names.end(), "y"), field_names.end()); - EXPECT_NE(std::find(field_names.begin(), field_names.end(), "ax"), field_names.end()); -} - -TEST(PluginDataHostReadTest, CatalogSnapshotMustBeReacquiredAfterStructuralMutation) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - ASSERT_TRUE(f.toolbox.ensureField(topic, "a", PrimitiveType::kFloat64).has_value()); - - auto snapshot_before_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_before_or.has_value()); - auto snapshot_before = std::move(snapshot_before_or.value()); - EXPECT_EQ(snapshot_before.fields().size(), 1U); - - ASSERT_TRUE(f.toolbox.ensureField(topic, "b", PrimitiveType::kFloat64).has_value()); - - auto snapshot_after_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_after_or.has_value()); - auto snapshot_after = std::move(snapshot_after_or.value()); - EXPECT_EQ(snapshot_after.fields().size(), 2U); -} - -TEST(PluginDataHostReadTest, ReadSeriesPreservesExactPrimitiveTypesAndNulls) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto i8 = *f.toolbox.ensureField(topic, "i8", PrimitiveType::kInt8); - const auto u32 = *f.toolbox.ensureField(topic, "u32", PrimitiveType::kUint32); - const auto u64 = *f.toolbox.ensureField(topic, "u64", PrimitiveType::kUint64); - const auto flag = *f.toolbox.ensureField(topic, "flag", PrimitiveType::kBool); - const auto label = *f.toolbox.ensureField(topic, "label", PrimitiveType::kString); - - const std::vector row1 = { - {.name = "i8", .value = int8_t{-5}}, - {.name = "u32", .value = uint32_t{123456}}, - {.name = "u64", .value = uint64_t{(uint64_t{1} << 60) + 7}}, - {.name = "flag", .value = true}, - {.name = "label", .value = std::string_view("alpha")}, - }; - const std::vector row2 = { - {.name = "i8", .value = PJ::kNull}, - {.name = "u32", .value = uint32_t{42}}, - {.name = "u64", .value = uint64_t{9}}, - {.name = "flag", .value = false}, - {.name = "label", .value = std::string_view("beta")}, - }; - ASSERT_TRUE(f.toolbox.appendRecord(topic, 1, row1).has_value()); - ASSERT_TRUE(f.toolbox.appendRecord(topic, 2, row2).has_value()); - f.toolbox_impl.flushPending(); - - auto i8_series_or = f.toolbox.readSeries(i8); - ASSERT_TRUE(i8_series_or.has_value()); - auto i8_series = std::move(i8_series_or.value()); - ASSERT_EQ(i8_series.type(), PrimitiveType::kInt8); - ASSERT_EQ(i8_series.timestamps().size(), 2U); - EXPECT_EQ(i8_series.raw().values.as_int8[0], -5); - EXPECT_EQ(i8_series.raw().values.as_int8[1], 0); - EXPECT_EQ(i8_series.raw().validity_bits[0] & 0b10U, 0U); - - auto u32_series_or = f.toolbox.readSeries(u32); - ASSERT_TRUE(u32_series_or.has_value()); - auto u32_series = std::move(u32_series_or.value()); - ASSERT_EQ(u32_series.type(), PrimitiveType::kUint32); - EXPECT_EQ(u32_series.raw().values.as_uint32[0], 123456U); - EXPECT_EQ(u32_series.raw().values.as_uint32[1], 42U); - - auto u64_series_or = f.toolbox.readSeries(u64); - ASSERT_TRUE(u64_series_or.has_value()); - auto u64_series = std::move(u64_series_or.value()); - ASSERT_EQ(u64_series.type(), PrimitiveType::kUint64); - EXPECT_EQ(u64_series.raw().values.as_uint64[0], (uint64_t{1} << 60) + 7); - EXPECT_EQ(u64_series.raw().values.as_uint64[1], 9U); - - auto flag_series_or = f.toolbox.readSeries(flag); - ASSERT_TRUE(flag_series_or.has_value()); - auto flag_series = std::move(flag_series_or.value()); - ASSERT_EQ(flag_series.type(), PrimitiveType::kBool); - EXPECT_EQ(flag_series.raw().values.as_bool[0], 1U); - EXPECT_EQ(flag_series.raw().values.as_bool[1], 0U); - - auto label_series_or = f.toolbox.readSeries(label); - ASSERT_TRUE(label_series_or.has_value()); - auto label_series = std::move(label_series_or.value()); - ASSERT_EQ(label_series.type(), PrimitiveType::kString); - ASSERT_EQ(label_series.raw().values.as_string.offset_count, 3U); - const auto bytes = - std::string_view(label_series.raw().values.as_string.bytes, label_series.raw().values.as_string.byte_count); - EXPECT_EQ(bytes, "alphabeta"); -} - -TEST(PluginDataHostReadTest, ReadSeriesRejectsUnknownField) { - Fixture f; - const FieldHandle bad_field{.topic = TopicHandle{.id = 999}, .id = 1}; - const auto result = f.toolbox.readSeries(bad_field); - EXPECT_FALSE(result.has_value()); -} - -// --------------------------------------------------------------------------- -// Catalog edge cases -// --------------------------------------------------------------------------- - -TEST(PluginDataHostReadTest, EmptyCatalogSnapshotReturnsZeroCounts) { - Fixture f; - auto snapshot_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_or.has_value()); - auto snapshot = std::move(snapshot_or.value()); - EXPECT_EQ(snapshot.dataSources().size(), 0U); - EXPECT_EQ(snapshot.topics().size(), 0U); - EXPECT_EQ(snapshot.fields().size(), 0U); -} - -TEST(PluginDataHostReadTest, FieldHandleTopicBindingMatchesContainingTopic) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - ASSERT_TRUE(f.toolbox.ensureField(topic, "a", PrimitiveType::kFloat32).has_value()); - ASSERT_TRUE(f.toolbox.ensureField(topic, "b", PrimitiveType::kInt32).has_value()); - - auto snapshot_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_or.has_value()); - auto snapshot = std::move(snapshot_or.value()); - - for (const auto& topic_info : snapshot.topics()) { - for (uint32_t i = topic_info.first_field; i < topic_info.first_field + topic_info.field_count; ++i) { - ASSERT_LT(i, snapshot.fields().size()); - EXPECT_EQ(snapshot.fields()[i].handle.topic.id, topic_info.handle.id) - << "field index " << i << " should belong to topic " << topic_info.handle.id; - } - } -} - -TEST(PluginDataHostReadTest, CatalogSnapshotTopicOrderIsStableWithinSource) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - ASSERT_TRUE(f.toolbox.ensureTopic(source, "b").has_value()); - ASSERT_TRUE(f.toolbox.ensureTopic(source, "a").has_value()); - - auto snapshot_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_or.has_value()); - auto snapshot = std::move(snapshot_or.value()); - ASSERT_EQ(snapshot.topics().size(), 2U); - EXPECT_LT(snapshot.topics()[0].handle.id, snapshot.topics()[1].handle.id); -} - -TEST(PluginDataHostReadTest, ReadSeriesRejectsUnknownTopic) { - Fixture f; - const FieldHandle bad_field{.topic = TopicHandle{.id = 999}, .id = 0}; - const auto result = f.toolbox.readSeries(bad_field); - EXPECT_FALSE(result.has_value()); -} - -// --------------------------------------------------------------------------- -// Type-specific reads -// --------------------------------------------------------------------------- - -TEST(PluginDataHostReadTest, ReadSeriesFloat32RoundTrip) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kFloat32); - - const float values[] = {1.5F, -2.25F, 0.0F}; - for (int i = 0; i < 3; ++i) { - const std::vector fields = {{.name = "val", .value = values[i]}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, i, fields).has_value()); - } - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kFloat32); - ASSERT_EQ(series.timestamps().size(), 3U); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[0], 1.5F); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[1], -2.25F); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[2], 0.0F); -} - -TEST(PluginDataHostReadTest, ReadSeriesFloat64RoundTrip) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kFloat64); - - const double values[] = {1e-15, -3.14159265358979, 1e+300}; - for (int i = 0; i < 3; ++i) { - const std::vector fields = {{.name = "val", .value = values[i]}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, i, fields).has_value()); - } - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kFloat64); - ASSERT_EQ(series.timestamps().size(), 3U); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[0], 1e-15); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[1], -3.14159265358979); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[2], 1e+300); -} - -TEST(PluginDataHostReadTest, ReadSeriesInt32RoundTrip) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kInt32); - - const int32_t values[] = {-1, 0, INT32_MAX}; - for (int i = 0; i < 3; ++i) { - const std::vector fields = {{.name = "val", .value = values[i]}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, i, fields).has_value()); - } - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kInt32); - ASSERT_EQ(series.timestamps().size(), 3U); - EXPECT_EQ(series.raw().values.as_int32[0], -1); - EXPECT_EQ(series.raw().values.as_int32[1], 0); - EXPECT_EQ(series.raw().values.as_int32[2], INT32_MAX); -} - -TEST(PluginDataHostReadTest, ReadSeriesInt64RoundTrip) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kInt64); - - // Keep spread within uint32_t range to avoid Frame-of-Reference offset overflow. - const int64_t values[] = {-500'000'000LL, 0, 500'000'000LL}; - for (int i = 0; i < 3; ++i) { - const std::vector fields = {{.name = "val", .value = values[i]}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, i, fields).has_value()); - } - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kInt64); - ASSERT_EQ(series.timestamps().size(), 3U); - EXPECT_EQ(series.raw().values.as_int64[0], -500'000'000LL); - EXPECT_EQ(series.raw().values.as_int64[1], 0); - EXPECT_EQ(series.raw().values.as_int64[2], 500'000'000LL); -} - -TEST(PluginDataHostReadTest, ReadSeriesStringWithNulls) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "label", PrimitiveType::kString); - - const std::vector row0 = {{.name = "label", .value = std::string_view("abc")}}; - const std::vector row1 = {{.name = "label", .value = PJ::kNull}}; - const std::vector row2 = {{.name = "label", .value = std::string_view("de")}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, 0, row0).has_value()); - ASSERT_TRUE(f.toolbox.appendRecord(topic, 1, row1).has_value()); - ASSERT_TRUE(f.toolbox.appendRecord(topic, 2, row2).has_value()); - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kString); - ASSERT_EQ(series.timestamps().size(), 3U); - - // 3 rows → 4 offsets (rows+1) - EXPECT_EQ(series.raw().values.as_string.offset_count, 4U); - - // The null row (index 1) should have its validity bit cleared. - // Bit 1 in byte 0: mask is 0b10. - EXPECT_EQ(series.raw().validity_bits[0] & 0b010U, 0U); - - // Total bytes = "abc" + "de" = 5 (null row contributes zero bytes). - EXPECT_EQ(series.raw().values.as_string.byte_count, 5U); -} - -// --------------------------------------------------------------------------- -// Boundary conditions -// --------------------------------------------------------------------------- - -TEST(PluginDataHostReadTest, ReadSeriesEmptyField) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kFloat64); - - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - EXPECT_EQ(series.timestamps().size(), 0U); -} - -TEST(PluginDataHostReadTest, ReadSeriesMultiChunkFloat64) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kFloat64); - - constexpr int kRowCount = 1100; // > 1024 default chunk size - for (int i = 0; i < kRowCount; ++i) { - const std::vector fields = {{.name = "val", .value = double(i) * 0.1}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, i, fields).has_value()); - } - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kFloat64); - ASSERT_EQ(series.timestamps().size(), static_cast(kRowCount)); - - EXPECT_EQ(series.timestamps()[0], 0); - EXPECT_EQ(series.timestamps()[kRowCount - 1], kRowCount - 1); - EXPECT_EQ(series.timestamps()[550], 550); - - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[0], 0.0); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[kRowCount - 1], double(kRowCount - 1) * 0.1); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[550], 550.0 * 0.1); -} - -// --------------------------------------------------------------------------- -// SDK RAII tests -// --------------------------------------------------------------------------- - -TEST(PluginDataHostReadTest, CatalogSnapshotDefaultConstructorIsEmpty) { - CatalogSnapshot snapshot; - EXPECT_EQ(snapshot.dataSources().size(), 0U); - EXPECT_EQ(snapshot.topics().size(), 0U); - EXPECT_EQ(snapshot.fields().size(), 0U); - // Destructor runs safely — no crash. -} - -TEST(PluginDataHostReadTest, CatalogSnapshotMoveTransfersOwnership) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - ASSERT_TRUE(f.toolbox.ensureTopic(source, "t").has_value()); - - auto original_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(original_or.has_value()); - auto original = std::move(original_or.value()); - ASSERT_GT(original.dataSources().size(), 0U); - - // Move-construct. - CatalogSnapshot moved(std::move(original)); - EXPECT_GT(moved.dataSources().size(), 0U); - EXPECT_EQ(original.dataSources().size(), 0U); // NOLINT(bugprone-use-after-move) - - // Move-assign. - CatalogSnapshot assigned; - assigned = std::move(moved); - EXPECT_GT(assigned.dataSources().size(), 0U); - EXPECT_EQ(moved.dataSources().size(), 0U); // NOLINT(bugprone-use-after-move) -} - -TEST(PluginDataHostReadTest, MaterializedSeriesMoveTransfersOwnership) { - Fixture f; - const auto source = *f.toolbox.createDataSource("src"); - const auto topic = *f.toolbox.ensureTopic(source, "data"); - const auto field = *f.toolbox.ensureField(topic, "val", PrimitiveType::kFloat64); - - const std::vector fields = {{.name = "val", .value = 3.14}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, 1, fields).has_value()); - f.toolbox_impl.flushPending(); - - auto original_or = f.toolbox.readSeries(field); - ASSERT_TRUE(original_or.has_value()); - auto original = std::move(original_or.value()); - ASSERT_EQ(original.timestamps().size(), 1U); - ASSERT_EQ(original.type(), PrimitiveType::kFloat64); - - // Move-construct. - MaterializedSeries moved(std::move(original)); - EXPECT_EQ(moved.type(), PrimitiveType::kFloat64); - EXPECT_EQ(moved.timestamps().size(), 1U); - EXPECT_EQ(original.timestamps().size(), 0U); // NOLINT(bugprone-use-after-move) - - // Move-assign. - MaterializedSeries assigned; - assigned = std::move(moved); - EXPECT_EQ(assigned.type(), PrimitiveType::kFloat64); - EXPECT_EQ(assigned.timestamps().size(), 1U); - EXPECT_EQ(moved.timestamps().size(), 0U); // NOLINT(bugprone-use-after-move) -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/plugin_host_write_test.cpp b/pj_datastore/tests/plugin_host_write_test.cpp deleted file mode 100644 index a027de47..00000000 --- a/pj_datastore/tests/plugin_host_write_test.cpp +++ /dev/null @@ -1,775 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include -#include -#include - -#include "nanoarrow/nanoarrow.h" -#include "nanoarrow/nanoarrow.hpp" -#include "nanoarrow/nanoarrow_ipc.h" -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" - -namespace PJ { -namespace { - -using namespace PJ::sdk; - -std::vector serializeToIpc(ArrowSchema* schema, ArrowArray* array) { - ArrowBuffer out_buf; - ArrowBufferInit(&out_buf); - - ArrowIpcOutputStream out_stream; - EXPECT_EQ(ArrowIpcOutputStreamInitBuffer(&out_stream, &out_buf), NANOARROW_OK); - - ArrowIpcWriter writer; - EXPECT_EQ(ArrowIpcWriterInit(&writer, &out_stream), NANOARROW_OK); - - ArrowError error; - EXPECT_EQ(ArrowIpcWriterWriteSchema(&writer, schema, &error), NANOARROW_OK) << error.message; - - nanoarrow::UniqueArrayView view; - EXPECT_EQ(ArrowArrayViewInitFromSchema(view.get(), schema, nullptr), NANOARROW_OK); - EXPECT_EQ(ArrowArrayViewSetArray(view.get(), array, nullptr), NANOARROW_OK); - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, view.get(), &error), NANOARROW_OK) << error.message; - EXPECT_EQ(ArrowIpcWriterWriteArrayView(&writer, nullptr, &error), NANOARROW_OK); - - ArrowIpcWriterReset(&writer); - - std::vector result(static_cast(out_buf.size_bytes)); - std::memcpy(result.data(), out_buf.data, result.size()); - ArrowBufferReset(&out_buf); - return result; -} - -struct Fixture { - DataEngine engine; - ObjectStore object_store; - DatastoreToolboxHost toolbox_impl{engine, object_store}; - ToolboxHostView toolbox{toolbox_impl.raw()}; -}; - -TEST(PluginDataHostWriteTest, SourceHostWritesWithinBoundDataSource) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat32); - EXPECT_EQ(field.topic.id, topic.id); - - const std::vector fields = {{.name = "ax", .value = 1.25F}}; - ASSERT_TRUE(writer.appendRecord(topic, 10, fields).has_value()); - source_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kFloat32); - ASSERT_EQ(series.timestamps().size(), 1U); - EXPECT_EQ(series.timestamps()[0], 10); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[0], 1.25F); -} - -TEST(PluginDataHostWriteTest, AppendRecordRejectsTypeMismatch) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - ASSERT_TRUE(writer.ensureField(topic, "ax", PrimitiveType::kInt32).has_value()); - - const std::vector fields = {{.name = "ax", .value = int16_t{7}}}; - const auto status = writer.appendRecord(topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordFastRejectsUnknownFieldHandle) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - ASSERT_TRUE(writer.ensureField(topic, "ax", PrimitiveType::kFloat64).has_value()); - - const FieldHandle bad_field{.topic = topic, .id = 999}; - const std::vector fields = {{.field = bad_field, .value = 1.0}}; - const auto status = writer.appendBoundRecord(topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, ParserHostIsBoundToSingleTopic) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - const auto topic = *f.toolbox.ensureTopic(source, "packets"); - const auto other_topic = *f.toolbox.ensureTopic(source, "other"); - const auto foreign_field = *f.toolbox.ensureField(other_topic, "count", PrimitiveType::kInt32); - - DatastoreParserWriteHost parser_impl(f.engine, topic); - ParserWriteHostView parser(parser_impl.raw()); - - const auto count_field = *parser.ensureField("count", PrimitiveType::kInt32); - const std::vector good_fields = {{.name = "count", .value = int32_t{42}}}; - ASSERT_TRUE(parser.appendRecord(100, good_fields).has_value()); - - const std::vector bad_fields = {{.field = foreign_field, .value = int32_t{9}}}; - const auto status = parser.appendBoundRecord(101, bad_fields); - EXPECT_FALSE(status.has_value()); - - parser_impl.flushPending(); - auto series_or = f.toolbox.readSeries(count_field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.timestamps().size(), 1U); - EXPECT_EQ(series.raw().values.as_int32[0], 42); -} - -TEST(PluginDataHostWriteTest, ToolboxCanWriteIntoExistingDataSource) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - const auto topic = *f.toolbox.ensureTopic(source, "labels"); - const auto field = *f.toolbox.ensureField(topic, "name", PrimitiveType::kString); - - const std::vector fields = {{.name = "name", .value = std::string_view("hello")}}; - ASSERT_TRUE(f.toolbox.appendRecord(topic, 5, fields).has_value()); - f.toolbox_impl.flushPending(); - - auto series_or = f.toolbox.readSeries(field); - ASSERT_TRUE(series_or.has_value()); - auto series = std::move(series_or.value()); - ASSERT_EQ(series.type(), PrimitiveType::kString); - ASSERT_EQ(series.raw().values.as_string.offset_count, 2U); - const auto bytes = std::string_view(series.raw().values.as_string.bytes, series.raw().values.as_string.byte_count); - EXPECT_EQ(bytes, "hello"); -} - -TEST(PluginDataHostWriteTest, ArrowIpcPreservesExactNarrowPrimitiveTypes) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("narrow"); - - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 4), NANOARROW_OK); - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "_timestamp"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT64), NANOARROW_OK); - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "i8"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_INT8), NANOARROW_OK); - ArrowSchemaInit(schema->children[2]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[2], "u16"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[2], NANOARROW_TYPE_UINT16), NANOARROW_OK); - ArrowSchemaInit(schema->children[3]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[3], "u32"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[3], NANOARROW_TYPE_UINT32), NANOARROW_OK); - - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - - for (int64_t i = 0; i < 3; ++i) { - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 100 + i), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[1], -5 + i), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendUInt(array->children[2], 1000 + static_cast(i)), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendUInt(array->children[3], 70000 + static_cast(i)), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - } - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - const auto ipc = serializeToIpc(schema.get(), array.get()); - ASSERT_TRUE(writer.appendArrowIpc(topic, ipc).has_value()); - source_impl.flushPending(); - - auto snapshot_or = f.toolbox.catalogSnapshot(); - ASSERT_TRUE(snapshot_or.has_value()); - auto snapshot = std::move(snapshot_or.value()); - PrimitiveType i8_type = PrimitiveType::kFloat64; - PrimitiveType u16_type = PrimitiveType::kFloat64; - PrimitiveType u32_type = PrimitiveType::kFloat64; - for (const auto& field : snapshot.fields()) { - const auto name = toStringView(field.name); - if (name == "i8") { - i8_type = fromAbiType(field.type); - } else if (name == "u16") { - u16_type = fromAbiType(field.type); - } else if (name == "u32") { - u32_type = fromAbiType(field.type); - } - } - - EXPECT_EQ(i8_type, PrimitiveType::kInt8); - EXPECT_EQ(u16_type, PrimitiveType::kUint16); - EXPECT_EQ(u32_type, PrimitiveType::kUint32); -} - -TEST(PluginDataHostWriteTest, CreateDataSourceReturnsDistinctHandles) { - Fixture f; - const auto source_a = *f.toolbox.createDataSource("alpha"); - const auto source_b = *f.toolbox.createDataSource("beta"); - EXPECT_NE(source_a.id, source_b.id); -} - -TEST(PluginDataHostWriteTest, EnsureTopicIsIdempotent) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic1 = *writer.ensureTopic("imu"); - const auto topic2 = *writer.ensureTopic("imu"); - EXPECT_EQ(topic1.id, topic2.id); -} - -TEST(PluginDataHostWriteTest, EnsureTopicDifferentNamesYieldDifferentIds) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic_a = *writer.ensureTopic("imu"); - const auto topic_b = *writer.ensureTopic("gps"); - EXPECT_NE(topic_a.id, topic_b.id); -} - -TEST(PluginDataHostWriteTest, EnsureFieldIsIdempotentForSameType) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field1 = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - const auto field2 = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - EXPECT_EQ(field1.id, field2.id); - EXPECT_EQ(field1.topic.id, field2.topic.id); -} - -TEST(PluginDataHostWriteTest, EnsureFieldRejectsTypeMismatchOnExistingField) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - ASSERT_TRUE(writer.ensureField(topic, "x", PrimitiveType::kFloat64).has_value()); - const auto result = writer.ensureField(topic, "x", PrimitiveType::kInt32); - EXPECT_FALSE(result.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordRejectsDuplicateFieldNames) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - ASSERT_TRUE(writer.ensureField(topic, "x", PrimitiveType::kInt32).has_value()); - - const std::vector fields = { - {.name = "x", .value = int32_t{1}}, - {.name = "x", .value = int32_t{2}}, - }; - const auto status = writer.appendRecord(topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordRejectsInvalidTopicHandle) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const TopicHandle bad_topic{.id = 999}; - const std::vector fields = {{.name = "x", .value = int32_t{1}}}; - const auto status = writer.appendRecord(bad_topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordSparseFieldsProduceNullsForMissingColumns) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field_a = *writer.ensureField(topic, "a", PrimitiveType::kFloat64); - const auto field_b = *writer.ensureField(topic, "b", PrimitiveType::kFloat64); - - // Append only field "a", leaving "b" missing. - const std::vector fields = {{.name = "a", .value = double{3.14}}}; - ASSERT_TRUE(writer.appendRecord(topic, 100, fields).has_value()); - source_impl.flushPending(); - - auto series_b = std::move(*f.toolbox.readSeries(field_b)); - ASSERT_EQ(series_b.timestamps().size(), 1U); - // Validity bit for row 0 should be cleared (null). - ASSERT_GE(series_b.validityBits().size(), 1U); - EXPECT_EQ(series_b.validityBits()[0] & 0x01, 0) << "Expected null for missing field b"; - - // Field "a" should have valid data. - auto series_a = std::move(*f.toolbox.readSeries(field_a)); - ASSERT_EQ(series_a.timestamps().size(), 1U); - EXPECT_DOUBLE_EQ(series_a.raw().values.as_float64[0], 3.14); -} - -TEST(PluginDataHostWriteTest, AppendRecordTimestampOnlyRow) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("heartbeat"); - const std::vector fields = {}; - const auto status = writer.appendRecord(topic, 42, fields); - // Just verify it does not crash. Accept either success or error. - (void)status; -} - -TEST(PluginDataHostWriteTest, AppendRecordFastHappyPath) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - - const std::vector fields = {{.field = field, .value = double{2.5}}}; - ASSERT_TRUE(writer.appendBoundRecord(topic, 10, fields).has_value()); - source_impl.flushPending(); - - auto series = std::move(*f.toolbox.readSeries(field)); - ASSERT_EQ(series.timestamps().size(), 1U); - EXPECT_EQ(series.timestamps()[0], 10); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[0], 2.5); -} - -TEST(PluginDataHostWriteTest, AppendRecordFastRejectsWrongTopic) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic_a = *writer.ensureTopic("imu"); - const auto topic_b = *writer.ensureTopic("gps"); - const auto field_a = *writer.ensureField(topic_a, "ax", PrimitiveType::kFloat64); - - // Use field from topic_a with topic_b. - const std::vector fields = {{.field = field_a, .value = double{1.0}}}; - const auto status = writer.appendBoundRecord(topic_b, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordFastRejectsDuplicateFieldIds) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - - const std::vector fields = { - {.field = field, .value = double{1.0}}, - {.field = field, .value = double{2.0}}, - }; - const auto status = writer.appendBoundRecord(topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordFastRejectsValueTypeMismatch) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - - // Pass int32_t value for a kFloat64 field. - const std::vector fields = {{.field = field, .value = int32_t{42}}}; - const auto status = writer.appendBoundRecord(topic, 1, fields); - EXPECT_FALSE(status.has_value()); -} - -TEST(PluginDataHostWriteTest, AppendRecordNullValueViaIsNull) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - - const std::vector fields = {{.name = "ax", .value = PJ::kNull}}; - ASSERT_TRUE(writer.appendRecord(topic, 10, fields).has_value()); - source_impl.flushPending(); - - auto series = std::move(*f.toolbox.readSeries(field)); - ASSERT_EQ(series.timestamps().size(), 1U); - ASSERT_GE(series.validityBits().size(), 1U); - EXPECT_EQ(series.validityBits()[0] & 0x01, 0) << "Expected null for NullValue field"; -} - -TEST(PluginDataHostWriteTest, AppendRecordFastNullValue) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - - const auto topic = *writer.ensureTopic("imu"); - const auto field = *writer.ensureField(topic, "ax", PrimitiveType::kFloat64); - - const std::vector fields = {{.field = field, .value = PJ::kNull}}; - ASSERT_TRUE(writer.appendBoundRecord(topic, 10, fields).has_value()); - source_impl.flushPending(); - - auto series = std::move(*f.toolbox.readSeries(field)); - ASSERT_EQ(series.timestamps().size(), 1U); - ASSERT_GE(series.validityBits().size(), 1U); - EXPECT_EQ(series.validityBits()[0] & 0x01, 0) << "Expected null for NullValue field via fast path"; -} - -TEST(PluginDataHostWriteTest, ArrowIpcFloat64RoundTrip) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("data"); - - // Build a struct schema: {_timestamp: INT64, value: DOUBLE} - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 2), NANOARROW_OK); - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "_timestamp"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT64), NANOARROW_OK); - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "value"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_DOUBLE), NANOARROW_OK); - - // Append 3 rows: {100,1.5}, {200,2.5}, {300,3.5} - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - const int64_t timestamps[] = {100, 200, 300}; - const double values[] = {1.5, 2.5, 3.5}; - for (int i = 0; i < 3; ++i) { - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], timestamps[i]), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendDouble(array->children[1], values[i]), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - } - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - const auto ipc = serializeToIpc(schema.get(), array.get()); - ASSERT_TRUE(writer.appendArrowIpc(topic, Span(ipc.data(), ipc.size())).has_value()); - source_impl.flushPending(); - - // Find the "value" field handle via catalog snapshot. - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - FieldHandle value_field{}; - bool found = false; - for (const auto& fi : snapshot.fields()) { - if (toStringView(fi.name) == "value") { - value_field = fi.handle; - found = true; - break; - } - } - ASSERT_TRUE(found) << "Field 'value' not found in catalog"; - - auto series = std::move(*f.toolbox.readSeries(value_field)); - ASSERT_EQ(series.type(), PrimitiveType::kFloat64); - ASSERT_EQ(series.timestamps().size(), 3U); - for (int i = 0; i < 3; ++i) { - EXPECT_EQ(series.timestamps()[static_cast(i)], timestamps[i]); - EXPECT_DOUBLE_EQ(series.raw().values.as_float64[i], values[i]); - } -} - -TEST(PluginDataHostWriteTest, ArrowIpcCustomTimestampColumnName) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("custom_ts"); - - // Build a struct schema: {ts: INT64, val: FLOAT} - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 2), NANOARROW_OK); - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "ts"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT64), NANOARROW_OK); - ArrowSchemaInit(schema->children[1]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[1], "val"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[1], NANOARROW_TYPE_FLOAT), NANOARROW_OK); - - // Append 2 rows. - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 10), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendDouble(array->children[1], 1.0), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 20), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendDouble(array->children[1], 2.0), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - const auto ipc = serializeToIpc(schema.get(), array.get()); - ASSERT_TRUE(writer.appendArrowIpc(topic, Span(ipc.data(), ipc.size()), "ts").has_value()); - source_impl.flushPending(); - - // Find the "val" field handle via catalog snapshot. - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - FieldHandle val_field{}; - bool found = false; - for (const auto& fi : snapshot.fields()) { - if (toStringView(fi.name) == "val") { - val_field = fi.handle; - found = true; - break; - } - } - ASSERT_TRUE(found) << "Field 'val' not found in catalog"; - - auto series = std::move(*f.toolbox.readSeries(val_field)); - ASSERT_EQ(series.type(), PrimitiveType::kFloat32); - ASSERT_EQ(series.timestamps().size(), 2U); - EXPECT_EQ(series.timestamps()[0], 10); - EXPECT_EQ(series.timestamps()[1], 20); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[0], 1.0F); - EXPECT_FLOAT_EQ(series.raw().values.as_float32[1], 2.0F); -} - -TEST(PluginDataHostWriteTest, ArrowIpcRejectsMissingTimestampColumn) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("no_ts"); - - // Build a struct schema with only a data column (no "_timestamp"). - nanoarrow::UniqueSchema schema; - ASSERT_EQ(ArrowSchemaInitFromType(schema.get(), NANOARROW_TYPE_STRUCT), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaAllocateChildren(schema.get(), 1), NANOARROW_OK); - ArrowSchemaInit(schema->children[0]); - ASSERT_EQ(ArrowSchemaSetName(schema->children[0], "data"), NANOARROW_OK); - ASSERT_EQ(ArrowSchemaSetType(schema->children[0], NANOARROW_TYPE_INT32), NANOARROW_OK); - - nanoarrow::UniqueArray array; - ASSERT_EQ(ArrowArrayInitFromSchema(array.get(), schema.get(), nullptr), NANOARROW_OK); - ASSERT_EQ(ArrowArrayStartAppending(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayAppendInt(array->children[0], 42), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishElement(array.get()), NANOARROW_OK); - ASSERT_EQ(ArrowArrayFinishBuildingDefault(array.get(), nullptr), NANOARROW_OK); - - const auto ipc = serializeToIpc(schema.get(), array.get()); - const auto status = writer.appendArrowIpc(topic, Span(ipc.data(), ipc.size())); - EXPECT_FALSE(status.has_value()) << "Expected error when timestamp column is missing"; -} - -// --------------------------------------------------------------------------- -// Late column addition (schema evolution) -// --------------------------------------------------------------------------- - -TEST(PluginDataHostWriteTest, LateColumnAddition) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("json"); - - // Row 1: only field "x" - const std::vector row1 = {{.name = "x", .value = 1.0}}; - ASSERT_TRUE(writer.appendRecord(topic, 10, row1).has_value()); - - // Row 2: "x" plus new field "y" — triggers auto-seal of chunk 1 - const std::vector row2 = { - {.name = "x", .value = 2.0}, - {.name = "y", .value = 3.0}, - }; - ASSERT_TRUE(writer.appendRecord(topic, 20, row2).has_value()); - source_impl.flushPending(); - - // Find field handles via catalog - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - FieldHandle x_field{}, y_field{}; - for (const auto& fi : snapshot.fields()) { - auto name = std::string_view(fi.name.data, fi.name.size); - if (name == "x") { - x_field = fi.handle; - } - if (name == "y") { - y_field = fi.handle; - } - } - - // x should have 2 rows across 2 chunks - auto x_series = std::move(*f.toolbox.readSeries(x_field)); - ASSERT_EQ(x_series.timestamps().size(), 2U); - EXPECT_DOUBLE_EQ(x_series.raw().values.as_float64[0], 1.0); - EXPECT_DOUBLE_EQ(x_series.raw().values.as_float64[1], 2.0); - - // y should have 1 row (only in chunk 2) - auto y_series = std::move(*f.toolbox.readSeries(y_field)); - ASSERT_EQ(y_series.timestamps().size(), 1U); - EXPECT_EQ(y_series.timestamps()[0], 20); - EXPECT_DOUBLE_EQ(y_series.raw().values.as_float64[0], 3.0); -} - -TEST(PluginDataHostWriteTest, UntypedNullForUnknownFieldIsSkipped) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("sparse"); - - // Row 1: "x" is non-null, "y" is untyped null (kNull) and never seen → skipped - const std::vector row = { - {.name = "x", .value = 1.0}, - {.name = "y", .value = PJ::kNull}, - }; - ASSERT_TRUE(writer.appendRecord(topic, 10, row).has_value()); - source_impl.flushPending(); - - // Only field "x" should exist - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - int field_count = 0; - for (const auto& fi : snapshot.fields()) { - auto name = std::string_view(fi.name.data, fi.name.size); - EXPECT_EQ(name, "x") << "unexpected field: " << name; - ++field_count; - } - EXPECT_EQ(field_count, 1); -} - -TEST(PluginDataHostWriteTest, TypedNullForUnknownFieldCreatesColumn) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("typed"); - - // Row 1: "x" non-null, "y" is a typed null (type known but value absent) - const std::vector row = { - {.name = "x", .value = 1.0}, - {.name = "y", .value = TypedNull{PrimitiveType::kFloat64}}, - }; - ASSERT_TRUE(writer.appendRecord(topic, 10, row).has_value()); - source_impl.flushPending(); - - // Both fields should exist - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - FieldHandle x_field{}, y_field{}; - int field_count = 0; - for (const auto& fi : snapshot.fields()) { - auto name = std::string_view(fi.name.data, fi.name.size); - if (name == "x") { - x_field = fi.handle; - } - if (name == "y") { - y_field = fi.handle; - } - ++field_count; - } - EXPECT_EQ(field_count, 2); - - // y should have 1 row with null value - auto y_series = std::move(*f.toolbox.readSeries(y_field)); - ASSERT_EQ(y_series.timestamps().size(), 1U); - ASSERT_GE(y_series.validityBits().size(), 1U); - EXPECT_EQ(y_series.validityBits()[0] & 0x01, 0) << "Expected null for TypedNull field"; -} - -TEST(PluginDataHostWriteTest, VariableLengthArraySimulation) { - Fixture f; - const auto source = *f.toolbox.createDataSource("sensor"); - - DatastoreSourceWriteHost source_impl(f.engine, source); - SourceWriteHostView writer(source_impl.raw()); - const auto topic = *writer.ensureTopic("varlen"); - - // Row 1: 2-element array - const std::vector row1 = { - {.name = "data[0]", .value = 10.0}, - {.name = "data[1]", .value = 20.0}, - }; - ASSERT_TRUE(writer.appendRecord(topic, 100, row1).has_value()); - - // Row 2: 4-element array — data[2] and data[3] are new columns - const std::vector row2 = { - {.name = "data[0]", .value = 11.0}, - {.name = "data[1]", .value = 21.0}, - {.name = "data[2]", .value = 31.0}, - {.name = "data[3]", .value = 41.0}, - }; - ASSERT_TRUE(writer.appendRecord(topic, 200, row2).has_value()); - source_impl.flushPending(); - - // Find field handles - auto snapshot = std::move(*f.toolbox.catalogSnapshot()); - FieldHandle d0{}, d2{}; - for (const auto& fi : snapshot.fields()) { - auto name = std::string_view(fi.name.data, fi.name.size); - if (name == "data[0]") { - d0 = fi.handle; - } - if (name == "data[2]") { - d2 = fi.handle; - } - } - - // data[0] should have 2 rows across 2 chunks - auto s0 = std::move(*f.toolbox.readSeries(d0)); - ASSERT_EQ(s0.timestamps().size(), 2U); - EXPECT_DOUBLE_EQ(s0.raw().values.as_float64[0], 10.0); - EXPECT_DOUBLE_EQ(s0.raw().values.as_float64[1], 11.0); - - // data[2] should have 1 row (only in chunk 2) - auto s2 = std::move(*f.toolbox.readSeries(d2)); - ASSERT_EQ(s2.timestamps().size(), 1U); - EXPECT_EQ(s2.timestamps()[0], 200); - EXPECT_DOUBLE_EQ(s2.raw().values.as_float64[0], 31.0); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/plugin_parser_object_write_test.cpp b/pj_datastore/tests/plugin_parser_object_write_test.cpp deleted file mode 100644 index 19068efe..00000000 --- a/pj_datastore/tests/plugin_parser_object_write_test.cpp +++ /dev/null @@ -1,221 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// Phase 3 — verify that a parser can resolve both the scalar and -// object write hosts from the service registry and write to each from -// a single parse() call. Exercises the service-registry composition -// path without the host-side delegated-ingest wiring (that lives in -// pj_plugins and lands with the MCAP port). - -#include - -#include -#include -#include -#include -#include - -#include "pj_base/sdk/plugin_data_api.hpp" -#include "pj_base/sdk/service_registry.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/object_store.hpp" -#include "pj_datastore/plugin_data_host.hpp" -#include "pj_plugins/sdk/message_parser_plugin_base.hpp" - -namespace PJ { -namespace { - -using sdk::ObjectBytes; -using sdk::ObjectTopicHandle; -using sdk::ParserObjectWriteHostService; -using sdk::ParserObjectWriteHostView; -using sdk::ParserWriteHostService; - -/// A mock parser that expects both hosts. parse() peels a trivial -/// "seq:;bytes:" envelope and writes seq to the scalar host -/// and the raw bytes to the object host. -class MediaParser : public MessageParserPluginBase { - public: - Status parse(Timestamp timestamp_ns, Span payload) override { - // Envelope: first 8 bytes little-endian seq; rest = bytes. - if (payload.size() < sizeof(uint64_t)) { - return unexpected("payload too small"); - } - uint64_t seq = 0; - std::memcpy(&seq, payload.data(), sizeof(uint64_t)); - Span body(payload.data() + sizeof(uint64_t), payload.size() - sizeof(uint64_t)); - - // 1. Scalar side — always required. - const std::vector fields = {{.name = "seq", .value = static_cast(seq)}}; - if (auto s = writeHost().appendRecord(timestamp_ns, fields); !s) { - return s; - } - - // 2. Object side — only if the host registered it. - if (auto* obj = objectWriteHost()) { - if (auto s = obj->pushOwned(timestamp_ns, body); !s) { - return s; - } - } - return okStatus(); - } -}; - -// Minimal implementation of PJ_service_registry_vtable_t for tests. -// Stores a static map of service name -> PJ_service_t fat pointer. -struct MockRegistryState { - std::unordered_map services; -}; - -bool mockGetService( - void* ctx, PJ_string_view_t name, uint32_t /*min_version*/, PJ_service_t* out_service, - PJ_error_t* out_error) noexcept { - auto* state = static_cast(ctx); - try { - std::string key(name.data, name.size); - auto it = state->services.find(key); - if (it == state->services.end()) { - if (out_error != nullptr) { - sdk::fillError(out_error, 1, "registry", "service not found"); - } - return false; - } - *out_service = it->second; - return true; - } catch (...) { - if (out_error != nullptr) { - sdk::fillError(out_error, 1, "registry", "exception in lookup"); - } - return false; - } -} - -TEST(ParserObjectWriteHostTest, ParserWritesToBothHostsFromOneParse) { - // Host setup: one scalar topic + one object topic. - DataEngine engine; - auto dataset_or = engine.createDataset(DatasetDescriptor{.source_name = "t", .time_domain_id = 0}); - ASSERT_TRUE(dataset_or.has_value()) << dataset_or.error(); - PJ_data_source_handle_t source_handle{static_cast(*dataset_or)}; - - // Scalar: ensure topic + DatastoreParserWriteHost bound to it. - DatastoreSourceWriteHost scalar_impl(engine, source_handle); - auto scalar_view = sdk::SourceWriteHostView{scalar_impl.raw()}; - const auto topic = *scalar_view.ensureTopic("media_topic"); - DatastoreParserWriteHost parser_write_impl(engine, topic); - - // Object: register topic in ObjectStore; bind DatastoreParserObjectWriteHost. - ObjectStore store; - DatastoreSourceObjectWriteHost obj_source(store, *dataset_or); - const auto obj_topic = - *sdk::SourceObjectWriteHostView{obj_source.raw()}.registerTopic("media_topic", R"({"media_class":"image"})"); - DatastoreParserObjectWriteHost parser_obj_impl(store, obj_topic.id); - - // Build the registry with both services. - MockRegistryState registry_state; - const auto scalar_raw = parser_write_impl.raw(); - const auto obj_raw = parser_obj_impl.raw(); - registry_state.services[ParserWriteHostService::kName] = PJ_service_t{scalar_raw.ctx, scalar_raw.vtable}; - registry_state.services[ParserObjectWriteHostService::kName] = PJ_service_t{obj_raw.ctx, obj_raw.vtable}; - - static const PJ_service_registry_vtable_t registry_vtable = { - PJ_PLUGIN_DATA_API_VERSION, - sizeof(PJ_service_registry_vtable_t), - mockGetService, - }; - const PJ_service_registry_t registry_raw{®istry_state, ®istry_vtable}; - - // Bind the parser through the SDK. - MediaParser parser; - ASSERT_TRUE(parser.bind(sdk::ServiceRegistry{registry_raw}).has_value()); - - // parse() one message: seq=7, payload=[0xAA 0xBB 0xCC]. - std::vector payload(sizeof(uint64_t) + 3); - uint64_t seq = 7; - std::memcpy(payload.data(), &seq, sizeof(uint64_t)); - payload[sizeof(uint64_t) + 0] = 0xAA; - payload[sizeof(uint64_t) + 1] = 0xBB; - payload[sizeof(uint64_t) + 2] = 0xCC; - - ASSERT_TRUE(parser.parse(100, Span(payload.data(), payload.size())).has_value()); - - // Object-store side: bytes landed. - auto resolved = store.latestAt(ObjectTopicId{obj_topic.id}, 100); - ASSERT_TRUE(resolved.has_value()); - ASSERT_NE(resolved->payload.anchor, nullptr); - const std::vector expected{0xAA, 0xBB, 0xCC}; - EXPECT_TRUE( - std::equal(resolved->payload.bytes.begin(), resolved->payload.bytes.end(), expected.begin(), expected.end())); - - // (Scalar side requires flushing + a read path; Phase-3 scope is proving - // both hosts were resolved and invoked. Scalar writes go into DataEngine - // and are covered by plugin_host_write_test's existing scalar tests.) -} - -TEST(ParserObjectWriteHostTest, ParserFallsBackToScalarOnlyWhenObjectServiceAbsent) { - DataEngine engine; - auto dataset_or = engine.createDataset(DatasetDescriptor{.source_name = "t", .time_domain_id = 0}); - ASSERT_TRUE(dataset_or.has_value()) << dataset_or.error(); - PJ_data_source_handle_t source_handle{static_cast(*dataset_or)}; - - DatastoreSourceWriteHost scalar_impl(engine, source_handle); - auto scalar_view = sdk::SourceWriteHostView{scalar_impl.raw()}; - const auto topic = *scalar_view.ensureTopic("scalar_only"); - DatastoreParserWriteHost parser_write_impl(engine, topic); - - MockRegistryState registry_state; - const auto scalar_raw = parser_write_impl.raw(); - registry_state.services[ParserWriteHostService::kName] = PJ_service_t{scalar_raw.ctx, scalar_raw.vtable}; - // Note: no ParserObjectWriteHostService registered. - - static const PJ_service_registry_vtable_t registry_vtable = { - PJ_PLUGIN_DATA_API_VERSION, - sizeof(PJ_service_registry_vtable_t), - mockGetService, - }; - const PJ_service_registry_t registry_raw{®istry_state, ®istry_vtable}; - - MediaParser parser; - ASSERT_TRUE(parser.bind(sdk::ServiceRegistry{registry_raw}).has_value()); - - // The parser's view into the object host is empty — it's the scalar-only - // path. parse() should take the non-media branch and still succeed. - std::vector payload(sizeof(uint64_t)); - uint64_t seq = 1; - std::memcpy(payload.data(), &seq, sizeof(uint64_t)); - ASSERT_TRUE(parser.parse(1, Span(payload.data(), payload.size())).has_value()); -} - -TEST(ParserObjectWriteHostTest, ObjectHostViewPushLazyThroughSdk) { - // Exercises the SDK pushLazy(Fetch&&) path for parsers — proves the - // heap-allocated LazyBox box is wired through the parser vtable. - ObjectStore store; - DatastoreSourceObjectWriteHost src(store, DatasetId{1}); - const auto topic = *sdk::SourceObjectWriteHostView{src.raw()}.registerTopic("lazy", "{}"); - - DatastoreParserObjectWriteHost parser_obj(store, topic.id); - ParserObjectWriteHostView view{parser_obj.raw()}; - - int fetch_calls = 0; - auto fetch = [&fetch_calls]() -> std::vector { - ++fetch_calls; - return {0xAA, 0xBB}; - }; - ASSERT_TRUE(view.pushLazy(10, fetch).has_value()); - - auto resolved = store.latestAt(ObjectTopicId{topic.id}, 10); - ASSERT_TRUE(resolved.has_value()); - const std::vector expected{0xAA, 0xBB}; - EXPECT_TRUE( - std::equal(resolved->payload.bytes.begin(), resolved->payload.bytes.end(), expected.begin(), expected.end())); - EXPECT_GE(fetch_calls, 1); -} - -TEST(ParserObjectWriteHostTest, UnboundViewReturnsError) { - ParserObjectWriteHostView empty; - EXPECT_FALSE(empty.valid()); - auto status = empty.pushOwned(0, {}); - EXPECT_FALSE(status.has_value()); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/query_test.cpp b/pj_datastore/tests/query_test.cpp deleted file mode 100644 index 62360f15..00000000 --- a/pj_datastore/tests/query_test.cpp +++ /dev/null @@ -1,300 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/query.hpp" - -#include - -#include -#include -#include - -#include "pj_base/types.hpp" -#include "pj_datastore/chunk.hpp" - -namespace PJ { -namespace { - -// Helper: build a test chunk with sequential timestamps. -TopicChunk make_test_chunk(Timestamp t_start, uint32_t num_rows, Timestamp step) { - std::vector cols = {{0, PrimitiveType::kFloat32, "value"}}; - TopicChunkBuilder builder(1, 1, cols, num_rows); - for (uint32_t i = 0; i < num_rows; ++i) { - Timestamp t = t_start + static_cast(i) * step; - builder.beginRow(t); - builder.set(0, static_cast(i) * 1.0f); - builder.finishRow(); - } - return builder.seal(); -} - -// Helper: build a chunk from an explicit (possibly non-uniform / duplicated) -// timestamp list. The column value equals the row index, so a returned -// row_index can be cross-checked against value. -TopicChunk make_chunk_from_timestamps(const std::vector& ts) { - std::vector cols = {{0, PrimitiveType::kFloat32, "value"}}; - TopicChunkBuilder builder(1, 1, cols, static_cast(ts.size())); - for (std::size_t i = 0; i < ts.size(); ++i) { - builder.beginRow(ts[i]); - builder.set(0, static_cast(i)); - builder.finishRow(); - } - return builder.seal(); -} - -// Build the standard 5-chunk test fixture: -// Chunk 0: t=[0, 90], step=10 -// Chunk 1: t=[100, 190], step=10 -// Chunk 2: t=[200, 290], step=10 -// Chunk 3: t=[300, 390], step=10 -// Chunk 4: t=[400, 490], step=10 -std::deque make_standard_chunks() { - std::deque chunks; - for (int i = 0; i < 5; ++i) { - chunks.push_back(make_test_chunk(static_cast(i) * 100, 10, 10)); - } - return chunks; -} - -// ========================================================================= -// Range query tests -// ========================================================================= - -TEST(QueryTest, RangeQuerySpanningTwoChunks) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 150, 250); - - std::vector timestamps; - while (cursor.valid()) { - timestamps.push_back(cursor.current().timestamp); - cursor.advance(); - } - - // Expected: 150, 160, 170, 180, 190, 200, 210, 220, 230, 240, 250 - ASSERT_EQ(timestamps.size(), 11u); - EXPECT_EQ(timestamps.front(), 150); - EXPECT_EQ(timestamps.back(), 250); - for (std::size_t i = 1; i < timestamps.size(); ++i) { - EXPECT_EQ(timestamps[i] - timestamps[i - 1], 10); - } -} - -TEST(QueryTest, RangeQueryWithinSingleChunk) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 100, 190); - - std::size_t count = 0; - while (cursor.valid()) { - count++; - cursor.advance(); - } - EXPECT_EQ(count, 10u); -} - -TEST(QueryTest, RangeQueryHittingNoChunks) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 500, 600); - EXPECT_FALSE(cursor.valid()); -} - -TEST(QueryTest, RangeQueryAllData) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 0, 490); - - std::size_t count = 0; - while (cursor.valid()) { - count++; - cursor.advance(); - } - EXPECT_EQ(count, 50u); -} - -TEST(QueryTest, RangeQueryExactChunkBoundary) { - auto chunks = make_standard_chunks(); - // query [100, 199] should return only samples from chunk 1: 100..190 - auto cursor = rangeQuery(chunks, 100, 199); - - std::vector timestamps; - while (cursor.valid()) { - timestamps.push_back(cursor.current().timestamp); - cursor.advance(); - } - - // Chunk 1 has t = 100, 110, ..., 190. All 10 are in [100, 199]. - ASSERT_EQ(timestamps.size(), 10u); - EXPECT_EQ(timestamps.front(), 100); - EXPECT_EQ(timestamps.back(), 190); -} - -TEST(QueryTest, ForEachCallback) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 200, 390); - - std::size_t count = 0; - cursor.forEach([&count](const SampleRow& /*row*/) { ++count; }); - EXPECT_EQ(count, 20u); // chunks 2 and 3, 10 rows each -} - -// ========================================================================= -// latest_at tests -// ========================================================================= - -TEST(QueryTest, LatestAtInMiddleOfChunk) { - auto chunks = make_standard_chunks(); - auto result = latestAt(chunks, 155); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 150); -} - -TEST(QueryTest, LatestAtExactTimestamp) { - auto chunks = make_standard_chunks(); - auto result = latestAt(chunks, 200); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 200); -} - -TEST(QueryTest, LatestAtBeforeAllData) { - auto chunks = make_standard_chunks(); - auto result = latestAt(chunks, -10); - EXPECT_FALSE(result.has_value()); -} - -TEST(QueryTest, LatestAtAfterAllData) { - auto chunks = make_standard_chunks(); - auto result = latestAt(chunks, 1000); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 490); -} - -TEST(QueryTest, LatestAtBetweenChunks) { - auto chunks = make_standard_chunks(); - // t=95 is between chunk 0 (t_max=90) and chunk 1 (t_min=100) - auto result = latestAt(chunks, 95); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 90); -} - -// ========================================================================= -// Binary-search edge cases (duplicate timestamps, shared chunk boundaries) -// ========================================================================= - -TEST(QueryTest, LatestAtWithDuplicateTimestampsReturnsLastDuplicate) { - std::deque chunks; - // Rows: 0 1 2 3 4 - chunks.push_back(make_chunk_from_timestamps({10, 20, 20, 20, 30})); - - auto result = latestAt(chunks, 20); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 20); - // upper_bound semantics: the last row with ts <= 20 is row index 3. - EXPECT_EQ(result->row_index, 3u); -} - -TEST(QueryTest, RangeQueryWithDuplicateTimestampsStartsAtFirstDuplicate) { - std::deque chunks; - // Rows: 0 1 2 3 4 - chunks.push_back(make_chunk_from_timestamps({10, 20, 20, 20, 30})); - - auto cursor = rangeQuery(chunks, 20, 20); - std::vector rows; - cursor.forEach([&](const SampleRow& row) { rows.push_back(row.row_index); }); - - // lower_bound semantics: starts at the first ts >= 20 (row 1) and includes - // every row with ts <= 20 (rows 1, 2, 3). - ASSERT_EQ(rows.size(), 3u); - EXPECT_EQ(rows.front(), 1u); - EXPECT_EQ(rows.back(), 3u); -} - -TEST(QueryTest, LatestAtAtSharedChunkBoundarySelectsLaterChunk) { - std::deque chunks; - chunks.push_back(make_chunk_from_timestamps({70, 80, 90})); // chunk A, t_max=90 - chunks.push_back(make_chunk_from_timestamps({90, 100, 110})); // chunk B, t_min=90 - - auto result = latestAt(chunks, 90); - ASSERT_TRUE(result.has_value()); - EXPECT_EQ(result->timestamp, 90); - // The boundary value 90 exists in both chunks; the later chunk (B, row 0) wins. - EXPECT_EQ(result->chunk, &chunks[1]); - EXPECT_EQ(result->row_index, 0u); -} - -TEST(QueryTest, RangeQuerySingleTimestampPoint) { - auto chunks = make_standard_chunks(); - // Degenerate inclusive range [200, 200] hits exactly one row. - auto cursor = rangeQuery(chunks, 200, 200); - std::vector timestamps; - cursor.forEach([&](const SampleRow& row) { timestamps.push_back(row.timestamp); }); - ASSERT_EQ(timestamps.size(), 1u); - EXPECT_EQ(timestamps.front(), 200); -} - -// ========================================================================= -// Empty deque tests -// ========================================================================= - -TEST(QueryTest, EmptyDequeRangeQuery) { - std::deque empty; - auto cursor = rangeQuery(empty, 0, 100); - EXPECT_FALSE(cursor.valid()); -} - -TEST(QueryTest, EmptyDequeLatestAt) { - std::deque empty; - auto result = latestAt(empty, 50); - EXPECT_FALSE(result.has_value()); -} - -// ========================================================================= -// for_each_chunk tests -// ========================================================================= - -TEST(QueryTest, ForEachChunkMatchesForEach) { - auto chunks = make_standard_chunks(); - - // Collect per-row results via for_each - auto cursor1 = rangeQuery(chunks, 150, 350); - std::vector per_row_ts; - cursor1.forEach([&](const SampleRow& row) { per_row_ts.push_back(row.timestamp); }); - - // Collect per-chunk results via for_each_chunk - auto cursor2 = rangeQuery(chunks, 150, 350); - std::vector chunk_ts; - cursor2.forEachChunk([&](const ChunkRowRange& range) { - for (std::size_t r = range.row_start; r < range.row_end; ++r) { - chunk_ts.push_back(range.chunk->readTimestamp(r)); - } - }); - - ASSERT_EQ(per_row_ts.size(), chunk_ts.size()); - for (std::size_t i = 0; i < per_row_ts.size(); ++i) { - EXPECT_EQ(per_row_ts[i], chunk_ts[i]) << "mismatch at index " << i; - } -} - -TEST(QueryTest, ForEachChunkAllData) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 0, 490); - - std::size_t total_rows = 0; - std::size_t chunk_count = 0; - cursor.forEachChunk([&](const ChunkRowRange& range) { - ++chunk_count; - total_rows += range.row_end - range.row_start; - }); - - EXPECT_EQ(total_rows, 50u); - EXPECT_EQ(chunk_count, 5u); -} - -TEST(QueryTest, ForEachChunkNoResults) { - auto chunks = make_standard_chunks(); - auto cursor = rangeQuery(chunks, 500, 600); - - std::size_t count = 0; - cursor.forEachChunk([&](const ChunkRowRange& /*range*/) { ++count; }); - EXPECT_EQ(count, 0u); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/regression_test.cpp b/pj_datastore/tests/regression_test.cpp deleted file mode 100644 index 3fa5e878..00000000 --- a/pj_datastore/tests/regression_test.cpp +++ /dev/null @@ -1,202 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -// Regression tests for bugs found during code review. -// -// Each test FAILS against the current (buggy) code and PASSES once the -// corresponding fix is applied. Run with ./build.sh --debug && ./test.sh -// (ASAN is required for Bug #2). - -#include - -#include -#include - -#include "pj_base/type_tree.hpp" -#include "pj_datastore/chunk.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/topic_storage.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -// --------------------------------------------------------------------------- -// Helper: build and seal a test chunk with given time range. -// Reuses the same pattern as topic_storage_test.cpp::make_test_chunk. -// --------------------------------------------------------------------------- - -TopicChunk makeChunkWithRange(TopicId tid, Timestamp t_start, Timestamp t_end, uint32_t num_rows) { - std::vector cols = {{0, PrimitiveType::kFloat32, "v"}}; - TopicChunkBuilder b(tid, /*schema_id=*/1, cols, num_rows); - Timestamp step = (num_rows > 1) ? (t_end - t_start) / static_cast(num_rows - 1) : 0; - for (uint32_t i = 0; i < num_rows; ++i) { - b.beginRow(t_start + static_cast(i) * step); - b.set(0, static_cast(i)); - b.finishRow(); - } - return b.seal(); -} - -// =========================================================================== -// Bug #1 — flush() with an in-progress row corrupts chunk stats -// -// writer.cpp:512 flush() checks rowCount() > 0 but not isRowInProgress(). -// chunk.cpp:77-78 beginRow() immediately updates stats_.t_min / stats_.t_max. -// -// When flush() is called after beginRow(200) but before finishRow(), the -// sealed chunk has stats_.t_max = 200 even though only the row at t = 100 -// was ever committed to timestamps_. -// =========================================================================== - -TEST(RegressionTest, Bug1_FlushWithRowInProgress_CorruptsChunkStats) { - DataEngine engine; - auto ds = *engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - DataWriter writer = engine.createWriter(); - - auto schema = makePrimitive("v", PrimitiveType::kFloat64); - auto sid = *writer.registerSchema("s", schema); - auto tid = *writer.registerTopic(ds, TopicDescriptor{.name = "t", .schema_id = sid}); - - // Commit one complete row at t=100. - ASSERT_TRUE(writer.beginRow(tid, 100).has_value()); - writer.set(tid, 0, 1.0); - ASSERT_TRUE(writer.finishRow(tid).has_value()); - - // Start a second row at t=200 but do NOT call finishRow(). - ASSERT_TRUE(writer.beginRow(tid, 200).has_value()); - writer.set(tid, 0, 2.0); - - // flush() should only seal the one committed row. - auto chunks = writer.flush(tid); - - ASSERT_EQ(chunks.size(), 1u); - EXPECT_EQ(chunks[0].stats.row_count, 1u); - // BUG: currently returns 200 — the in-progress row's timestamp leaked into stats_. - EXPECT_EQ(chunks[0].stats.t_max, 100); - EXPECT_EQ(chunks[0].timestamps.size(), 1u); -} - -// =========================================================================== -// Bug #2 — finishBulkAppend() underflows when a column has fewer rows -// than bulk_pending_rows_ -// -// chunk.cpp:249 -// const std::size_t first_row = columns_[col].rowCount() - count; -// -// If the caller appends N timestamps but only N-1 values to a column, -// rowCount() - count wraps to SIZE_MAX. The statistics loop then reads -// buf[SIZE_MAX], which ASAN catches as an out-of-bounds access. -// -// The test uses EXPECT_DEATH; it passes both before the fix (ASAN OOB) and -// after the fix (PJ_ASSERT). Its purpose is to document that this scenario -// must always fail rather than silently corrupting stats. -// =========================================================================== - -TEST(RegressionTest, Bug2_FinishBulkAppend_ColumnRowCountMismatch_TriggersUB) { - // Declare test data outside the EXPECT_DEATH block: the C preprocessor does - // not track {} when splitting macro arguments, so vector-initializer commas - // inside the block would be misinterpreted as extra macro arguments. - std::vector cols = {{1, PrimitiveType::kFloat32, "x"}}; - std::vector ts; - ts.push_back(100); - ts.push_back(200); - ts.push_back(300); - std::vector vals; - vals.push_back(1.0f); - vals.push_back(2.0f); // one short: 2 values for 3 timestamps - - // PJ_ASSERT behaviour differs by build configuration: - // Debug/ASAN (PJ_ASSERT_THROWS=ON, no NDEBUG): throws std::runtime_error - // RelWithDebInfo (no PJ_ASSERT_THROWS, NDEBUG): assert() compiled away → silent UB - // The assertion is only verifiable in debug builds. -#ifdef PJ_ASSERT_THROWS - EXPECT_THROW( - { - TopicChunkBuilder builder(/*topic_id=*/1, /*schema_id=*/1, cols, /*max_rows=*/100); - builder.appendTimestamps(ts); - builder.appendColumn(0, vals); // 2 rows appended, pending=3 - builder.finishBulkAppend(); // PJ_ASSERT fires → throws - }, - std::exception); -#else - // RelWithDebInfo: NDEBUG disables assert(), so the check cannot be observed at - // this build level. Debug builds provide the authoritative verification. - GTEST_SKIP() << "Bug #2 assertion not verifiable in Release (NDEBUG disables assert())"; -#endif -} - -// =========================================================================== -// Bug #3 — commitChunks() includes the rejected topic in the returned -// 'changed' set even when appendSealedChunk fails -// -// engine.cpp:141-145 -// PJ_ASSERT(status.has_value(), ...); // no-op in Release -// (void)status; -// changed.push_back(topic_id); // unconditional! -// -// When an out-of-order chunk is rejected by appendSealedChunk, the chunk is -// discarded but the topic is still added to 'changed'. DerivedEngine then -// marks dependent nodes dirty and schedules a spurious incremental run. -// =========================================================================== - -TEST(RegressionTest, Bug3_CommitChunks_ReportsChangedTopicOnRejectedChunk) { - DataEngine engine; - auto ds = *engine.createDataset(DatasetDescriptor{.source_name = "test", .time_domain_id = 0}); - DataWriter writer = engine.createWriter(); - - auto schema = makePrimitive("v", PrimitiveType::kFloat32); - auto sid = *writer.registerSchema("s", schema); - auto tid = *writer.registerTopic(ds, TopicDescriptor{.name = "t", .schema_id = sid}); - - // First commit: chunk at t=[100, 200]. Accepted. - std::vector> batch1; - batch1.emplace_back(tid, makeChunkWithRange(tid, 100, 200, 2)); - auto changed1 = engine.commitChunks(std::move(batch1)); - ASSERT_EQ(changed1.size(), 1u); - - // Second commit: chunk at t=[50, 150] — out of order (t_min=50 < last t_min=100). - // appendSealedChunk rejects it. commitChunks should return an empty changed list - // without throwing. - // - Debug (PJ_ASSERT_THROWS): currently throws std::runtime_error. - // - Release: currently returns {tid} (ASSERT is a no-op; push_back is unconditional). - // Both behaviours are bugs. The correct behaviour is: return empty, no throw. - std::vector> batch2; - batch2.emplace_back(tid, makeChunkWithRange(tid, 50, 150, 2)); - std::vector changed2; - ASSERT_NO_THROW(changed2 = engine.commitChunks(std::move(batch2))); - EXPECT_TRUE(changed2.empty()); -} - -// =========================================================================== -// Bug #4 — appendSealedChunk accepts overlapping time ranges -// -// topic_storage.cpp:15 -// if (!sealed_chunks_.empty() && chunk.stats.t_min < sealed_chunks_.back().stats.t_min) -// -// The guard only checks new.t_min < last.t_min. A chunk whose t_min falls -// inside the previous chunk's [t_min, t_max] passes the check even though -// it creates a temporal overlap. This violates the non-overlapping invariant -// assumed by latestAt() and RangeCursor. -// -// Example: Chunk1=[100,500], Chunk2=[400,600]. 400 >= 100 → accepted (BUG). -// =========================================================================== - -TEST(RegressionTest, Bug4_AppendSealedChunk_AcceptsOverlappingTimeRange) { - TopicDescriptor desc; - desc.name = "t"; - desc.schema_id = 1; - desc.dataset_id = 1; - TopicStorage storage(/*topic_id=*/1, std::move(desc)); - - // Chunk1: t=[100, 500]. - ASSERT_TRUE(storage.appendSealedChunk(makeChunkWithRange(1, 100, 500, 5)).has_value()); - - // Chunk2: t=[400, 600] — overlaps Chunk1 in [400, 500]. Should be rejected. - auto result = storage.appendSealedChunk(makeChunkWithRange(1, 400, 600, 3)); - // BUG: currently has_value() == true (overlap silently accepted). - EXPECT_FALSE(result.has_value()); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/series_reader_test.cpp b/pj_datastore/tests/series_reader_test.cpp deleted file mode 100644 index be4a1514..00000000 --- a/pj_datastore/tests/series_reader_test.cpp +++ /dev/null @@ -1,216 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include - -#include -#include -#include - -#include "pj_base/type_tree.hpp" -#include "pj_datastore/engine.hpp" -#include "pj_datastore/reader.hpp" -#include "pj_datastore/writer.hpp" - -namespace PJ { -namespace { - -class SeriesReaderTest : public ::testing::Test { - protected: - void SetUp() override { - auto dataset_or = engine_.createDataset(DatasetDescriptor{.source_name = "series"}); - ASSERT_TRUE(dataset_or.has_value()) << dataset_or.error(); - dataset_id_ = *dataset_or; - - DataWriter writer = engine_.createWriter(); - auto schema_or = writer.registerSchema( - "row", makeStruct( - "row", { - makePrimitive("dense", PrimitiveType::kFloat64), - makePrimitive("sparse", PrimitiveType::kFloat64), - makePrimitive("text", PrimitiveType::kString), - makePrimitive("flag", PrimitiveType::kBool), - makePrimitive("all_null", PrimitiveType::kFloat64), - })); - ASSERT_TRUE(schema_or.has_value()) << schema_or.error(); - - TopicDescriptor descriptor; - descriptor.name = "/topic"; - descriptor.schema_id = *schema_or; - descriptor.max_chunk_rows = 3; - auto topic_or = writer.registerTopic(dataset_id_, descriptor); - ASSERT_TRUE(topic_or.has_value()) << topic_or.error(); - topic_id_ = *topic_or; - - for (int i = 0; i < 6; ++i) { - ASSERT_TRUE(writer.beginRow(topic_id_, static_cast(i * 10)).has_value()); - writer.set(topic_id_, 0, static_cast(i)); - if (i == 0) { - writer.set(topic_id_, 1, 10.0); - } else if (i == 2) { - writer.set(topic_id_, 1, 20.0); - } else if (i == 5) { - writer.set(topic_id_, 1, -5.0); - } else { - writer.setNull(topic_id_, 1); - } - writer.set(topic_id_, 2, std::string_view("text")); - if (i == 1 || i == 4) { - writer.set(topic_id_, 3, i == 4); - } else { - writer.setNull(topic_id_, 3); - } - writer.setNull(topic_id_, 4); - ASSERT_TRUE(writer.finishRow(topic_id_).has_value()); - } - - const auto changed = engine_.commitChunks(writer.flushAll()); - ASSERT_FALSE(changed.empty()); - } - - DataEngine engine_; - DatasetId dataset_id_ = 0; - TopicId topic_id_ = 0; -}; - -TEST_F(SeriesReaderTest, SparseSeriesExposesOnlyValueBearingSamples) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 1); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - EXPECT_EQ(series.size(), 3U); - EXPECT_FALSE(series.empty()); - - const auto first = series.sampleAt(0); - ASSERT_TRUE(first.has_value()); - EXPECT_EQ(first->timestamp, 0); - EXPECT_DOUBLE_EQ(first->value, 10.0); - EXPECT_EQ(first->row_index, 0U); - - const auto second = series.sampleAt(1); - ASSERT_TRUE(second.has_value()); - EXPECT_EQ(second->timestamp, 20); - EXPECT_DOUBLE_EQ(second->value, 20.0); - EXPECT_EQ(second->row_index, 2U); - - const auto third = series.sampleAt(2); - ASSERT_TRUE(third.has_value()); - EXPECT_EQ(third->timestamp, 50); - EXPECT_DOUBLE_EQ(third->value, -5.0); - EXPECT_EQ(third->row_index, 2U); - - EXPECT_FALSE(series.sampleAt(3).has_value()); -} - -TEST_F(SeriesReaderTest, TimeLookupsUseSeriesIndicesAndSkipNullRows) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 1); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - EXPECT_FALSE(series.indexAtOrBeforeTime(-1).has_value()); - EXPECT_EQ(series.indexAtOrBeforeTime(0), 0U); - EXPECT_EQ(series.indexAtOrBeforeTime(10), 0U); - EXPECT_EQ(series.indexAtOrBeforeTime(20), 1U); - EXPECT_EQ(series.indexAtOrBeforeTime(49), 1U); - EXPECT_EQ(series.indexAtOrBeforeTime(50), 2U); - - EXPECT_EQ(series.indexAtOrAfterTime(1), 1U); - EXPECT_EQ(series.indexAtOrAfterTime(21), 2U); - EXPECT_FALSE(series.indexAtOrAfterTime(51).has_value()); - - const auto before = series.sampleAtOrBeforeTime(49); - ASSERT_TRUE(before.has_value()); - EXPECT_EQ(before->timestamp, 20); - EXPECT_DOUBLE_EQ(before->value, 20.0); - - const auto after = series.sampleAtOrAfterTime(21); - ASSERT_TRUE(after.has_value()); - EXPECT_EQ(after->timestamp, 50); - EXPECT_DOUBLE_EQ(after->value, -5.0); -} - -TEST_F(SeriesReaderTest, SeriesCursorFiltersByTimeRange) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 1); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - std::vector timestamps; - std::vector values; - auto cursor = series.samples(Range{.min = 10, .max = 50}); - cursor.forEach([&](const SeriesSample& sample) { - timestamps.push_back(sample.timestamp); - values.push_back(sample.value); - }); - - ASSERT_EQ(timestamps.size(), 2U); - EXPECT_EQ(timestamps[0], 20); - EXPECT_EQ(timestamps[1], 50); - EXPECT_DOUBLE_EQ(values[0], 20.0); - EXPECT_DOUBLE_EQ(values[1], -5.0); -} - -TEST_F(SeriesReaderTest, BoundsUseOnlySeriesSamples) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 1); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - const auto bounds = series.bounds(); - ASSERT_TRUE(bounds.has_value()); - EXPECT_EQ(bounds->time.min, 0); - EXPECT_EQ(bounds->time.max, 50); - EXPECT_DOUBLE_EQ(bounds->value.min, -5.0); - EXPECT_DOUBLE_EQ(bounds->value.max, 20.0); - EXPECT_EQ(bounds->sample_count, 3U); - - const auto partial = series.bounds(Range{.min = 1, .max = 49}); - ASSERT_TRUE(partial.has_value()); - EXPECT_EQ(partial->time.min, 20); - EXPECT_EQ(partial->time.max, 20); - EXPECT_DOUBLE_EQ(partial->value.min, 20.0); - EXPECT_DOUBLE_EQ(partial->value.max, 20.0); - EXPECT_EQ(partial->sample_count, 1U); -} - -TEST_F(SeriesReaderTest, AllNullColumnIsAnEmptySeries) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 4); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - EXPECT_EQ(series.size(), 0U); - EXPECT_TRUE(series.empty()); - EXPECT_FALSE(series.sampleAt(0).has_value()); - EXPECT_FALSE(series.bounds().has_value()); - - std::size_t count = 0; - series.samples(Range{.min = 0, .max = 50}).forEach([&](const SeriesSample&) { ++count; }); - EXPECT_EQ(count, 0U); -} - -TEST_F(SeriesReaderTest, BoolColumnsAreNumericSeries) { - DataReader reader = engine_.createReader(); - auto series_or = reader.series(topic_id_, 3); - ASSERT_TRUE(series_or.has_value()) << series_or.error(); - const SeriesReader series = *series_or; - - ASSERT_EQ(series.size(), 2U); - ASSERT_TRUE(series.sampleAt(0).has_value()); - ASSERT_TRUE(series.sampleAt(1).has_value()); - EXPECT_DOUBLE_EQ(series.sampleAt(0)->value, 0.0); - EXPECT_DOUBLE_EQ(series.sampleAt(1)->value, 1.0); -} - -TEST_F(SeriesReaderTest, SeriesCreationValidatesTopicColumnAndType) { - DataReader reader = engine_.createReader(); - - EXPECT_FALSE(reader.series(topic_id_ + 9999U, 0).has_value()); - EXPECT_FALSE(reader.series(topic_id_, 99).has_value()); - EXPECT_FALSE(reader.series(topic_id_, 2).has_value()); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/topic_storage_test.cpp b/pj_datastore/tests/topic_storage_test.cpp deleted file mode 100644 index 7ada0d01..00000000 --- a/pj_datastore/tests/topic_storage_test.cpp +++ /dev/null @@ -1,264 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/topic_storage.hpp" - -#include - -#include -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_datastore/chunk.hpp" - -namespace PJ { -namespace { - -// --------------------------------------------------------------------------- -// Helper: build and seal a test chunk with given time range -// --------------------------------------------------------------------------- - -TopicChunk make_test_chunk(TopicId topic_id, Timestamp t_start, Timestamp t_end, uint32_t num_rows) { - std::vector cols = {{0, PrimitiveType::kFloat32, "value"}}; - TopicChunkBuilder builder(topic_id, /*schema_id=*/1, cols, num_rows); - Timestamp step = (num_rows > 1) ? (t_end - t_start) / static_cast(num_rows - 1) : 0; - for (uint32_t i = 0; i < num_rows; ++i) { - builder.beginRow(t_start + static_cast(i) * step); - builder.set(0, static_cast(i)); - builder.finishRow(); - } - return builder.seal(); -} - -// =========================================================================== -// Test 1: Append chunks -// =========================================================================== - -TEST(TopicStorageTest, AppendChunks) { - TopicDescriptor desc; - desc.name = "test_topic"; - desc.schema_id = 1; - desc.dataset_id = 10; - - TopicStorage storage(/*topic_id=*/1, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(1, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(1, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(1, 3000, 3900, 10)).has_value()); - - EXPECT_EQ(storage.sealedChunks().size(), 3U); -} - -// =========================================================================== -// Test 2: time_min / time_max -// =========================================================================== - -TEST(TopicStorageTest, TimeMinMax) { - TopicDescriptor desc; - desc.name = "time_range_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/2, std::move(desc)); - - // Empty storage returns 0 - EXPECT_EQ(storage.time_min(), 0); - EXPECT_EQ(storage.time_max(), 0); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(2, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(2, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(2, 3000, 3900, 10)).has_value()); - - EXPECT_EQ(storage.time_min(), 1000); - EXPECT_EQ(storage.time_max(), 3900); -} - -// =========================================================================== -// Test 3: Evict none -// =========================================================================== - -TEST(TopicStorageTest, EvictNone) { - TopicDescriptor desc; - desc.name = "evict_none_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/3, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(3, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(3, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(3, 3000, 3900, 10)).has_value()); - - // Evict before the first chunk's t_min -- nothing should be removed - storage.evictBefore(500); - EXPECT_EQ(storage.sealedChunks().size(), 3U); - - // Evict at exactly the first chunk's t_min -- still nothing removed - // because t_max (1900) is not < 1000 - storage.evictBefore(1000); - EXPECT_EQ(storage.sealedChunks().size(), 3U); -} - -// =========================================================================== -// Test 4: Evict some -// =========================================================================== - -TEST(TopicStorageTest, EvictSome) { - TopicDescriptor desc; - desc.name = "evict_some_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/4, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(4, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(4, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(4, 3000, 3900, 10)).has_value()); - - // Evict chunks whose t_max < 2500 - // Chunk 1 (t_max=1900 < 2500) -> evicted - // Chunk 2 (t_max=2900 >= 2500) -> kept - // Chunk 3 (t_max=3900 >= 2500) -> kept - storage.evictBefore(2500); - EXPECT_EQ(storage.sealedChunks().size(), 2U); - EXPECT_EQ(storage.time_min(), 2000); - EXPECT_EQ(storage.time_max(), 3900); -} - -// =========================================================================== -// Test 5: Evict all -// =========================================================================== - -TEST(TopicStorageTest, EvictAll) { - TopicDescriptor desc; - desc.name = "evict_all_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/5, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(5, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(5, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(5, 3000, 3900, 10)).has_value()); - - // Evict with t_keep_min beyond all chunks - storage.evictBefore(5000); - EXPECT_TRUE(storage.empty()); - EXPECT_EQ(storage.sealedChunks().size(), 0U); - EXPECT_EQ(storage.time_min(), 0); - EXPECT_EQ(storage.time_max(), 0); -} - -// =========================================================================== -// Test 6: Metadata -// =========================================================================== - -TEST(TopicStorageTest, Metadata) { - TopicDescriptor desc; - desc.name = "metadata_topic"; - desc.schema_id = 42; - desc.dataset_id = 7; - - TopicStorage storage(/*topic_id=*/6, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(6, 1000, 1900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(6, 2000, 2900, 10)).has_value()); - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(6, 3000, 3900, 10)).has_value()); - - TopicMetadata meta = storage.metadata(); - - EXPECT_EQ(meta.topic_id, 6U); - EXPECT_EQ(meta.name, "metadata_topic"); - EXPECT_EQ(meta.current_schema, 42U); - EXPECT_EQ(meta.dataset_id, 7U); - EXPECT_EQ(meta.time_range_min, 1000); - EXPECT_EQ(meta.time_range_max, 3900); - EXPECT_EQ(meta.total_row_count, 30U); - EXPECT_GT(meta.total_byte_size, 0U); -} - -// =========================================================================== -// Test 7: Empty -// =========================================================================== - -TEST(TopicStorageTest, Empty) { - TopicDescriptor desc; - desc.name = "empty_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/7, std::move(desc)); - - // Initially empty - EXPECT_TRUE(storage.empty()); - - // After appending, not empty - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(7, 1000, 1900, 10)).has_value()); - EXPECT_FALSE(storage.empty()); - - // After evicting all, empty again - storage.evictBefore(5000); - EXPECT_TRUE(storage.empty()); -} - -// =========================================================================== -// Test 8: Update schema -// =========================================================================== - -TEST(TopicStorageTest, UpdateSchema) { - TopicDescriptor desc; - desc.name = "schema_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/8, std::move(desc)); - - EXPECT_EQ(storage.descriptor().schema_id, 1U); - - storage.updateSchema(42); - EXPECT_EQ(storage.descriptor().schema_id, 42U); - - // Metadata should reflect the updated schema - TopicMetadata meta = storage.metadata(); - EXPECT_EQ(meta.current_schema, 42U); -} - -// =========================================================================== -// Test 9: Reject out-of-order chunk -// =========================================================================== - -TEST(TopicStorageTest, RejectOutOfOrderChunk) { - TopicDescriptor desc; - desc.name = "order_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/9, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(9, 2000, 2900, 10)).has_value()); - - // Append a chunk with t_min < previous chunk's t_min — should fail - auto status = storage.appendSealedChunk(make_test_chunk(9, 1000, 1900, 10)); - EXPECT_FALSE(status.has_value()); - - // Only the first chunk should be stored - EXPECT_EQ(storage.sealedChunks().size(), 1U); -} - -// =========================================================================== -// Test 10: Equal t_min chunks are allowed -// =========================================================================== - -TEST(TopicStorageTest, OverlappingChunkRejected_SameTMin) { - // A chunk whose t_min falls inside the previous chunk's [t_min, t_max] is an - // overlap and must be rejected. This includes the case where t_min is equal - // (Chunk2.t_min=1000 < Chunk1.t_max=1900 → rejected). - TopicDescriptor desc; - desc.name = "overlap_tmin_topic"; - desc.schema_id = 1; - - TopicStorage storage(/*topic_id=*/10, std::move(desc)); - - ASSERT_TRUE(storage.appendSealedChunk(make_test_chunk(10, 1000, 1900, 10)).has_value()); - // Same t_min — overlaps Chunk1 in [1000, 1900]: must be rejected. - EXPECT_FALSE(storage.appendSealedChunk(make_test_chunk(10, 1000, 1500, 5)).has_value()); - - EXPECT_EQ(storage.sealedChunks().size(), 1U); -} - -} // namespace -} // namespace PJ diff --git a/pj_datastore/tests/type_registry_test.cpp b/pj_datastore/tests/type_registry_test.cpp deleted file mode 100644 index d49c8810..00000000 --- a/pj_datastore/tests/type_registry_test.cpp +++ /dev/null @@ -1,243 +0,0 @@ -// Copyright 2026 Davide Faconti -// SPDX-License-Identifier: MPL-2.0 - -#include "pj_datastore/type_registry.hpp" - -#include - -#include -#include - -#include "pj_base/expected.hpp" -#include "pj_base/type_tree.hpp" -#include "pj_base/types.hpp" - -namespace PJ { -namespace { - -// Helper: build a simple struct with two float64 fields (x, y) -std::shared_ptr make_point_schema() { - return makeStruct( - "Point", { - makePrimitive("x", PrimitiveType::kFloat64), - makePrimitive("y", PrimitiveType::kFloat64), - }); -} - -// Helper: build a struct with three float64 fields (x, y, z) -std::shared_ptr make_point3d_schema() { - return makeStruct( - "Point3D", { - makePrimitive("x", PrimitiveType::kFloat64), - makePrimitive("y", PrimitiveType::kFloat64), - makePrimitive("z", PrimitiveType::kFloat64), - }); -} - -// 1. Register a schema, lookup by ID: returns correct tree -TEST(TypeRegistryTest, RegisterAndLookupById) { - TypeRegistry registry; - auto tree = make_point_schema(); - auto* raw_ptr = tree.get(); - - auto result = registry.registerSchema("Point", tree); - ASSERT_TRUE(result.has_value()) << result.error(); - - const TypeTreeNode* looked_up = registry.lookup(*result); - ASSERT_NE(looked_up, nullptr); - EXPECT_EQ(looked_up, raw_ptr); - EXPECT_EQ(looked_up->name, "Point"); - EXPECT_EQ(looked_up->kind, TypeKind::kStruct); - EXPECT_EQ(looked_up->children.size(), 2); -} - -// 2. Register a schema, find by name: returns correct ID -TEST(TypeRegistryTest, RegisterAndFindByName) { - TypeRegistry registry; - auto tree = make_point_schema(); - - auto result = registry.registerSchema("Point", tree); - ASSERT_TRUE(result.has_value()) << result.error(); - - auto found = registry.findByName("Point"); - ASSERT_TRUE(found.has_value()); - EXPECT_EQ(*found, *result); -} - -// 3. Register duplicate name: returns AlreadyExistsError -TEST(TypeRegistryTest, RegisterDuplicateNameFails) { - TypeRegistry registry; - - auto result1 = registry.registerSchema("Point", make_point_schema()); - ASSERT_TRUE(result1.has_value()) << result1.error(); - - auto result2 = registry.registerSchema("Point", make_point_schema()); - ASSERT_FALSE(result2.has_value()); -} - -// 4. register_or_get with new name: registers and returns ID -TEST(TypeRegistryTest, RegisterOrGetNewName) { - TypeRegistry registry; - - auto result = registry.registerOrGet("Point", make_point_schema()); - ASSERT_TRUE(result.has_value()) << result.error(); - - // Verify it was actually registered - auto found = registry.findByName("Point"); - ASSERT_TRUE(found.has_value()); - EXPECT_EQ(*found, *result); - - const TypeTreeNode* looked_up = registry.lookup(*result); - ASSERT_NE(looked_up, nullptr); - EXPECT_EQ(looked_up->name, "Point"); -} - -// 5. register_or_get with existing name: returns existing ID -TEST(TypeRegistryTest, RegisterOrGetExistingName) { - TypeRegistry registry; - - auto result1 = registry.registerSchema("Point", make_point_schema()); - ASSERT_TRUE(result1.has_value()) << result1.error(); - - // register_or_get should return the same ID, ignoring the new tree - auto result2 = registry.registerOrGet("Point", make_point3d_schema()); - ASSERT_TRUE(result2.has_value()) << result2.error(); - EXPECT_EQ(*result1, *result2); - - // The original tree should still be the one stored (2 fields, not 3) - const TypeTreeNode* looked_up = registry.lookup(*result2); - ASSERT_NE(looked_up, nullptr); - EXPECT_EQ(looked_up->children.size(), 2); -} - -// 6. lookup with unknown ID: returns nullptr -TEST(TypeRegistryTest, LookupUnknownIdReturnsNullptr) { - TypeRegistry registry; - EXPECT_EQ(registry.lookup(999), nullptr); -} - -// 7. find_by_name with unknown name: returns nullopt -TEST(TypeRegistryTest, FindByNameUnknownReturnsNullopt) { - TypeRegistry registry; - auto found = registry.findByName("NonExistent"); - EXPECT_FALSE(found.has_value()); -} - -// 8. evolve_schema with additive change: succeeds, lookup returns new tree -TEST(TypeRegistryTest, EvolveSchemaAdditiveChange) { - TypeRegistry registry; - - auto original = make_point_schema(); - auto result = registry.registerSchema("Point", original); - ASSERT_TRUE(result.has_value()) << result.error(); - SchemaId id = *result; - - // Evolve: add a z field - auto evolved = make_point3d_schema(); - auto* evolved_ptr = evolved.get(); - PJ::Status status = registry.evolveSchema(id, evolved); - ASSERT_TRUE(status.has_value()) << status.error(); - - // lookup should now return the evolved tree - const TypeTreeNode* looked_up = registry.lookup(id); - ASSERT_NE(looked_up, nullptr); - EXPECT_EQ(looked_up, evolved_ptr); - EXPECT_EQ(looked_up->children.size(), 3); -} - -// 9. evolve_schema with removed field: returns InvalidArgumentError -TEST(TypeRegistryTest, EvolveSchemaRemovedFieldFails) { - TypeRegistry registry; - - // Start with 3 fields - auto original = make_point3d_schema(); - auto result = registry.registerSchema("Point3D", original); - ASSERT_TRUE(result.has_value()) << result.error(); - - // Try to evolve to 2 fields (removing z) - auto reduced = make_point_schema(); - PJ::Status status = registry.evolveSchema(*result, reduced); - ASSERT_FALSE(status.has_value()); -} - -// 10. evolve_schema with type change on existing field: returns InvalidArgumentError -TEST(TypeRegistryTest, EvolveSchemaTypeChangeFails) { - TypeRegistry registry; - - auto original = make_point_schema(); // x: float64, y: float64 - auto result = registry.registerSchema("Point", original); - ASSERT_TRUE(result.has_value()) << result.error(); - - // Try to evolve: change x from float64 to int32 - auto changed = makeStruct( - "Point", { - makePrimitive("x", PrimitiveType::kInt32), // type changed! - makePrimitive("y", PrimitiveType::kFloat64), - makePrimitive("z", PrimitiveType::kFloat64), - }); - PJ::Status status = registry.evolveSchema(*result, changed); - ASSERT_FALSE(status.has_value()); -} - -// 11. evolve_schema with unknown ID: returns NotFoundError -TEST(TypeRegistryTest, EvolveSchemaUnknownIdFails) { - TypeRegistry registry; - - PJ::Status status = registry.evolveSchema(999, make_point_schema()); - ASSERT_FALSE(status.has_value()); -} - -// 12. Multiple schemas: register 3, verify each has unique ID and correct tree -TEST(TypeRegistryTest, MultipleSchemas) { - TypeRegistry registry; - - auto tree_a = makePrimitive("temp", PrimitiveType::kFloat32); - auto tree_b = make_point_schema(); - auto tree_c = makeStruct( - "Pose", { - makePrimitive("frame", PrimitiveType::kString), - makeStruct( - "position", - { - makePrimitive("x", PrimitiveType::kFloat64), - makePrimitive("y", PrimitiveType::kFloat64), - makePrimitive("z", PrimitiveType::kFloat64), - }), - }); - - auto* raw_a = tree_a.get(); - auto* raw_b = tree_b.get(); - auto* raw_c = tree_c.get(); - - auto id_a = registry.registerSchema("Temperature", tree_a); - auto id_b = registry.registerSchema("Point", tree_b); - auto id_c = registry.registerSchema("Pose", tree_c); - - ASSERT_TRUE(id_a.has_value()) << id_a.error(); - ASSERT_TRUE(id_b.has_value()) << id_b.error(); - ASSERT_TRUE(id_c.has_value()) << id_c.error(); - - // All IDs are unique - EXPECT_NE(*id_a, *id_b); - EXPECT_NE(*id_a, *id_c); - EXPECT_NE(*id_b, *id_c); - - // Each lookup returns the correct tree - EXPECT_EQ(registry.lookup(*id_a), raw_a); - EXPECT_EQ(registry.lookup(*id_b), raw_b); - EXPECT_EQ(registry.lookup(*id_c), raw_c); - - // find_by_name works for all - auto found_a = registry.findByName("Temperature"); - auto found_b = registry.findByName("Point"); - auto found_c = registry.findByName("Pose"); - ASSERT_TRUE(found_a.has_value()); - ASSERT_TRUE(found_b.has_value()); - ASSERT_TRUE(found_c.has_value()); - EXPECT_EQ(*found_a, *id_a); - EXPECT_EQ(*found_b, *id_b); - EXPECT_EQ(*found_c, *id_c); -} - -} // namespace -} // namespace PJ diff --git a/pj_plugins/CLAUDE.md b/pj_plugins/CLAUDE.md index 71f38da4..105bf3d3 100644 --- a/pj_plugins/CLAUDE.md +++ b/pj_plugins/CLAUDE.md @@ -6,10 +6,11 @@ plugin DSOs. Owns **four plugin families** — DataSource, MessageParser, Toolbo Dialog. Plugins depend only on `pj_base`; this module (the host side) links `pj_base` and is consumed by the app. It does **not** own the data-plane bridge (that is `pj_datastore`'s `DatastoreSourceWriteHost` / `…ParserWriteHost` / -`…ToolboxHost`) and links **no Qt** — dialogs are toolkit-neutral (the GUI host -supplies the renderer). The submodule's read-path is `plotjuggler_core/CLAUDE.md` -→ this file → `docs/` → headers → code (the PJ4 per-module-CLAUDE contract does -not govern submodule-internal modules; `pj_base`/`pj_datastore` carry none). +`…ToolboxHost`, which now lives in the PlotJuggler application repo, not in this +SDK) and links **no Qt** — dialogs are toolkit-neutral (the GUI host supplies the +renderer). The submodule's read-path is `plotjuggler_core/CLAUDE.md` → this file +→ `docs/` → headers → code (the PJ4 per-module-CLAUDE contract does not govern +submodule-internal modules; `pj_base` carries none). ## Layout - `include/pj_plugins/host/` — host loaders + RAII handles for DataSource / diff --git a/pj_plugins/docs/ARCHITECTURE.md b/pj_plugins/docs/ARCHITECTURE.md index cb1ed4f3..2099963e 100644 --- a/pj_plugins/docs/ARCHITECTURE.md +++ b/pj_plugins/docs/ARCHITECTURE.md @@ -252,6 +252,7 @@ pj_plugins/ message_parser_library.cpp toolbox_library.cpp +(PlotJuggler application repo — not part of this SDK submodule) pj_datastore/ include/pj_datastore/ plugin_data_host.hpp ← DatastoreSourceWriteHost, @@ -260,9 +261,10 @@ pj_datastore/ ``` **Dependency direction:** Plugins depend only on `pj_base`. The host links -`pj_plugins` (which depends on `pj_base`). `pj_datastore` -provides the concrete data-host implementations that bridge plugin writes to -the columnar storage engine. +`pj_plugins` (which depends on `pj_base`). `pj_datastore` — now a module in the +PlotJuggler application repo, not part of this SDK — provides the concrete +data-host implementations that bridge plugin writes to the columnar storage +engine. ## 3. C ABI Protocols diff --git a/test_sdk_install.sh b/test_sdk_install.sh index f92eaba2..e719f732 100755 --- a/test_sdk_install.sh +++ b/test_sdk_install.sh @@ -72,7 +72,7 @@ cmake --build "$CONSUMER_BUILD_DIR" -j "$(nproc)" echo "" echo "--- Step 4: Smoke-test find_package COMPONENTS ---" -for comp in base plugin_sdk plugin_host datastore; do +for comp in base plugin_sdk plugin_host; do COMP_DIR="$(mktemp -d)" cat > "$COMP_DIR/CMakeLists.txt" <