Skip to content

Staging#1879

Merged
JoshuaSBrown merged 20 commits intodevelfrom
staging
Feb 25, 2026
Merged

Staging#1879
JoshuaSBrown merged 20 commits intodevelfrom
staging

Conversation

@JoshuaSBrown
Copy link
Collaborator

@JoshuaSBrown JoshuaSBrown commented Feb 25, 2026

Ticket

Description

How Has This Been Tested?

Artifacts (if appropriate):

Tasks

  • - A description of the PR has been provided, and a diagram included if it is a new feature.
  • - Formatter has been run
  • - CHANGELOG comment has been added
  • - Labels have been assigned to the pr
  • - A reviwer has been added
  • - A user has been assigned to work on the pr
  • - If new feature a unit test has been added

Summary by Sourcery

Update DataFed core, Foxx services, and web server for the 3.0 release, tightening security around paths and queries, improving logging and error handling, and adding new end-to-end tests for API queries and Playwright-based web UI.

New Features:

  • Add repository resolution from file paths and enforce repo/file consistency in authz checks.
  • Introduce end-to-end query CRUD API test and Playwright-based web UI tests, including scaffolding for additional browser-driven scenarios.

Bug Fixes:

  • Correct multiple logging, error-reporting, and optional-chaining issues across user, schema, repo, metrics, config, note, topic, task, and admin routers.
  • Fix permission checks and tag/dependency handling in data record updates.
  • Ensure saved query params are consistently treated as structured objects, including legacy string-encoded records, and fix parameter validation and deletion of the limit field.
  • Stabilize ZMQ message body size validation by avoiding use-after-close on zmq_msg.

Enhancements:

  • Replace the project search endpoint with a safer, logged project listing implementation and remove the corresponding Foxx tests and core projSearch API.
  • Refine record path validation logic and repository path normalization utilities for safer POSIX path handling.
  • Improve saved query create/update/exec flows, including better logging metadata, mode normalization, and richer protobuf JSON serialization in the core DatabaseAPI.
  • Augment token-related tasks with richer token format data and enhance metrics, ACL, and task routers with clearer result initialization and logging.
  • Update web server version negotiation to align with new version constants and API field names, and remove unused project search API in the web layer.
  • Switch core and mock servers to use a custom proxy implementation via the updated ServerFactory.

Build:

  • Bump release date and major/minor versions across components for the 3.0 release and wire new end-to-end tests into CTest configuration.
  • Tidy CMakeLists formatting and add an option for Playwright-based end-to-end web tests.

Documentation:

  • Expand end-to-end testing documentation with Playwright setup and usage instructions, including Docker-based workflows.

Tests:

  • Remove obsolete Foxx project search tests and add new end-to-end tests for query CRUD operations and web UI password change flow.

Chores:

  • Modernize version field names in the web server to match the updated version.js contract and reply schema.

JoshuaSBrown and others added 20 commits November 25, 2025 06:54
…se. (#1778)

[DAPS-1786] - Web tests, refactor: add test for hitting password reset. (#1787)
… tests (#1779)

* [DAPS-1775] - fix: core, foxx, add missing {}, foxx query_router add params object schema to routes. (#1781)
* [DAPS-1777] - fix: foxx, user_router fix regression in missing response. (#1778)
* [DAPS-1786] - refactor: web tests, add test for hitting password reset. (#1787)
* [DAPS-1277] - fix: mock, core, common, PROXY_BASIC_ZMQ and PROXY_CUSTOM correctly defined
* [DAPS-1790] - fix: common, core, repo, zmq assertion failure during EXCEPT call due to callin zmq_msg with invalid state after closing it.
* [DAPS-1791] - fix: build, python client, requirements.txt was being moved to a folder named requirements.txt during cmake configure script.
* refactor: only print subset of user properties.

* chore: Auto-format JavaScript files with Prettier
* [DAPS-1770] - release: v4.0.0 

* [DAPS-1585] - update: dependencies, upgrade ssl dependency. 3.2.5 (#1646)
* [DAPS-1605] - fix: scripts, install_foxx.sh by splitting ssl_args (#1623)
* [DAPS-1651] - refactor: scripts, compose, univify treatment of env variables in compose env generator (#1656) (#1658)
* [DAPS-1675] - feature: foxx, adding the logger functions for future PR's (#1675)
* [DAPS-1659] - refactor: scripts, remove dependencies install scripts (#1660)
* [DAPS-1670] - feature: common, core, repo, python_client, web, allow passing repo types in protobuf messages (#1670)
* [DAPS-1671] - feature: foxx, add repository and execution strategy types (#1672)
* [DAPS-1661] - refactor: compose, scripts, remove remaining occurences of zeromq system secret. (#1661)
* [DAPS-1522] - refactor: foxx, user router logging improvements, remove non helpful logs from tasks.js (#1629)
* [DAPS-1691] - refactor: foxx, adjust validation.js swap g_lib with error_code require rem… (#1691)
* [DAPS-1692] - tests: ci, End-to-end web tests, fix flaky test (#1693)
* [DAPS-1694] - refactor: foxx, move permissions functions from support.js to lib/permissions (#1695)
* [DAPS-1685] - feature: compose, enable arangodb ssl (#1687)
* [DAPS-1700] - fix: ci, limit arangodb job output to last 3 hours. (#1701)
* [DAPS-1676] - feature: foxx, arango add factory for repositories for metadata and globus (#1697)
* [DAPS-1718] - feature: web, core, python client, Protobuf ExecutionMethod enum, add RepoAllocationCreateResponse (#1719)
* [DAPS-1713] - refactor: core, web, python client, protobuf, allow optional fields when creating repo to support metadat… (#1714)
* [DAPS-1715] - refactor: core, make path, pub_key, address, endpoint optional in repoCreateRequest (#1716)
* [DAPS-1705] - feature: foxx, integrate metadata globus factory repo router create (#1706)
* [DAPS-1688] - update: dependencies, core, repo, authz, gcs, Crypto libssl switched to version 3 globus_sdk version pinned (#1689)
* [DAPS-1729] - fix: ci, downstream datafed dependencies pipelines are building the container image from incorrect sha (#1732)
* [DAPS-1711] - refactor: foxx standardize repo response schema (#1712)
* [DAPS-1725] - refactor: remove confusing apache conf file. (#1728)
* [DAPS-1707] - update: dependencies, web, update web dependencies before install (#1709)
* [DAPS-1522] - refactor: foxx, task router logging improvements (#1648)
* [DAPS-1522] - refactor: foxx, query router logging improvements (#1627)
* [DAPS-1735] - bug: foxx, remove duplicate user_router test (#1736)
* [DAPS-1731] - reature: scripts, compose, add scripts to generate globus credentials for web service (#1731)
* [DAPS-1725] - refactor: tests, mock core server centralized (#1726)
* [DAPS-1741] - update: scripts, native client id in intialize_globus_endpoint and globus_clea… (#1741)
* [DAPS-1745]  - fix: scripts, account for nested client credentials. (#1746)
* [DAPS-1725-2] - fix; tests, centralized mock core service libraries fixed (part 2) (#1747)
* [DAPS-1742] - refactor script replace os.path.join with urllib.parse.urljoin (#1744)
* [DAPS-1749] - refactor: cmake, set cmake policy to silence noisy warning. (#1750)
* [DAPS-1522] - refactor: foxx, feature tag router logging improvements (#1734)
* [DAPS-1378] - fix: web, mapping of multiple globus accounts. (#1753)
* [DAPS-1756] - fix: scripts, foxx, add retries and connection check to install_foxx.sh script. (#1757)
* [DAPS-1522] - refactor: foxx, Version Router Logging Improvements (#1758)
* [DAPS-1737] - refactor: compose, cleanup arango ssl env variables (#1765)
* [DAPS-1766] - fix: ci, python client provisioning job
* [DAPS-1663] - feature: core Service, adding Correlation ID to Logging (#1704)

Co-authored-by: Aaron Perez <perezam@ornl.gov>
Co-authored-by: AronPerez <aperez0295@gmail.com>
Co-authored-by: Blake Nedved <nedvedba@ornl.gov>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: nedvedba <145805866+nedvedba@users.noreply.github.com>
Co-authored-by: Austin Hampton <amh107@latech.edu>
Co-authored-by: Austin Hampton <44103380+megatnt1122@users.noreply.github.com>
Co-authored-by: Blake Nedved <blakeanedved@gmail.com>
Co-authored-by: Polina Shpilker <infinite.loopholes@gmail.com>

* [DAPS-1777] - Foxx, fix: user_router fix regression in missing response. (#1778)

[DAPS-1786] - Web tests, refactor: add test for hitting password reset. (#1787)

* [DAPS-1774] - Core, Python, Database, Foxx, Test add query end to end tests (#1779)

* [DAPS-1775] - fix: core, foxx, add missing {}, foxx query_router add params object schema to routes. (#1781)
* [DAPS-1777] - fix: foxx, user_router fix regression in missing response. (#1778)
* [DAPS-1786] - refactor: web tests, add test for hitting password reset. (#1787)
* [DAPS-1277] - fix: mock, core, common, PROXY_BASIC_ZMQ and PROXY_CUSTOM correctly defined
* [DAPS-1790] - fix: common, core, repo, zmq assertion failure during EXCEPT call due to callin zmq_msg with invalid state after closing it.
* [DAPS-1791] - fix: build, python client, requirements.txt was being moved to a folder named requirements.txt during cmake configure script.

* chore: Auto-format JavaScript files with Prettier

* [DAPS-1774] - continuation of fix (#1798)

* [DAPS-1774] - continuation of fix (#1798)

* release: version 4.0.1 (#1807)

* refactor: only print subset of user properties. (#1804)

* refactor: only print subset of user properties.

* chore: Auto-format JavaScript files with Prettier

* [DAPS-1806] - refactor: foxx, user tokens expiring route. (#1806)

---------

Co-authored-by: Aaron Perez <perezam@ornl.gov>
Co-authored-by: AronPerez <aperez0295@gmail.com>
Co-authored-by: Blake Nedved <nedvedba@ornl.gov>
Co-authored-by: sourcery-ai[bot] <58596630+sourcery-ai[bot]@users.noreply.github.com>
Co-authored-by: nedvedba <145805866+nedvedba@users.noreply.github.com>
Co-authored-by: Austin Hampton <amh107@latech.edu>
Co-authored-by: Austin Hampton <44103380+megatnt1122@users.noreply.github.com>
Co-authored-by: Blake Nedved <blakeanedved@gmail.com>
Co-authored-by: Polina Shpilker <infinite.loopholes@gmail.com>
#1861) (#1866)

* fix: prevent defaults being set to undefined, and interpret numbers and enums as strings.

* chore: Auto-format JavaScript files with Prettier

Co-authored-by: Joshua S Brown <joshbro42867@yahoo.com>
* fix: prevent defaults being set to undefined, and interpret numbers a… (#1861)

* fix: prevent defaults being set to undefined, and interpret numbers and enums as strings.

* chore: Auto-format JavaScript files with Prettier

* fix: version numbers from proto3 messages follow camel case. (#1868)

---------

Co-authored-by: Joshua S Brown <joshbro42867@yahoo.com>
@sourcery-ai
Copy link
Contributor

sourcery-ai bot commented Feb 25, 2026

Reviewer's Guide

This PR delivers a staging release that upgrades protocol and component versions, tightens security and logging across Foxx APIs, refactors query and record path handling, removes the project AQL search surface, and adds end‑to‑end tests (including query CRUD and Playwright web tests) along with minor web/server wiring adjustments.

Sequence diagram for saved query update and execution across Web, Core, and Foxx

sequenceDiagram
    actor User
    participant WebServer
    participant CoreServer
    participant FoxxQueryRouter
    participant ArangoDB

    User->>WebServer: POST /api/query/update?id=Q123&replaceQuery=true
    Note over WebServer: HTTP body contains updated query JSON (params as object)

    WebServer->>CoreServer: QueryUpdateRequest(id=Q123, replace_query=true, query)

    CoreServer->>CoreServer: queryUpdate(request)
    CoreServer->>CoreServer: parseSearchRequest(query, out qry_begin, qry_end, qry_filter, params)
    CoreServer->>CoreServer: MessageToJsonString(query, options)
    CoreServer->>FoxxQueryRouter: POST /qry/update
    Note over CoreServer,FoxxQueryRouter: payload.qry_begin, payload.qry_end, payload.qry_filter, payload.params (JSON object), payload.limit

    FoxxQueryRouter->>ArangoDB: _executeTransaction(read:["u","uuid","accn","admin"], write:["q"])
    FoxxQueryRouter->>ArangoDB: q.document(req.body.id)
    FoxxQueryRouter->>ArangoDB: _update(q._id, merged_query, {mergeObjects:false, returnNew:true})
    FoxxQueryRouter->>FoxxQueryRouter: delete qry.qry_begin/qry_end/qry_filter/params/limit
    FoxxQueryRouter->>CoreServer: reply with updated query metadata

    CoreServer->>CoreServer: setQueryData(a_reply, result)
    CoreServer->>WebServer: QueryDataReply
    WebServer->>User: HTTP 200 with updated query metadata

    User->>WebServer: GET /api/query/exec?id=Q123
    WebServer->>CoreServer: QueryExecRequest(id=Q123)

    CoreServer->>FoxxQueryRouter: GET /qry/exec?id=Q123
    FoxxQueryRouter->>ArangoDB: q.document(id)
    FoxxQueryRouter->>FoxxQueryRouter: if typeof params == "string" then JSON.parse(params)
    FoxxQueryRouter->>FoxxQueryRouter: execQuery(client, mode, published, query)
    FoxxQueryRouter->>ArangoDB: _query(built_AQL, query.params)
    FoxxQueryRouter->>CoreServer: results[]

    CoreServer->>WebServer: ExecReply(results)
    WebServer->>User: HTTP 200 with results[]
Loading

Updated class diagram for Record and Repo path handling and AuthZ integration

classDiagram
    class Record {
        -string #key
        -object #loc
        -object #repo
        -number #error
        -string #err_msg
        +boolean isManaged()
        +boolean isPathConsistent(a_path string)
        -string _pathToRecord(uid string, basePath string)
    }

    class Repo {
        -string id_value
        -object doc
        +Repo(id string)
        +string id()
        +PathType pathType(file_path string)
        +static Repo resolveFromPath(file_path string)
    }

    class PathType {
        <<enumeration>>
        UNKNOWN
        REPO
        USER
        PROJECT
    }

    class PosixPath {
        <<module>>
        +string normalizePOSIXPath(a_posix_path string)
        +string joinPOSIX(...segments)
        +string dirnamePOSIX(a_posix_path string)
        +string basenamePOSIX(a_posix_path string)
        +string~[]~ splitPOSIX(a_posix_path string)
    }

    class AuthzRouter {
        <<module>>
        +get_file_access(client string, repo string, file string)
    }

    Record --> "1" Repo : uses
    Repo --> PathType : returns
    AuthzRouter --> Repo : resolveFromPath(file)
    AuthzRouter --> PosixPath : normalizePOSIXPath(file)
    PosixPath ..> Record : shared path normalization convention
Loading

File-Level Changes

Change Details Files
Harden and refactor project listing, remove raw AQL project search, and fix related logging and permission handling.
  • Make proj_router logging for project update and list more defensive with optional chaining and consistent success/failure calls
  • Reimplement the /prj/list endpoint inside the previous /prj/search handler with try/catch, full logging, and subject permission checks
  • Ensure subject-based listing uses permissions.ensureAdminPermUser and the subject id for queries
  • Remove the /prj/search AQL passthrough endpoint and its Foxx tests, and drop the unused core DatabaseAPI::projSearch function
  • Fix minor bugs such as undeclared tot in logging and variable scoping in proj delete task init
core/database/foxx/api/proj_router.js
core/database/foxx/tests/proj_router.test.js
core/server/DatabaseAPI.cpp
core/server/DatabaseAPI.hpp
web/datafed-ws.js
Improve record path computation and consistency validation to correctly handle repos, in-flight moves, and malformed input.
  • Change Record._pathToRecord to accept a uid string instead of a loc object and produce clearer error messages without referencing undefined variables
  • Rework Record.isPathConsistent to normalize paths, handle in-flight new_repo/new_owner correctly, validate against repo documents, and set detailed error messages on mismatch
  • Remove unused comments and the old path comparison logic, simplifying flow into explicit success/failure branches
core/database/foxx/api/record.js
Tighten saved query CRUD/exec behavior, including schema validation, logging, and support for partial vs full query updates end-to-end.
  • Move Create/Update saved query logger.logRequestStarted out of transaction action closures and standardize logging metadata
  • Fix typos deleting qry.limit instead of qry.lmit and require params to be objects (not arbitrary/strings) in Joi schemas
  • Normalize execQuery to accept mode names as strings, enforce that params.cols is a non-empty array, and enforce paging limits consistently
  • Handle legacy string-encoded params on query exec by JSON parsing, and change /qry/exec POST body to pass params as an object, updating Joi to require that
  • Update DatabaseAPI.generalSearch and parseSearchRequest so params is passed as JSON (not wrapped in extra braces), and use JSON print options with enums as strings and primitive fields always printed for query create/update
  • Implement queryUpdate merging semantics: if replace_query is true replace the whole query; otherwise load existing query JSON, merge incoming SearchRequest onto it (clearing repeated fields when provided), regenerate AQL, and persist the combined query
  • Wire web /api/query/update to send replaceQuery=true, add an end-to-end test test_api_query.py that exercises query create/exec/delete, and register it in CTest with fixtures
  • Adjust logging extras for query exec to avoid logging large result payloads and instead log counts and query_id
core/database/foxx/api/query_router.js
core/server/DatabaseAPI.cpp
core/server/DatabaseAPI.hpp
web/datafed-ws.js
tests/end-to-end/test_api_query.py
tests/end-to-end/CMakeLists.txt
Harden authz and repository path handling, including POSIX path normalization and resolving repo from file path.
  • Introduce Repo.resolveFromPath to normalize POSIX file paths, reject paths that change under normalization, find the best matching repo.path prefix, and wrap it in a Repo instance or throw if none match
  • Expose posix_path.normalizePOSIXPath with validation and use it in Repo.resolveFromPath to avoid directory traversal issues
  • Change authz_router to resolve repo by file path, verify the resolved repo id matches the request repo id, then compute path type; throw if the path does not belong to that repo
  • Guard repo_router logging extras with optional chaining, ensure proper success vs failure logging in alloc stats, and fix subject existence check when subject is a project id
core/database/foxx/api/repo.js
core/database/foxx/api/posix_path.js
core/database/foxx/api/authz_router.js
core/database/foxx/api/repo_router.js
Improve data, user, task, topic, metrics, ACL, and note routers for better logging, correctness, and robustness.
  • In data_router, use permissions.hasPermissions instead of g_lib.hasPermissions for record updates, fix tag diff logic to use array inclusion checks, align dependency add/remove validation to dep_add/dep_rem, fix logging variable name in batch update, initialize dep graph and path locals safely, and add robust logging around delete (including retries, summarized IDs, and retry metadata on failure)
  • In user_router, make logging extras safe with optional chaining, ensure client/user variables are declared outside transactions for logging, send responses before logging success in several endpoints, and improve token/keys/expiring token handling with clearer extra metadata and counts
  • In task_router, send empty responses after successful delete and sched/rebuild operations, fix which client id is logged for list and reload, and ensure handleException is called after logging in error paths
  • In topic_router, normalize use of local client variables, initialize results arrays, and keep consistent logging of search phrases
  • In metrics_router, initialize result/counters to safe defaults before queries
  • In acl_router, initialize rules/result arrays to avoid null assumptions
  • In note_router, declare doc separately from note to avoid accidental const reuse and log the updated note object instead of note.new on errors
  • In group_router, avoid sending a partial response in the error path for member listing
core/database/foxx/api/data_router.js
core/database/foxx/api/user_router.js
core/database/foxx/api/task_router.js
core/database/foxx/api/topic_router.js
core/database/foxx/api/metrics_router.js
core/database/foxx/api/acl_router.js
core/database/foxx/api/note_router.js
core/database/foxx/api/group_router.js
Align versioning and protocol behavior across web, core, and build, and swap the proxy implementation to the new ProxyBasicZMQ.
  • Update cmake/Version.cmake to bump release timestamp and set all component and protocol versions to the new 3.x/5.x series
  • Update web/datafed-ws.js to use new version.js constant names (RELEASE_YEAR, MAJOR, etc.), track API version (MAJOR/MINOR/PATCH), and compare against reply.apiMajor/apiMinor; update release comparison fields to camelCase names from the core
  • Convert the CoreServer and MockCoreServer to construct ServerType::PROXY_CUSTOM instead of PROXY_BASIC_ZMQ, and change ServerFactory::create to instantiate ProxyBasicZMQ for PROXY_CUSTOM
  • Improve ZeroMQCommunicator receiveBody frame-size error reporting by storing zmq_msg_size in a local variable before logging
cmake/Version.cmake
web/datafed-ws.js
core/server/CoreServer.cpp
tests/mock_core/MockCoreServer.cpp
common/source/ServerFactory.cpp
common/source/communicators/ZeroMQCommunicator.cpp
CMakeLists.txt
Enhance tests and documentation, especially for end-to-end and Playwright-based web UI testing.
  • Add Playwright usage instructions, including recommended host/Windows setup, sample Dockerfile, and config hints for headed vs headless runs to tests/end-to-end/README.md
  • Add a skeleton playwright spec testingUserPasswordChange.spec.js and clean up the default Playwright projects list in playwright.config.js by keeping only chromium
  • Add an end-to-end query API test test_api_query.py and wire it into CTest with fixtures so it runs after record tests
  • Emit a diagnostic CMake message in web UI tests CMakeLists and enable the web tests option flag in the top-level CMakeLists
  • Remove unused or obsolete CI helper configuration from .gitlab/common.yml
tests/end-to-end/README.md
tests/end-to-end/test_api_query.py
tests/end-to-end/CMakeLists.txt
tests/end-to-end/web-UI/playwright.config.js
tests/end-to-end/web-UI/CMakeLists.txt
.gitlab/common.yml
Miscellaneous fixes and small behavior tweaks across routers and utilities.
  • Guard logger extras with optional chaining in schema_router and config_router; ensure failure logs don’t dereference undefined objects
  • Fix admin_router logging to use real client id and avoid bogus execution_time_seconds in failure, defaulting to 0
  • Adjust coll_router collection create to reuse owner_id variable consistently instead of shadowing it
  • Improve JSON change detection in web/static/util.js by treating missing/falsey old values as change when new JSON is provided
  • Fix a test fixture repo id in record.test.js from repo/orange-at-com to repo/orange-at-org
  • Tidy various small logging field names and initializations (e.g., using arrays instead of null for results/rules)
core/database/foxx/api/schema_router.js
core/database/foxx/api/config_router.js
core/database/foxx/api/admin_router.js
core/database/foxx/api/coll_router.js
web/static/util.js
core/database/foxx/tests/record.test.js
core/database/foxx/api/tag_router.js

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

@JoshuaSBrown JoshuaSBrown merged commit 874878c into devel Feb 25, 2026
12 of 14 checks passed
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 security issues, 5 other issues, and left some high level feedback:

Security issues:

  • Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries. (link)
  • Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries. (link)

General comments:

  • In execQuery (query_router.js), the change from if (!query.params.cols) to if (!query.params.cols.length) will throw if query.params.cols is undefined; consider guarding with !query.params.cols || !query.params.cols.length (or defaulting it to an empty array) before accessing .length.
  • In data_router.js's /update/batch error logging, the description template uses displayedIds while the variable constructed earlier is displayIds, which will cause a ReferenceError at runtime; these names should be made consistent.
  • The message("ENABLE END TO END!!!!!!!") line in tests/end-to-end/web-UI/CMakeLists.txt looks like leftover debug output and will fire on every configure; consider removing or gating it behind a developer-only option.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `execQuery` (query_router.js), the change from `if (!query.params.cols)` to `if (!query.params.cols.length)` will throw if `query.params.cols` is undefined; consider guarding with `!query.params.cols || !query.params.cols.length` (or defaulting it to an empty array) before accessing `.length`.
- In `data_router.js`'s `/update/batch` error logging, the description template uses `displayedIds` while the variable constructed earlier is `displayIds`, which will cause a `ReferenceError` at runtime; these names should be made consistent.
- The `message("ENABLE END TO END!!!!!!!")` line in `tests/end-to-end/web-UI/CMakeLists.txt` looks like leftover debug output and will fire on every configure; consider removing or gating it behind a developer-only option.

## Individual Comments

### Comment 1
<location path="core/database/foxx/api/data_router.js" line_range="711-712" />
<code_context>
     }

-    if (record.deps != undefined && (record.deps_add != undefined || record.deps_rem != undefined))
+    if (record.deps != undefined && (record.dep_add != undefined || record.dep_rem != undefined))
         throw [error.ERR_INVALID_PARAM, "Cannot use both dependency set and add/remove."];

     var dep,
</code_context>
<issue_to_address>
**issue (bug_risk):** Inconsistent dependency field names likely break validation of mixed dep operations.

Elsewhere in the codebase these fields are consistently named `deps_add` / `deps_rem`, but this guard uses `dep_add` / `dep_rem`. As a result, requests using `deps` with `deps_add` / `deps_rem` will skip this check instead of throwing. Please either align the field names to `deps_add` / `deps_rem` here, or explicitly handle both name variants if legacy payloads must be supported.
</issue_to_address>

### Comment 2
<location path="core/database/foxx/api/data_router.js" line_range="1052-1053" />
<code_context>
                 httpVerb: "POST",
                 routePath: basePath + "/update/batch",
                 status: "Failure",
-                description: `Update a batch of existing data record. RecordIDs: ${displayIds}`,
+                description: `Update a batch of existing data record. RecordIDs: ${displayedIds}`,
                 extra: {
                     count: totalCount,
</code_context>
<issue_to_address>
**issue (bug_risk):** `displayedIds` is undefined in the error logger and will mask the original error.

The template literal now references `displayedIds`, but the variable defined earlier is `displayIds`. This will throw a `ReferenceError` when logging failures and hide the original exception. Please update the description to use the correct variable name (or rename the variable consistently).
</issue_to_address>

### Comment 3
<location path="core/database/foxx/api/group_router.js" line_range="442-443" />
<code_context>
             });
         } catch (e) {
             logger.logRequestFailure({
-                client: "N/A",
</code_context>
<issue_to_address>
**issue (bug_risk):** Error path in group listing no longer sends a response on failure.

Previously this catch block returned `groups` to the client; now no response is sent at all, so any exception will leave the HTTP request hanging until it times out. If you no longer want to return partial data, please still send an error response here (e.g., via `g_lib.handleException(e, res)` or another appropriate error handler).
</issue_to_address>

### Comment 4
<location path="core/database/foxx/api/query_router.js" line_range="669-670" />
<code_context>
+            // (joi.any()) accepted both. New documents are stored as objects
+            // (joi.object()), but old records remain until migrated.
+            // TODO: Remove after backfilling existing queries in ArangoDB.
+            if (typeof qry.params === "string") {
+                qry.params = JSON.parse(qry.params);
+            }
+
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Bare `JSON.parse` on legacy `params` string can throw and may deserve guarding.

For legacy documents you’re now calling `JSON.parse(qry.params)` without handling failures. If any stored `params` string is malformed (e.g. corrupted or from an older buggy version), this will throw a `SyntaxError` and surface as a generic error. To be more robust with historical data, consider wrapping this in a try/catch, logging the offending value (or its ID), and returning a controlled `ERR_INVALID_PARAM` (or similar) instead of letting it crash execution.

Suggested implementation:

```javascript
            // Legacy query documents may have `params` stored as a JSON string
            // rather than an object, because the original schema validation
            // (joi.any()) accepted both. New documents are stored as objects
            // (joi.object()), but old records remain until migrated.
            // TODO: Remove after backfilling existing queries in ArangoDB.
            if (typeof qry.params === "string") {
                try {
                    qry.params = JSON.parse(qry.params);
                } catch (e) {
                    // Guard against malformed legacy JSON in stored queries.
                    // Log enough context to investigate without leaking full params.
                    const queryId = req && req.queryParams && req.queryParams.id;
                    const paramsSample =
                        typeof qry.params === "string"
                            ? qry.params.slice(0, 200)
                            : undefined;

                    console.warn("Failed to parse legacy query params JSON", {
                        queryId,
                        paramsSample,
                        error: String(e),
                    });

                    const err = new Error("ERR_INVALID_PARAM: invalid legacy query params JSON");
                    err.errorNum = g_lib.ERR_INVALID_PARAM;
                    err.code = "ERR_INVALID_PARAM";
                    throw err;
                }
            }

```

If `g_lib.ERR_INVALID_PARAM` or the `err.errorNum`/`err.code` convention does not exist in your codebase, you should adapt the thrown error to your existing error-handling pattern. For example, if you have a helper like `g_lib.createError(code, message)` or a custom error class, replace the manual `Error` construction with that helper so the error surfaces as a standard validation/parameter error in your API responses.
</issue_to_address>

### Comment 5
<location path="CMakeLists.txt" line_range="37" />
<code_context>
 libraries must be shared libraries for DataFed to be interoperable. If this
 setting is turned on DataFed will build it's libraries as shared and try to
 link to shared libraries." OFF)
-OPTION(ENABLE_END_TO_END_API_TESTS "Enable end-to-end API testing" FALSE) 
</code_context>
<issue_to_address>
**issue (typo):** Use "its" (possessive) instead of "it's" in this description.

In this line, replace "it's" with the possessive "its" ("DataFed will build its libraries as shared...").

```suggestion
setting is turned on DataFed will build its libraries as shared and try to
```
</issue_to_address>

### Comment 6
<location path="core/database/foxx/api/data_router.js" line_range="2400" />
<code_context>
            description: `Attempting to delete a total of: ${req.body.ids.length}`,
</code_context>
<issue_to_address>
**security (javascript.express.security.injection.tainted-sql-string):** Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries.

*Source: opengrep*
</issue_to_address>

### Comment 7
<location path="core/database/foxx/api/data_router.js" line_range="2464" />
<code_context>
                    description: `Attempting to delete a total of: ${req.body.ids.length}`,
</code_context>
<issue_to_address>
**security (javascript.express.security.injection.tainted-sql-string):** Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries.

*Source: opengrep*
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment on lines +711 to 712
if (record.deps != undefined && (record.dep_add != undefined || record.dep_rem != undefined))
throw [error.ERR_INVALID_PARAM, "Cannot use both dependency set and add/remove."];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Inconsistent dependency field names likely break validation of mixed dep operations.

Elsewhere in the codebase these fields are consistently named deps_add / deps_rem, but this guard uses dep_add / dep_rem. As a result, requests using deps with deps_add / deps_rem will skip this check instead of throwing. Please either align the field names to deps_add / deps_rem here, or explicitly handle both name variants if legacy payloads must be supported.

Comment on lines -1052 to 1053
description: `Update a batch of existing data record. RecordIDs: ${displayIds}`,
description: `Update a batch of existing data record. RecordIDs: ${displayedIds}`,
extra: {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): displayedIds is undefined in the error logger and will mask the original error.

The template literal now references displayedIds, but the variable defined earlier is displayIds. This will throw a ReferenceError when logging failures and hide the original exception. Please update the description to use the correct variable name (or rename the variable consistently).

Comment on lines 442 to -443
} catch (e) {
res.send(groups);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (bug_risk): Error path in group listing no longer sends a response on failure.

Previously this catch block returned groups to the client; now no response is sent at all, so any exception will leave the HTTP request hanging until it times out. If you no longer want to return partial data, please still send an error response here (e.g., via g_lib.handleException(e, res) or another appropriate error handler).

Comment on lines +669 to +670
if (typeof qry.params === "string") {
qry.params = JSON.parse(qry.params);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion (bug_risk): Bare JSON.parse on legacy params string can throw and may deserve guarding.

For legacy documents you’re now calling JSON.parse(qry.params) without handling failures. If any stored params string is malformed (e.g. corrupted or from an older buggy version), this will throw a SyntaxError and surface as a generic error. To be more robust with historical data, consider wrapping this in a try/catch, logging the offending value (or its ID), and returning a controlled ERR_INVALID_PARAM (or similar) instead of letting it crash execution.

Suggested implementation:

            // Legacy query documents may have `params` stored as a JSON string
            // rather than an object, because the original schema validation
            // (joi.any()) accepted both. New documents are stored as objects
            // (joi.object()), but old records remain until migrated.
            // TODO: Remove after backfilling existing queries in ArangoDB.
            if (typeof qry.params === "string") {
                try {
                    qry.params = JSON.parse(qry.params);
                } catch (e) {
                    // Guard against malformed legacy JSON in stored queries.
                    // Log enough context to investigate without leaking full params.
                    const queryId = req && req.queryParams && req.queryParams.id;
                    const paramsSample =
                        typeof qry.params === "string"
                            ? qry.params.slice(0, 200)
                            : undefined;

                    console.warn("Failed to parse legacy query params JSON", {
                        queryId,
                        paramsSample,
                        error: String(e),
                    });

                    const err = new Error("ERR_INVALID_PARAM: invalid legacy query params JSON");
                    err.errorNum = g_lib.ERR_INVALID_PARAM;
                    err.code = "ERR_INVALID_PARAM";
                    throw err;
                }
            }

If g_lib.ERR_INVALID_PARAM or the err.errorNum/err.code convention does not exist in your codebase, you should adapt the thrown error to your existing error-handling pattern. For example, if you have a helper like g_lib.createError(code, message) or a custom error class, replace the manual Error construction with that helper so the error surfaces as a standard validation/parameter error in your API responses.

@@ -36,7 +36,7 @@ building with static depencies is not completely possible because some system
libraries must be shared libraries for DataFed to be interoperable. If this
setting is turned on DataFed will build it's libraries as shared and try to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue (typo): Use "its" (possessive) instead of "it's" in this description.

In this line, replace "it's" with the possessive "its" ("DataFed will build its libraries as shared...").

Suggested change
setting is turned on DataFed will build it's libraries as shared and try to
setting is turned on DataFed will build its libraries as shared and try to

httpVerb: "POST",
routePath: basePath + "/delete",
status: "Started",
description: `Attempting to delete a total of: ${req.body.ids.length}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (javascript.express.security.injection.tainted-sql-string): Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries.

Source: opengrep

httpVerb: "POST",
routePath: basePath + "/delete",
status: "Failure",
description: `Attempting to delete a total of: ${req.body.ids.length}`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security (javascript.express.security.injection.tainted-sql-string): Detected user input used to manually construct a SQL string. This is usually bad practice because manual construction could accidentally result in a SQL injection. An attacker could use a SQL injection to steal or modify contents of the database. Instead, use a parameterized query which is available by default in most database engines. Alternatively, consider using an object-relational mapper (ORM) such as Sequelize which will protect your queries.

Source: opengrep

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants