[ENH]: Add API endpoints for Task management #5579

tanujnay112 · 2025-10-08T19:15:14Z

Description of changes

Summarize the changes made by this PR.
This is a duplicate of PR 5547 as that branch got corrupted.

Improvements & Bug fixes
- Added HTTP routes and client changes to add and remove tasks.
New functionality
- ...

Test plan

How are these changes tested?

Tests pass locally with pytest for python, yarn test for js, cargo test for rust

Migration plan

Are there any migrations, or any forwards/backwards compatibility changes needed in order to make sure this change deploys reliably?

Observability plan

What is the plan to instrument and monitor this change?

Documentation Changes

Are all docstrings for user-facing APIs updated if required? Do we need to make documentation changes in the _docs section?_

tanujnay112 · 2025-10-08T19:15:33Z

[ENH]: Add API endpoints for Task management #5579 👈 (View in Graphite)
main

This stack of pull requests is managed by Graphite. Learn more about stacking.

github-actions · 2025-10-08T19:15:36Z

propel-code-bot · 2025-10-08T19:18:04Z

Add API Endpoints for Task Management (Create/Remove Task)

This pull request introduces new API endpoints and supporting infrastructure for registering and removing asynchronous tasks that process collections in Chroma. It defines the client, HTTP, Rust, and Go backend interfaces for task creation and task deletion, including support in the protobuf definitions and coordinator layer. The implementation makes these endpoints available in both the Rust frontend and Python/JS clients, adds validation, and ensures correct propagation of errors. Extensive integration and property-based Python tests demonstrate correct task registration, error handling (e.g., duplicate tasks, bad operators), and removal logic.

Key Changes

• Added new REST endpoints and handlers on /api/v2/.../collections/{collection_id}/tasks/{create,delete} for task creation and task removal in rust/frontend/src/server.rs.
• Introduced CreateTaskRequest, RemoveTaskRequest, CreateTaskResponse, RemoveTaskResponse, and matching Rust, Python, and TypeScript models.
• Extended the OpenAPI/Typescript client with new task-related endpoints and type definitions.
• Updated Python client and Collection object to provide create_task and remove_task methods; documented and validated all arguments.
• Implemented sysdb gRPC, Go backend logic, and end-to-end database persistence for task metadata, parameters (as JSON/struct), and error propagation.
• Round-tripped operator parameters using protobuf Struct for operator-specific configuration, with validation and error handling.
• Added property-based and distributed integration tests to validate task creation, duplicate task prevention, operator existence checks, multiple task logic, and edge cases.
• Bumped Rust, Go, and protobuf dependencies where needed and updated codegen.
• Added config option to control minimal record threshold for task triggering.

Affected Areas

• idl/chromadb/proto/coordinator.proto (proto/model updates)
• rust/frontend/src/server.rs (routing, handler, OpenAPI)
• rust/frontend/src/impls/service_based_frontend.rs (endpoint logic, sysdb bridge)
• rust/sysdb/src/sysdb.rs and related code (task persistence, gRPC handling)
• chromadb/api/models/Collection.py and chromadb/api/fastapi.py, chromadb/api/__init__.py (Python client/task API)
• chromadb/test/distributed/test_task_api.py (tests)
• clients/new-js/packages/chromadb/src/api/* (JS SDK/OpenAPI updates)
• go/pkg/sysdb/coordinator/task.go and related Go code (backend logic)

This summary was automatically generated by @propel-code-bot

chromadb/api/models/Collection.py

rust/frontend/src/server.rs

rust/sysdb/src/sysdb.rs

rust/frontend/src/impls/service_based_frontend.rs

go/pkg/sysdb/coordinator/task.go

rust/types/src/task.rs

rust/sysdb/src/sysdb.rs

rescrv · 2025-10-10T16:04:02Z

chromadb/api/models/Collection.py

                Search, Key, K, Knn, Val
            )
-            
+


Are these from your editor? Or did you run the python formatter? I'm always suspicious of whitespace changes that would imply the formatter has changed or was not run.

editor, left them in because i thought it was good to get rid of trailing whitespace

rescrv · 2025-10-10T16:04:44Z

chromadb/api/models/Collection.py

+            task_name: Unique name for this task instance
+            operator_name: Built-in operator name (e.g., "record_counter")
+            output_collection_name: Name of the collection where task output will be stored
+            params: Optional dictionary with operator-specific parameters


Is this blob just passed in to the operator as e.g. a JSON value? How does the operator get these?

They are stored in the task definition as a JSON string that a TaskRunner receives and passes into the operator it executes for that Task.

rescrv · 2025-10-10T16:06:02Z

chromadb/api/fastapi.py

+        """Register a recurring task on a collection."""
+        import json
+
+        params_str = json.dumps(params) if params is not None else None


Why nested JSON? An alternative that seems cleaner (but may have a con you've thought of) would be to make params a JSON object (it has to be anyway to round trip through JSON).

rescrv · 2025-10-10T16:09:11Z

chromadb/test/distributed/test_task_api.py

+from chromadb.errors import ChromaError, NotFoundError
+
+
+def test_task_create_and_remove(basic_http_client: System) -> None:


This test would pass if I changed the impls of create task and remove task to simply return static objects, right? Can we assert anything about the task existing and then not existing?

rescrv · 2025-10-10T16:10:12Z

chromadb/test/distributed/test_task_api.py

+        )
+
+
+def test_task_multiple_collections(basic_http_client: System) -> None:


What about the adjacent: test_task_multiple_tasks that tries to register multiple tasks on the same collection?

rescrv · 2025-10-10T16:16:45Z

rust/frontend/src/impls/service_based_frontend.rs

+        }: CreateTaskRequest,
+    ) -> Result<CreateTaskResponse, AddTaskError> {
+        // TODO: Make min_records_for_task configurable
+        const DEFAULT_MIN_RECORDS_FOR_TASK: u64 = 100;


Can we do this in this PR? It's just a few lines.

rescrv · 2025-10-10T16:23:03Z

rust/sysdb/src/sysdb.rs

+        // This is a client-side parsing issue, not a creation failure.
+        let task_id =
+            chroma_types::TaskUuid(uuid::Uuid::parse_str(&response.task_id).map_err(|e| {
+                CreateTaskError::FailedToCreateTask(tonic::Status::internal(format!(


It feels very awkward to me that we're returning a TypedError that throws away all typing to wrap a tonic::Status that formats a string.

What about:

enum CreateTaskError { NaturalLanguageReasonForFailure { task_id: Uuid, } }

As written it's a create task error (so it failed) where the reason it failed is because it failed.

rust/sysdb/src/sysdb.rs

rescrv · 2025-10-10T16:26:15Z

rust/types/src/api_types.rs

+    pub task_name: String,
+    pub operator_name: String,
+    pub output_collection_name: String,
+    pub params: Option<String>,


Following up on my other point: What if this were a serde_json::Value rather than a string for the user to parse?

propel-code-bot · 2025-10-11T00:48:44Z

rust/sysdb/src/sysdb.rs

+        let response = self.client.create_task(req).await?.into_inner();
+
+        // Parse the returned task_id - this should always succeed since the server generated it
+        // If this fails, it indicates a serious server bug or protocol corruption
+        let task_id = chroma_types::TaskUuid(
+            uuid::Uuid::parse_str(&response.task_id).map_err(|e| {
+                tracing::error!(
+                    task_id = %response.task_id,
+                    error = %e,
+                    "Server returned invalid task_id UUID - task was created but response is corrupt"
+                );
+                CreateTaskError::ServerReturnedInvalidData
+            })?,
+        );
+
+        Ok(task_id)
+    }


[BestPractice]

The current error handling for create_task doesn't correctly propagate the AlreadyExists error code from the gRPC service. The ? operator on line 1682 will convert any gRPC error status, including AlreadyExists, into a generic CreateTaskError::FailedToCreateTask. This causes the ServiceBasedFrontend to misinterpret it as an internal error instead of a specific AlreadyExists condition.

This is inconsistent with how get_task_by_name and delete_task_by_name handle specific error codes like NotFound. To fix this, you should match on the result of the gRPC call and explicitly handle the AlreadyExists status code.

Suggested Change

Suggested change

let response = self.client.create_task(req).await?.into_inner();

// Parse the returned task_id - this should always succeed since the server generated it

// If this fails, it indicates a serious server bug or protocol corruption

let task_id = chroma_types::TaskUuid(

uuid::Uuid::parse_str(&response.task_id).map_err(|e| {

tracing::error!(

task_id = %response.task_id,

error = %e,

"Server returned invalid task_id UUID - task was created but response is corrupt"

);

CreateTaskError::ServerReturnedInvalidData

})?,

);

Ok(task_id)

}

let response = self.client.create_task(req).await;

match response {

Ok(resp) => {

let inner = resp.into_inner();

// Parse the returned task_id - this should always succeed since the server generated it

// If this fails, it indicates a serious server bug or protocol corruption

let task_id = chroma_types::TaskUuid(

uuid::Uuid::parse_str(&inner.task_id).map_err(|e| {

tracing::error!(

task_id = %inner.task_id,

error = %e,

"Server returned invalid task_id UUID - task was created but response is corrupt"

);

CreateTaskError::ServerReturnedInvalidData

})?,

);

Ok(task_id)

}

Err(status) => {

if status.code() == tonic::Code::AlreadyExists {

Err(CreateTaskError::AlreadyExists)

} else {

Err(CreateTaskError::FailedToCreateTask(status))

}

}

}

⚡ Committable suggestion

Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation.

Context for Agents

[**BestPractice**] The current error handling for `create_task` doesn't correctly propagate the `AlreadyExists` error code from the gRPC service. The `?` operator on line 1682 will convert any gRPC error status, including `AlreadyExists`, into a generic `CreateTaskError::FailedToCreateTask`. This causes the `ServiceBasedFrontend` to misinterpret it as an internal error instead of a specific `AlreadyExists` condition. This is inconsistent with how `get_task_by_name` and `delete_task_by_name` handle specific error codes like `NotFound`. To fix this, you should match on the result of the gRPC call and explicitly handle the `AlreadyExists` status code. <details> <summary>Suggested Change</summary> ```suggestion let response = self.client.create_task(req).await; match response { Ok(resp) => { let inner = resp.into_inner(); // Parse the returned task_id - this should always succeed since the server generated it // If this fails, it indicates a serious server bug or protocol corruption let task_id = chroma_types::TaskUuid( uuid::Uuid::parse_str(&inner.task_id).map_err(|e| { tracing::error!( task_id = %inner.task_id, error = %e, "Server returned invalid task_id UUID - task was created but response is corrupt" ); CreateTaskError::ServerReturnedInvalidData })?, ); Ok(task_id) } Err(status) => { if status.code() == tonic::Code::AlreadyExists { Err(CreateTaskError::AlreadyExists) } else { Err(CreateTaskError::FailedToCreateTask(status)) } } } ``` ⚡ **Committable suggestion** Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. </details> File: rust/sysdb/src/sysdb.rs Line: 1698

propel-code-bot · 2025-10-11T08:54:30Z

rust/sysdb/src/sysdb.rs

+        serde_json::Value::Null => Kind::NullValue(0),
+        serde_json::Value::Bool(b) => Kind::BoolValue(b),
+        serde_json::Value::Number(n) => {
+            if let Some(f) = n.as_f64() {
+                Kind::NumberValue(f)
+            } else {


[CriticalError]

Critical Bug: Potential task creation with corrupted parameters

The JSON-to-protobuf conversion in json_to_prost_value() has a silent failure case:

serde_json::Value::Number(n) => { if let Some(f) = n.as_f64() { Kind::NumberValue(f) } else { Kind::NullValue(0) // Silent data corruption! } }

If a JSON number can't be converted to f64 (e.g., very large integers), it silently becomes null instead of failing. This could cause tasks to be created with corrupted parameters, leading to runtime failures.

Fix: Return an error or use a more robust conversion:

serde_json::Value::Number(n) => { Kind::NumberValue(n.as_f64().ok_or_else(|| ConversionError::InvalidNumber(n.to_string()))?) }

Context for Agents

[**CriticalError**] **Critical Bug: Potential task creation with corrupted parameters** The JSON-to-protobuf conversion in `json_to_prost_value()` has a silent failure case: ```rust serde_json::Value::Number(n) => { if let Some(f) = n.as_f64() { Kind::NumberValue(f) } else { Kind::NullValue(0) // Silent data corruption! } } ``` If a JSON number can't be converted to f64 (e.g., very large integers), it silently becomes null instead of failing. This could cause tasks to be created with corrupted parameters, leading to runtime failures. **Fix**: Return an error or use a more robust conversion: ```rust serde_json::Value::Number(n) => { Kind::NumberValue(n.as_f64().ok_or_else(|| ConversionError::InvalidNumber(n.to_string()))?) } ``` File: rust/sysdb/src/sysdb.rs Line: 51

rust/frontend/src/impls/service_based_frontend.rs

propel-code-bot · 2025-10-11T08:54:31Z

go/pkg/sysdb/coordinator/task.go

 		if existingTask != nil {
-			log.Info("CreateTask: task already exists, returning existing")
-			taskID = existingTask.ID
-			return nil
+			log.Error("CreateTask: task already exists", zap.String("task_name", req.Name))
+			return common.ErrTaskAlreadyExists


[BestPractice]

Logic Bug: Incorrect parameter handling

The CreateTask function has a logic error where it changes from returning existing tasks to rejecting duplicates, but the error handling pattern suggests it should still handle the case where a task might legitimately already exist:

if existingTask != nil { log.Error("CreateTask: task already exists", zap.String("task_name", req.Name)) return common.ErrTaskAlreadyExists // Always errors now }

This change breaks idempotency. If this is intentional, the API should be documented as non-idempotent. If not, the original logic should be preserved for idempotent task creation.

Context for Agents

[**BestPractice**] **Logic Bug: Incorrect parameter handling** The `CreateTask` function has a logic error where it changes from returning existing tasks to rejecting duplicates, but the error handling pattern suggests it should still handle the case where a task might legitimately already exist: ```go if existingTask != nil { log.Error("CreateTask: task already exists", zap.String("task_name", req.Name)) return common.ErrTaskAlreadyExists // Always errors now } ``` This change breaks idempotency. If this is intentional, the API should be documented as non-idempotent. If not, the original logic should be preserved for idempotent task creation. File: go/pkg/sysdb/coordinator/task.go Line: 42

chromadb/test/distributed/test_task_api.py

propel-code-bot · 2025-10-11T08:54:33Z

go/pkg/sysdb/grpc/task_service.go


 	res, err := s.coordinator.CreateTask(ctx, req)
 	if err != nil {
 		log.Error("CreateTask failed", zap.Error(err))


[BestPractice]

For improved debuggability, consider adding more context to the error logs in this file. Including the task name when an operation fails can be very helpful. This pattern could be applied to GetTaskByName and DeleteTask as well.

For example:

Context for Agents

[**BestPractice**] For improved debuggability, consider adding more context to the error logs in this file. Including the task name when an operation fails can be very helpful. This pattern could be applied to `GetTaskByName` and `DeleteTask` as well. For example: File: go/pkg/sysdb/grpc/task_service.go Line: 18

tanujnay112 changed the title ~~[ENH]: Add API endpoints for create/remove tasks~~ [ENH]: Add API endpoints for Task management Oct 8, 2025

tanujnay112 marked this pull request as ready for review October 8, 2025 19:17