Skip to content

Conversation

c-gamble
Copy link
Contributor

@c-gamble c-gamble commented Oct 9, 2025

Adds the Qwen model family hosted on the Chroma embedding service to the known embedding functions in the Python and JS/TS SDKs.

Tested by pointing chromadb to my local OSS dir + running add operations against staging with the real hosted embedding service.

Copy link
Contributor Author

c-gamble commented Oct 9, 2025

Copy link

github-actions bot commented Oct 9, 2025

Reviewer Checklist

Please leverage this checklist to ensure your code review is thorough before approving

Testing, Bugs, Errors, Logs, Documentation

  • Can you think of any use case in which the code does not behave as intended? Have they been tested?
  • Can you think of any inputs or external events that could break the code? Is user input validated and safe? Have they been tested?
  • If appropriate, are there adequate property based tests?
  • If appropriate, are there adequate unit tests?
  • Should any logging, debugging, tracing information be added or removed?
  • Are error messages user-friendly?
  • Have all documentation changes needed been made?
  • Have all non-obvious changes been commented?

System Compatibility

  • Are there any potential impacts on other parts of the system or backward compatibility?
  • Does this change intersect with any items on our roadmap, and if so, is there a plan for fitting them together?

Quality

  • Is this code of a unexpectedly high quality (Readability, Modularity, Intuitiveness)

@c-gamble c-gamble marked this pull request as ready for review October 9, 2025 22:36
Copy link
Contributor

propel-code-bot bot commented Oct 9, 2025

Add Chroma Hosted Qwen Embedding Function to SDKs

This PR introduces the ChromaCloudQwenEmbeddingFunction, which supports the Qwen model family on the Chroma hosted embedding service. The embedding function is integrated in both Python and JS/TS SDKs, featuring its own configuration schema, runtime validation, and documentation. All ecosystem build/test tooling and registration mechanisms are updated accordingly. The implementation ensures environment-variable API key usage, various configuration options (model, task, instructions), and robust test coverage. The SDKs' registry, package definitions, and shared schema utilities now include and recognize this embedding function.

Key Changes

• Introduced ChromaCloudQwenEmbeddingFunction in chromadb/utils/embedding_functions/chroma_cloud_qwen_embedding_function.py (Python) and clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/src/index.ts (TypeScript).
• Added schema file schemas/embedding_functions/chroma-cloud-qwen.json for config validation.
• Updated Python SDK: registration in chromadb/utils/embedding_functions/__init__.py, tests in chromadb/test/ef/test_ef.py, and schema tests.
• Updated JS/TS SDK: new package folder (chroma-cloud-qwen), registration in shared embedders, code, README, TypeScript config, build scripts, and tests.
• Extended all-in-one @chroma-core/all package and pnpm-lock.yaml to include new embedding function.
• Enhanced schema loader/util tooling to cover the new schema and enable runtime type safety.
• Environment variable-only API key (no config fallback). Enforced via runtime checks in both Python and JS/TS.
• Tested round-trip config serialization, error paths, API call wiring, and instructions mapping.

Affected Areas

chromadb/utils/embedding_functions/__init__.py
chromadb/utils/embedding_functions/chroma_cloud_qwen_embedding_function.py
schemas/embedding_functions/chroma-cloud-qwen.json
clients/new-js/packages/ai-embeddings/chroma-cloud-qwen/ (all new files, including source, tests, schema utils, README, build/test configs)
clients/new-js/packages/ai-embeddings/common/src/schema-utils.ts
clients/new-js/packages/ai-embeddings/all/ (package.json, src/index.ts)
clients/new-js/pnpm-lock.yaml
• Shared tests for config/schema compliance

This summary was automatically generated by @propel-code-bot

@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch 3 times, most recently from ec6cf27 to 8af298b Compare October 9, 2025 22:55
@blacksmith-sh blacksmith-sh bot deleted a comment from c-gamble Oct 9, 2025
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from 8af298b to d51c0c8 Compare October 9, 2025 22:58
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from d51c0c8 to d29b480 Compare October 9, 2025 23:12
model_id: str = "Qwen/Qwen3-Embedding-0.6B",
task: str = "code",
api_key_env_var: str = "CHROMA_API_KEY",
):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not add a query config?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ref jina embedding function for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not needed in this case

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there will only ever be 2 tasks, one for docs one for queries? just trying to future proof it

Copy link
Contributor Author

@c-gamble c-gamble Oct 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no those are targets. tasks are like NL_TO_CODE vs CODE_TO_CODE vs CODE_TO_NL. tasks will presumably be extended in the future

@c-gamble c-gamble changed the base branch from main to graphite-base/5585 October 10, 2025 19:18
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from d29b480 to 7237177 Compare October 10, 2025 19:18
@c-gamble c-gamble changed the base branch from graphite-base/5585 to cooper/is-query October 10, 2025 19:19
@kylediaz
Copy link
Contributor

In the PDR, it said that CloudEmbeddingFunction should be able to get its API key from the client using it. It doesn't look like your implementation has this feature.

@c-gamble c-gamble changed the base branch from cooper/is-query to graphite-base/5585 October 13, 2025 16:10
@blacksmith-sh blacksmith-sh bot deleted a comment from c-gamble Oct 13, 2025
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from 4752abf to d866593 Compare October 13, 2025 17:11
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from d866593 to 76cfae0 Compare October 13, 2025 17:22
@blacksmith-sh blacksmith-sh bot deleted a comment from c-gamble Oct 13, 2025
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch 2 times, most recently from 51a61ae to bc74f5c Compare October 13, 2025 17:35
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from bc74f5c to 852e13d Compare October 13, 2025 17:55
@c-gamble c-gamble requested a review from drewkim October 13, 2025 19:56
@c-gamble
Copy link
Contributor Author

In the PDR, it said that CloudEmbeddingFunction should be able to get its API key from the client using it. It doesn't look like your implementation has this feature.

imo this is quite a bit of magic and we don't have precedence for anything similar. also, the API is only available if the user is using the cloud client so we wouldn't really have a way to resolve it for local client users.

i think we can do it as a fast follow if users request it but it seems fine to rely on a single env var for both your CloudClient and EF since technically the client and EF are separate entities and you don't need a client to create an EF.

@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from 852e13d to 419aad4 Compare October 13, 2025 21:20
@blacksmith-sh blacksmith-sh bot deleted a comment from c-gamble Oct 13, 2025
@c-gamble c-gamble force-pushed the cooper/_enh_add_chroma_hosted_embedding_function branch from 419aad4 to c9fb7da Compare October 13, 2025 22:56
@blacksmith-sh blacksmith-sh bot deleted a comment from c-gamble Oct 13, 2025
…dex.ts

Co-authored-by: propel-code-bot[bot] <203372662+propel-code-bot[bot]@users.noreply.github.com>
@c-gamble c-gamble merged commit 87f3683 into main Oct 14, 2025
59 checks passed
@c-gamble c-gamble deleted the cooper/_enh_add_chroma_hosted_embedding_function branch October 14, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants