Skip to content

Conversation

@fede-kamel
Copy link
Contributor

@fede-kamel fede-kamel commented Jan 29, 2026

Extends OCIGenAIEmbeddings with image embedding support. Multimodal models like cohere.embed-v4.0 embed text and images into the same vector space, enabling cross-modal retrieval (search images with text, find text from images).

Builds on the vision utilities from #99 and #104 (data URI convention, vision.py module).

What changed:

  • vision.py gets to_data_uri() as the core encoding primitive, plus IMAGE_EMBEDDING_MODELS registry. load_image() and encode_image() now delegate to it internally.
  • New embeddings/image.py with ImageEmbeddingMixin adds embed_image/embed_images without bloating oci_generative_ai.py.
  • oci_generative_ai.py gains input_type and output_dimensions fields and stays focused on OCI client + text embedding only.
  • embed_image() / embed_images() accept file paths, raw bytes, or data URIs. Automatically sets input_type=IMAGE.
  • output_dimensions lets you configure vector size (256, 512, 1024, 1536) for embed-v4.0+.

Quick example:

from langchain_oci import OCIGenAIEmbeddings

emb = OCIGenAIEmbeddings(
    model_id="cohere.embed-v4.0",
    compartment_id="ocid1.compartment.oc1..xxx",
    service_endpoint="https://inference.generativeai.us-chicago-1.oci.oraclecloud.com",
)
text_vec = emb.embed_query("a photo of a sunset")
image_vec = emb.embed_image("sunset.jpg")

Tested against all 6 available OCI embedding models (embed-v4.0, embed-english-v3.0, embed-english-light-v3.0, embed-multilingual-v3.0, embed-multilingual-light-v3.0, embed-multilingual-image-v3.0). 38 unit tests + 43 integration tests, ruff clean.

@oracle-contributor-agreement oracle-contributor-agreement bot added the OCA Verified All contributors have signed the Oracle Contributor Agreement. label Jan 29, 2026
@fede-kamel fede-kamel force-pushed the add-multimodal-image-embedding branch 4 times, most recently from 54cd3c0 to b283161 Compare January 29, 2026 19:53
@fede-kamel
Copy link
Contributor Author

Integration Testing Update

Ran full integration tests for the image embedding feature — all 43 tests pass.

Models tested

Text embedding models (5):

  • cohere.embed-v4.0 (1536 dims)
  • cohere.embed-english-v3.0 (1024 dims)
  • cohere.embed-english-light-v3.0 (384 dims)
  • cohere.embed-multilingual-v3.0 (1024 dims)
  • cohere.embed-multilingual-light-v3.0 (384 dims)

Image embedding models (2):

  • cohere.embed-v4.0 (1536 dims)
  • cohere.embed-multilingual-image-v3.0 (1024 dims)

Test coverage

Test suite Tests Status
Image embedding (bytes, data URI, file path, batch) 5
Cross-modal similarity (text vs image) 3
Output dimensions (256, 512, 1024) 3
Input type override (search_document, search_query, classification, clustering) 4
Multi-model text embedding (dims, batch, similarity) 15
Multi-model image embedding (single, batch, cross-modal) 10
Image embedding models registry 3
Total 43 ✅ All pass

Notes

  • cohere.embed-v4.0 is not yet available in eu-frankfurt-1 (404). All tests ran against us-chicago-1.
  • Unit tests also pass (38/38).

Add to_data_uri() as core primitive for converting file paths, bytes,
or data URIs to data URI strings. Refactor load_image/encode_image to
use it. Move IMAGE_EMBEDDING_MODELS registry here alongside VISION_MODELS.
New embed_image/embed_images methods via ImageEmbeddingMixin in
embeddings/image.py. Supports file paths, raw bytes, and data URIs.
Add input_type and output_dimensions fields. Unit tests included.
- Rename embed_images() to embed_image_batch() for clarity
- Add image embedding examples to README.md (section 3b)
- Update all tests to use new method name
@fede-kamel fede-kamel force-pushed the add-multimodal-image-embedding branch from b283161 to 4be66e6 Compare February 6, 2026 01:35
@fede-kamel
Copy link
Contributor Author

@paxiaatucsdedu Addressed your review feedback:

  1. Renamed function: embed_images()embed_image_batch() for better clarity
  2. Added README examples: New section 3b with image embedding examples

Changes rebased onto latest main.

@fede-kamel
Copy link
Contributor Author

fede-kamel commented Feb 6, 2026

Integration tests passed (43/43)

Tested with real OCI GenAI API in Frankfurt region:

Image Embedding Models (multimodal):

  • cohere.embed-v4.0
  • cohere.embed-multilingual-image-v3.0

Text Embedding Models:

  • cohere.embed-v4.0
  • cohere.embed-english-v3.0
  • cohere.embed-english-light-v3.0
  • cohere.embed-multilingual-v3.0
  • cohere.embed-multilingual-light-v3.0

Confirmed working:

  • embed_image() - single image from bytes, file path, data URI
  • embed_image_batch() - multiple images in batch
  • Cross-modal similarity (image ↔ text vectors in same space)
  • Output dimensions: 256, 512, 1024, 1536
  • Input types: SEARCH_DOCUMENT, SEARCH_QUERY, CLASSIFICATION, CLUSTERING

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

OCA Verified All contributors have signed the Oracle Contributor Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants