Skip to content

Conversation

@abdurrahmanbutler
Copy link

Proposed Changes

Feature: Add Isaacus embedders to Haystack core integrations.

  • New integration package at integrations/isaacus/:
    • src/haystack_integrations/components/embedders/isaacus/__init__.py
    • src/haystack_integrations/components/embedders/isaacus/text_embedder.py
      • IsaacusTextEmbedder (query embeddings; returns {"embedding": List[float]})
    • src/haystack_integrations/components/embedders/isaacus/document_embedder.py
      • IsaacusDocumentEmbedder (document embeddings; writes to document.embedding, returns {"documents": List[Document]})
    • src/haystack_integrations/components/embedders/isaacus/utils.py (minimal HTTP client)
    • tests/test_isaacus_embedder.py
    • README.md, CHANGELOG.md, pyproject.toml

Highlights

  • Configurable model (defaults to "kanon-2-embedder"), dimensions, overflow_strategy, and batch_size.
  • Follows Haystack 2.0 component contract (@component, run, @component.output_types).
  • Scope limited to integrations/isaacus/**; no other files touched.

How did you test it?

Unit tests (HTTP mocked)

  • From integrations/isaacus/: hatch run test:all - all green.

Manual verification (live API)

  • Installed editable from repo root: pip install -e integrations/isaacus
  • Ran a smoke test:
    • Embedded two Documents with IsaacusDocumentEmbedder(model="kanon-2-embedder")
    • Queried with IsaacusTextEmbedder + InMemoryEmbeddingRetriever
    • Observed non-zero scores and expected embedding dimension.

Notes for reviewer

Isaacus recently launched the Kanon 2 Embedder, an English language legal specialised embedding model that is the best performing model on the Massive Legal Embedding Benchmark.

We are submitting this pull request to extend support for the model to Haystack via an integration of our API and model.

Usage Example

from haystack import Pipeline, Document
from haystack.document_stores.in_memory import InMemoryDocumentStore
from haystack.components.retrievers.in_memory import InMemoryEmbeddingRetriever
from haystack.utils import Secret
from haystack_integrations.components.embedders.isaacus import (IsaacusTextEmbedder, IsaacusDocumentEmbedder)

store = InMemoryDocumentStore(embedding_similarity_function="dot_product")
embedder = IsaacusDocumentEmbedder(
    api_key=Secret.from_env_var("ISAACUS_API_KEY"),
    model="kanon-2-embedder",          # choose any supported Isaacus embedding model
    # dimensions=1792,                 # optionally set to match your vector DB
)

raw_docs = [Document(content="Isaacus releases Kanon 2 Embedder: the world's best legal embedding model.")]
store.write_documents(embedder.run(raw_docs)["documents"])

pipe = Pipeline()
pipe.add_component("q", IsaacusTextEmbedder(
    api_key=Secret.from_env_var("ISAACUS_API_KEY"),
    model="kanon-2-embedder",
))
pipe.add_component("ret", InMemoryEmbeddingRetriever(document_store=store))
pipe.connect("q.embedding", "ret.query_embedding")

print(pipe.run({"q": {"text": "Who built Kanon 2 Embedder?"}}))

Checklist

@abdurrahmanbutler abdurrahmanbutler requested a review from a team as a code owner October 21, 2025 06:18
@abdurrahmanbutler abdurrahmanbutler requested review from Amnah199 and removed request for a team October 21, 2025 06:18
@CLAassistant
Copy link

CLAassistant commented Oct 21, 2025

CLA assistant check
All committers have signed the CLA.

@github-actions github-actions bot added the type:documentation Improvements or additions to documentation label Oct 21, 2025
@sjrl sjrl requested review from sjrl and removed request for Amnah199 October 21, 2025 06:20
@sjrl sjrl self-assigned this Oct 21, 2025
@sjrl
Copy link
Contributor

sjrl commented Oct 21, 2025

Hi @abdurrahmanbutler, thanks a lot for your interest in integrating Isaacus with Haystack! It’s great to see your focus on retrieval for the legal domain, which is also very relevant to our users. For example, we’ve collaborated with MANZ on a similar use case: https://www.deepset.ai/case-studies/manz.

For the integration, we’d suggest hosting it in your own repository. This way, you’ll fully own the code and can move at your own speed without depending on our review and release cycles. We’ll still be happy to provide feedback, and we’ll list your integration on our integrations page (https://haystack.deepset.ai/integrations) to ensure it gets visibility within the community.

Let us know if you’d like any guidance or examples on how to set this up.

@abdurrahmanbutler
Copy link
Author

For the integration, we’d suggest hosting it in your own repository. This way, you’ll fully own the code and can move at your own speed without depending on our review and release cycles. We’ll still be happy to provide feedback, and we’ll list your integration on our integrations page (https://haystack.deepset.ai/integrations) to ensure it gets visibility within the community.

Thanks for the feedback! I've migrated the integration to https://github.com/isaacus-dev/isaacus-haystack and published to PyPi. Do I need to do a PR to the Haystack integrations page to add a readme and metadata or is the repo sufficient?

@sjrl
Copy link
Contributor

sjrl commented Oct 22, 2025

@abdurrahmanbutler that's great!

I'd recommend opening a PR in https://haystack.deepset.ai/integrations following the contributions guide here https://github.com/deepset-ai/haystack-integrations?tab=readme-ov-file#how-to-contribute

@abdurrahmanbutler
Copy link
Author

@abdurrahmanbutler that's great!

I'd recommend opening a PR in https://haystack.deepset.ai/integrations following the contributions guide here https://github.com/deepset-ai/haystack-integrations?tab=readme-ov-file#how-to-contribute

I've added a pr to the integrations repo! Thanks for the advice.

Link below:
deepset-ai/haystack-integrations#366

@sjrl
Copy link
Contributor

sjrl commented Oct 23, 2025

@abdurrahmanbutler great! I'll go ahead and close this PR then and someone from the team should be able to review your other PR soon.

@sjrl sjrl closed this Oct 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type:documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants