Skip to content

Add default Rhesis embedding model and standardize terminology#1355

Draft
EmanueleDeRossi1 wants to merge 23 commits intomainfrom
feature/user-default-embedding-model
Draft

Add default Rhesis embedding model and standardize terminology#1355
EmanueleDeRossi1 wants to merge 23 commits intomainfrom
feature/user-default-embedding-model

Conversation

@EmanueleDeRossi1
Copy link
Collaborator

@EmanueleDeRossi1 EmanueleDeRossi1 commented Feb 16, 2026

Summary

This PR standardizes the terminology across the codebase from inconsistent "LLM/llm" naming to clear "language model" and "embedding model" terminology, and adds support for a default Rhesis-hosted embedding model.

Key Changes

1. Terminology Standardization

  • Renamed get_model()get_language_model() in SDK for clarity
  • Changed model_type enum value from llmlanguage
  • Updated all variable names from DEFAULT_GENERATION_MODELDEFAULT_LANGUAGE_MODEL_PROVIDER
  • Consistent use of "language model" and "embedding model" throughout codebase
  • Updated 73 files across backend, frontend, SDK, and tests

2. Default Rhesis Embedding Model

  • Added RhesisEmbedder class in SDK with full implementation
  • New endpoint: /generate/embedding (same behavior as /generate/content)
  • Registered "rhesis" provider in embedding model factory
  • Changed default embedding provider from OpenAI to Rhesis
  • Backend now creates Rhesis Default Embedding model during organization initialization
  • Stores both language_model_id and embedding_model_id in user settings
  • Uses RHESIS_API_KEY and RHESIS_BASE_URL environment variables

3. Database Migration

  • Added migration to update existing model_type values from "llm" to "language"

Breaking Changes

Environment variable names change:

  • DEFAULT_GENERATION_MODEL → DEFAULT_LANGUAGE_MODEL_PROVIDER
  • DEFAULT_MODEL_NAME -> DEFAULT_LANGUAGE_MODEL_NAME

New variables to add:

  • DEFAULT_EMBEDDING_MODEL_PROVIDER
  • DEFAULT_EMBEDDING_MODEL_NAME

@EmanueleDeRossi1 EmanueleDeRossi1 changed the title refactor: standardize model terminology and add default Rhesis embedding model Add default Rhesis embedding model and standardize terminology Feb 16, 2026
Copy link

@peqy peqy bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Critical issues to address before merge

  1. Backend model_type values are inconsistent ("language" vs "language_model") across DB enum/migration, API schemas, and connection testing. This will cause 422s and/or make model connection tests fail.

  2. SDK factory removed DEFAULT_PROVIDER but tests (and likely external users) still import it.

  3. SDK embedding factory default provider changed to rhesis, but tests still assert OpenAI is used on the default path.

Found 6 issues (6 critical).

@EmanueleDeRossi1 EmanueleDeRossi1 marked this pull request as draft February 16, 2026 11:30
@peqy
Copy link

peqy bot commented Feb 16, 2026

@EmanueleDeRossi1 yep, confirmed — the backend API schema/routers now consistently use model_type="language" (instead of language_model), which should unblock the earlier 422/“invalid model type” issues.

Remaining blockers from my earlier notes still look outstanding in this draft:

  • sdk.models.factory still doesn’t define DEFAULT_PROVIDER (tests/imports still reference it)
  • get_embedding_model() now defaults to rhesis, but the default-path unit tests still patch/assert OpenAIEmbedder (and will fail unless updated/mocked)

@peqy
Copy link

peqy bot commented Feb 16, 2026

@EmanueleDeRossi1 confirmed — the backend is now consistently using model_type="language" (schema + router fallback + connection service), so that earlier mismatch should be resolved.

The other two blockers I flagged still remain (missing DEFAULT_PROVIDER backward-compat alias in sdk.models.factory, and the embedding-factory tests still expecting OpenAI on the default path).

@peqy
Copy link

peqy bot commented Feb 16, 2026

@EmanueleDeRossi1 confirmed — with 561d187 the model_type handling in ModelConnectionService is now consistently "language", so the earlier “invalid model type” path should be resolved.

The two remaining draft blockers from my earlier notes still apply: sdk.models.factory missing the DEFAULT_PROVIDER backward-compat alias, and the embedding factory tests still expecting OpenAI on the default path even though get_embedding_model() now defaults to rhesis.

@peqy
Copy link

peqy bot commented Feb 16, 2026

@EmanueleDeRossi1 confirmed — after 46fed56 I don’t see any remaining DEFAULT_PROVIDER imports/usages in the repo, so this is resolved on our side.

One related follow-up: get_embedding_model() still documents the default as OpenAI and tests/sdk/models/test_embedder_factory.py::test_get_embedding_model_minimal_defaults still patches OpenAIEmbedder, but the default provider is now rhesis (DEFAULT_EMBEDDING_MODEL_PROVIDER). Likely needs aligning to avoid CI failures.

- Rename get_model() → get_language_model() and get_embedder() → get_embedding_model()
- Rename ModelConfig → LanguageModelConfig and EmbedderConfig → EmbeddingModelConfig
- Keep deprecated aliases for backward compatibility
- Renamed get_model() to get_language_model() across SDK
- Renamed DEFAULT_MODEL_NAME to DEFAULT_LANGUAGE_MODEL_NAME in all providers
- Renamed PROVIDER_REGISTRY to LANGUAGE_MODEL_PROVIDER_REGISTRY
Rename DEFAULT_GENERATION_MODEL → DEFAULT_LANGUAGE_MODEL_PROVIDER and DEFAULT_MODEL_NAME → DEFAULT_LANGUAGE_MODEL_NAME across all services
- rename model_type to purpose in _get_user_model and related functions to avoid confusion between model_type terminology (which refers to whether model is either language/embedding model)
Add Rhesis as the default embedding model provider, following the same  pattern as the language model:

Backend changes:
- Update constants to use consistent naming (DEFAULT_EMBEDDING_MODEL_PROVIDER)
- Create default Rhesis embedding model during organization initialization
- Store both language_model_id and embedding_model_id in user settings
- Update generate/embedding endpoint to use new constants

SDK changes:
- Implement complete RhesisEmbedder class with generate() and generate_batch()
- Add factory function for Rhesis embedding model
- Register "rhesis" provider in EMBEDDING_MODEL_REGISTRY
- Update DEFAULT_EMBEDDING_MODEL_PROVIDER from "openai" to "rhesis"

This enables users to use Rhesis-hosted embeddings by default while still allowing custom embedding model configuration.
- use correct import (DEFAULT_LANGUAGE_MODEL_PROVIDER) in tests
- remove unused aliases (DEFAULT_MODELS, DEFAULT_PROVIDER)
@EmanueleDeRossi1 EmanueleDeRossi1 force-pushed the feature/user-default-embedding-model branch from e87e8d9 to d410840 Compare February 16, 2026 12:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant