Add default Rhesis embedding model and standardize terminology#1355
Add default Rhesis embedding model and standardize terminology#1355EmanueleDeRossi1 wants to merge 23 commits intomainfrom
Conversation
There was a problem hiding this comment.
Critical issues to address before merge
-
Backend
model_typevalues are inconsistent ("language"vs"language_model") across DB enum/migration, API schemas, and connection testing. This will cause 422s and/or make model connection tests fail. -
SDK factory removed
DEFAULT_PROVIDERbut tests (and likely external users) still import it. -
SDK embedding factory default provider changed to
rhesis, but tests still assert OpenAI is used on the default path.
Found 6 issues (6 critical).
apps/backend/src/rhesis/backend/app/services/model_connection.py
Outdated
Show resolved
Hide resolved
|
@EmanueleDeRossi1 yep, confirmed — the backend API schema/routers now consistently use Remaining blockers from my earlier notes still look outstanding in this draft:
|
|
@EmanueleDeRossi1 confirmed — the backend is now consistently using The other two blockers I flagged still remain (missing |
|
@EmanueleDeRossi1 confirmed — with 561d187 the The two remaining draft blockers from my earlier notes still apply: |
|
@EmanueleDeRossi1 confirmed — after 46fed56 I don’t see any remaining One related follow-up: |
- Rename get_model() → get_language_model() and get_embedder() → get_embedding_model() - Rename ModelConfig → LanguageModelConfig and EmbedderConfig → EmbeddingModelConfig - Keep deprecated aliases for backward compatibility
- Renamed get_model() to get_language_model() across SDK - Renamed DEFAULT_MODEL_NAME to DEFAULT_LANGUAGE_MODEL_NAME in all providers - Renamed PROVIDER_REGISTRY to LANGUAGE_MODEL_PROVIDER_REGISTRY
Rename DEFAULT_GENERATION_MODEL → DEFAULT_LANGUAGE_MODEL_PROVIDER and DEFAULT_MODEL_NAME → DEFAULT_LANGUAGE_MODEL_NAME across all services
- rename model_type to purpose in _get_user_model and related functions to avoid confusion between model_type terminology (which refers to whether model is either language/embedding model)
Add Rhesis as the default embedding model provider, following the same pattern as the language model: Backend changes: - Update constants to use consistent naming (DEFAULT_EMBEDDING_MODEL_PROVIDER) - Create default Rhesis embedding model during organization initialization - Store both language_model_id and embedding_model_id in user settings - Update generate/embedding endpoint to use new constants SDK changes: - Implement complete RhesisEmbedder class with generate() and generate_batch() - Add factory function for Rhesis embedding model - Register "rhesis" provider in EMBEDDING_MODEL_REGISTRY - Update DEFAULT_EMBEDDING_MODEL_PROVIDER from "openai" to "rhesis" This enables users to use Rhesis-hosted embeddings by default while still allowing custom embedding model configuration.
- use correct import (DEFAULT_LANGUAGE_MODEL_PROVIDER) in tests - remove unused aliases (DEFAULT_MODELS, DEFAULT_PROVIDER)
e87e8d9 to
d410840
Compare
…_NAME in infrastructure, docs and github workflow files
…xisting organizations
Summary
This PR standardizes the terminology across the codebase from inconsistent "LLM/llm" naming to clear "language model" and "embedding model" terminology, and adds support for a default Rhesis-hosted embedding model.
Key Changes
1. Terminology Standardization
get_model()→get_language_model()in SDK for claritymodel_typeenum value fromllm→languageDEFAULT_GENERATION_MODEL→DEFAULT_LANGUAGE_MODEL_PROVIDER2. Default Rhesis Embedding Model
RhesisEmbedderclass in SDK with full implementationlanguage_model_idandembedding_model_idin user settingsRHESIS_API_KEYandRHESIS_BASE_URLenvironment variables3. Database Migration
model_typevalues from "llm" to "language"❗ Breaking Changes
Environment variable names change:
New variables to add: