Skip to content
This repository was archived by the owner on Jul 22, 2025. It is now read-only.

Commit 6c17d42

Browse files
committed
Add support for mxbai-embed-xsmall-v1 embedding model
1 parent 0abd4b1 commit 6c17d42

File tree

5 files changed

+30764
-0
lines changed

5 files changed

+30764
-0
lines changed

config/settings.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -308,6 +308,7 @@ discourse_ai:
308308
- text-embedding-3-small
309309
- text-embedding-3-large
310310
- multilingual-e5-large
311+
- mxbai-embed-xsmall-v1
311312
- bge-large-en
312313
- gemini
313314
- bge-m3

lib/embeddings/vector_representations/base.rb

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ def find_representation(model_name)
1414
DiscourseAi::Embeddings::VectorRepresentations::BgeM3,
1515
DiscourseAi::Embeddings::VectorRepresentations::Gemini,
1616
DiscourseAi::Embeddings::VectorRepresentations::MultilingualE5Large,
17+
DiscourseAi::Embeddings::VectorRepresentations::MxbaiEmbedXsmallV1,
1718
DiscourseAi::Embeddings::VectorRepresentations::TextEmbedding3Large,
1819
DiscourseAi::Embeddings::VectorRepresentations::TextEmbedding3Small,
1920
DiscourseAi::Embeddings::VectorRepresentations::TextEmbeddingAda002,
Lines changed: 64 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,64 @@
1+
# frozen_string_literal: true
2+
3+
module DiscourseAi
4+
module Embeddings
5+
module VectorRepresentations
6+
class MxbaiEmbedXsmallV1 < Base
7+
class << self
8+
def name
9+
"mxbai-embed-xsmall-v1"
10+
end
11+
12+
def correctly_configured?
13+
SiteSetting.ai_embeddings_discourse_service_api_endpoint_srv.present? ||
14+
SiteSetting.ai_embeddings_discourse_service_api_endpoint.present?
15+
end
16+
17+
def dependant_setting_names
18+
%w[
19+
ai_embeddings_discourse_service_api_key
20+
ai_embeddings_discourse_service_api_endpoint_srv
21+
ai_embeddings_discourse_service_api_endpoint
22+
]
23+
end
24+
end
25+
26+
def vector_from(text, asymetric: false)
27+
inference_client.perform!(text)
28+
end
29+
30+
def dimensions
31+
384
32+
end
33+
34+
def max_sequence_length
35+
512
36+
end
37+
38+
def id
39+
1
40+
end
41+
42+
def version
43+
1
44+
end
45+
46+
def pg_function
47+
"<#>"
48+
end
49+
50+
def pg_index_type
51+
"halfvec_ip_ops"
52+
end
53+
54+
def tokenizer
55+
DiscourseAi::Tokenizer::MxbaiEmbedXsmallV1Tokenizer
56+
end
57+
58+
def inference_client
59+
DiscourseAi::Inference::DiscourseClassifier.instance(self.class.name)
60+
end
61+
end
62+
end
63+
end
64+
end
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
# frozen_string_literal: true
2+
3+
module DiscourseAi
4+
module Tokenizer
5+
class MxbaiEmbedXsmallV1Tokenizer < BasicTokenizer
6+
def self.tokenizer
7+
@@tokenizer ||=
8+
Tokenizers.from_file("./plugins/discourse-ai/tokenizers/mxbai-embed-xsmall-v1.json")
9+
end
10+
end
11+
end
12+
end

0 commit comments

Comments
 (0)