Skip to content

Commit 5e4e2ed

Browse files
authored
Add encodings folder to azure-ai-resources (#34228)
* add encodings * include encodings folder * use local embeddings * use local cache dirs * ignore unknown words in encodings files
1 parent 84f4f2e commit 5e4e2ed

File tree

8 files changed

+150268
-4
lines changed

8 files changed

+150268
-4
lines changed

sdk/ai/azure-ai-generative/azure/ai/generative/index/_embeddings/openai.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,9 +235,11 @@ def _embed(self, texts: List[str]) -> List[List[float]]:
235235
"""Embed the given texts."""
236236
import numpy as np
237237
import tiktoken
238+
from azure.ai.generative.index._utils.tokens import tiktoken_cache_dir
238239

239240
try:
240-
encoding = tiktoken.encoding_for_model(self.model)
241+
with tiktoken_cache_dir():
242+
encoding = tiktoken.encoding_for_model(self.model)
241243
except KeyError:
242244
logger.warning("Warning: model not found. Using cl100k_base encoding.")
243245
model = "cl100k_base"

sdk/ai/azure-ai-resources/MANIFEST.in

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,4 +4,5 @@ include *.md
44
include azure/__init__.py
55
include azure/ai/__init__.py
66
include azure/ai/resources/py.typed
7+
include azure/ai/resources/_index/_utils/encodings/*
78
include azure/ai/common/operations/component-configs/*

sdk/ai/azure-ai-resources/azure/ai/resources/_index/_embeddings/openai.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -222,9 +222,11 @@ def _embed(self, texts: List[str]) -> List[List[float]]:
222222
"""Embed the given texts."""
223223
import numpy as np
224224
import tiktoken
225+
from azure.ai.resources._index._utils.tokens import tiktoken_cache_dir
225226

226227
try:
227-
encoding = tiktoken.encoding_for_model(self.model)
228+
with tiktoken_cache_dir():
229+
encoding = tiktoken.encoding_for_model(self.model)
228230
except KeyError:
229231
logger.warning("Warning: model not found. Using cl100k_base encoding.")
230232
model = "cl100k_base"

sdk/ai/azure-ai-resources/azure/ai/resources/_index/_utils/encodings/6c7ea1a7e38e3a7f062df639a5b80947f075ffe6

Lines changed: 1 addition & 0 deletions
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)