How to use offline embedding model? #2622

mijq · 2024-06-21T11:08:38Z

mijq
Jun 21, 2024

I pre-download an embedding model named "moka-ai/m3e-base" from https://huggingface.co/moka-ai/m3e-base.
The tree look like this:

├── m3e-base
│   ├── 1_Pooling
│   │   └── config.json
│   ├── README.md
│   ├── config.json
│   ├── model.safetensors
│   ├── modules.json
│   ├── pytorch_model.bin
│   ├── sentence_bert_config.json
│   ├── special_tokens_map.json
│   ├── tokenizer.json
│   ├── tokenizer_config.json
│   └── vocab.txt
├── m3e-base.yaml

The m3e-base.yaml content is:

name: m3e-base
backend: sentencetransformers
embeddings: true
parameters:
  model: m3e-base

Use latest-aio-gpu-nvidia-cuda-12 dock image to start service, and try to test this embedding model but failed.

$ docker run --rm -ti --gpus all -p 28080:8080 -e DEBUG=true -e MODELS_PATH=/models -e THREADS=4 -v ~/localai/models:/models quay.io/go-skynet/local-ai:latest-aio-gpu-nvidia-cuda-12
....

$ curl http://localhost:28080/embeddings -X POST -H "Content-Type: application/json" -d '{
  "input": "Your text string goes here",
  "model": "m3e-base"
}' 

{"error":{"code":500,"message":"could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/m3e-base is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=\u003cclass 'OSError'\u003e","type":""}}

The debug info as following:

11:06AM DBG Request received: {"model":"m3e-base","language":"","n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":"Your text string goes here","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""}
11:06AM DBG Parameter Config: &{PredictionOptions:{Model:m3e-base Language: N:0 TopP:0xc0009a1b80 TopK:0xc0009a1b88 Temperature:0xc0009a1b90 Maxtokens:0xc0009a1bc0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc0009a1bb8 TypicalP:0xc0009a1bb0 Seed:0xc0009a1bd8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:m3e-base F16:0xc0009a1b78 Threads:0xc0009a1b70 Debug:0xc0005b04a0 Roles:map[] Embeddings:true Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil>} PromptStrings:[] InputStrings:[Your text string goes here] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex: JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc0009a1ba8 MirostatTAU:0xc0009a1ba0 Mirostat:0xc0009a1b98 NGPULayers:0xc0009a1bc8 MMap:0xc0009a1bd0 MMlock:0xc0009a1bd1 LowVRAM:0xc0009a1bd1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0009a1b68 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:}
11:06AM INF Loading model 'm3e-base' with backend sentencetransformers
11:06AM DBG Loading model in memory from file: /models/m3e-base
11:06AM DBG Loading Model m3e-base with gRPC (file: /models/m3e-base) (backend: sentencetransformers): {backendString:sentencetransformers model:m3e-base threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000238488 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false}
11:06AM DBG Loading external backend: /build/backend/python/sentencetransformers/run.sh
11:06AM DBG Loading GRPC Process: /build/backend/python/sentencetransformers/run.sh
11:06AM DBG GRPC Service for m3e-base will be running at: '127.0.0.1:43023'
11:06AM DBG GRPC Service state dir: /tmp/go-processmanager2532182622
11:06AM DBG GRPC Service Started
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout Initializing libbackend for build
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout virtualenv activated
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout activated virtualenv has been ensured
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr   warnings.warn(
11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr Server started. Listening on: 127.0.0.1:43023
11:06AM DBG GRPC Service Ready
11:06AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:m3e-base ContextSize:512 Seed:1507385333 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/m3e-base Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false}
11:06AM INF Success ip=127.0.0.1 latency="38.789µs" method=GET status=200 url=/readyz
11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr No sentence-transformers model found with name sentence-transformers/m3e-base. Creating a new one with MEAN pooling.
11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr   warnings.warn(
11:07AM ERR Server error error="could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/m3e-base is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=<class 'OSError'>"

How to config sentencetransformers to use these offline model files?

thiner · 2024-07-17T09:11:14Z

thiner
Jul 17, 2024

You need mount your local model file via volume mount, so Docker can see it.

8 replies

zenyanbo Aug 5, 2024

Surprise! I still block on it. How did you do it?

First, push your model to huggingface, such as huggingface-cli upload or ref.
Second, configure .yaml. Refer for #1743. It's ok

thiner Aug 7, 2024

Surprise! I still block on it. How did you do it?

First, push your model to huggingface, such as huggingface-cli upload or ref. Second, configure .yaml. Refer for #1743. It's ok

But the OP wants to load offline model...

thiner Aug 7, 2024

I checked backend/python/sentencetransformers/backend.py code.

model_name = request.Model
        try:
            self.model = SentenceTransformer(model_name)

It reads the request.Model directly and then passes SentenceTransformer which supports local path as model name. So you should check your model parameter value in your yaml file, it should be an absolute path inside container points to the model file. From your original post, it could be somewhere like /models/m3e-base.

jermeyhu Aug 19, 2024

目录结构

/
    builds
        models
            m3e-base

配置

name: m3e-base
backend: sentencetransformers
embeddings: true
parameters:
  model: ./models/m3e-base

aef5748 Sep 5, 2024

Hi @jermeyhu
I tried to load local models, but it not working.
I get error message as below:

Server error error="could not load model (no success): Unexpected err=ValueError('The provided pretrained_model_name_or_path \"./models/DreamShaper_8_pruned.safetensors\" is neither a valid local path nor a valid repo id. Please check the parameter.')

Could you help me to check my config?
PS. I have tried to disable LOCALAI_MODELS_PATH but it still not working

docker-compose.yaml

...
environment:
      - LOCALAI_MODELS_PATH=/models
...

The folder tree:

/ 
   build
   models
        DreamShaper_8_pruned.safetensors

model.yaml

backend: diffusers
diffusers:
  cuda: true
  enable_parameters: negative_prompt,num_inference_steps
  pipeline_type: StableDiffusionPipeline
  scheduler_type: k_dpmpp_2m
f16: true
name: dreamshaper
parameters:
  model: ./models/DreamShaper_8_pruned.safetensors
 # or model: /models/DreamShaper_8_pruned.safetensors
step: 25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

How to use offline embedding model? #2622

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 8 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Uh oh!

How to use offline embedding model? #2622

Uh oh!

mijq Jun 21, 2024

Replies: 1 comment · 8 replies

Uh oh!

thiner Jul 17, 2024

Uh oh!

zenyanbo Aug 5, 2024

Uh oh!

thiner Aug 7, 2024

Uh oh!

thiner Aug 7, 2024

Uh oh!

Uh oh!

jermeyhu Aug 19, 2024

Uh oh!

Uh oh!

aef5748 Sep 5, 2024

mijq
Jun 21, 2024

Replies: 1 comment 8 replies

thiner
Jul 17, 2024