Replies: 1 comment 8 replies
-
|
You need mount your local model file via volume mount, so Docker can see it. |
Beta Was this translation helpful? Give feedback.
8 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I pre-download an embedding model named "moka-ai/m3e-base" from
https://huggingface.co/moka-ai/m3e-base.The tree look like this:
The
m3e-base.yamlcontent is:Use
latest-aio-gpu-nvidia-cuda-12dock image to start service, and try to test this embedding model but failed.The debug info as following:
11:06AM DBG Request received: {"model":"m3e-base","language":"","n":0,"top_p":null,"top_k":null,"temperature":null,"max_tokens":null,"echo":false,"batch":0,"ignore_eos":false,"repeat_penalty":0,"n_keep":0,"frequency_penalty":0,"presence_penalty":0,"tfz":null,"typical_p":null,"seed":null,"negative_prompt":"","rope_freq_base":0,"rope_freq_scale":0,"negative_prompt_scale":0,"use_fast_tokenizer":false,"clip_skip":0,"tokenizer":"","file":"","response_format":{},"size":"","prompt":null,"instruction":"","input":"Your text string goes here","stop":null,"messages":null,"functions":null,"function_call":null,"stream":false,"mode":0,"step":0,"grammar":"","grammar_json_functions":null,"grammar_json_name":null,"backend":"","model_base_name":""} 11:06AM DBG Parameter Config: &{PredictionOptions:{Model:m3e-base Language: N:0 TopP:0xc0009a1b80 TopK:0xc0009a1b88 Temperature:0xc0009a1b90 Maxtokens:0xc0009a1bc0 Echo:false Batch:0 IgnoreEOS:false RepeatPenalty:0 Keep:0 FrequencyPenalty:0 PresencePenalty:0 TFZ:0xc0009a1bb8 TypicalP:0xc0009a1bb0 Seed:0xc0009a1bd8 NegativePrompt: RopeFreqBase:0 RopeFreqScale:0 NegativePromptScale:0 UseFastTokenizer:false ClipSkip:0 Tokenizer:} Name:m3e-base F16:0xc0009a1b78 Threads:0xc0009a1b70 Debug:0xc0005b04a0 Roles:map[] Embeddings:true Backend:sentencetransformers TemplateConfig:{Chat: ChatMessage: Completion: Edit: Functions: UseTokenizerTemplate:false JoinChatMessagesByCharacter:<nil>} PromptStrings:[] InputStrings:[Your text string goes here] InputToken:[] functionCallString: functionCallNameString: FunctionsConfig:{DisableNoAction:false GrammarConfig:{ParallelCalls:false DisableParallelNewLines:false MixedMode:false NoMixedFreeString:false NoGrammar:false Prefix:} NoActionFunctionName: NoActionDescriptionName: ResponseRegex: JSONRegexMatch:[] ReplaceFunctionResults:[] ReplaceLLMResult:[] FunctionName:false} FeatureFlag:map[] LLMConfig:{SystemPrompt: TensorSplit: MainGPU: RMSNormEps:0 NGQA:0 PromptCachePath: PromptCacheAll:false PromptCacheRO:false MirostatETA:0xc0009a1ba8 MirostatTAU:0xc0009a1ba0 Mirostat:0xc0009a1b98 NGPULayers:0xc0009a1bc8 MMap:0xc0009a1bd0 MMlock:0xc0009a1bd1 LowVRAM:0xc0009a1bd1 Grammar: StopWords:[] Cutstrings:[] TrimSpace:[] TrimSuffix:[] ContextSize:0xc0009a1b68 NUMA:false LoraAdapter: LoraBase: LoraScale:0 NoMulMatQ:false DraftModel: NDraft:0 Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: FlashAttention:false NoKVOffloading:false RopeScaling: ModelType: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0} AutoGPTQ:{ModelBaseName: Device: Triton:false UseFastTokenizer:false} Diffusers:{CUDA:false PipelineType: SchedulerType: EnableParameters: CFGScale:0 IMG2IMG:false ClipSkip:0 ClipModel: ClipSubFolder: ControlNet:} Step:0 GRPC:{Attempts:0 AttemptsSleepTime:0} VallE:{AudioPath:} CUDA:false DownloadFiles:[] Description: Usage:} 11:06AM INF Loading model 'm3e-base' with backend sentencetransformers 11:06AM DBG Loading model in memory from file: /models/m3e-base 11:06AM DBG Loading Model m3e-base with gRPC (file: /models/m3e-base) (backend: sentencetransformers): {backendString:sentencetransformers model:m3e-base threads:4 assetDir:/tmp/localai/backend_data context:{emptyCtx:{}} gRPCOptions:0xc000238488 externalBackends:map[autogptq:/build/backend/python/autogptq/run.sh bark:/build/backend/python/bark/run.sh coqui:/build/backend/python/coqui/run.sh diffusers:/build/backend/python/diffusers/run.sh exllama:/build/backend/python/exllama/run.sh exllama2:/build/backend/python/exllama2/run.sh huggingface-embeddings:/build/backend/python/sentencetransformers/run.sh mamba:/build/backend/python/mamba/run.sh openvoice:/build/backend/python/openvoice/run.sh parler-tts:/build/backend/python/parler-tts/run.sh petals:/build/backend/python/petals/run.sh rerankers:/build/backend/python/rerankers/run.sh sentencetransformers:/build/backend/python/sentencetransformers/run.sh transformers:/build/backend/python/transformers/run.sh transformers-musicgen:/build/backend/python/transformers-musicgen/run.sh vall-e-x:/build/backend/python/vall-e-x/run.sh vllm:/build/backend/python/vllm/run.sh] grpcAttempts:20 grpcAttemptsDelay:2 singleActiveBackend:false parallelRequests:false} 11:06AM DBG Loading external backend: /build/backend/python/sentencetransformers/run.sh 11:06AM DBG Loading GRPC Process: /build/backend/python/sentencetransformers/run.sh 11:06AM DBG GRPC Service for m3e-base will be running at: '127.0.0.1:43023' 11:06AM DBG GRPC Service state dir: /tmp/go-processmanager2532182622 11:06AM DBG GRPC Service Started 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout Initializing libbackend for build 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout virtualenv activated 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stdout activated virtualenv has been ensured 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead. 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr warnings.warn( 11:06AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr Server started. Listening on: 127.0.0.1:43023 11:06AM DBG GRPC Service Ready 11:06AM DBG GRPC: Loading model with options: {state:{NoUnkeyedLiterals:{} DoNotCompare:[] DoNotCopy:[] atomicMessageInfo:<nil>} sizeCache:0 unknownFields:[] Model:m3e-base ContextSize:512 Seed:1507385333 NBatch:512 F16Memory:false MLock:false MMap:true VocabOnly:false LowVRAM:false Embeddings:true NUMA:false NGPULayers:99999999 MainGPU: TensorSplit: Threads:4 LibrarySearchPath: RopeFreqBase:0 RopeFreqScale:0 RMSNormEps:0 NGQA:0 ModelFile:/models/m3e-base Device: UseTriton:false ModelBaseName: UseFastTokenizer:false PipelineType: SchedulerType: CUDA:false CFGScale:0 IMG2IMG:false CLIPModel: CLIPSubfolder: CLIPSkip:0 ControlNet: Tokenizer: LoraBase: LoraAdapter: LoraScale:0 NoMulMatQ:false DraftModel: AudioPath: Quantization: GPUMemoryUtilization:0 TrustRemoteCode:false EnforceEager:false SwapSpace:0 MaxModelLen:0 TensorParallelSize:0 MMProj: RopeScaling: YarnExtFactor:0 YarnAttnFactor:0 YarnBetaFast:0 YarnBetaSlow:0 Type: FlashAttention:false NoKVOffload:false} 11:06AM INF Success ip=127.0.0.1 latency="38.789µs" method=GET status=200 url=/readyz 11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr No sentence-transformers model found with name sentence-transformers/m3e-base. Creating a new one with MEAN pooling. 11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr /build/backend/python/sentencetransformers/venv/lib/python3.10/site-packages/huggingface_hub/file_download.py:1132: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`. 11:07AM DBG GRPC(m3e-base-127.0.0.1:43023): stderr warnings.warn( 11:07AM ERR Server error error="could not load model (no success): Unexpected err=OSError(\"We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like sentence-transformers/m3e-base is not the path to a directory containing a file named config.json.\\nCheckout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'.\"), type(err)=<class 'OSError'>"How to config sentencetransformers to use these offline model files?
Beta Was this translation helpful? Give feedback.
All reactions