Llama-3.1 405b with litellm proxy has issues #3880

ss-gonda · 2024-09-01T13:31:13Z

ss-gonda
Sep 1, 2024

What happened?

Deploy Llama-3.1 405b model on Vertex Model garden.
Configure all settings as per litellm and librechat.yaml
Submit a query
It just errors with "something went wrong"...

Steps to Reproduce

Deploy Llama-3.1 405b model on Vertex Model garden.
Configure all settings as per litellm and librechat.yaml
Submit a query
It just errors with "something went wrong"...

What browsers are you seeing the problem on?

No response

Relevant log output

Based on log analysis on litellm and librechat. I can say that litellm latest complains about unsuppored params:
presence_penalty
frequency_penalty

I was able to fix by some code changes and removing these params (not well structured). After that it the chat works fine but does not gives chattitles again because of above parameters.
Also, after all the fixes it keeps giving "assistant" randomly at start of response but it looks like an issue with Llama deployed model.

Screenshots

No response

Code of Conduct

I agree to follow this project's Code of Conduct

danny-avila · 2024-09-01T14:05:31Z

danny-avila
Sep 1, 2024
Maintainer

Check your terminal logs and/or the logs directory at project root.

0 replies

ss-gonda · 2024-09-03T16:44:24Z

ss-gonda
Sep 3, 2024
Author

Hi,

Running litellm locally with detailed log enabled.

This is what we get on Librechat logs:

2024-09-03 22:11:53 error: [handleAbortError] AI response error; aborting request: litellm.APIConnectionError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{
Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 772, in anext
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10776, in anext
async for chunk in self.completion_stream:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 786, in anext
raise RuntimeError(f"Error parsing chunk: {e},\nReceived chunk: {chunk}")
RuntimeError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{

And this is what the logs at litellm side gives:
22:11:17 - LiteLLM Proxy:DEBUG: proxy_server.py:2762 - litellm.proxy.proxy_server.py::startup() - CHECKING PREMIUM USER - False
22:11:17 - LiteLLM Proxy:DEBUG: litellm_license.py:74 - litellm.proxy.auth.litellm_license.py::is_premium() - ENTERING 'IS_PREMIUM' - None
22:11:17 - LiteLLM Proxy:DEBUG: litellm_license.py:83 - litellm.proxy.auth.litellm_license.py::is_premium() - Updated 'self.license_str' - None
22:11:17 - LiteLLM Proxy:DEBUG: proxy_server.py:2770 - litellm.proxy.proxy_server.py::startup() - PREMIUM USER value - False
22:11:17 - LiteLLM Proxy:DEBUG: proxy_server.py:2825 - prisma_client: None
22:11:17 - LiteLLM Proxy:DEBUG: proxy_server.py:2829 - custom_db_client client - None
22:11:17 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: daily_metrics_report_sent; local_only: False
22:11:17 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:17 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:17 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: daily_metrics_report_sent; local_only: False; value: 1725381677.111085
22:11:17 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 0
INFO: Application startup complete.
INFO: Uvicorn running on http://0.0.0.0:4000 (Press CTRL+C to quit)
22:11:42 - LiteLLM Proxy:DEBUG: user_api_key_auth.py:495 - Unable to find user in db. Error - No db connected
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: dee4cd06cf21382c3aba3b3f9ba52d533d384384059481bd9f65ad17b094b8c4; local_only: False
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: dee4cd06cf21382c3aba3b3f9ba52d533d384384059481bd9f65ad17b094b8c4; local_only: False; value: token='dee4cd06cf21382c3aba3b3f9ba52d533d384384059481bd9f65ad17b094b8c4' key_name=None key_alias=None spend=0.0 max_budget=None expires=None models=[] aliases={} config={} user_id='default_user_id' team_id=None max_parallel_requests=None metadata={} tpm_limit=None rpm_limit=None budget_duration=None budget_reset_at=None allowed_cache_controls=[] permissions={} model_spend={} model_max_budget={} soft_budget_cooldown=False litellm_budget_table=None org_id=None team_spend=None team_alias=None team_tpm_limit=None team_rpm_limit=None team_max_budget=None team_models=[] team_blocked=False soft_budget=None team_model_aliases=None team_member_spend=None team_member=None team_metadata=None end_user_id=None end_user_tpm_limit=None end_user_rpm_limit=None end_user_max_budget=None last_refreshed_at=None api_key='sk_live_SetToRandomValueabcdefgh8675309' user_role=<LitellmUserRoles.PROXY_ADMIN: 'proxy_admin'> allowed_model_region=None parent_otel_span=None rpm_limit_per_model=None tpm_limit_per_model=None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 0
22:11:42 - LiteLLM Proxy:DEBUG: proxy_server.py:3100 - Request received by LiteLLM:
{
"model": "Llama-3.1",
"temperature": 1,
"top_p": 1,
"presence_penalty": 0,
"frequency_penalty": 0,
"user": "66a8ddb8110ef4b9772f4c95",
"stream": true,
"messages": [
{
"role": "user",
"content": "2 words on cat"
}
]
}
22:11:42 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:193 - Request Headers: Headers({'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'authorization': 'Bearer sk_live_SetToRandomValueabcdefgh8675309', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'})
22:11:42 - LiteLLM Proxy:DEBUG: litellm_pre_call_utils.py:199 - receiving data: {'model': 'Llama-3.1', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'stream': True, 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'proxy_server_request': {'url': 'http://0.0.0.0:4000/v1/chat/completions', 'method': 'POST', 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'authorization': 'Bearer sk_live_SetToRandomValueabcdefgh8675309', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'body': {'model': 'Llama-3.1', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'stream': True, 'messages': [{'role': 'user', 'content': '2 words on cat'}]}}}
22:11:42 - LiteLLM Proxy:DEBUG: utils.py:79 - Inside Proxy Logging Pre-call hook!
NoneType: None

22:11:42 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - Inside Max Parallel Request Pre-Call Hook
22:11:42 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: sk_live_SetToRandomValueabcdefgh8675309::2024-09-03-22-11::request_count; local_only: False
22:11:42 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - current: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: default_user_id; local_only: False
22:11:42 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: default_user_id_user_api_key_user_id; local_only: False
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: utils.py:247 - Initialized litellm callbacks, Async Success Callbacks: [<bound method Router.deployment_callback_on_success of <litellm.router.Router object at 0x000001CFEE6AA4D0>>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x000001CFEE5D8990>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x000001CFECA3F910>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x000001CFECE87D50>, <litellm._service_logger.ServiceLogging object at 0x000001CFEE80A650>]
22:11:42 - LiteLLM:DEBUG: litellm_logging.py:215 - self.optional_params: {}
22:11:42 - LiteLLM Router:DEBUG: router.py:2614 - Inside async function with retries: args - (); kwargs - {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/v1/chat/completions', 'method': 'POST', 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'authorization': 'Bearer sk_live_SetToRandomValueabcdefgh8675309', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'body': {'model': 'Llama-3.1', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'stream': True, 'messages': [{'role': 'user', 'content': '2 words on cat'}]}}, 'metadata': {'user_api_key': 'sk_live_SetToRandomValueabcdefgh8675309', 'user_api_key_alias': None, 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.44.13', 'global_max_parallel_requests': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_org_id': None, 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'endpoint': 'http://0.0.0.0:4000/v1/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'Llama-3.1'}, 'request_timeout': 600, 'litellm_call_id': '82d2b1a7-b8d6-4960-8223-77c4e130a609', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x000001CFEC35A1D0>, 'model': 'Llama-3.1', 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'stream': True, 'original_function': <bound method Router._acompletion of <litellm.router.Router object at 0x000001CFEE6AA4D0>>, 'num_retries': 2}
22:11:42 - LiteLLM Router:DEBUG: router.py:2631 - async function w/ retries: original_function - <bound method Router._acompletion of <litellm.router.Router object at 0x000001CFEE6AA4D0>>, num_retries - 2
22:11:42 - LiteLLM Router:DEBUG: router.py:702 - Inside _acompletion()- model: Llama-3.1; kwargs: {'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'proxy_server_request': {'url': 'http://0.0.0.0:4000/v1/chat/completions', 'method': 'POST', 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'authorization': 'Bearer sk_live_SetToRandomValueabcdefgh8675309', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'body': {'model': 'Llama-3.1', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'stream': True, 'messages': [{'role': 'user', 'content': '2 words on cat'}]}}, 'metadata': {'user_api_key': 'sk_live_SetToRandomValueabcdefgh8675309', 'user_api_key_alias': None, 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.44.13', 'global_max_parallel_requests': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_org_id': None, 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'endpoint': 'http://0.0.0.0:4000/v1/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'Llama-3.1'}, 'request_timeout': 600, 'litellm_call_id': '82d2b1a7-b8d6-4960-8223-77c4e130a609', 'litellm_logging_obj': <litellm.litellm_core_utils.litellm_logging.Logging object at 0x000001CFEC35A1D0>, 'stream': True}
22:11:42 - LiteLLM:DEBUG: main.py:5133 - initial list of deployments: [{'model_name': 'Llama-3.1', 'litellm_params': {'model': 'vertex_ai/meta/llama3-405b-instruct-maas', 'vertex_ai_project': 'uit-uit-et-servicenow-ai-dev', 'vertex_ai_location': 'us-central1'}, 'model_info': {'id': 'f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98', 'db_model': False}}]
22:11:42 - LiteLLM Router:DEBUG: router.py:3390 - retrieve cooldown models: []
22:11:42 - LiteLLM Router:DEBUG: router.py:4628 - async cooldown deployments: []
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98; local_only: True
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - set cache: key: f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98; value: 1
22:11:42 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 0
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98_stream_async_client; local_only: True
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98_stream_async_client; local_only: True
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache key: f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98_max_parallel_requests_client; local_only: True
22:11:42 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:42 - LiteLLM:DEBUG: utils.py:247 -

22:11:42 - LiteLLM:DEBUG: utils.py:247 - Request to litellm:
22:11:42 - LiteLLM:DEBUG: utils.py:247 - litellm.acompletion(model='vertex_ai/meta/llama3-405b-instruct-maas', vertex_ai_project='uit-uit-et-servicenow-ai-dev', vertex_ai_location='us-central1', messages=[{'role': 'user', 'content': '2 words on cat'}], caching=False, client=None, timeout=6000, temperature=1, top_p=1, presence_penalty=0, frequency_penalty=0, user='66a8ddb8110ef4b9772f4c95', proxy_server_request={'url': 'http://0.0.0.0:4000/v1/chat/completions', 'method': 'POST', 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'authorization': 'Bearer sk_live_SetToRandomValueabcdefgh8675309', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'body': {'model': 'Llama-3.1', 'temperature': 1, 'top_p': 1, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'stream': True, 'messages': [{'role': 'user', 'content': '2 words on cat'}]}}, metadata={'user_api_key': 'sk_live_SetToRandomValueabcdefgh8675309', 'user_api_key_alias': None, 'user_api_end_user_max_budget': None, 'litellm_api_version': '1.44.13', 'global_max_parallel_requests': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_org_id': None, 'user_api_key_team_id': None, 'user_api_key_team_alias': None, 'user_api_key_team_max_budget': None, 'user_api_key_team_spend': None, 'user_api_key_spend': 0.0, 'user_api_key_max_budget': None, 'user_api_key_metadata': {}, 'headers': {'content-length': '256', 'accept': 'application/json', 'content-type': 'application/json', 'user-agent': 'OpenAI/JS 4.47.1', 'x-stainless-lang': 'js', 'x-stainless-package-version': '4.47.1', 'x-stainless-os': 'Windows', 'x-stainless-arch': 'x64', 'x-stainless-runtime': 'node', 'x-stainless-runtime-version': 'v20.16.0', 'x-stainless-helper-method': 'stream', 'accept-encoding': 'gzip,deflate', 'host': '0.0.0.0:4000', 'connection': 'keep-alive'}, 'endpoint': 'http://0.0.0.0:4000/v1/chat/completions', 'litellm_parent_otel_span': None, 'requester_ip_address': '', 'model_group': 'Llama-3.1', 'deployment': 'vertex_ai/meta/llama3-405b-instruct-maas', 'model_info': {'id': 'f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98', 'db_model': False}, 'api_base': None, 'caching_groups': None}, request_timeout=600, litellm_call_id='82d2b1a7-b8d6-4960-8223-77c4e130a609', litellm_logging_obj=<litellm.litellm_core_utils.litellm_logging.Logging object at 0x000001CFEC35A1D0>, stream=True, model_info={'id': 'f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98', 'db_model': False}, max_retries=0)
22:11:42 - LiteLLM:DEBUG: utils.py:247 -

22:11:42 - LiteLLM:DEBUG: utils.py:247 - ASYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache'): None
22:11:42 - LiteLLM:INFO: utils.py:2938 -
LiteLLM completion() model= meta/llama3-405b-instruct-maas; provider = vertex_ai
22:11:42 - LiteLLM:DEBUG: utils.py:2941 -
LiteLLM: Params passed to completion() {'model': 'meta/llama3-405b-instruct-maas', 'functions': None, 'function_call': None, 'temperature': 1, 'top_p': 1, 'n': None, 'stream': True, 'stream_options': None, 'stop': None, 'max_tokens': None, 'presence_penalty': 0, 'frequency_penalty': 0, 'logit_bias': None, 'user': '66a8ddb8110ef4b9772f4c95', 'custom_llm_provider': 'vertex_ai', 'response_format': None, 'seed': None, 'tools': None, 'tool_choice': None, 'max_retries': 0, 'logprobs': None, 'top_logprobs': None, 'extra_headers': None, 'api_version': None, 'parallel_tool_calls': None, 'drop_params': None, 'additional_drop_params': None, 'vertex_ai_project': 'uit-uit-et-servicenow-ai-dev', 'vertex_ai_location': 'us-central1'}
22:11:42 - LiteLLM:DEBUG: utils.py:2944 -
LiteLLM: Non-Default params passed to completion() {'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'user': '66a8ddb8110ef4b9772f4c95', 'max_retries': 0}
22:11:42 - LiteLLM:DEBUG: utils.py:247 - Final returned optional params: {'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'vertex_ai_project': 'uit-uit-et-servicenow-ai-dev', 'vertex_ai_location': 'us-central1'}
22:11:42 - LiteLLM:DEBUG: litellm_logging.py:215 - self.optional_params: {'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0, 'vertex_ai_project': 'uit-uit-et-servicenow-ai-dev', 'vertex_ai_location': 'us-central1'}
22:11:50 - LiteLLM:DEBUG: litellm_logging.py:298 - PRE-API-CALL ADDITIONAL ARGS: {'complete_input_dict': {'model': 'meta/llama3-405b-instruct-maas', 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0}, 'api_base': 'https://us-central1-aiplatform.googleapis.com/v1beta1/projects/uit-uit-et-servicenow-ai-dev/locations/us-central1/endpoints/openapi/chat/completions', 'headers': {'Authorization': 'Bearer ya29.c.c0ASRK0GbJBM6EqkJ3gExih67AsWL1078xLpC9duIuf107bKVt22XjuRiZrdjGQoUpebhr2-866-oN0S-hBtoqB4jFDMIuz9selPst6gZb9j95IVsflmMW80_FFCqCUz2yQsfqxGi0X7oHn6TQSazFHQOUqc_PEob5FIptTGQsgc9maSRzobCqtAGDQW1xoyyg5fhZGpm1mqAo4ytOyUZs4pLPvyR0NK2QpFIUkdEGQQiFClg7eGxQvllBk84UQ0CVZp3drKwCZIvdXXXMH6cgte47jEj9aw36VIN7GeteMvos1sXrKupljhCE_imuNYaxPfCIpxnHe9ejkdxF275KNVK_jvsLWi3U16ZF4qM51SjYzRLEIw-JexrAT385DnnmcpJppJ_0UBOR1daocn8BWm0IoIieYb0lfiztn90hI9okYzagoirxqkacgMc-zi2BzQ_-dQxbkuWZ4cMsUM_ajsbufSRQVFqrr6isvr0ozyo6jyjVgOpzSIwz4gopWddfIsR3yvgeg8VfQ3m_M_zezahV9wupFliUgQORwzMSUlxXZigtSRB0-edkS-f5IZ_-MkSYO13U7tzjI3b57epIxzqeobY1v85SdkRrZ1rg_6MlRzg3sI1ktgFhnz7WZzOBbvxIU8X0hSSx1nMaMMqX1p8ppJJr_RlSOs-v6U6pommVhVZyfruZ5gxJgMxBYfFdkoRJ44VlQhSQxt1wtwpllXp8S77523gq3tibOsQl-l7iUxzMhsMhUuBwOrramRpwh5710SwyrB8ya7vvrVJuzWI_iIZ_7XxjZ-chkpWhW7fV5bzzjW1pJ81q6QBfr-z58WZhIvsQ38FglfQBxx9W9iXkXk2a82j5yu9p8ygVjdwQWhzf3MuObRd0d58VJizw3zMawm5XsZorU0ZfRJnxjR7zmzkv3MVdoVQaBUr4bkv5k_QWtxvzonF14SFxcuyFJh7wy1_89Sh5bOYfY4FZ6z7uR9iWete_jqkUc5iz18bIzliX0Re4XmF', 'Content-Type': 'application/json'}}
22:11:50 - LiteLLM:DEBUG: utils.py:247 -

POST Request Sent from LiteLLM:
curl -X POST
https://us-central1-aiplatform.googleapis.com/v1beta1/projects/uit-uit-et-servicenow-ai-dev/locations/us-central1/endpoints/openapi/chat/completions
-H 'Authorization: Bearer ya29.c.c0ASRK0GbJBM6EqkJ3gExih67AsWL1078xLpC9duIuf107bKVt22XjuRiZrdjGQoUpebhr2-866-oN0S-hBtoqB4jFDMIuz9selPst6gZb9j95IVsflmMW80_FFCqCUz2yQsfqxGi0X7oHn6TQSazFHQOUqc_PEob5FIptTGQsgc9maSRzobCqtAGDQW1xoyyg5fhZGpm1mqAo4ytOyUZs4pLPvyR0NK2QpFIUkdEGQQiFClg7eGxQvllBk84UQ0CVZp3drKwCZIvdXXXMH6cgte47jEj9aw36VIN7GeteMvos1sXrKupljhCE_imuNYaxPfCIpxnHe9ejkdxF275KNVK_jvsLWi3U16ZF4qM51SjYzRLEIw-JexrAT385DnnmcpJppJ_0UBOR1daocn8BWm0IoIieYb0lfiztn90hI9okYzagoirxqkacgMc-zi2BzQ_-dQxbkuWZ4cMsUM_ajsbufSRQVFqrr6isvr0ozyo6jyjVgOpzSIwz4gopWddfIsR3yvgeg8VfQ3m_M_zezahV9wupFliUgQORwzMSUlxXZigtSRB0-edkS-f5IZ_-MkSYO13U7tzjI3b57epIxzqeobY1v85SdkRrZ1rg_6MlRzg3sI1ktgFhnz7WZzOBbvxIU8X0hSSx1nMaMMqX1p8ppJJr_RlSOs-v6U6pommVhVZyfruZ5gxJgMxBYfFdkoRJ44VlQhSQxt1wtwpllXp8S77523gq3tibOsQl-l7iUxzMhsMhUuBwOrramRpwh5710SwyrB8ya7vvrVJuzWI_iIZ_7XxjZ-chkpWhW7fV5bzzjW1pJ81q6QBfr-z58WZhIvsQ38FglfQBxx9W9iXkXk2a82j5yu9p8ygVjdwQWhzf3MuObRd0d58VJizw3zMawm5XsZorU0ZfRJnxjR7zmzkv3MVdoVQaBUr4bkv5k_QWtxvzonF14SFxcuyFJh7wy1_89Sh********************************************' -H 'Content-Type: *****'
-d '{'model': 'meta/llama3-405b-instruct-maas', 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0}'

22:11:50 - LiteLLM:DEBUG: main.py:5133 - makes async anthropic streaming POST request
22:11:50 - LiteLLM Router:INFO: router.py:809 - litellm.acompletion(model=vertex_ai/meta/llama3-405b-instruct-maas) 200 OK
22:11:50 - LiteLLM Router:DEBUG: router.py:2472 - Async Response: <litellm.utils.CustomStreamWrapper object at 0x000001CFEE8873D0>
INFO: 127.0.0.1:55163 - "POST /v1/chat/completions HTTP/1.1" 200 OK
22:11:50 - LiteLLM Proxy:DEBUG: proxy_server.py:2562 - inside generator
22:11:52 - LiteLLM:DEBUG: utils.py:247 - RAW RESPONSE:
<litellm.llms.databricks.ModelResponseIterator object at 0x000001CFFE699E10>

22:11:52 - LiteLLM:DEBUG: utils.py:247 - Logging Details LiteLLM-Success Call: Cache_hit=False
22:11:52 - LiteLLM Proxy:DEBUG: proxy_server.py:2566 - async_data_generator: received streaming chunk - ModelResponse(id='chatcmpl-a55fac8c-cf61-4b8a-9c9a-585c809eae15', choices=[StreamingChoices(finish_reason='stop', index=0, delta=Delta(content=None, role=None, function_call=None, tool_calls=None), logprobs=None)], created=1725381712, model='meta/llama3-405b-instruct-maas', object='chat.completion.chunk', system_fingerprint=None)
22:11:52 - LiteLLM:DEBUG: litellm_logging.py:656 - success callbacks: [<litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x000001CFEE5D8990>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x000001CFECA3F910>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x000001CFECE87D50>, <litellm._service_logger.ServiceLogging object at 0x000001CFEE80A650>]
22:11:52 - LiteLLM:DEBUG: utils.py:247 - value of async chunk: {'text': '', 'is_finished': False, 'finish_reason': '', 'usage': None, 'index': 0, 'tool_use': None}
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Goes into checking if chunk has hiddden created at param
22:11:52 - LiteLLM:DEBUG: utils.py:247 - PROCESSED ASYNC CHUNK PRE CHUNK CREATOR: {'text': '', 'is_finished': False, 'finish_reason': '', 'usage': None, 'index': 0, 'tool_use': None}
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Chunks have a created at hidden param
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Chunks sorted
22:11:52 - LiteLLM:DEBUG: utils.py:247 - token_counter messages received: [{'role': 'user', 'content': '2 words on cat'}]
22:11:52 - LiteLLM:DEBUG: utils.py:247 - model_response finish reason 3: None; response_obj={'text': '', 'is_finished': False, 'finish_reason': '', 'usage': None, 'index': 0, 'tool_use': None}
22:11:52 - LiteLLM:DEBUG: utils.py:247 - model_response.choices[0].delta: Delta(content=None, role=None, function_call=None, tool_calls=None); completion_obj: {'content': ''}
22:11:52 - LiteLLM:DEBUG: utils.py:247 - self.sent_first_chunk: False
22:11:52 - LiteLLM:DEBUG: utils.py:247 - PROCESSED ASYNC CHUNK POST CHUNK CREATOR: None
22:11:52 - LiteLLM:DEBUG: utils.py:247 - Logging Details LiteLLM-Async Success Call, cache_hit=False
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Goes into checking if chunk has hiddden created at param
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Chunks have a created at hidden param
22:11:52 - LiteLLM:DEBUG: main.py:5133 - Chunks sorted
22:11:52 - LiteLLM:DEBUG: utils.py:247 - token_counter messages received: [{'role': 'user', 'content': '2 words on cat'}]
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Token Counter - using generic token counter, for model=meta/llama3-405b-instruct-maas
22:11:53 - LiteLLM:DEBUG: utils.py:247 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Token Counter - using generic token counter, for model=meta/llama3-405b-instruct-maas
22:11:53 - LiteLLM:DEBUG: utils.py:247 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
22:11:53 - LiteLLM:DEBUG: litellm_logging.py:685 - Logging Details LiteLLM-Success Call streaming complete
22:11:53 - LiteLLM:DEBUG: cost_calculator.py:545 - completion_response response ms: 10070.687
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Looking up model=vertex_ai/meta/llama3-405b-instruct-maas in model_cost_map
22:11:53 - LiteLLM:DEBUG: litellm_logging.py:2378 - Standard Logging: created payload - payload: {'id': 'chatcmpl-a55fac8c-cf61-4b8a-9c9a-585c809eae15', 'call_type': 'acompletion', 'cache_hit': False, 'saved_cache_cost': None, 'startTime': 1725381702.495205, 'endTime': 1725381712.565892, 'completionStartTime': 1725381712.565892, 'model': 'meta/llama3-405b-instruct-maas', 'metadata': {'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_alias': None, 'requester_ip_address': ''}, 'cache_key': None, 'response_cost': 0.0, 'total_tokens': 11, 'prompt_tokens': 11, 'completion_tokens': 0, 'request_tags': [], 'end_user': '66a8ddb8110ef4b9772f4c95', 'api_base': 'https://us-central1-aiplatform.googleapis.com/v1beta1/projects/uit-uit-et-servicenow-ai-dev/locations/us-central1/endpoints/openapi/chat/completions', 'model_group': 'Llama-3.1', 'model_id': 'f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98', 'requester_ip_address': '', 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'response': {'id': 'chatcmpl-a55fac8c-cf61-4b8a-9c9a-585c809eae15', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': '', 'role': 'assistant', 'tool_calls': None, 'function_call': None}}], 'created': 1725381712, 'model': 'meta/llama3-405b-instruct-maas', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'completion_tokens': 0, 'prompt_tokens': 11, 'total_tokens': 11}}, 'model_parameters': {'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0}, 'hidden_params': {}, 'model_map_information': {'model_map_key': 'meta/llama3-405b-instruct-maas', 'model_map_value': {'key': 'vertex_ai/meta/llama3-405b-instruct-maas', 'max_tokens': 32000, 'max_input_tokens': 32000, 'max_output_tokens': 32000, 'input_cost_per_token': 0.0, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'output_cost_per_token': 0.0, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_vector_size': None, 'litellm_provider': 'vertex_ai-llama_models', 'mode': 'chat', 'supported_openai_params': ['frequency_penalty', 'logit_bias', 'logprobs', 'top_logprobs', 'max_tokens', 'n', 'presence_penalty', 'seed', 'stop', 'stream', 'stream_options', 'temperature', 'top_p', 'tools', 'tool_choice', 'function_call', 'functions', 'max_retries', 'extra_headers', 'parallel_tool_calls', 'response_format', 'user'], 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': False, 'supports_function_calling': False, 'supports_assistant_prefill': False}}}

22:11:53 - LiteLLM:DEBUG: utils.py:247 - Token Counter - using generic token counter, for model=meta/llama3-405b-instruct-maas
22:11:53 - LiteLLM:DEBUG: utils.py:247 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Token Counter - using generic token counter, for model=meta/llama3-405b-instruct-maas
22:11:53 - LiteLLM:DEBUG: utils.py:247 - LiteLLM: Utils - Counting tokens for OpenAI model=gpt-3.5-turbo
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Async success callbacks: Got a complete streaming response
22:11:53 - LiteLLM:DEBUG: cost_calculator.py:545 - completion_response response ms: 10276.973
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Looking up model=vertex_ai/meta/llama3-405b-instruct-maas in model_cost_map
22:11:53 - LiteLLM:DEBUG: litellm_logging.py:1311 - Model=meta/llama3-405b-instruct-maas; cost=0.0
22:11:53 - LiteLLM:DEBUG: litellm_logging.py:2378 - Standard Logging: created payload - payload: {'id': 'chatcmpl-a55fac8c-cf61-4b8a-9c9a-585c809eae15', 'call_type': 'acompletion', 'cache_hit': False, 'saved_cache_cost': None, 'startTime': 1725381702.495205, 'endTime': 1725381712.772178, 'completionStartTime': 1725381712.565892, 'model': 'meta/llama3-405b-instruct-maas', 'metadata': {'user_api_key_alias': None, 'user_api_key_team_id': None, 'user_api_key_user_id': 'default_user_id', 'user_api_key_team_alias': None, 'requester_ip_address': ''}, 'cache_key': None, 'response_cost': 0.0, 'total_tokens': 11, 'prompt_tokens': 11, 'completion_tokens': 0, 'request_tags': [], 'end_user': '66a8ddb8110ef4b9772f4c95', 'api_base': 'https://us-central1-aiplatform.googleapis.com/v1beta1/projects/uit-uit-et-servicenow-ai-dev/locations/us-central1/endpoints/openapi/chat/completions', 'model_group': 'Llama-3.1', 'model_id': 'f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98', 'requester_ip_address': '', 'messages': [{'role': 'user', 'content': '2 words on cat'}], 'response': {'id': 'chatcmpl-a55fac8c-cf61-4b8a-9c9a-585c809eae15', 'choices': [{'finish_reason': 'stop', 'index': 0, 'message': {'content': '', 'role': 'assistant', 'tool_calls': None, 'function_call': None}}], 'created': 1725381712, 'model': 'meta/llama3-405b-instruct-maas', 'object': 'chat.completion', 'system_fingerprint': None, 'usage': {'completion_tokens': 0, 'prompt_tokens': 11, 'total_tokens': 11}}, 'model_parameters': {'temperature': 1, 'top_p': 1, 'stream': True, 'presence_penalty': 0, 'frequency_penalty': 0, 'max_retries': 0}, 'hidden_params': {}, 'model_map_information': {'model_map_key': 'meta/llama3-405b-instruct-maas', 'model_map_value': {'key': 'vertex_ai/meta/llama3-405b-instruct-maas', 'max_tokens': 32000, 'max_input_tokens': 32000, 'max_output_tokens': 32000, 'input_cost_per_token': 0.0, 'input_cost_per_character': None, 'input_cost_per_token_above_128k_tokens': None, 'output_cost_per_token': 0.0, 'output_cost_per_character': None, 'output_cost_per_token_above_128k_tokens': None, 'output_cost_per_character_above_128k_tokens': None, 'output_vector_size': None, 'litellm_provider': 'vertex_ai-llama_models', 'mode': 'chat', 'supported_openai_params': ['frequency_penalty', 'logit_bias', 'logprobs', 'top_logprobs', 'max_tokens', 'n', 'presence_penalty', 'seed', 'stop', 'stream', 'stream_options', 'temperature', 'top_p', 'tools', 'tool_choice', 'function_call', 'functions', 'max_retries', 'extra_headers', 'parallel_tool_calls', 'response_format', 'user'], 'supports_system_messages': None, 'supports_response_schema': None, 'supports_vision': False, 'supports_function_calling': False, 'supports_assistant_prefill': False}}}

22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 1
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - INSIDE parallel request limiter ASYNC SUCCESS LOGGING
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: sk_live_SetToRandomValueabcdefgh8675309::2024-09-03-22-11::request_count; local_only: False
22:11:53 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:53 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - updated_value in success call: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}, precise_minute: 2024-09-03-22-11
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: sk_live_SetToRandomValueabcdefgh8675309::2024-09-03-22-11::request_count; local_only: False; value: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 1
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: default_user_id::2024-09-03-22-11::request_count; local_only: False
22:11:53 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:53 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - updated_value in success call: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}, precise_minute: 2024-09-03-22-11
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: default_user_id::2024-09-03-22-11::request_count; local_only: False; value: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 2
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: 66a8ddb8110ef4b9772f4c95::2024-09-03-22-11::request_count; local_only: False
22:11:53 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: None
22:11:53 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: None
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - updated_value in success call: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}, precise_minute: 2024-09-03-22-11
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: 66a8ddb8110ef4b9772f4c95::2024-09-03-22-11::request_count; local_only: False; value: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 3
22:11:53 - LiteLLM:DEBUG: litellm_logging.py:1558 - Logging Details LiteLLM-Failure Call: [<bound method Router.deployment_callback_on_failure of <litellm.router.Router object at 0x000001CFEE6AA4D0>>, <litellm.proxy.hooks.parallel_request_limiter._PROXY_MaxParallelRequestsHandler object at 0x000001CFEE5D8990>, <litellm.proxy.hooks.max_budget_limiter._PROXY_MaxBudgetLimiter object at 0x000001CFECA3F910>, <litellm.proxy.hooks.cache_control_check._PROXY_CacheControlCheck object at 0x000001CFECE87D50>, <litellm.service_logger.ServiceLogging object at 0x000001CFEE80A650>]
22:11:53 - LiteLLM:DEBUG: utils.py:4136 - Error occurred in getting api base - litellm.BadRequestError: LLM Provider NOT provided. Pass in the LLM provider you are trying to call. You passed model=meta/llama3-405b-instruct-maas
Pass model as E.g. For 'Huggingface' inference endpoints pass in completion(model='huggingface/starcoder',..) Learn more: https://docs.litellm.ai/docs/providers
22:11:53 - LiteLLM Router:DEBUG: router.py:3328 - Attempting to add f9e1687e9f4128a2c678f25558b0fc580699931b920d5da18967a6aa71e9ff98 to cooldown list. updated_fails: 1; self.allowed_fails: 3
22:11:53 - LiteLLM Router:DEBUG: router.py:3339 - Unable to cast exception status to int . Defaulting to status=500.
22:11:53 - LiteLLM:DEBUG: utils.py:247 - Logging Details: logger_fn - None | callable(logger_fn) - False
22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 0
22:11:53 - LiteLLM Proxy:ERROR: proxy_server.py:2586 - litellm.proxy.proxy_server.async_data_generator(): Exception occured - litellm.APIConnectionError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{
Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 772, in anext
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json_init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10776, in anext
async for chunk in self.completion_stream:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 786, in anext
raise RuntimeError(f"Error parsing chunk: {e},\nReceived chunk: {chunk}")
RuntimeError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{
Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 772, in anext
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10776, in anext
async for chunk in self.completion_stream:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 786, in anext
raise RuntimeError(f"Error parsing chunk: {e},\nReceived chunk: {chunk}")
RuntimeError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\proxy\proxy_server.py", line 2565, in async_data_generator
async for chunk in response:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10982, in anext
raise exception_type(
^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 8538, in exception_type
raise e
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 8510, in exception_type
raise APIConnectionError(
litellm.exceptions.APIConnectionError: litellm.APIConnectionError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{
Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 772, in anext
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10776, in anext
async for chunk in self.completion_stream:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 786, in anext
raise RuntimeError(f"Error parsing chunk: {e},\nReceived chunk: {chunk}")
RuntimeError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{

22:11:53 - LiteLLM Proxy:DEBUG: proxy_server.py:2596 - An error occurred: litellm.APIConnectionError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{
Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 772, in anext
json_chunk = json.loads(chunk)
^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json_init_.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\json\decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 3 (char 2)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\utils.py", line 10776, in anext
async for chunk in self.completion_stream:
File "C:\Users\siddh\AppData\Local\Programs\Python\Python311\Lib\site-packages\litellm\llms\databricks.py", line 786, in anext
raise RuntimeError(f"Error parsing chunk: {e},\nReceived chunk: {chunk}")
RuntimeError: Error parsing chunk: Expecting property name enclosed in double quotes: line 1 column 3 (char 2),
Received chunk: [{

Debug this by setting --debug, e.g. litellm --model gpt-3.5-turbo --debug
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - Inside Max Parallel Request Failure Hook
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - user_api_key: sk_live_SetToRandomValueabcdefgh8675309
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async get cache: cache key: sk_live_SetToRandomValueabcdefgh8675309::2024-09-03-22-11::request_count; local_only: False
22:11:53 - LiteLLM:DEBUG: caching.py:33 - in_memory_result: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - get cache: cache result: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM Proxy:DEBUG: parallel_request_limiter.py:28 - updated_value in failure call: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - async set cache: cache key: sk_live_SetToRandomValueabcdefgh8675309::2024-09-03-22-11::request_count; local_only: False; value: {'current_requests': 0, 'current_tpm': 22, 'current_rpm': 2}
22:11:53 - LiteLLM:DEBUG: caching.py:33 - InMemoryCache: set_cache. current size= 4

0 replies

ishaan-jaff · 2024-09-24T02:14:09Z

ishaan-jaff
Sep 24, 2024

Hi @ss-gonda , thanks for using LiteLLM. Any chance we can hop on a call to learn how we can improve LiteLLM Proxy for you ?

We’re planning roadmap and I’d love to get your feedback

my cal for your convenience: https://calendly.com/d/4mp-gd3-k5k/litellm-1-1-onboarding-chat
my linkedin if you prefer DMs: https://www.linkedin.com/in/reffajnaahsi/

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Llama-3.1 405b with litellm proxy has issues #3880

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Llama-3.1 405b with litellm proxy has issues #3880

Uh oh!

ss-gonda Sep 1, 2024

What happened?

Steps to Reproduce

What browsers are you seeing the problem on?

Relevant log output

Screenshots

Code of Conduct

Replies: 3 comments

Uh oh!

danny-avila Sep 1, 2024 Maintainer

Uh oh!

ss-gonda Sep 3, 2024 Author

Uh oh!

ishaan-jaff Sep 24, 2024

ss-gonda
Sep 1, 2024

danny-avila
Sep 1, 2024
Maintainer

ss-gonda
Sep 3, 2024
Author

ishaan-jaff
Sep 24, 2024