Conversation
…hub.com:chrisalexiuk-nvidia/llama_index into add_feature_nvidia_api_playground_connector_llm
…hub.com:chrisalexiuk-nvidia/llama_index into add_feature_nvidia_api_playground_connector_llm
| "playground_mixtral_8x7b": 32_000, | ||
| "playground_llama2_code_34b": 100_000, | ||
| "playground_steerlm_llama_70b": 3072, | ||
| } |
There was a problem hiding this comment.
models to add -
- mistralai/mistral-7b-instruct-v0.2
- mistralai/mixtral-8x7b-instruct-v0.1
- google/gemma-7b
- meta/codellama-70b
- meta/llama2-70b
- cohere/aya-101
- cohere/command-r
There was a problem hiding this comment.
@chrisalexiuk-nvidia the cohere ones aren't actually part of api catalog, let's remove them
| from openai.types.chat import ChatCompletionMessageParam | ||
| from openai.types.chat.chat_completion_message import ChatCompletionMessage | ||
|
|
||
| AI_PLAYGROUND_MODELS: Dict[str, int] = { |
| @@ -0,0 +1,3 @@ | |||
| from llama_index.llms.nvidia_ai_playground.base import NvidiaAIPlayground | |||
|
|
|||
| __all__ = ["NvidiaAIPlayground"] | |||
There was a problem hiding this comment.
let's tentatively go with NVIDIA instead of NvidiaAIPlayground
| DEFAULT_PLAYGROUND_MAX_TOKENS = 512 | ||
|
|
||
|
|
||
| class NvidiaAIPlayground(LLM): |
There was a problem hiding this comment.
chat completion models also support params -
- frequency_penalty: float -2..2
- presence_penalty: float -2..2
- seed
- stop: str | list[str]
include them as properties or require passing as kwargs
in either case, make sure there's test coverage
There was a problem hiding this comment.
Passing as kwargs tested as working, added example to "test" notebook.
|
|
||
| response = self._client.chat.completions.create(messages=message_dicts, stream=True, **all_kwargs) | ||
|
|
||
| def gen() -> ChatResponseGen: |
There was a problem hiding this comment.
why do you create gen() and return gen() instead of yielding from the loop?
There was a problem hiding this comment.
This is just the pattern found in LlamaIndex - they use this ChatResponseGen later.
(Sorry for double ping, was on my personal acct.)
…hub.com:chrisalexiuk-nvidia/llama_index into add_feature_nvidia_api_playground_connector_llm
mattf
left a comment
There was a problem hiding this comment.
how should this be extended to work with a downloaded NIM?
llama-index-integrations/llms/llama-index-llms-nvidia/llama_index/llms/nvidia/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-nvidia/llama_index/llms/nvidia/base.py
Outdated
Show resolved
Hide resolved
llama-index-integrations/llms/llama-index-llms-nvidia/llama_index/llms/nvidia/utils.py
Show resolved
Hide resolved
|
@mattf Added tests for Pytest to test for basic functionality |
I'm still not sure the ".mode()" works well with LlamaIndex - but if that's the best way to do it, in your impression, then I'll try and whip it up today. |
|
|
||
|
|
||
| def test_validates_api_key_is_present() -> None: | ||
| with CachedNVIDIApiKeys(set_fake_key=True): |
There was a problem hiding this comment.
NVIDIA() should work with no api_key or NVIDIA_API_KEY env to support users doing NVIDIA().mode("nim", base_url=...) without a key. please add a test for that case.
mattf
left a comment
There was a problem hiding this comment.
need to add support and tests for mode switching w/ NVIDIA().mode("nim", base_url=...)
Added NVIDIA Catalog Connector