Skip to content

Commit 23e48ef

Browse files
committed
initial implementation
- allow loading any model from hf and any lora adapter - support caching - support batchs - support optimized attention - add dynamic temperature
1 parent 476719c commit 23e48ef

File tree

2 files changed

+1159
-1
lines changed

2 files changed

+1159
-1
lines changed

optillm.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,8 +43,13 @@
4343

4444
def get_config():
4545
API_KEY = None
46+
if os.environ.get("OPTILLM_API_KEY"):
47+
# Use local inference engine
48+
from optillm.inference import create_inference_client
49+
API_KEY = os.environ.get("OPTILLM_API_KEY")
50+
default_client = create_inference_client()
4651
# OpenAI, Azure, or LiteLLM API configuration
47-
if os.environ.get("OPENAI_API_KEY"):
52+
elif os.environ.get("OPENAI_API_KEY"):
4853
API_KEY = os.environ.get("OPENAI_API_KEY")
4954
base_url = server_config['base_url']
5055
if base_url != "":

0 commit comments

Comments
 (0)