Skip to content

Commit f4e2870

Browse files
Merge pull request #14532 from timelfrink/feat/issue-14476-compactifai-provider
Add CompactifAI provider support
2 parents e368a08 + afd720a commit f4e2870

File tree

12 files changed

+708
-0
lines changed

12 files changed

+708
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -316,6 +316,7 @@ curl 'http://0.0.0.0:4000/key/generate' \
316316
| [google AI Studio - gemini](https://docs.litellm.ai/docs/providers/gemini) ||||| | |
317317
| [mistral ai api](https://docs.litellm.ai/docs/providers/mistral) |||||| |
318318
| [cloudflare AI Workers](https://docs.litellm.ai/docs/providers/cloudflare_workers) ||||| | |
319+
| [CompactifAI](https://docs.litellm.ai/docs/providers/compactifai) ||||| | |
319320
| [cohere](https://docs.litellm.ai/docs/providers/cohere) |||||| |
320321
| [anthropic](https://docs.litellm.ai/docs/providers/anthropic) ||||| | |
321322
| [empower](https://docs.litellm.ai/docs/providers/empower) |||||
Lines changed: 223 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,223 @@
1+
import Tabs from '@theme/Tabs';
2+
import TabItem from '@theme/TabItem';
3+
4+
# CompactifAI
5+
https://docs.compactif.ai/
6+
7+
CompactifAI offers highly compressed versions of leading language models, delivering up to **70% lower inference costs**, **4x throughput gains**, and **low-latency inference** with minimal quality loss (<5%). CompactifAI's OpenAI-compatible API makes integration straightforward, enabling developers to build ultra-efficient, scalable AI applications with superior concurrency and resource efficiency.
8+
9+
| Property | Details |
10+
|-------|-------|
11+
| Description | CompactifAI offers compressed versions of leading language models with up to 70% cost reduction and 4x throughput gains |
12+
| Provider Route on LiteLLM | `compactifai/` (add this prefix to the model name - e.g. `compactifai/cai-llama-3-1-8b-slim`) |
13+
| Provider Doc | [CompactifAI ↗](https://docs.compactif.ai/) |
14+
| API Endpoint for Provider | https://api.compactif.ai/v1 |
15+
| Supported Endpoints | `/chat/completions`, `/completions` |
16+
17+
## Supported OpenAI Parameters
18+
19+
CompactifAI is fully OpenAI-compatible and supports the following parameters:
20+
21+
```
22+
"stream",
23+
"stop",
24+
"temperature",
25+
"top_p",
26+
"max_tokens",
27+
"presence_penalty",
28+
"frequency_penalty",
29+
"logit_bias",
30+
"user",
31+
"response_format",
32+
"seed",
33+
"tools",
34+
"tool_choice",
35+
"parallel_tool_calls",
36+
"extra_headers"
37+
```
38+
39+
## API Key Setup
40+
41+
CompactifAI API keys are available through AWS Marketplace subscription:
42+
43+
1. Subscribe via [AWS Marketplace](https://aws.amazon.com/marketplace)
44+
2. Complete subscription verification (24-hour review process)
45+
3. Access MultiverseIAM dashboard with provided credentials
46+
4. Retrieve your API key from the dashboard
47+
48+
```python
49+
import os
50+
51+
os.environ["COMPACTIFAI_API_KEY"] = "your-api-key"
52+
```
53+
54+
## Usage
55+
56+
<Tabs>
57+
<TabItem value="sdk" label="SDK">
58+
59+
```python
60+
from litellm import completion
61+
import os
62+
63+
os.environ['COMPACTIFAI_API_KEY'] = "your-api-key"
64+
65+
response = completion(
66+
model="compactifai/cai-llama-3-1-8b-slim",
67+
messages=[
68+
{"role": "user", "content": "Hello from LiteLLM!"}
69+
],
70+
)
71+
print(response)
72+
```
73+
74+
</TabItem>
75+
<TabItem value="proxy" label="Proxy">
76+
77+
```yaml
78+
model_list:
79+
- model_name: llama-2-compressed
80+
litellm_params:
81+
model: compactifai/cai-llama-3-1-8b-slim
82+
api_key: os.environ/COMPACTIFAI_API_KEY
83+
```
84+
85+
</TabItem>
86+
</Tabs>
87+
88+
## Streaming
89+
90+
```python
91+
from litellm import completion
92+
import os
93+
94+
os.environ['COMPACTIFAI_API_KEY'] = "your-api-key"
95+
96+
response = completion(
97+
model="compactifai/cai-llama-3-1-8b-slim",
98+
messages=[
99+
{"role": "user", "content": "Write a short story"}
100+
],
101+
stream=True
102+
)
103+
104+
for chunk in response:
105+
print(chunk)
106+
```
107+
108+
## Advanced Usage
109+
110+
### Custom Parameters
111+
112+
```python
113+
from litellm import completion
114+
115+
response = completion(
116+
model="compactifai/cai-llama-3-1-8b-slim",
117+
messages=[{"role": "user", "content": "Explain quantum computing"}],
118+
temperature=0.7,
119+
max_tokens=500,
120+
top_p=0.9,
121+
stop=["Human:", "AI:"]
122+
)
123+
```
124+
125+
### Function Calling
126+
127+
CompactifAI supports OpenAI-compatible function calling:
128+
129+
```python
130+
from litellm import completion
131+
132+
functions = [
133+
{
134+
"name": "get_weather",
135+
"description": "Get current weather information",
136+
"parameters": {
137+
"type": "object",
138+
"properties": {
139+
"location": {
140+
"type": "string",
141+
"description": "The city and state"
142+
}
143+
},
144+
"required": ["location"]
145+
}
146+
}
147+
]
148+
149+
response = completion(
150+
model="compactifai/cai-llama-3-1-8b-slim",
151+
messages=[{"role": "user", "content": "What's the weather in San Francisco?"}],
152+
tools=[{"type": "function", "function": f} for f in functions],
153+
tool_choice="auto"
154+
)
155+
```
156+
157+
### Async Usage
158+
159+
```python
160+
import asyncio
161+
from litellm import acompletion
162+
163+
async def async_call():
164+
response = await acompletion(
165+
model="compactifai/cai-llama-3-1-8b-slim",
166+
messages=[{"role": "user", "content": "Hello async world!"}]
167+
)
168+
return response
169+
170+
# Run async function
171+
response = asyncio.run(async_call())
172+
print(response)
173+
```
174+
175+
## Available Models
176+
177+
CompactifAI offers compressed versions of popular models. Use the `/models` endpoint to get the latest list:
178+
179+
```python
180+
import httpx
181+
182+
headers = {"Authorization": f"Bearer {your_api_key}"}
183+
response = httpx.get("https://api.compactif.ai/v1/models", headers=headers)
184+
models = response.json()
185+
```
186+
187+
Common model formats:
188+
- `compactifai/cai-llama-3-1-8b-slim`
189+
- `compactifai/mistral-7b-compressed`
190+
- `compactifai/codellama-7b-compressed`
191+
192+
## Benefits
193+
194+
- **Cost Efficient**: Up to 70% lower inference costs compared to standard models
195+
- **High Performance**: 4x throughput gains with minimal quality loss (<5%)
196+
- **Low Latency**: Optimized for fast response times
197+
- **Drop-in Replacement**: Full OpenAI API compatibility
198+
- **Scalable**: Superior concurrency and resource efficiency
199+
200+
## Error Handling
201+
202+
CompactifAI returns standard OpenAI-compatible error responses:
203+
204+
```python
205+
from litellm import completion
206+
from litellm.exceptions import AuthenticationError, RateLimitError
207+
208+
try:
209+
response = completion(
210+
model="compactifai/cai-llama-3-1-8b-slim",
211+
messages=[{"role": "user", "content": "Hello"}]
212+
)
213+
except AuthenticationError:
214+
print("Invalid API key")
215+
except RateLimitError:
216+
print("Rate limit exceeded")
217+
```
218+
219+
## Support
220+
221+
- Documentation: https://docs.compactif.ai/
222+
- LinkedIn: [MultiverseComputing](https://www.linkedin.com/company/multiversecomputing)
223+
- Analysis: [Artificial Analysis Provider Comparison](https://artificialanalysis.ai/providers/compactifai)

docs/my-website/sidebars.js

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -453,6 +453,7 @@ const sidebars = {
453453
"providers/elevenlabs",
454454
"providers/fireworks_ai",
455455
"providers/clarifai",
456+
"providers/compactifai",
456457
"providers/vllm",
457458
"providers/llamafile",
458459
"providers/infinity",

litellm/__init__.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1023,6 +1023,7 @@ def add_known_models():
10231023
from .llms.aiohttp_openai.chat.transformation import AiohttpOpenAIChatConfig
10241024
from .llms.galadriel.chat.transformation import GaladrielChatConfig
10251025
from .llms.github.chat.transformation import GithubChatConfig
1026+
from .llms.compactifai.chat.transformation import CompactifAIChatConfig
10261027
from .llms.empower.chat.transformation import EmpowerChatConfig
10271028
from .llms.huggingface.chat.transformation import HuggingFaceChatConfig
10281029
from .llms.huggingface.embedding.transformation import HuggingFaceEmbeddingConfig

litellm/litellm_core_utils/get_llm_provider_logic.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -372,6 +372,8 @@ def get_llm_provider( # noqa: PLR0915
372372
custom_llm_provider = "cometapi"
373373
elif model.startswith("oci/"):
374374
custom_llm_provider = "oci"
375+
elif model.startswith("compactifai/"):
376+
custom_llm_provider = "compactifai"
375377
elif model.startswith("ovhcloud/"):
376378
custom_llm_provider = "ovhcloud"
377379
if not custom_llm_provider:
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# CompactifAI provider for LiteLLM
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
# CompactifAI chat completions
Lines changed: 100 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,100 @@
1+
"""
2+
CompactifAI chat completion transformation
3+
"""
4+
5+
from typing import TYPE_CHECKING, Any, List, Optional, Tuple, Union
6+
7+
import httpx
8+
9+
from litellm.secret_managers.main import get_secret_str
10+
from litellm.types.utils import ModelResponse
11+
from litellm.llms.openai.common_utils import OpenAIError
12+
from litellm.llms.base_llm.chat.transformation import BaseLLMException
13+
14+
from ...openai.chat.gpt_transformation import OpenAIGPTConfig
15+
16+
if TYPE_CHECKING:
17+
from litellm.litellm_core_utils.litellm_logging import Logging as _LiteLLMLoggingObj
18+
19+
LiteLLMLoggingObj = _LiteLLMLoggingObj
20+
else:
21+
LiteLLMLoggingObj = Any
22+
23+
24+
class CompactifAIChatConfig(OpenAIGPTConfig):
25+
"""
26+
Configuration class for CompactifAI chat completions.
27+
Since CompactifAI is OpenAI-compatible, we extend OpenAIGPTConfig.
28+
"""
29+
30+
def _get_openai_compatible_provider_info(
31+
self,
32+
api_base: Optional[str],
33+
api_key: Optional[str],
34+
) -> Tuple[Optional[str], Optional[str]]:
35+
"""
36+
Get API base and key for CompactifAI provider.
37+
"""
38+
api_base = api_base or "https://api.compactif.ai/v1"
39+
dynamic_api_key = api_key or get_secret_str("COMPACTIFAI_API_KEY") or ""
40+
return api_base, dynamic_api_key
41+
42+
def transform_response(
43+
self,
44+
model: str,
45+
raw_response: httpx.Response,
46+
model_response: ModelResponse,
47+
logging_obj: LiteLLMLoggingObj,
48+
request_data: dict,
49+
messages: List,
50+
optional_params: dict,
51+
litellm_params: dict,
52+
encoding: Any,
53+
api_key: Optional[str] = None,
54+
json_mode: Optional[bool] = None,
55+
) -> ModelResponse:
56+
"""
57+
Transform CompactifAI response to LiteLLM format.
58+
Since CompactifAI is OpenAI-compatible, we can use the standard OpenAI transformation.
59+
"""
60+
## LOGGING
61+
logging_obj.post_call(
62+
input=messages,
63+
api_key=api_key,
64+
original_response=raw_response.text,
65+
additional_args={"complete_input_dict": request_data},
66+
)
67+
68+
## RESPONSE OBJECT
69+
response_json = raw_response.json()
70+
71+
# Handle JSON mode if needed
72+
if json_mode:
73+
for choice in response_json["choices"]:
74+
message = choice.get("message")
75+
if message and message.get("tool_calls"):
76+
# Convert tool calls to content for JSON mode
77+
tool_calls = message.get("tool_calls", [])
78+
if len(tool_calls) == 1:
79+
message["content"] = tool_calls[0]["function"].get("arguments", "")
80+
message["tool_calls"] = None
81+
82+
returned_response = ModelResponse(**response_json)
83+
84+
# Set model name with provider prefix
85+
returned_response.model = f"compactifai/{model}"
86+
87+
return returned_response
88+
89+
def get_error_class(
90+
self, error_message: str, status_code: int, headers: Union[dict, httpx.Headers]
91+
) -> BaseLLMException:
92+
"""
93+
Get the appropriate error class for CompactifAI errors.
94+
Since CompactifAI is OpenAI-compatible, we use OpenAI error handling.
95+
"""
96+
return OpenAIError(
97+
status_code=status_code,
98+
message=error_message,
99+
headers=headers,
100+
)

0 commit comments

Comments
 (0)