You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An **Mixture-of-Models** (MoM) router that intelligently directs OpenAI API requests to the most suitable models from a defined pool based on **Semantic Understanding** of the request's intent (Complexity, Task, Tools).
23
25
@@ -33,31 +35,49 @@ The screenshot below shows the LLM Router dashboard in Grafana.
33
35
34
36
The router is implemented in two ways: Golang (with Rust FFI based on Candle) and Python. Benchmarking will be conducted to determine the best implementation.
35
37
36
-
### Auto-Selection of Tools
38
+
####Auto-Selection of Tools
37
39
38
40
Select the tools to use based on the prompt, avoiding the use of tools that are not relevant to the prompt so as to reduce the number of prompt tokens and improve tool selection accuracy by the LLM.
39
41
40
-
### PII detection
42
+
### Enterprise Security π
43
+
44
+
#### PII detection
41
45
42
46
Detect PII in the prompt, avoiding sending PII to the LLM so as to protect the privacy of the user.
43
47
44
-
### Prompt guard
48
+
####Prompt guard
45
49
46
50
Detect if the prompt is a jailbreak prompt, avoiding sending jailbreak prompts to the LLM so as to prevent the LLM from misbehaving.
47
51
48
-
### Semantic Caching
52
+
### Similarity Caching β‘οΈ
49
53
50
54
Cache the semantic representation of the prompt so as to reduce the number of prompt tokens and improve the overall inference latency.
51
55
52
-
## π Documentation
56
+
## Documentation π
53
57
54
58
For comprehensive documentation including detailed setup instructions, architecture guides, and API references, visit:
55
59
56
-
**π [Complete Documentation at Read the Docs](https://llm-semantic-router.readthedocs.io/en/latest/)**
60
+
**π [Complete Documentation at Read the Docs](https://vllm-semantic-router.com/)**
0 commit comments