You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Introduces the DeepConf plugin for confidence-aware reasoning with early termination for local models, based on the 'Deep Think with Confidence' paper. Adds core modules for confidence calculation, threshold calibration, and online processing with consensus-based stopping and weighted majority voting. Integrates DeepConf decoding into the inference pipeline and provides a test suite for validation.
DeepConf is a confidence-aware reasoning approach for large language models that uses model-internal confidence signals to dynamically filter out low-quality reasoning traces during generation, improving both efficiency and accuracy.
4
+
5
+
## Overview
6
+
7
+
Based on the paper "Deep Think with Confidence" by Fu et al. (2024), DeepConf implements:
8
+
9
+
-**Token-level confidence scoring** using entropy and log-probability metrics
10
+
-**Online mode with early termination** to save computational resources
11
+
-**Warmup phase for threshold calibration**
12
+
-**Consensus-based stopping** when high agreement is reached
13
+
-**Weighted majority voting** for final answer selection
14
+
15
+
## Features
16
+
17
+
- ✅ **Local models only** - Works with OptILLM's local inference engine
18
+
- ✅ **Two variants**: `low` (aggressive, top 10%) and `high` (conservative, top 90%)
19
+
- ✅ **Configurable parameters** for different use cases
20
+
- ✅ **Early termination** to reduce token usage by 50-70%
21
+
- ✅ **Automatic quality control** without external evaluation
22
+
23
+
## Usage
24
+
25
+
### Basic Usage
26
+
27
+
Set up OptILLM for local inference:
28
+
29
+
```bash
30
+
export OPTILLM_API_KEY=optillm
31
+
python optillm.py --model your-local-model
32
+
```
33
+
34
+
Then make a request with DeepConf decoding:
35
+
36
+
```python
37
+
import openai
38
+
39
+
client = openai.OpenAI(
40
+
api_key="optillm",
41
+
base_url="http://localhost:8000/v1"
42
+
)
43
+
44
+
response = client.chat.completions.create(
45
+
model="your-model",
46
+
messages=[
47
+
{"role": "user", "content": "Solve this math problem: What is the derivative of x^3 + 2x^2 - 5x + 1?"}
48
+
],
49
+
extra_body={
50
+
"decoding": "deepconf",
51
+
"variant": "low", # "low" or "high"
52
+
"warmup_samples": 16, # Number of calibration traces
53
+
"max_traces": 64, # Maximum total traces
54
+
"consensus_threshold": 0.95# Stop when consensus reached
0 commit comments