Skip to content

Commit 09cac43

Browse files
committed
update docs after changes in profiles
1 parent e3558a6 commit 09cac43

File tree

3 files changed

+68
-14
lines changed

3 files changed

+68
-14
lines changed

docs/content/api-reference/overview.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -98,10 +98,10 @@ When available, provider-specific performance metrics are extracted from respons
9898

9999
| Metric | Description | Providers |
100100
|--------|-------------|-----------|
101-
| `provider_total_ms` | Total processing time | Ollama, LM Studio |
102-
| `provider_prompt_tokens` | Tokens in prompt | All |
103-
| `provider_completion_tokens` | Tokens generated | All |
104-
| `provider_tokens_per_second` | Generation speed | Ollama, LM Studio |
101+
| `provider_total_ms` | Total processing time (ms) | Ollama, LM Studio |
102+
| `provider_prompt_tokens` | Tokens in prompt (count) | All |
103+
| `provider_completion_tokens` | Tokens generated (count) | All |
104+
| `provider_tokens_per_second` | Generation speed (tokens/s) | Ollama, LM Studio |
105105
| `provider_model` | Actual model used | All |
106106

107107
See [Provider Metrics](../concepts/provider-metrics.md) for detailed information.

docs/content/concepts/provider-metrics.md

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,9 @@ metrics:
124124
125125
### Key Components
126126
127-
1. **`paths`**: Maps field names to JSONPath expressions for extracting values from the provider's response
127+
1. **`paths`**: Maps field names to JSON path expressions for extracting values from the provider's response
128+
- Supports both JSONPath notation (`$.field.subfield`) and gjson notation (`field.subfield`)
129+
- JSONPath prefixes are automatically normalized: `$.` is trimmed, `$` becomes empty string
128130
2. **`calculations`**: Defines derived metrics using mathematical expressions that reference extracted fields
129131
3. **Expression variables**: Any field defined in `paths` can be used as a variable in `calculations`
130132
4. **Pre-compilation**: Expressions are compiled at startup for performance
@@ -146,12 +148,13 @@ metrics:
146148
format: "json" # Expected format
147149
148150
# JSONPath expressions for extracting values from provider response
151+
# Note: Both JSONPath ($.field) and gjson (field) notation are supported
149152
paths:
150-
model: "$.model"
151-
done: "$.done"
153+
model: "$.model" # JSONPath notation (normalized to "model")
154+
is_complete: "done" # gjson notation (used as-is)
152155
# Token counts
153156
input_tokens: "$.prompt_eval_count"
154-
output_tokens: "$.eval_count"
157+
output_tokens: "eval_count" # Both formats work identically
155158
# Timing data (in nanoseconds from Ollama)
156159
total_duration_ns: "$.total_duration"
157160
load_duration_ns: "$.load_duration"
@@ -260,9 +263,16 @@ Response includes provider metrics when available:
260263

261264
### Extraction Implementation
262265
Olla uses high-performance libraries for metrics extraction:
263-
- **[gjson](https://github.com/tidwall/gjson)**: For JSONPath parsing (7.6x faster than encoding/json)
266+
- **[gjson](https://github.com/tidwall/gjson)**: For JSON path parsing (7.6x faster than encoding/json)
264267
- **[expr](https://github.com/expr-lang/expr)**: For pre-compiled mathematical expressions
265268

269+
**JSONPath Normalization**: Olla automatically normalizes JSONPath-style prefixes for gjson compatibility:
270+
- `$.foo.bar` → `foo.bar` (leading `$.` is trimmed)
271+
- `$` → `` (root selector is converted to empty string)
272+
- `foo.bar` → `foo.bar` (already normalized paths are unchanged)
273+
274+
This means you can use either JSONPath notation (`$.model`) or gjson notation (`model`) in your configurations - both work identically.
275+
266276
### Extraction Overhead
267277
- Metrics extraction runs with a 10ms timeout to prevent blocking
268278
- Extraction is best-effort - failures don't affect request processing
@@ -316,7 +326,7 @@ alerts:
316326

317327
### Metrics Not Appearing
318328
1. Check provider supports metrics in responses
319-
2. Verify profile configuration includes `metrics_extraction`
329+
2. Verify profile configuration includes `metrics.extraction` section
320330
3. Enable debug logging to see extraction attempts
321331
4. Ensure response format matches expected structure
322332

docs/content/development/architecture.md

Lines changed: 48 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -210,7 +210,25 @@ func (s *SherpaProxy) ProxyRequest(ctx context.Context,
210210
}
211211
defer resp.Body.Close()
212212

213-
return s.streamResponse(w, resp)
213+
// Stream response with new signature
214+
// Returns:
215+
// - bytesWritten: total bytes successfully written to client
216+
// - lastChunk: final bytes of response (up to 8KB) for metrics extraction
217+
// - err: streaming error if any
218+
buffer := make([]byte, 32*1024)
219+
bytesWritten, lastChunk, err := s.streamResponse(ctx, ctx, w, resp, buffer, logger)
220+
if err != nil {
221+
return fmt.Errorf("streaming failed after %d bytes: %w", bytesWritten, err)
222+
}
223+
224+
// lastChunk contains the tail of the response (for extracting provider metrics)
225+
// This avoids buffering the entire response while still capturing completion stats
226+
if len(lastChunk) > 0 {
227+
// Extract provider metrics from the last chunk of response
228+
s.extractMetrics(lastChunk, stats)
229+
}
230+
231+
return nil
214232
}
215233
```
216234

@@ -226,11 +244,37 @@ type OllaProxy struct {
226244
func (o *OllaProxy) ProxyRequest(ctx context.Context,
227245
w http.ResponseWriter, r *http.Request,
228246
stats *RequestStats, logger StyledLogger) error {
247+
endpoint := o.selectEndpoint()
229248
pool := o.getPool(endpoint)
230-
conn := pool.Get()
231-
defer pool.Put(conn)
232249

233-
return o.streamWithBackpressure(w, conn)
250+
resp, err := pool.RoundTrip(r)
251+
if err != nil {
252+
return err
253+
}
254+
defer resp.Body.Close()
255+
256+
// Get buffer from pool for zero-allocation streaming
257+
buffer := o.bufferPool.Get()
258+
defer o.bufferPool.Put(buffer)
259+
260+
// Stream with optimized backpressure handling
261+
// Returns: bytes written, last chunk for metrics, error
262+
bytesWritten, lastChunk, err := o.streamResponse(
263+
r.Context(), // client context
264+
resp.Request.Context(), // upstream context
265+
w, resp, *buffer, logger)
266+
267+
if err != nil && !errors.Is(err, context.Canceled) {
268+
return fmt.Errorf("stream failed: %w", err)
269+
}
270+
271+
// Extract metrics from last chunk (Olla buffers only final bytes)
272+
if len(lastChunk) > 0 {
273+
o.extractProviderMetrics(lastChunk, endpoint, stats)
274+
}
275+
276+
stats.TotalBytes = bytesWritten
277+
return nil
234278
}
235279
```
236280

0 commit comments

Comments
 (0)