docs: add native model client dev note#465
Conversation
Greptile SummaryThis PR adds a new long-form developer note,
|
| Filename | Overview |
|---|---|
| docs/devnotes/posts/owning-the-model-stack.md | New long-form dev note covering native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level keying, and retry boundary semantics — technically accurate and well-structured |
| docs/devnotes/.authors.yml | Adds author entry for nmulepati following the established YAML format |
| mkdocs.yml | Adds nav entry for the new dev note in the correct "most recent first" position |
Sequence Diagram
sequenceDiagram
participant CG as Column Generator
participant MF as ModelFacade
participant TC as ThrottledModelClient
participant TM as ThrottleManager
participant HC as HttpModelClient
participant PA as Provider Adapter
participant API as Provider HTTP API
CG->>MF: generate(request)
MF->>TC: chat_completion(request)
TC->>TM: acquire_permit(domain_key)
alt Permit granted
TM-->>TC: permit
TC->>HC: execute(request)
HC->>PA: translate & send
PA->>API: HTTP POST
alt 200 OK
API-->>PA: response
PA-->>HC: canonical response
HC-->>TC: success
TC->>TM: release_success()
TM->>TM: increment success_streak
Note over TM: streak >= success_window → concurrency +1
else 429 Rate Limited
API-->>PA: 429
PA-->>HC: ProviderError(429)
HC-->>TC: 429 (bypasses transport retry)
TC->>TM: release_rate_limited()
TM->>TM: concurrency × reduce_factor
TM->>TM: update ceiling, start cooldown
TC->>TC: re-enter throttle acquire path
else 502/503/504
API-->>PA: 5xx
PA-->>HC: transient error
HC->>HC: RetryTransport (exp backoff + jitter)
HC-->>TC: response (after retry)
end
TC-->>MF: response
else Blocked (cooldown)
TM-->>TC: wait for cooldown
end
MF-->>CG: result
Reviews (11): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile
johnnygreco
left a comment
There was a problem hiding this comment.
This is a really great writeup! Nice work @nabinchha 🙌
📋 Summary
Adds a new dev note covering the native model client layer and its adaptive throttling system (AIMD-based concurrency control).
🔄 Changes
✨ Added
owning-the-model-stack.md— covers the native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level throttle keying, and the retry boundary designdocs/devnotes/posts/assets/owning-the-model-stack/(hero image, layer diagram, AIMD concurrency chart, throttle keying diagram, retry boundary diagram)nmulepatiin.authors.ymlmkdocs.yml🔍 Attention Areas
docs/devnotes/posts/owning-the-model-stack.md— new long-form technical content; review for accuracy on AIMD behavior, retry boundary semantics, and configuration parameter descriptions🤖 Generated with AI
Made with Cursor