Skip to content

docs: add native model client dev note#465

Merged
nabinchha merged 20 commits intomainfrom
nmulepati/docs/native-model-client-dev-notes
Mar 31, 2026
Merged

docs: add native model client dev note#465
nabinchha merged 20 commits intomainfrom
nmulepati/docs/native-model-client-dev-notes

Conversation

@nabinchha
Copy link
Copy Markdown
Contributor

📋 Summary

Adds a new dev note covering the native model client layer and its adaptive throttling system (AIMD-based concurrency control).

🔄 Changes

✨ Added

  • New dev note: owning-the-model-stack.md — covers the native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level throttle keying, and the retry boundary design
  • Architecture diagrams in docs/devnotes/posts/assets/owning-the-model-stack/ (hero image, layer diagram, AIMD concurrency chart, throttle keying diagram, retry boundary diagram)
  • Author entry for nmulepati in .authors.yml
  • Nav entry in mkdocs.yml

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:


🤖 Generated with AI

Made with Cursor

@nabinchha nabinchha requested a review from a team as a code owner March 25, 2026 17:48
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 25, 2026

Greptile Summary

This PR adds a new long-form developer note, owning-the-model-stack.md, documenting the native HTTP model client layer that replaced LiteLLM in Data Designer. It covers the six-layer client architecture, AIMD adaptive concurrency control, ceiling stabilization, cascade dampening, two-level throttle keying, the retry boundary design, and a ThrottleConfig reference. Supporting PNG assets and authorship/nav metadata are included.

  • The AIMD math is consistent throughout: a 25% cut on 429 (reduce_factor=0.75), additive increase of 1 per success window of 25, and a ceiling probe band of 10% (ceiling_overshoot=0.10). The log-example arithmetic (20→15→11) checks out against the stated defaults.
  • The retry boundary design (429 excluded from transport retries, flows to ThrottledModelClient for AIMD feedback) is clearly explained and architecturally coherent.
  • The sync vs. async mode caveat is appropriately called out as temporary.
  • Both issues flagged in the prior review (inconsistent model name prefix in log examples, duplicate closing phrase) were resolved in follow-up commits and are no longer present.
  • The ThrottleConfig code sample omits additive_increase (it uses the default of 1), which is intentional and consistent with the accompanying parameter table.
  • Nav entry is correctly placed first in the Dev Notes list following the "most recent first" ordering convention.

Confidence Score: 5/5

Documentation-only PR with no code changes; safe to merge.

No code is modified — only docs, assets, and config metadata. The technical content has been verified for internal consistency (AIMD math, log sequence arithmetic, parameter defaults). Prior review issues are resolved. No P0 or P1 findings remain.

No files require special attention.

Important Files Changed

Filename Overview
docs/devnotes/posts/owning-the-model-stack.md New long-form dev note covering native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level keying, and retry boundary semantics — technically accurate and well-structured
docs/devnotes/.authors.yml Adds author entry for nmulepati following the established YAML format
mkdocs.yml Adds nav entry for the new dev note in the correct "most recent first" position

Sequence Diagram

sequenceDiagram
    participant CG as Column Generator
    participant MF as ModelFacade
    participant TC as ThrottledModelClient
    participant TM as ThrottleManager
    participant HC as HttpModelClient
    participant PA as Provider Adapter
    participant API as Provider HTTP API

    CG->>MF: generate(request)
    MF->>TC: chat_completion(request)
    TC->>TM: acquire_permit(domain_key)
    alt Permit granted
        TM-->>TC: permit
        TC->>HC: execute(request)
        HC->>PA: translate & send
        PA->>API: HTTP POST
        alt 200 OK
            API-->>PA: response
            PA-->>HC: canonical response
            HC-->>TC: success
            TC->>TM: release_success()
            TM->>TM: increment success_streak
            Note over TM: streak >= success_window → concurrency +1
        else 429 Rate Limited
            API-->>PA: 429
            PA-->>HC: ProviderError(429)
            HC-->>TC: 429 (bypasses transport retry)
            TC->>TM: release_rate_limited()
            TM->>TM: concurrency × reduce_factor
            TM->>TM: update ceiling, start cooldown
            TC->>TC: re-enter throttle acquire path
        else 502/503/504
            API-->>PA: 5xx
            PA-->>HC: transient error
            HC->>HC: RetryTransport (exp backoff + jitter)
            HC-->>TC: response (after retry)
        end
        TC-->>MF: response
    else Blocked (cooldown)
        TM-->>TC: wait for cooldown
    end
    MF-->>CG: result
Loading

Reviews (11): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile

johnnygreco
johnnygreco previously approved these changes Mar 31, 2026
Copy link
Copy Markdown
Contributor

@johnnygreco johnnygreco left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really great writeup! Nice work @nabinchha 🙌

johnnygreco
johnnygreco previously approved these changes Mar 31, 2026
@nabinchha nabinchha requested a review from johnnygreco March 31, 2026 21:41
@nabinchha nabinchha merged commit a1eb244 into main Mar 31, 2026
47 checks passed
@nabinchha nabinchha deleted the nmulepati/docs/native-model-client-dev-notes branch March 31, 2026 21:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants