docs: add native model client dev note by nabinchha · Pull Request #465 · NVIDIA-NeMo/DataDesigner

nabinchha · 2026-03-25T17:48:45Z

📋 Summary

Adds a new dev note covering the native model client layer and its adaptive throttling system (AIMD-based concurrency control).

🔄 Changes

✨ Added

New dev note: owning-the-model-stack.md — covers the native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level throttle keying, and the retry boundary design
Architecture diagrams in docs/devnotes/posts/assets/owning-the-model-stack/ (hero image, layer diagram, AIMD concurrency chart, throttle keying diagram, retry boundary diagram)
Author entry for nmulepati in .authors.yml
Nav entry in mkdocs.yml

🔍 Attention Areas

⚠️ Reviewers: Please pay special attention to the following:

docs/devnotes/posts/owning-the-model-stack.md — new long-form technical content; review for accuracy on AIMD behavior, retry boundary semantics, and configuration parameter descriptions

🤖 Generated with AI

Made with Cursor

greptile-apps · 2026-03-25T17:51:54Z

Greptile Summary

This PR adds a new long-form developer note, owning-the-model-stack.md, documenting the native HTTP model client layer that replaced LiteLLM in Data Designer. It covers the six-layer client architecture, AIMD adaptive concurrency control, ceiling stabilization, cascade dampening, two-level throttle keying, the retry boundary design, and a ThrottleConfig reference. Supporting PNG assets and authorship/nav metadata are included.

The AIMD math is consistent throughout: a 25% cut on 429 (reduce_factor=0.75), additive increase of 1 per success window of 25, and a ceiling probe band of 10% (ceiling_overshoot=0.10). The log-example arithmetic (20→15→11) checks out against the stated defaults.
The retry boundary design (429 excluded from transport retries, flows to ThrottledModelClient for AIMD feedback) is clearly explained and architecturally coherent.
The sync vs. async mode caveat is appropriately called out as temporary.
Both issues flagged in the prior review (inconsistent model name prefix in log examples, duplicate closing phrase) were resolved in follow-up commits and are no longer present.
The ThrottleConfig code sample omits additive_increase (it uses the default of 1), which is intentional and consistent with the accompanying parameter table.
Nav entry is correctly placed first in the Dev Notes list following the "most recent first" ordering convention.

Confidence Score: 5/5

Documentation-only PR with no code changes; safe to merge.

No code is modified — only docs, assets, and config metadata. The technical content has been verified for internal consistency (AIMD math, log sequence arithmetic, parameter defaults). Prior review issues are resolved. No P0 or P1 findings remain.

No files require special attention.

Important Files Changed

Filename	Overview
docs/devnotes/posts/owning-the-model-stack.md	New long-form dev note covering native HTTP client architecture, AIMD adaptive throttling, ceiling stabilization, cascade dampening, two-level keying, and retry boundary semantics — technically accurate and well-structured
docs/devnotes/.authors.yml	Adds author entry for nmulepati following the established YAML format
mkdocs.yml	Adds nav entry for the new dev note in the correct "most recent first" position

Sequence Diagram

sequenceDiagram
    participant CG as Column Generator
    participant MF as ModelFacade
    participant TC as ThrottledModelClient
    participant TM as ThrottleManager
    participant HC as HttpModelClient
    participant PA as Provider Adapter
    participant API as Provider HTTP API

    CG->>MF: generate(request)
    MF->>TC: chat_completion(request)
    TC->>TM: acquire_permit(domain_key)
    alt Permit granted
        TM-->>TC: permit
        TC->>HC: execute(request)
        HC->>PA: translate & send
        PA->>API: HTTP POST
        alt 200 OK
            API-->>PA: response
            PA-->>HC: canonical response
            HC-->>TC: success
            TC->>TM: release_success()
            TM->>TM: increment success_streak
            Note over TM: streak >= success_window → concurrency +1
        else 429 Rate Limited
            API-->>PA: 429
            PA-->>HC: ProviderError(429)
            HC-->>TC: 429 (bypasses transport retry)
            TC->>TM: release_rate_limited()
            TM->>TM: concurrency × reduce_factor
            TM->>TM: update ceiling, start cooldown
            TC->>TC: re-enter throttle acquire path
        else 502/503/504
            API-->>PA: 5xx
            PA-->>HC: transient error
            HC->>HC: RetryTransport (exp backoff + jitter)
            HC-->>TC: response (after retry)
        end
        TC-->>MF: response
    else Blocked (cooldown)
        TM-->>TC: wait for cooldown
    end
    MF-->>CG: result

_{Reviews (11): Last reviewed commit: "Merge branch 'main' into nmulepati/docs/..." | Re-trigger Greptile}

docs/devnotes/posts/owning-the-model-stack.md

johnnygreco

This is a really great writeup! Nice work @nabinchha 🙌

nabinchha added 4 commits March 24, 2026 17:37

add images

8002c6f

Merge branch 'main' into nmulepati/docs/native-model-client-dev-notes

ac8a1c4

re-ran slopguard

8536e2b

update dev notes

8fc7a0a

nabinchha requested a review from a team as a code owner March 25, 2026 17:48

Merge branch 'main' into nmulepati/docs/native-model-client-dev-notes

efc8e5e

greptile-apps bot reviewed Mar 25, 2026

View reviewed changes

docs/devnotes/posts/owning-the-model-stack.md Outdated Show resolved Hide resolved

docs/devnotes/posts/owning-the-model-stack.md Show resolved Hide resolved

nabinchha added 6 commits March 25, 2026 11:59

address greptile comments

4d841cb

Merge branch 'main' into nmulepati/docs/native-model-client-dev-notes

75bf989

update example model name

12549fb

Merge branch 'main' into nmulepati/docs/native-model-client-dev-notes

a37961c

add info on throttlemanager

e44de01

Merge branch 'main' into nmulepati/docs/native-model-client-dev-notes

3bd879c