We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
1 parent 2b8fe03 commit 0881524Copy full SHA for 0881524
protobuf/model_config.proto
@@ -1662,8 +1662,8 @@ message ModelEnsembling
1662
1663
//@@ .. cpp:var:: uint32 max_inflight_requests
1664
//@@
1665
- //@@ The maximum number of concurrent inflight requests at each ensemble
1666
- //@@ step.
+ //@@ The maximum number of concurrent inflight requests allowed at each ensemble
+ //@@ step per inference request.
1667
//@@ This limit prevents unbounded memory growth when decoupled models
1668
//@@ produce responses faster than downstream models can consume them.
1669
//@@ Default value is 0, which indicates that no limit is enforced.
0 commit comments