Skip to content
Merged
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion protobuf/model_config.proto
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
// Copyright 2018-2024, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
// Copyright 2018-2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
//
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions
Expand Down Expand Up @@ -1659,6 +1659,16 @@ message ModelEnsembling
//@@ The models and the input / output mappings used within the ensemble.
//@@
repeated Step step = 1;

//@@ .. cpp:var:: uint32 max_inflight_responses
//@@
//@@ The maximum number of concurrent inflight responses from ensemble
//@@ steps to downstream consumers. This limit prevents unbounded memory
//@@ growth when decoupled models produce responses faster than downstream
//@@ models can consume them. Default value is 0, which indicates that no
//@@ limit is enforced (unlimited).
//@@
uint32 max_inflight_responses = 2;
}

//@@
Expand Down