@@ -1463,25 +1463,21 @@ message ModelSequenceBatching
14631463 //@@ Should the dynamic batcher preserve the ordering of responses to
14641464 //@@ match the order of requests received by the scheduler. Default is
14651465 //@@ false. If true, the responses will be returned in the same order
1466- // as
1467- //@@ the order of requests sent to the scheduler. If false, the
1468- // responses
1469- //@@ may be returned in arbitrary order. This option is specifically
1470- //@@ needed when a sequence of related inference requests (i.e.
1471- // inference
1472- //@@ requests with the same correlation ID) are sent to the dynamic
1473- //@@ batcher to ensure that the sequence responses are in the correct
1474- //@@ order.
1466+ //@@ as the order of requests sent to the scheduler. If false, the
1467+ //@@ responses may be returned in arbitrary order. This option is
1468+ //@@ specifically needed when a sequence of related inference requests
1469+ //@@ (i.e. inference requests with the same correlation ID) are sent
1470+ //@@ to the dynamic batcher to ensure that the sequence responses are
1471+ //@@ in the correct order.
14751472 //@@
14761473 //@@ When using decoupled models, setting this to true may block the
14771474 //@@ responses from independent sequences from being returned to the
14781475 //@@ client until the previous request completes, hurting overall
14791476 //@@ performance. If using GRPC streaming protocol, the stream
1480- // ordering
1481- //@@ guarantee may be sufficient alone to ensure the responses for
1482- // each
1483- //@@ sequence are returned in sequence-order without blocking based on
1484- //@@ independent requests, depending on the use case.
1477+ //@@ ordering guarantee may be sufficient alone to ensure the
1478+ //@@ responses for each sequence are returned in sequence-order
1479+ //@@ without blocking based on independent requests, depending on the
1480+ //@@ use case.
14851481 //@@
14861482 bool preserve_ordering = 4 ;
14871483 }
@@ -1537,6 +1533,14 @@ message ModelSequenceBatching
15371533 //@@ in the sequence contains garbage data.
15381534 //@@
15391535 repeated State state = 5 ;
1536+
1537+ //@@ .. cpp:var:: bool generative_sequence
1538+ //@@
1539+ //@@ The sequence batcher is expecting the sequence to be generative. A
1540+ //@@ generative sequence is initiated by single request, the sequence
1541+ //@@ batcher expects the same request to be "rescheduled" by the model if
1542+ //@@ the sequence is continuing.
1543+ bool generative_sequence = 6 ;
15401544}
15411545
15421546//@@
0 commit comments