Skip to content

Commit adef772

Browse files
authored
Add new sequence batcher parameter for generative sequence (#102)
1 parent cf617c9 commit adef772

File tree

1 file changed

+18
-14
lines changed

1 file changed

+18
-14
lines changed

protobuf/model_config.proto

Lines changed: 18 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1463,25 +1463,21 @@ message ModelSequenceBatching
14631463
//@@ Should the dynamic batcher preserve the ordering of responses to
14641464
//@@ match the order of requests received by the scheduler. Default is
14651465
//@@ false. If true, the responses will be returned in the same order
1466-
// as
1467-
//@@ the order of requests sent to the scheduler. If false, the
1468-
// responses
1469-
//@@ may be returned in arbitrary order. This option is specifically
1470-
//@@ needed when a sequence of related inference requests (i.e.
1471-
// inference
1472-
//@@ requests with the same correlation ID) are sent to the dynamic
1473-
//@@ batcher to ensure that the sequence responses are in the correct
1474-
//@@ order.
1466+
//@@ as the order of requests sent to the scheduler. If false, the
1467+
//@@ responses may be returned in arbitrary order. This option is
1468+
//@@ specifically needed when a sequence of related inference requests
1469+
//@@ (i.e. inference requests with the same correlation ID) are sent
1470+
//@@ to the dynamic batcher to ensure that the sequence responses are
1471+
//@@ in the correct order.
14751472
//@@
14761473
//@@ When using decoupled models, setting this to true may block the
14771474
//@@ responses from independent sequences from being returned to the
14781475
//@@ client until the previous request completes, hurting overall
14791476
//@@ performance. If using GRPC streaming protocol, the stream
1480-
// ordering
1481-
//@@ guarantee may be sufficient alone to ensure the responses for
1482-
// each
1483-
//@@ sequence are returned in sequence-order without blocking based on
1484-
//@@ independent requests, depending on the use case.
1477+
//@@ ordering guarantee may be sufficient alone to ensure the
1478+
//@@ responses for each sequence are returned in sequence-order
1479+
//@@ without blocking based on independent requests, depending on the
1480+
//@@ use case.
14851481
//@@
14861482
bool preserve_ordering = 4;
14871483
}
@@ -1537,6 +1533,14 @@ message ModelSequenceBatching
15371533
//@@ in the sequence contains garbage data.
15381534
//@@
15391535
repeated State state = 5;
1536+
1537+
//@@ .. cpp:var:: bool generative_sequence
1538+
//@@
1539+
//@@ The sequence batcher is expecting the sequence to be generative. A
1540+
//@@ generative sequence is initiated by single request, the sequence
1541+
//@@ batcher expects the same request to be "rescheduled" by the model if
1542+
//@@ the sequence is continuing.
1543+
bool generative_sequence = 6;
15401544
}
15411545

15421546
//@@

0 commit comments

Comments
 (0)