Alternative implementation of thinking mode #1723

RobinPicard · 2025-08-11T16:35:17Z

No description provided.

rlouf · 2025-08-12T10:22:40Z

outlines/backends/__init__.py

 def get_json_schema_logits_processor(
    backend_name: str | None,
    model: SteerableModel,


I'd prefer a separate function that calls get_json_schema_logits_processor instead of the current branching logic

Also, is there a clean way to get the JsonSchema, Regex and CFG objects up to this point? That would allow us to have a single function get_thinking_logits_processor that dispatches depending on the type.

rlouf · 2025-08-12T10:22:55Z

outlines/backends/__init__.py

+    if end_thinking_tag is not None:
+        end_thinking_token_id = _get_end_thinking_token_id(end_thinking_tag, model)
+        return ThinkingLogitsProcessor(end_thinking_token_id, thinking_max_tokens, backend_logits_processor)
+    return backend_logits_processor


 def get_regex_logits_processor(


rlouf · 2025-08-12T10:23:02Z

outlines/backends/__init__.py

+    if end_thinking_tag is not None:
+        end_thinking_token_id = _get_end_thinking_token_id(end_thinking_tag, model)
+        return ThinkingLogitsProcessor(end_thinking_token_id, thinking_max_tokens, backend_logits_processor)
+    return backend_logits_processor


 def get_cfg_logits_processor(


rlouf · 2025-08-12T10:26:07Z

outlines/backends/outlines_core.py

@@ -90,7 +90,7 @@ def _setup(self, batch_size: int, vocab_size: int) -> None:
        ]

    def _bias_logits_mlx( # pragma: no cover
-        self, batch_size: int, logits: TensorType
+        self, batch_size: int, logits: TensorType, skip: list[bool]


If we go with this design, I would consider a different name like passthrough

rlouf · 2025-08-12T10:30:14Z

outlines/processors/thinking_logits_processor.py

+        if all(self._is_thinking):
+            return logits
+
+        return self.logits_processor.process_logits(input_ids, logits)


I'm wondering if we could transform all this into operations on arrays so we don't have to call process_logits for the sequences where the end-of-think token has not been generated. It would go as:

Extract sequences where end-of-think is present

Run process-logits on them

Re-build the logits array with all sequences.

What do you think?

That would be the best although it means the downstream logits processor needs to be able to handle tensors of different batch sizes and not always in the same order. I'm going to look into how constraining it is.

Alternative implementation of thinking mode

26f3751

RobinPicard requested a review from rlouf August 11, 2025 16:39

rlouf reviewed Aug 12, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Alternative implementation of thinking mode #1723

Alternative implementation of thinking mode #1723

Uh oh!

RobinPicard commented Aug 11, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

rlouf Aug 12, 2025

Uh oh!

RobinPicard Aug 12, 2025

Uh oh!

Uh oh!

Alternative implementation of thinking mode #1723

Are you sure you want to change the base?

Alternative implementation of thinking mode #1723

Uh oh!

Conversation

RobinPicard commented Aug 11, 2025

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

rlouf Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

RobinPicard Aug 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!