Skip to content

Commit 527a069

Browse files
fix: remove max_tokens cap to support thinking models (Kimi-K2.5) (#869)
* fix: remove max_tokens cap to support thinking models (Kimi-K2.5) Thinking/reasoning models like Kimi-K2.5 use output tokens for internal chain-of-thought before generating the visible response. When max_tokens was set (500 or 2048), the thinking budget consumed all available tokens, leaving an empty response — causing TreeSummarize to return '' and crashing the topic detection retry workflow. Set max_tokens default to None so the model controls its own output budget, allowing thinking models to complete both reasoning and response. Also fix process.py CLI tool to import the Celery worker app before dispatching tasks, ensuring the Redis broker config is used instead of Celery's default AMQP transport. * fix: remove max_tokens=200 cap from final title processor Same thinking model issue — 200 tokens is especially tight and would be entirely consumed by chain-of-thought reasoning, producing an empty title. * Update server/reflector/tools/process.py Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com> * fix: remove max_tokens=500 cap from topic detector processor Same thinking model fix — this is the original callsite that was failing with Kimi-K2.5, producing empty TreeSummarize responses. --------- Co-authored-by: pr-agent-monadical[bot] <198624643+pr-agent-monadical[bot]@users.noreply.github.com>
1 parent d4cc6be commit 527a069

File tree

5 files changed

+9
-4
lines changed

5 files changed

+9
-4
lines changed

server/reflector/hatchet/workflows/topic_chunk_processing.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -71,7 +71,7 @@ async def detect_chunk_topic(input: TopicChunkInput, ctx: Context) -> TopicChunk
7171
from reflector.settings import settings # noqa: PLC0415
7272
from reflector.utils.text import clean_title # noqa: PLC0415
7373

74-
llm = LLM(settings=settings, temperature=0.9, max_tokens=500)
74+
llm = LLM(settings=settings, temperature=0.9)
7575

7676
prompt = TOPIC_PROMPT.format(text=input.chunk_text)
7777
response = await llm.get_structured_response(

server/reflector/llm.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,9 @@ def _format_error(self, error: Exception, raw_output: str) -> str:
202202

203203

204204
class LLM:
205-
def __init__(self, settings, temperature: float = 0.4, max_tokens: int = 2048):
205+
def __init__(
206+
self, settings, temperature: float = 0.4, max_tokens: int | None = None
207+
):
206208
self.settings_obj = settings
207209
self.model_name = settings.LLM_MODEL
208210
self.url = settings.LLM_URL

server/reflector/processors/transcript_final_title.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ class TranscriptFinalTitleProcessor(Processor):
3939
def __init__(self, **kwargs):
4040
super().__init__(**kwargs)
4141
self.chunks: list[TitleSummary] = []
42-
self.llm = LLM(settings=settings, temperature=0.5, max_tokens=200)
42+
self.llm = LLM(settings=settings, temperature=0.5)
4343

4444
async def _push(self, data: TitleSummary):
4545
self.chunks.append(data)

server/reflector/processors/transcript_topic_detector.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -35,7 +35,7 @@ def __init__(
3535
super().__init__(**kwargs)
3636
self.transcript = None
3737
self.min_transcript_length = min_transcript_length
38-
self.llm = LLM(settings=settings, temperature=0.9, max_tokens=500)
38+
self.llm = LLM(settings=settings, temperature=0.9)
3939

4040
async def _push(self, data: Transcript):
4141
if self.transcript is None:

server/reflector/tools/process.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,9 @@
2424
pipeline_process as live_pipeline_process,
2525
)
2626
from reflector.storage import Storage
27+
from reflector.worker.app import (
28+
app as celery_app, # noqa: F401 - ensure Celery uses Redis broker
29+
)
2730

2831

2932
def validate_s3_bucket_name(bucket: str) -> None:

0 commit comments

Comments
 (0)