Fix flaky testPhi4 and testVoxtral by setting temperature=0 (#16517)

psiddh · web-flow · commit 10f72fca87b6 · 2026-01-09T12:00:54.000-08:00
Summary:
Both tests were flaky because LLM outputs are non-deterministic with the
default temperature of 0.8, which uses RNG-based sampling with a
time-based seed. Setting temperature=0 enables greedy argmax decoding,
eliminating randomness and making assertions on generated text reliable.

This is consistent with how other LLM tests and production runners in
the codebase handle determinism (e.g., test_text_decoder_runner.cpp,
test_sampler.cpp, and QNN/QAI Hub runners).


This fixes 5 flaky tests

 {F1984490494}

Reviewed By: shoumikhin

Differential Revision: D90361187
diff --git a/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift b/extension/llm/apple/ExecuTorchLLM/__tests__/MultimodalRunnerTest.swift
@@ -238,6 +238,7 @@ class MultimodalRunnerTest: XCTestCase {
         MultimodalInput(String(format: chatTemplate, userPrompt)),
       ], Config {
         $0.maximumNewTokens = 256
+        $0.temperature = 0
       }) { token in
         text += token
       }
diff --git a/extension/llm/apple/ExecuTorchLLM/__tests__/TextRunnerTest.swift b/extension/llm/apple/ExecuTorchLLM/__tests__/TextRunnerTest.swift
@@ -87,6 +87,7 @@ class TextRunnerTest: XCTestCase {
     do {
       try runner.generate(userPrompt, Config {
         $0.sequenceLength = sequenceLength
+        $0.temperature = 0
       }) { token in
         text += token
       }
@@ -100,6 +101,7 @@ class TextRunnerTest: XCTestCase {
     do {
       try runner.generate(userPrompt, Config {
         $0.sequenceLength = sequenceLength
+        $0.temperature = 0
       }) { token in
         text += token
       }

Original file line number	Diff line number	Diff line change
`@@ -238,6 +238,7 @@ class MultimodalRunnerTest: XCTestCase {`
`238`	`238`	`MultimodalInput(String(format: chatTemplate, userPrompt)),`
`239`	`239`	`], Config {`
`240`	`240`	`$0.maximumNewTokens = 256`
	`241`	`+ $0.temperature = 0`
`241`	`242`	`}) { token in`
`242`	`243`	`text += token`
`243`	`244`	`}`