Skip to content

[Bug] ProgramOfThought and CodeAct fail using Evaluate with num_threads > 1 #9082

@adaminsky

Description

@adaminsky

What happened?

Running a simple evaluation of the ProgramOfThought module using dspy.Evaluate with num_threads > 1 results in "I/O operation on closed file" errors when it should run to completion. Currently, ProgramOfThought and CodeAct are not usable for me due to this concurrency issue.

Steps to reproduce

The following is a simple evaluation of ProgramOfThought on AIME-2025 which results in the error. Replacing ProgramOfThought with CodeAct results in the same issue as well.

import dspy
from datasets import load_dataset

def metric(example, prediction, trace=None, pred_name=None, pred_trace=None):
    correct_answer = int(example['answer'])
    try:
        llm_answer = int(prediction.answer.replace('[','').replace(']','').strip())
    except ValueError as e:
        return 0
    return int(correct_answer == llm_answer)

def load_aime25():
    dataset = load_dataset("yentinglin/aime_2025", "default", split="train")
    return [
        dspy.Example(question=sample["problem"], answer=str(sample["answer"]))
        .with_inputs("question")
        for sample in dataset
    ]

devset = load_aime25()
lm = dspy.LM(
    "openai/gpt-4.1-mini",
    api_key=API_KEY,
    temperature=1.0,
    max_tokens=32000,
)
dspy.configure(lm=lm)

evaluator = dspy.Evaluate(devset=devset, metric=metric, num_threads=10, display_progress=True, display_table=True, max_errors=50, provide_traceback=True)
signature = "question -> answer"
agent = dspy.ProgramOfThought(signature, max_iters=10)
evaluation = evaluator(agent)

The problem is that both ProgramOfThought and CodeAct initialize a single interpreter in their constructor and then shut down the interpreter at the end of the forward function. When Evaluate with num_threads > 1 is used, I believe this shares the interpreter across threads, leading to issues when one thread shuts down the shared interpreter.

One way to fix this is that there can still be a single interpreter object created in the constructor, but it should be copied before use in the forward function of CodeAct and ProgramOfThought.

DSPy version

3.0.4 and HEAD of main

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions