gpt-5 model fails to complete my tasks while gpt-4.1 completes

Switching the model from `gpt-4.1-nano` to `gpt-5-nano` causes unusually high local CPU and RAM usage during agent runs (See attached image showing an 8-core container jumping to ~800% CPU with gpt-5-nano, while gpt-4.1-nano remains comparatively low.). 

With gpt-4.1-nano, CPU usage is modest and stable. With gpt-5-nano, CPU spikes up to 800% (sometimes up to 1200–1400%) (Docker Desktop shows 8 CPUs available), along with a noticeable memory surge. This happens in an agent-based, tool-using workflow. The only change is the model name. No other code/config/process changes.


Example code to run:

```python
import os
from dotenv import load_dotenv
from agents import Agent, function_tool, Runner, trace, AgentOutputSchema, TResponseInputItem, ModelSettings
from datetime import datetime
from typing import Any, Optional, List
from pydantic import BaseModel, Field
from scrapling.fetchers import StealthyFetcher
import requests

fetcher = StealthyFetcher()

load_dotenv()

LLM_MODEL = 'gpt-5-nano'

@function_tool
def internet_search(query: str, url: Optional[str]=None, location: Optional[str]=None, language: Optional[str]=None):
    """
    Perform a Google search with optional site, location, and language parameters.
    Args:
        query (str): Search term.
        url (str): Restrict results to a specific site (e.g., 'bbc.com').
        location (str): Country code (e.g., 'US', 'GB').
        language (str): Language code (e.g., 'en', 'de').
    Returns:
        List[dict]: Search results.
    """
    if url:
        query = f"site:{url} {query}"
    params = {
        'q': query,
        'cx': os.getenv('GOOGLE_SEARCH_ENGINE_ID'),
        'key': os.getenv('GOOGLE_SEARCH_API_KEY'),
        'num': 10,
    }
    if language:
        params['lr'] = f'lang_{language}'
    if location:
        params['gl'] = location

    url = 'https://www.googleapis.com/customsearch/v1'
    resp = requests.get(url, params=params)
    resp.raise_for_status()
    data = resp.json()
    results = []
    for item in data.get('items', []):
        results.append({
            'title': item.get('title'),
            'link': item.get('link'),
            'snippet': item.get('snippet'),
        })
    return results


@function_tool
async def scrape_url(url: str) -> str:
    """Scrape the content of a URL and return it as Markdown."""
    page = await fetcher.async_fetch(url, headless=True, network_idle=True)
    text = page.get_all_text()
    return text.strip()


# ---------------- AGENTS ----------------
# WEB RESEARCH AGENT
class WebResearch(BaseModel):
    summary: str = Field(..., description="A brief summary of the research findings.")
    urls: List[str] = Field(..., description="List of urls used in the research.")

web_research_agent = Agent(
    name="Web Research Agent",
    model=LLM_MODEL,
    model_settings=ModelSettings(max_tokens=32000),
    instructions=(
        f"Today is {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} and you are the Web Research Agent.\n"
        "You are tasked to search the web for the topic provided by the user, summarize the findings, and return a TLDR with sources.\n"
        "First, search for the topic using the internet search tool.\n"
        "Then, scrape any potential URLs for additional content.\n"
        "Repeat this process until you have a comprehensive understanding of the topic.\n"
    ),
    tools=[internet_search, scrape_url],
    output_type=AgentOutputSchema(WebResearch, strict_json_schema=False)
)

# MAIN AGENT (Orchestrator)
class Findings(BaseModel):
    web_findings: WebResearch = Field(..., description="Web research findings")

main_agent = Agent(
    name="Main Agent",
    model=LLM_MODEL,
    model_settings=ModelSettings(max_tokens=32000),
    instructions=(
        f"Today is {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} and you are the Main Agent.\n"
        "Your task is to generate a digest for the topic provided by the user.\n"
        "You will coordinate with the Web Research agent to gather comprehensive information.\n"
        "You will instruct the Web Research Agent to search the web for the topic. The goal is to find Names, Websites and other resources to dig deeper into the topic.\n"
        "After gathering all findings, you will compile them into a structured format and return the results.\n"
        "Make sure to use the tools provided to gather current information and do not rely solely on your own knowledge.\n"
        "If any findings are missing details, ambiguous, or not actionable, instruct the respective agents to refine or expand their search and use tools again.\n"
    ),
    tools=[
        web_research_agent.as_tool(
            tool_name="web_research_agent",
            tool_description="Search the web for up-to-date information"
        )
    ],
    output_type=AgentOutputSchema(Findings, strict_json_schema=False)
)

# SUMMARIZER AGENT
class Digest(BaseModel):
    summary: str = Field(..., description="A concise summary of the findings")
    urls: List[str] = Field(..., description="List of URLs used in the research")

summarizer_agent = Agent(
    name="Summarizer Agent",
    model=LLM_MODEL,
    model_settings=ModelSettings(max_tokens=32000),
    instructions=(
        f"Today is {datetime.now().strftime('%Y-%m-%d %H:%M:%S')} and you are the Summarizer Agent.\n"
        "Your task is to create a digest from the findings provided by the Main Agent.\n"
        "You will summarize the findings into a concise digest and return it with the sources used.\n"
    ),
    tools=[],
    output_type=AgentOutputSchema(Digest, strict_json_schema=False),
)

# ---------------- RUN ----------------
async def generate_digest(topic) -> Any:
    input_items: List[TResponseInputItem] = [{"content": topic, "role": "user"}]

    with trace(workflow_name="COOKBOOK", group_id="Digest"):
        research_results = await Runner.run(main_agent, input_items)
        thread = research_results.to_input_list() + [{"role": "user", "content": f"Please summarize these findings for this topic:\n\n{topic}"}]
        digest = await Runner.run(summarizer_agent, thread)

        digest_output = digest.final_output
        return digest_output.model_dump()

await generate_digest("Which company has the best performing LLM right now?")
```

You can run it either on docker or notebook and monitor the CPU usage.
My assumption is that it calls so many functions as tools that it overwhelms the CPU and even causes so many of them to fail due to timeout (judging by the traces)

P.S. I'm not sure about the "bug" label i have used (maybe there is some ways around this behavior) but since it causes failure i thought it needed more attention.

<img width="636" height="757" alt="Image" src="https://github.com/user-attachments/assets/03b8fccf-dd78-45a6-8e0b-e52e66a6e843" />
<img width="791" height="504" alt="Image" src="https://github.com/user-attachments/assets/1c584c00-20d4-4234-9b79-a6e97b8c0d0e" />

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

gpt-5 model fails to complete my tasks while gpt-4.1 completes #1409

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

gpt-5 model fails to complete my tasks while gpt-4.1 completes #1409

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions