To better debug the agent’s thinking, we should log intermediate reasoning steps before output.
Could be as simple as:
- Raw LLM chain response
- Which tool/agent was selected and why
- Confidence or token count
Optional: attach as metadata to final output or as separate log file.