Can we add integratin test cases that actually call live APIs with fake content? I belive we could have keys for openai/anthropic/hugging_face and run cheap calls of test profiles that server as a strong safety net.