-
Notifications
You must be signed in to change notification settings - Fork 110
Description
This issue is for a: (mark with an x)
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
- Clone the
ai-rag-chat-evaluatorrepository. - Follow the setup instructions to install dependencies and configure the environment.
- Run the evaluation script with the provided command:
python -m scripts.evaluate --config=config.json
Any log messages given by the failure
@Myname ➜ /workspaces/ai-rag-chat-evaluator (main) $ python -m scripts evaluate --config=config.json
[17:57:29] INFO Running evaluation from config /workspaces/ai-rag-chat-evaluator/config.json
INFO Replaced results_dir in config with timestamp
INFO Using Azure OpenAI Service with Azure Developer CLI Credential
INFO Running evaluation using data from /workspaces/ai-rag-chat-evaluator/example_input/qa.jsonl
INFO Sending a test question to the target to ensure it is running...
ERROR Failed to send a test question to the target due to error:
Response from target https://MYBACKEND.azurewebsites.net/chat is not valid JSON:
Make sure that your configuration points at a chat endpoint that returns a single JSON object.
ERROR Evaluation was terminated early due to an error ⬆
Expected/desired behavior
The evaluation script should successfully communicate with the chat endpoint, and the evaluation should proceed without errors.
OS and Version?
Ubuntu 20.04 LTS
Versions
Python: 3.10
Scripts version: Latest from the main branch as of 17.07.2024
Mention any other details that might be useful
The target_url in my config.json points to https://MYBACKEND.azurewebsites.net/chat.
The chat endpoint should return a single JSON object but seems not to be in the expected format.
The application is deployed as per the instructions in the repository documentation.
The app at https://MYBACKEND.azurewebsites.net has enabled authentication which might be affecting the evaluation script.