-
Notifications
You must be signed in to change notification settings - Fork 71
Make llm-complete-guide work again
#164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Introduces a new function `run_llm_judged_tests` to perform end-to-end tests on RAG systems using LLM evaluation. The implementation includes: - Parallel processing of test cases - Scoring for toxicity, faithfulness, helpfulness, and relevance - Retry logic for robust test execution - Detailed logging of test results
Enhance the evaluation visualization step by logging detailed metrics to ZenML, including: - Retrieval performance metrics - Generation failure rates - Quality scores (toxicity, faithfulness, helpfulness, relevance) - Composite scores for overall quality and retrieval effectiveness
Refactor import statements in eval_retrieval.py and eval_visualisation.py to: - Remove unused imports - Organize imports consistently - Simplify import statements
Simplify the dev/rag.yaml configuration by removing the commented "environment configuration" line, keeping the configuration clean and concise.
Modify the default temperature parameter in get_completion_from_messages() from 0.4 to 0, ensuring more deterministic and focused model responses.
Modify Hugging Face space deployment to ensure ZenML store secrets are converted to strings before adding, preventing potential type-related errors during deployment.
htahir1
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
postgres is a bit hard locally - is there a way we can change this to a leaner DB? even something like sqlite?
I'll see what we can do. Problem is that sqlite doesn't really support vector search etc like Postgres does. But maybe I can refactor that out in a separate PR? |
|
I guess I could try https://alexgarcia.xyz/sqlite-vec/ but I'd like to do it as a completely self-contained PR, not in this one please. WDYT @htahir1 ? |
Update project dependencies to include: - Elasticsearch for potential search and indexing functionality - Tenacity for improved retry handling in various components
|
Yes @strickvl lets just merge this one so its fixed on main |
- Add explicit constants for ZenML chatbot model name and version - Enhance find_vectorstore_name() function with error handling and fallback mechanism - Improve logging for vector store metadata retrieval
Quite a few small fixes.
Also made Postgresql the default DB again.
And parallelise the evals.