Skip to content

Commit 2b068fa

Browse files
authored
Update README.md
1 parent 008df4d commit 2b068fa

File tree

1 file changed

+3
-3
lines changed

1 file changed

+3
-3
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ The single biggest challenge in scaling any GenAI solution (such as RAG, multi-t
44

55
JudgeIt is an automated evaluation framework built to accurately and efficiently assess various Generative AI pipelines, including RAG, multi-turn query rewriting (conversation memory), text-to-SQL conversion, and more. This service allows users to conduct batch evaluations across these different Generative AI pipelines. Users can input datasets containing generated text along with corresponding golden text. JudgeIt then employs an LLM-as-a-judge to perform similarity evaluations between these inputs, mimicking human evaluation and providing an accurate assessment of the GenAI pipeline's performance.
66

7-
his results in saving 30 times the time spent on manual testing for each RAG pipeline version, allowing AI engineers to run 10 times more experiments and achieve the desired accuracy much faster.
7+
This results in saving 30 times the time spent on manual testing for each RAG pipeline version, allowing AI engineers to run 10 times more experiments and achieve the desired accuracy much faster.
88

99

1010
<!-- ![JudgeIt Flow](/images/flow-diagram.png) -->
@@ -102,8 +102,8 @@ Using JudgeIt framework is simple, just pick what is the task you want to evalua
102102
- [ ] Mixtral-Large as Judge Model
103103
- [ ] Text2Sql Task support
104104
- [ ] Liberal vs Conservative Judge Options for verbose vs crisp RAG comparison
105-
- [ ] Query-Rewrite support for More-than 2 turn
106-
- [ ] Specific support for differnt LLM generated text formats (Like Anthropic etc.)
105+
- [ ] Query-Rewrite support for More-than 2-turn
106+
- [ ] Specific support for multiple LLM generated text formats (Like Anthropic etc.)
107107

108108
**Known-Limitation** :
109109
1. Verbose vs Crisp RAG Answers - We found that the framework sometimes becomes extremely conservative when there is a large gap between the size of the golden text and the generated text. For RAG comparisons, if the golden text is one page long and the generated text is just a few lines, it tends to treat them as dissimilar. We are currently working on a more 'liberal' version of this judge, which will be released soon.

0 commit comments

Comments
 (0)