Skip to content

Commit 008df4d

Browse files
authored
Update README.md
1 parent 9cf93d3 commit 008df4d

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

README.md

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -103,8 +103,11 @@ Using JudgeIt framework is simple, just pick what is the task you want to evalua
103103
- [ ] Text2Sql Task support
104104
- [ ] Liberal vs Conservative Judge Options for verbose vs crisp RAG comparison
105105
- [ ] Query-Rewrite support for More-than 2 turn
106+
- [ ] Specific support for differnt LLM generated text formats (Like Anthropic etc.)
106107

107-
**Known-Limitation** : We found that the framework sometimes becomes extremely conservative when there is a large gap between the size of the golden text and the generated text. For RAG comparisons, if the golden text is one page long and the generated text is just a few lines, it tends to treat them as dissimilar. We are currently working on a more 'liberal' version of this judge, which will be released soon.
108+
**Known-Limitation** :
109+
1. Verbose vs Crisp RAG Answers - We found that the framework sometimes becomes extremely conservative when there is a large gap between the size of the golden text and the generated text. For RAG comparisons, if the golden text is one page long and the generated text is just a few lines, it tends to treat them as dissimilar. We are currently working on a more 'liberal' version of this judge, which will be released soon.
110+
2. Comparing outputs for two LLMs.- We found that some LLM's preface their answers with some text, which can throw Judge off. When comparing two LLM try to avoid standard text like "I am just an average Language Model...etc". We are working on next version which will do this automatically.
108111

109112
## SuperKnowa
110113
JudgeIt is the latest framework from the SuperKnowa project. Do check out our other repos for building RAG pipelines. text2sql etc. here.

0 commit comments

Comments
 (0)