|
23 | 23 | /** |
24 | 24 | * Implementation of {@link Evaluator} used to evaluate the factual accuracy of Large |
25 | 25 | * Language Model (LLM) responses against provided context. |
26 | | - * <p/> |
| 26 | + * <p> |
27 | 27 | * This evaluator addresses a specific type of potential error in LLM outputs known as |
28 | 28 | * "hallucination" in the context of grounded factuality. It verifies whether a given |
29 | 29 | * statement (the "claim") is logically supported by a provided context (the "document"). |
30 | | - * <p/> |
| 30 | + * <p> |
31 | 31 | * Key concepts: - Document: The context or grounding information against which the claim |
32 | 32 | * is checked. - Claim: The statement to be verified against the document. |
33 | | - * <p/> |
| 33 | + * <p> |
34 | 34 | * The evaluator uses a prompt-based approach with a separate, typically smaller and more |
35 | 35 | * efficient LLM to perform the fact-checking. This design choice allows for |
36 | 36 | * cost-effective and rapid verification, which is crucial when evaluating longer LLM |
37 | 37 | * outputs that may require multiple verification steps. |
38 | | - * <p/> |
| 38 | + * <p> |
39 | 39 | * Implementation note: For efficient and accurate fact-checking, consider using |
40 | 40 | * specialized models like Bespoke-Minicheck, a grounded factuality checking model |
41 | 41 | * developed by Bespoke Labs and available in Ollama. Such models are specifically |
|
45 | 45 | * Hallucinations with Bespoke-Minicheck</a> and the research paper: |
46 | 46 | * <a href="https://arxiv.org/pdf/2404.10774v1">MiniCheck: An Efficient Method for LLM |
47 | 47 | * Hallucination Detection</a> |
48 | | - * <p/> |
| 48 | + * <p> |
49 | 49 | * Note: This evaluator is specifically designed to fact-check statements against given |
50 | 50 | * information. It's not meant for other types of accuracy tests, like quizzing an AI on |
51 | 51 | * obscure facts without giving it any reference material to work with (so-called 'closed |
52 | 52 | * book' scenarios). |
53 | | - * <p/> |
| 53 | + * <p> |
54 | 54 | * The evaluation process aims to determine if the claim is supported by the document, |
55 | 55 | * returning a boolean result indicating whether the fact-check passed or failed. |
56 | 56 | * |
|
0 commit comments