-
Notifications
You must be signed in to change notification settings - Fork 8.2k
Open
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.
Description
Self Checks
- I have searched for existing issues search for existing issues, including closed ones.
- I confirm that I am using English to submit this report (Language Policy).
- Non-english title submitions will be closed directly ( 非英文标题的提交将会被直接关闭 ) (Language Policy).
- Please do not modify this template :) and fill in all the required fields.
RAGFlow workspace code commit ID
0.24.0
RAGFlow image version
0.24.0
Other environment information
Actual behavior
When the output from the LLM contains LaTeX math, ragflow will fail to render the math properly. In previous versions, it will look like #8954. In this new 0.24.0 version, it will cut the response in the middle of the math:
Expected behavior
From the log, copy the actual LLM response content.
In a Jupyter Notebook, this piece of code will render the most of it correctly.
text = r"""ACTUAL CONTENT GOES HERE"""
from IPython.display import display, Markdown, Latex
display(Markdown(text.encode().decode('unicode_escape')))where text is the variable containing the response.
Steps to reproduce
1. Parse the paper given below
2. Import the agent given below
3. Ask "In this paper, the author used some really old parsers, while I am going to use modern T5 models. Hence, it's formula for calculating Model-based Confidence will not work. What could I do to calculate Model-based Confidence for my work?"Additional information
This is the log contains the LLM response
agent_response contains the actual response content.
Details
ragflow-cpu-1 | 2026-02-11 18:11:06,423 INFO 25 handle_task done for task {"id": "1152ec5207b811f1ac583f92f7ca4b97", "create_time": 1770862257125, "create_date": "2026-02-11 18:10:57", "update_time": 1770862257125, "update_date": "2026-02-11 18:10:57", "doc_id": "e89bb65ae43d11f0a1b4f276361bc6b7", "from_page": 0, "to_page": 100000000, "task_type": "memory", "priority": 0, "begin_at": null, "process_duration": 0.0, "progress": 0.0, "progress_msg": "", "retry_count": 0, "digest": "60", "chunk_ids": "", "memory_id": "e89bb65ae43d11f0a1b4f276361bc6b7", "source_id": 60, "message_dict": {"user_id": "38e8abd861b111f0bd2f0a17e63f93cd", "agent_id": "81688c5083f011f0bf4992dd7cc36ab6", "session_id": "81688c5083f011f0bf4992dd7cc36ab6", "user_input": "Can you repeat the recommended approaches? Your answer is not complete", "agent_response": "# Recommended Approaches for T5-based Model Confidence (Complete Version)\n\nBased on the literature and adapting from Nishida and Matsumoto's framework, here are the complete recommended approaches for calculating model-based confidence with T5 models:\n\n---\n\n## 1. Sequence-Level Log-Probability (Primary Recommendation)\n\nFor T5 models, the most direct analog to model-based confidence is the **generation probability**:\n\n$$\nC_{seq}(y|x) = \\frac{1}{|y|} \\sum_{t=1}^{|y|} \\log P(y_t | y_{<t}, x)\n$$\n\nWhere:\n- *x* = input document/EDUs\n- *y* = predicted output (dependency structure)\n- *|y|* = sequence length (for length normalization)\n- *P(y_t | y_{<t}, x)* = softmax probability of token y_t\n\n**Implementation**: In HuggingFace, use `model.generate()` with `return_dict_in_generate=True` and `output_scores=True` to obtain `sequences_scores` (log-probability).\n\n---\n\n## 2. Normalized Perplexity-based Confidence\n\nConvert log-probability to a perplexity-based confidence score:\n\n$$\nPPL(y|x) = \\exp\\left(-\\frac{1}{|y|} \\sum_{t=1}^{|y|} \\log P(y_t | y_{<t}, x)\\right)\n$$\n\n$$\nC_{ppl}(y|x) = \\frac{1}{PPL(y|x)}\n$$\n\n**Lower perplexity \u2192 Higher confidence**. Invert and normalize to get a confidence score in [0,1]:\n\n$$\nC_{norm} = \\frac{C_{ppl} - C_{min}}{C_{max} - C_{min}}\n$$\n\n---\n\n## 3. Document-Level Confidence Aggregation\n\nFollowing Nishida and Matsumoto's approach of document-level confidence, aggregate EDU-pair predictions:\n\n$$\nC_{doc}(D) = \\frac{1}{|E|} \\sum_{(h,d) \\in E} C(h, d)\n$$\n\nWhere:\n- *E* = set of predicted EDU attachments in document *D*\n- *C(h,d)* = confidence for individual head-dependent pair\n\n**Important caveat**: Documents with higher confidence scores are biased to shorter documents [ID:244]. Consider normalizing by document length:\n\n$$\nC_{doc}^{norm}(D) = \\frac{C_{doc}(D)}{\\alpha \\cdot \\log(|EDU|) + \\beta}\n$$\n\nWhere \u03b1 and \u03b2 are tunable hyperparameters.\n\n---\n\n## 4. Entropy-based Uncertainty\n\nMeasure prediction uncertainty using output distribution entropy:\n\n$$\nH(y_t) = -\\sum_{v \\in V} P(v | y_{<t}, x) \\log P(v | y_{<t}, x)\n$$\n\n$$\nC_{entropy}(y|x) = 1 - \\frac{1}{|y|} \\sum_{t=1}^{|y|} \\frac{H(y_t)}{\\log|V|}\n$$\n\nWhere:\n- *V* = vocabulary\n- *H(y_t)* = entropy at decoding step *t*\n- Normalized by *log|V|* to scale between [0,1]\n\n**High entropy = Low confidence** (model is uncertain).\n\n---\n\n## 5. Token-Level Max Probability\n\nFor each generated token, take the maximum softmax probability:\n\n$$\nC_{max}(y|x) = \\frac{1}{|y|} \\sum_{t=1}^{|y|} \\max_{v \\in V} P(v | y_{<t}, x)\n$$\n\nThis measures how confident the model is at each decoding step.\n\n---\n\n## 6. Margin-based Confidence\n\nMeasure the gap between the top prediction and the second-best:\n\n$$\nC_{margin}(y|x) = \\frac{1}{|y|} \\sum_{t=1}^{|y|} \\left( P(y_t | y_{<t}, x) - P(y_t^{(2)} | y_{<t}, x) \\right)\n$$\n\nWhere *y_t^{(2)}* is the second most probable token at step *t*.\n\n**Larger margin = Higher confidence**.\n\n---\n\n## 7. Consistency-based Confidence (Multi-sample)\n\nUse multiple decoding strategies and measure agreement:\n\n**Method A: Multiple Sampling**\n- Generate *k* samples using different random seeds or temperature settings\n- Calculate agreement rate:\n\n$$\nC_{consist}(x) = \\frac{1}{\\binom{k}{2}} \\sum_{i < j} \\mathbb{1}[y_i = y_j]\n$$\n\n**Method B: Beam vs. Sampling Comparison**\n- Compare beam search output with nucleus/top-k sampling outputs\n- Higher agreement = Higher confidence\n\n**Method C: MC Dropout**\n- Apply dropout at inference time multiple times\n- Measure variance across predictions:\n\n$$\nC_{MC}(x) = 1 - \\text{Var}(\\{y_1, y_2, ..., y_k\\})\n$$\n\n---\n\n## Summary Table\n\n| **Approach** | **Formula Basis** | **Pros** | **Cons** |\n|--------------|-------------------|----------|----------|\n| Seq-level log-prob | $\\frac{1}{|y|}\\sum \\log P(y_t)$ | Direct, easy to compute | May be overconfident |\n| Perplexity-based | $1/PPL$ | Intuitive interpretation | Requires normalization |\n| Document-level aggregation | $\\frac{1}{|E|}\\sum C(h,d)$ | Matches Nishida's framework | Biased to short docs |\n| Entropy-based | $1 - \\bar{H}/\\log|V|$ | Captures uncertainty | Computationally heavier |\n| Consistency-based | Multi-sample agreement | Robust to noise | Requires multiple passes |\n| Token-level max prob | $\\frac{1}{|y|}\\sum \\max P$ | Simple | Ignores distribution shape |\n| Margin-based | Top-1 minus Top-2 prob | Captures decision boundary | May miss multimodal cases |\n\n---\n\n## Recommended Combination\n\nFor best results, consider combining multiple approaches:\n\n$$\nC_{final} = \\alpha \\cdot C_{seq} + \\beta \\cdot C_{entropy} + \\gamma \\cdot C_{consist}\n$$\n\nWith weights tuned on a validation set. Additionally:\n- Apply **length normalization** to avoid bias toward short documents [ID:244]\n- Use **label class balancing** when selecting pseudo-labeled samples [ID:329]\n- Consider **human-in-the-loop** for low-confidence examples [ID:329]"}}
Upon further observation, the rendering will stop when '<' happens in the math.
The paper for reproducing the issue:
The agent I use:
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
🐞 bugSomething isn't working, pull request that fix bug.Something isn't working, pull request that fix bug.