Skip to content

Commit e82ceb3

Browse files
authored
Merge pull request #169 from MigoXLab/main
Sync main to dev
2 parents 1b101c6 + 054b683 commit e82ceb3

File tree

4 files changed

+8
-2
lines changed

4 files changed

+8
-2
lines changed

dingo/model/llm/llm_text_3h.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ def process_response(cls, response: str) -> ModelRes:
4040
result = ModelRes()
4141

4242
# error_status
43-
if response_model.score == "1":
43+
if response_model.score == 1:
4444
result.reason = [response_model.reason]
4545
result.name = cls.prompt.__name__[8:].upper()
4646
else:

dingo/run/vsl.py

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -169,6 +169,11 @@ def parse_args():
169169
"app"],
170170
default="visualization",
171171
help="Choose the mode: visualization or app")
172+
parser.add_argument(
173+
"--port",
174+
type=int,
175+
default=8000,
176+
help="Port for local HTTP server in visualization mode (default: 8000)")
172177
return parser.parse_args()
173178

174179

@@ -195,7 +200,7 @@ def main():
195200
success, new_html_filename = process_and_inject(args.input)
196201
if success:
197202
web_static_dir = os.path.join(os.path.dirname(__file__), "..", "..", "web-static")
198-
port = 8000
203+
port = args.port
199204
try:
200205
server = start_http_server(web_static_dir, port)
201206
url = f"http://localhost:{port}/{new_html_filename}"

docs/assets/wechat.jpg

157 KB
Loading

docs/metrics.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ This document provides comprehensive information about all quality metrics used
1616

1717
| Type | Metric | Description | Paper Source | Evaluation Results |
1818
|------|--------|-------------|--------------|-------------------|
19+
| `MathCompare` | PromptMathCompare | Compares the effectiveness of two tools in extracting mathematical formulas from HTML to Markdown format by evaluatin... | Internal Implementation | N/A |
1920
| `QUALITY_BAD_HALLUCINATION` | PromptHallucination | Evaluates whether the response contains factual contradictions or hallucinations against provided context information | [TruthfulQA: Measuring How Models Mimic Human Falsehoods](https://arxiv.org/abs/2109.07958) (Lin et al., 2021) | N/A |
2021
| `QUALITY_BAD_HALLUCINATION` | RuleHallucinationHHEM | Uses Vectara's HHEM-2.1-Open model for local hallucination detection by evaluating consistency between response and c... | [HHEM-2.1-Open](https://huggingface.co/vectara/hallucination_evaluation_model) (Forrest Bao, Miaoran Li, Rogger Luo, Ofer Mendelevitch) | N/A |
2122
| `QUALITY_HARMLESS` | PromptTextHarmless | Checks if responses avoid harmful content, discriminatory language, and dangerous assistance | [Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback](https://arxiv.org/pdf/2204.05862) (Bai et al., 2022) | [📊 See Results](eval/prompt/qa_data_evaluated_by_3h.md) |

0 commit comments

Comments
 (0)