Skip to content

Commit d23876a

Browse files
authored
Merge pull request #134 from seanpjlab/dev
feat: 1. Standardize dataman class name 2. update README.md: delete changelog and add deepwiki badge
2 parents c0411d4 + 2245c1c commit d23876a

File tree

8 files changed

+16
-15
lines changed

8 files changed

+16
-15
lines changed

README.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
<a href="https://github.com/DataEval/dingo/network/members"><img src="https://img.shields.io/github/forks/DataEval/dingo" alt="GitHub forks"></a>
1515
<a href="https://github.com/DataEval/dingo/issues"><img src="https://img.shields.io/github/issues/DataEval/dingo" alt="GitHub issues"></a>
1616
<a href="https://mseep.ai/app/dataeval-dingo"><img src="https://mseep.net/pr/dataeval-dingo-badge.png" alt="MseeP.ai Security Assessment Badge" height="20"></a>
17+
<a href="https://deepwiki.com/MigoXLab/dingo"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
1718
</p>
1819

1920
</div>
@@ -33,10 +34,6 @@
3334
</p>
3435

3536

36-
# Changelog
37-
38-
- 2024/12/27: Project Initialization
39-
4037
# Introduction
4138

4239
Dingo is a data quality evaluation tool that helps you automatically detect data quality issues in your datasets. Dingo provides a variety of built-in rules and model evaluation methods, and also supports custom evaluation methods. Dingo supports commonly used text datasets and multimodal datasets, including pre-training datasets, fine-tuning datasets, and evaluation datasets. In addition, Dingo supports multiple usage methods, including local CLI and SDK, making it easy to integrate into various evaluation platforms, such as [OpenCompass](https://github.com/open-compass/opencompass).

README_ja.md

Lines changed: 1 addition & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
<a href="https://github.com/DataEval/dingo/network/members"><img src="https://img.shields.io/github/forks/DataEval/dingo" alt="GitHub forks"></a>
1515
<a href="https://github.com/DataEval/dingo/issues"><img src="https://img.shields.io/github/issues/DataEval/dingo" alt="GitHub issues"></a>
1616
<a href="https://mseep.ai/app/dataeval-dingo"><img src="https://mseep.net/pr/dataeval-dingo-badge.png" alt="MseeP.ai Security Assessment Badge" height="20"></a>
17+
<a href="https://deepwiki.com/MigoXLab/dingo"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
1718
</p>
1819

1920
</div>
@@ -33,10 +34,6 @@
3334
</p>
3435

3536

36-
# 更新履歴
37-
38-
- 2024/12/27: プロジェクト初期化
39-
4037
# はじめに
4138

4239
Dingoは、データセット内のデータ品質問題を自動的に検出するデータ品質評価ツールです。Dingoは様々な組み込みルールとモデル評価手法を提供し、カスタム評価手法もサポートしています。Dingoは一般的に使用されるテキストデータセットとマルチモーダルデータセット(事前学習データセット、ファインチューニングデータセット、評価データセットを含む)をサポートしています。さらに、DingoはローカルCLIやSDKなど複数の使用方法をサポートし、[OpenCompass](https://github.com/open-compass/opencompass)などの様々な評価プラットフォームに簡単に統合できます。

README_zh-CN.md

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@
1414
<a href="https://github.com/DataEval/dingo/network/members"><img src="https://img.shields.io/github/forks/DataEval/dingo" alt="GitHub 分支"></a>
1515
<a href="https://github.com/DataEval/dingo/issues"><img src="https://img.shields.io/github/issues/DataEval/dingo" alt="GitHub 问题"></a>
1616
<a href="https://mseep.ai/app/dataeval-dingo"><img src="https://mseep.net/pr/dataeval-dingo-badge.png" alt="MseeP.ai 安全评估徽章" height="20"></a>
17+
<a href="https://deepwiki.com/MigoXLab/dingo"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
1718
</p>
1819

1920

@@ -30,9 +31,6 @@
3031

3132
</div>
3233

33-
# Changelog
34-
35-
- 2024/12/27: Project Initialization
3634

3735
# 介绍
3836

app_gradio/app.py

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -236,6 +236,7 @@ def get_data_column_mapping():
236236
'LLMText3HHonest': ['content'],
237237
'LLMClassifyTopic': ['content'],
238238
'LLMClassifyQR': ['content'],
239+
'LLMDatamanAssessment': ['content'],
239240
'VLMImageRelevant': ['prompt', 'content'],
240241

241242
# rule
Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,17 +3,19 @@
33
from dingo.model import Model
44
from dingo.model.llm.base_openai import BaseOpenAI
55
from dingo.model.modelres import ModelRes
6+
from dingo.model.prompt.prompt_dataman_assessment import PromptDataManAssessment
67
from dingo.model.response.response_class import ResponseScoreTypeNameReason
78
from dingo.utils import log
89
from dingo.utils.exception import ConvertJsonError
910

1011

11-
@Model.llm_register("dataman_assessment")
12-
class DatamanAssessment(BaseOpenAI):
12+
@Model.llm_register("LLMDatamanAssessment")
13+
class LLMDatamanAssessment(BaseOpenAI):
1314
"""
1415
Implementation of DataMan assessment using OpenAI API.
1516
Evaluates text based on 14 quality standards and assigns a domain type.
1617
"""
18+
prompt = PromptDataManAssessment
1719

1820
@classmethod
1921
def process_response(cls, response: str) -> ModelRes:

dingo/model/prompt/prompt_dataman_assessment.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -82,7 +82,7 @@
8282
"""
8383

8484

85-
@Model.prompt_register("DATAMAN_ASSESSMENT", [], ['dataman_assessment'])
85+
@Model.prompt_register("DATAMAN_ASSESSMENT", [], ['LLMDatamanAssessment'])
8686
class PromptDataManAssessment(BasePrompt):
8787

8888
# Metadata for documentation generation

docs/metrics.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,3 +55,9 @@ This document provides comprehensive information about all quality metrics used
5555
| `QUALITY_BAD_IMG_RELEVANCE` | RuleImageTextSimilarity | Evaluates semantic similarity between image and text content using CLIP model | [Learning Transferable Visual Representations with Natural Language Supervision](https://arxiv.org/abs/2103.00020) (Radford et al., 2021) | N/A |
5656
| `QUALITY_BAD_IMG_SIMILARITY` | RuleImageRepeat | Detects duplicate images using PHash and CNN methods to ensure data diversity | [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) (Krizhevsky et al., 2012) | N/A |
5757

58+
### Text Generation
59+
60+
| Type | Metric | Description | Paper Source | Evaluation Results |
61+
|------|--------|-------------|--------------|-------------------|
62+
| `PromptLongVideoQa` | PromptLongVideoQa | Generate video-related question-answer pairs based on the summarized information of the input long video. | [VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos](https://arxiv.org/abs/2506.108572) (Jiashuo Yu et al., 2025) | N/A |
63+

examples/dataman/dataman.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
},
2222
"evaluator": {
2323
"llm_config": {
24-
"dataman_assessment": {
24+
"LLMDatamanAssessment": {
2525
"key": "enter your key, such as:EMPTY",
2626
"api_url": "enter your local llm api url, such as:http://127.0.0.1:8080/v1",
2727
}

0 commit comments

Comments
 (0)