MigoXLab · e06084 · Feb 28, 2025 · Dec 31, 2024 · Dec 31, 2024 · Dec 31, 2024
diff --git a/.github/workflows/IntegrationTest.yml b/.github/workflows/IntegrationTest.yml
@@ -7,10 +7,10 @@ on:
   push:
     branches: [ "main", "dev" ]
   pull_request:
-    branches: [ "main" ]
+    branches: [ "main", "dev" ]
   workflow_dispatch:
 
-    
+
 jobs:
   build:
 

diff --git a/.github/workflows/lint.yml b/.github/workflows/lint.yml
@@ -0,0 +1,28 @@
+name: lint
+
+on: [push, pull_request]
+
+concurrency:
+  group: ${{ github.workflow }}-${{ github.ref }}
+  cancel-in-progress: true
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: [3.10.15]
+    steps:
+      - uses: actions/checkout@v3
+      - name: Set up Python ${{ matrix.python-version }}
+        uses: actions/setup-python@v4
+        with:
+          python-version: ${{ matrix.python-version }}
+      - name: Install pre-commit hook
+        run: |
+          pip install pre-commit==3.8.0
+          pre-commit install
+      - name: Linting
+        run: |
+          pre-commit sample-config > .pre-commit-config.yaml
+          pre-commit run --all-files
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,2 @@
+__pycache__/
+*.egg-info/
diff --git a/.owners.yml b/.owners.yml
@@ -0,0 +1,9 @@
+assign:
+  strategy:
+    # random
+    daily-shift-based
+  schedule:
+    '*/1 * * * *'
+  assignees:
+    - e06084
+    - shijinpjlab
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,14 @@
+# See https://pre-commit.com for more information
+# See https://pre-commit.com/hooks.html for more hooks
+repos:
+-   repo: https://github.com/pre-commit/pre-commit-hooks
+    rev: v5.0.0
+    hooks:
+    -   id: trailing-whitespace
+    -   id: end-of-file-fixer
+    -   id: check-yaml
+    -   id: check-added-large-files
+-   repo: https://github.com/PyCQA/isort
+    rev: 6.0.0
+    hooks:
+    -   id: isort
diff --git a/LICENSE b/LICENSE
@@ -198,4 +198,4 @@
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
-   limitations under the License.
+   limitations under the License.
diff --git a/README.md b/README.md
@@ -9,7 +9,17 @@
 
 </div>
 
-[English](README.md) | [简体中文](README_CN.md)
+[English](README.md) | [简体中文](README_zh-CN.md)
+
+<div align="center">
+  <a href="https://discord.gg/Jhgb2eKWh8" style="text-decoration:none;">
+    <img src="https://user-images.githubusercontent.com/25839884/218347213-c080267f-cbb6-443e-8532-8e1ed9a58ea9.png" width="3%" alt="" /></a>
+  <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
+  <a href="https://huggingface.co/spaces/DataEval/dingo" style="text-decoration:none;">
+    <img src="https://huggingface.co/datasets/huggingface/brand-assets/resolve/main/hf-logo.png" width="3%" alt="Hugging Face" /></a>
+  <img src="https://user-images.githubusercontent.com/25839884/218346358-56cc8e2f-a2b8-487f-9088-32480cceabcf.png" width="3%" alt="" />
+</div>
+
 
 # Changelog
 
@@ -83,7 +93,7 @@ $ cat test/data/config_gpt.json
   "llm_config": {
     "openai": {
       "model": "gpt-4o",
-      "key": "xxxx", 
+      "key": "xxxx",
       "api_url": "https://api.openai.com/v1/chat/completions"
     }
   }
@@ -99,7 +109,10 @@ If the user wants to manually start a frontend page, you need to enter the follo
 python -m dingo.run.vsl --input xxx
 ```
 
-The input followed is the directory of the quality inspection results. Users need to ensure that there is a summary.json file when the directory is opened.
+The input followed is the directory of the quality inspection results. Users need to ensure that there is a summary.json file when the directory is opened. Frontend page of output looks like:![GUI output](docs/assets/dingo_gui.png)
+
+## Online Demo
+Try dingo on our online demo: [(Hugging Face)🤗](https://huggingface.co/spaces/DataEval/dingo)
 
 # Feature List
 
@@ -153,17 +166,17 @@ then you can refer to: [Install Dependencies](requirements)
 
 ## Register Rules/Prompts/Models
 
-If the heuristic rules inside the project do not meet the user's quality inspection requirements, users can also customize rules or models.  
+If the heuristic rules inside the project do not meet the user's quality inspection requirements, users can also customize rules or models.
 
 ### Register Rules
 
-If the user wants to create a new rule `CommonPatternDemo`, then the first step is to add a decorator to the rule to inject the rule into the project.  
-Secondly, the `metric_type` type, such as `QUALITY_BAD_RELEVANCE`, needs to be set for the rule, and `group` does not need to be set.  
-Then the user needs to define the `DynamicRuleConfig` object, so that the properties of the rule can be configured dynamically.  
-In addition, the method name of the rule must be `eval` and it needs to be a class method.  
-The return value of the last step should be a `ModelRes` object.  
+If the user wants to create a new rule `CommonPatternDemo`, then the first step is to add a decorator to the rule to inject the rule into the project.
+Secondly, the `metric_type` type, such as `QUALITY_BAD_RELEVANCE`, needs to be set for the rule, and `group` does not need to be set.
+Then the user needs to define the `DynamicRuleConfig` object, so that the properties of the rule can be configured dynamically.
+In addition, the method name of the rule must be `eval` and it needs to be a class method.
+The return value of the last step should be a `ModelRes` object.
 
-For example: [Register Rules](examples/register/sdk_register_rule.py) 
+For example: [Register Rules](examples/register/sdk_register_rule.py)
 
 ### Register Prompts
 
@@ -173,8 +186,8 @@ For example: [Register Prompts](examples/register/sdk_register_prompt.py)
 
 ### Register Models
 
-The way to register models is slightly different, users need to implement a call_api method, accept MetaData type parameters, and return ModelRes type results.  
-There are already implemented basic model classes [BaseOpenAI](dingo/model/llm/base_openai.py) in the project, users can directly inherit.  
+The way to register models is slightly different, users need to implement a call_api method, accept MetaData type parameters, and return ModelRes type results.
+There are already implemented basic model classes [BaseOpenAI](dingo/model/llm/base_openai.py) in the project, users can directly inherit.
 If the user has special functions to implement, then you can rewrite the corresponding methods.
 
 For example: [Register Models](examples/register/sdk_register_llm.py)
@@ -185,7 +198,7 @@ For example: [Register Models](examples/register/sdk_register_llm.py)
 
 ## Execution Engine
 
-`Dingo` can run locally or on a spark cluster.  
+`Dingo` can run locally or on a spark cluster.
 Regardless of the choice of engine, the executor supports some common methods:
 
 | function name      | description              |
@@ -203,9 +216,9 @@ When choosing the spark engine, users can freely choose rules, models for qualit
 
 ### Spark Mode
 
-When choosing the spark engine, users can only choose rules for quality inspection, and models cannot be used.  
-And only `eval_group`,`save_data`,`save_correct`,`custom_config` in `InputArgs` are still valid.  
-Therefore, the user needs to input `spark_session` to initialize spark, and input `spark_rdd` (composed of `MetaData` structure) as data for quality inspection.  
+When choosing the spark engine, users can only choose rules for quality inspection, and models cannot be used.
+And only `eval_group`,`save_data`,`save_correct`,`custom_config` in `InputArgs` are still valid.
+Therefore, the user needs to input `spark_session` to initialize spark, and input `spark_rdd` (composed of `MetaData` structure) as data for quality inspection.
 It should be noted that if `save_data` is `False`, then the data in memory will be cleared immediately after the quality inspection is completed, and `spark_session` will also stop immediately.
 
 [Spark Example](examples/spark/sdk_spark.py)
@@ -275,7 +288,8 @@ If you find this project useful, please consider citing our tool:
 ```
 @misc{dingo,
   title={Dingo: A Comprehensive Data Quality Evaluation Tool for Large Models},
+  author={Dingo Contributors},
   howpublished={\url{https://github.com/DataEval/dingo}},
   year={2024}
 }
-```
+```
diff --git a/README_CN.md → README_zh-CN.md b/README_CN.md → README_zh-CN.md
@@ -82,7 +82,7 @@ $ cat test/data/config_gpt.json
   "llm_config": {
     "openai": {
       "model": "gpt-4o",
-      "key": "xxxx", 
+      "key": "xxxx",
       "api_url": "https://api.openai.com/v1/chat/completions"
     }
   }
@@ -98,7 +98,12 @@ $ cat test/data/config_gpt.json
 python -m dingo.run.vsl --input xxx
 ```
 
-input之后跟随的是质检结果的目录，用户需要确保目录打开后其中有summary.json文件
+input之后跟随的是质检结果的目录，用户需要确保目录打开后其中有summary.json文件。
+前端页面输出效果如下：![GUI output](docs/assets/dingo_gui.png)
+
+## 5.在线demo
+
+尝试使用我们的在线demo: [(Hugging Face)🤗](https://huggingface.co/spaces/DataEval/dingo)
 
 # 三、功能列表
 
@@ -152,17 +157,17 @@ Dingo 支持输出7个Quality Metrics概况报告和异常数据追溯详情报
 
 ## 2.注册规则/prompt/模型
 
-如果项目内部的启发式规则不满足用户的质检需求，用户还可以自定义规则或者模型。  
+如果项目内部的启发式规则不满足用户的质检需求，用户还可以自定义规则或者模型。
 
 ### 2.1 注册规则
 
-如果用户想要创建一个新规则 `CommonPatternDemo`，那么首先要为规则添加装饰器，将规则注入项目中。  
-其次还需要为规则设置 `metric_type` 类型，比如 `QUALITY_BAD_RELEVANCE`， `group` 可以不用设置。  
-然后用户需要定义 `DynamicRuleConfig` 对象，这样可以动态的配置规则的属性。  
-除此之外，规则的方法名称必须是 `eval` 且需要是类方法。  
-最后一步的返回值应该是 `ModelRes` 对象。  
+如果用户想要创建一个新规则 `CommonPatternDemo`，那么首先要为规则添加装饰器，将规则注入项目中。
+其次还需要为规则设置 `metric_type` 类型，比如 `QUALITY_BAD_RELEVANCE`， `group` 可以不用设置。
+然后用户需要定义 `DynamicRuleConfig` 对象，这样可以动态的配置规则的属性。
+除此之外，规则的方法名称必须是 `eval` 且需要是类方法。
+最后一步的返回值应该是 `ModelRes` 对象。
 
-例如：[注册规则](examples/register/sdk_register_rule.py) 
+例如：[注册规则](examples/register/sdk_register_rule.py)
 
 ### 2.2 注册prompt
 
@@ -172,8 +177,8 @@ Dingo 支持输出7个Quality Metrics概况报告和异常数据追溯详情报
 
 ### 2.3 注册模型
 
-注册模型的方式略有不同，用户需要实现一个call_api方法，接受MetaData类型参数，返回ModelRes类型结果。  
-项目中有已经实现好的基础模型类[BaseOpenAI](dingo/model/llm/base_openai.py)，用户可以直接继承。  
+注册模型的方式略有不同，用户需要实现一个call_api方法，接受MetaData类型参数，返回ModelRes类型结果。
+项目中有已经实现好的基础模型类[BaseOpenAI](dingo/model/llm/base_openai.py)，用户可以直接继承。
 如果用户有特殊的功能要实现，那么就可以重写对应的方法。
 
 例如：[注册模型](examples/register/sdk_register_llm.py)
@@ -184,7 +189,7 @@ Dingo 支持输出7个Quality Metrics概况报告和异常数据追溯详情报
 
 ## 4.执行引擎
 
-`Dingo` 可以在本地运行，也可以在spark集群上运行。  
+`Dingo` 可以在本地运行，也可以在spark集群上运行。
 无论选择何种引擎，executor都支持一些公共方法：
 
 | function name      | description              |
@@ -202,9 +207,9 @@ Dingo 支持输出7个Quality Metrics概况报告和异常数据追溯详情报
 
 ### 4.2 Spark Mode
 
-选择spark引擎时，用户只能选择规则进行质检，模型无法使用。  
-而且`InputArgs`中仅有`eval_group`,`save_data`,`save_correct`,`custom_config`依旧有效。  
-因此，用户需要输入`spark_session`用来初始化spark，输入`spark_rdd`（由`MetaData`结构组成）作为数据用来质检。  
+选择spark引擎时，用户只能选择规则进行质检，模型无法使用。
+而且`InputArgs`中仅有`eval_group`,`save_data`,`save_correct`,`custom_config`依旧有效。
+因此，用户需要输入`spark_session`用来初始化spark，输入`spark_rdd`（由`MetaData`结构组成）作为数据用来质检。
 需要注意，`save_data`如果为`False`，那么质检完成后会立刻清除内存中的数据，`spark_session`也立即停止。
 
 [spark示例](examples/spark/sdk_spark.py)
@@ -274,6 +279,7 @@ If you find this project useful, please consider citing our tool:
 ```
 @misc{dingo,
   title={Dingo: A Comprehensive Data Quality Evaluation Tool for Large Models},
+  author={Dingo Contributors},
   howpublished={\url{https://github.com/DataEval/dingo}},
   year={2024}
 }

diff --git a/Todo.json b/Todo.json
@@ -1 +1 @@
-{"verion":"0.0.1","entries":[]}
+{"verion":"0.0.1","entries":[]}
diff --git a/app/.editorconfig b/app/.editorconfig
@@ -6,4 +6,4 @@ indent_style = space
 indent_size = 2
 end_of_line = lf
 insert_final_newline = true
-trim_trailing_whitespace = true
+trim_trailing_whitespace = true
diff --git a/app/app-static.py b/app/app-static.py
@@ -1,11 +1,12 @@
 #!/usr/bin/env python
 # -*- coding: utf-8 -*-
 
-import os
-import json
-import re
 import argparse
 import base64
+import json
+import os
+import re
+
 
 def get_folder_structure(root_path):
     structure = []

diff --git a/app/app.py b/app/app.py
@@ -1,6 +1,7 @@
-import sys
-import subprocess
 import argparse
+import subprocess
+import sys
+
 
 def run_electron_app():
     parser = argparse.ArgumentParser(description="Run Electron app with optional input path")

diff --git a/app/package.json b/app/package.json
@@ -80,4 +80,4 @@
     "typescript": "^5.5.2",
     "vite": "^5.3.1"
   }
-}
+}
Original file line number	Diff line number	Diff line change
		@@ -1 +1 @@
		{"verion":"0.0.1","entries":[]}
		{"verion":"0.0.1","entries":[]}
-Original file line number
+Diff line change
@@ Expand Up / @@ -80,4 +80,4 @@ @@
         "typescript": "^5.5.2",
         "vite": "^5.3.1"
       }
-    }
+    }