Thanks for your interest in contributing to Dingo! All kinds of contributions are welcome, including but not limited to the following:
- Fix typo or bugs
- Add documentation or translate the documentation into other languages
- Add new features and components
- Add new evaluation rules, prompts, or models
- Improve test coverage and performance
PR is the abbreviation of Pull Request. Here's the definition of PR in the official document of Github.
Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.
- Get the most recent codebase
- Checkout a new branch from
devbranch - Commit your changes (Don't forget to use pre-commit hooks!)
- Push your changes and create a PR
- Discuss and review your code
- Merge your branch to
devbranch
-
When you work on your first PR
Fork the Dingo repository: click the fork button at the top right corner of Github page
Clone forked repository to local
git clone git@github.com:XXX/dingo.git
Add source repository to upstream
git remote add upstream git@github.com:MigoXLab/dingo.git
-
After your first PR
Checkout the latest branch of the local repository and pull the latest branch of the source repository.
git checkout dev git pull upstream dev
git checkout dev -b branchname-
If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.
pip install -U pre-commit pre-commit install
-
Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.
# coding git add [files] git commit -m 'messages'
Note: Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.
-
Push the branch to your forked remote repository
git push origin branchname
-
Create a PR
Go to your forked repository on GitHub and click "New pull request"
-
Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message.
-
You can also ask a specific person to review the changes you've proposed.
- Modify your codes according to reviewers' suggestions and then push your changes.
- After the PR is merged by the maintainer, you can delete the branch you created in your forked repository.
git branch -d branchname # delete local branch git push origin --delete branchname # delete remote branch
-
Clone the repository
git clone https://github.com/MigoXLab/dingo.git cd dingo -
Create a virtual environment (recommended)
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies
pip install -e . pip install -r requirements/runtime.txt -
Install pre-commit hooks
pre-commit install
# Run all tests
python -m pytest test/
# Run specific test file
python -m pytest test/scripts/data/dataset/test_hf_dataset.py
# Run with coverage
python -m pytest --cov=dingo test/# Test CLI functionality
python -m dingo.run.cli --input test/env/local_plaintext.json
# Start local demo
cd app_gradio
python app.pyWe adopt PEP8 as the preferred code style.
We use the following tools for linting and formatting:
- flake8: A wrapper around some linter tools
- isort: A Python utility to sort imports
- black: A formatter for Python files
- pre-commit: Git hooks for code quality
Style configurations can be found in setup.cfg and .pre-commit-config.yaml.
- Follow PEP8 for Python code style
- Use type hints where appropriate
- Write docstrings for all public functions and classes
- Keep functions small and focused on a single responsibility
- Use meaningful variable names
- Add comments for complex logic
from typing import List, Optional
from dingo.io.input import Data
from dingo.io.output.eval_detail import EvalDetail
class ExampleRule:
"""Example rule for demonstration purposes.
This rule checks for specific patterns in text data.
Args:
pattern: Regular expression pattern to match
threshold: Minimum threshold for rule activation
"""
def __init__(self, pattern: str, threshold: float = 0.5) -> None:
self.pattern = pattern
self.threshold = threshold
def eval(self, input_data: Data) -> EvalDetail:
"""Evaluate input data against the rule.
Args:
input_data: Input data to evaluate
Returns:
EvalDetail: Evaluation result
"""
res = EvalDetail()
# Implementation here
return res- Create an issue first to discuss the feature
- Follow the existing architecture patterns
- Add comprehensive tests for new functionality
- Update documentation as needed
- Ensure backward compatibility when possible
- Inherit from appropriate base class (
BaseRulefor rule-based evaluation) - Register your rule using the
@Model.rule_registerdecorator - Add comprehensive tests in the
test/directory - Document the rule with clear docstrings and examples
Example:
from dingo.model import Model
from dingo.model.rule.base import BaseRule
from dingo.config.input_args import EvaluatorRuleArgs
from dingo.io import Data
from dingo.io.output.eval_detail import EvalDetail
@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])
class CustomRule(BaseRule):
"""Custom rule for specific quality check."""
dynamic_config = EvaluatorRuleArgs(pattern=r'custom_pattern')
@classmethod
def eval(cls, input_data: Data) -> EvalDetail:
res = EvalDetail()
# Implementation
return res- Inherit from appropriate base class (
BaseOpenAIfor OpenAI-compatible APIs) - Register your model using the
@Model.llm_registerdecorator - Handle API keys and configuration properly
- Add error handling for API failures
- Follow existing prompt structure in
dingo/model/prompt/ - Use clear and specific prompt templates
- Test prompts with different models
- Document prompt purpose and expected outputs
- Update README.md if adding major features
- Add docstrings to all public functions and classes
- Create examples in the
examples/directory - Update configuration documentation in
docs/config.md
- Use pytest for all tests
- Create test data in
test/data/directory - Mock external dependencies (APIs, file systems)
- Test edge cases and error conditions
- Maintain high test coverage
test/
├── data/ # Test data files
├── scripts/ # Test scripts
├── test_rules.py # Rule tests
├── test_models.py # Model tests
└── test_integration.py # Integration tests
import pytest
from dingo.io.input import Data
from dingo.model.rule.rule_common import RuleContentNull
class TestRuleContentNull:
"""Test cases for RuleContentNull."""
def test_null_content(self):
"""Test rule with null content."""
data = Data(data_id='test', content='')
result = RuleContentNull().eval(data)
assert result.is_bad is True
def test_valid_content(self):
"""Test rule with valid content."""
data = Data(data_id='test', content='Valid content')
result = RuleContentNull().eval(data)
assert result.is_bad is False- Please implement logic for automatic dataset downloading in the code; or provide a method for obtaining the dataset in the PR
- If the dataset is not yet public, please indicate so
- Ensure datasets comply with licensing requirements
- Provide a README in the same directory as the data configuration
- The README should include:
- A brief description of the dataset
- The official link to the dataset
- Some test examples from the dataset
- Evaluation results of the dataset on relevant models
- Citation of the dataset
- Add dataset configuration to appropriate rule groups
- Test dataset with existing evaluation rules
- Document any special requirements or preprocessing steps
We follow semantic versioning (SemVer):
- Major version (X.0.0): Breaking changes
- Minor version (X.Y.0): New features, backward compatible
- Patch version (X.Y.Z): Bug fixes, backward compatible
- Update version in
setup.py - Update
CHANGELOG.mdwith new features and fixes - Run full test suite
- Update documentation
- Create release PR to
mainbranch - Tag release after merge
- GitHub Issues: For bugs and feature requests
- Discord: For real-time discussion and support
- WeChat: For Chinese community support
- Be respectful and inclusive
- Search existing issues before creating new ones
- Provide clear descriptions and reproduction steps
- Use appropriate labels for issues and PRs
By contributing to Dingo, you agree that your contributions will be licensed under the Apache 2.0 License.
We appreciate all contributors who help make Dingo better! Your contributions, whether code, documentation, or feedback, are valuable to the community.
For more detailed information about specific components, please refer to: