Contributing to Dingo

Thanks for your interest in contributing to Dingo! All kinds of contributions are welcome, including but not limited to the following:

Fix typo or bugs
Add documentation or translate the documentation into other languages
Add new features and components
Add new evaluation rules, prompts, or models
Improve test coverage and performance

What is PR

PR is the abbreviation of Pull Request. Here's the definition of PR in the official document of Github.

Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.

Basic Workflow

Get the most recent codebase
Checkout a new branch from dev branch
Commit your changes (Don't forget to use pre-commit hooks!)
Push your changes and create a PR
Discuss and review your code
Merge your branch to dev branch

Procedures in Detail

1. Get the Most Recent Codebase

When you work on your first PR

Fork the Dingo repository: click the fork button at the top right corner of Github page

Clone forked repository to local
```
git clone git@github.com:XXX/dingo.git
```
Add source repository to upstream
```
git remote add upstream git@github.com:MigoXLab/dingo.git
```
After your first PR

Checkout the latest branch of the local repository and pull the latest branch of the source repository.
```
git checkout dev
git pull upstream dev
```

2. Checkout a New Branch from `dev` Branch

git checkout dev -b branchname

3. Commit Your Changes

If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.
```
pip install -U pre-commit
pre-commit install
```
Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.
```
# coding
git add [files]
git commit -m 'messages'
```
Note: Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.

4. Push Your Changes to the Forked Repository and Create a PR

Push the branch to your forked remote repository
```
git push origin branchname
```
Create a PR

Go to your forked repository on GitHub and click "New pull request"
Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message.
You can also ask a specific person to review the changes you've proposed.

5. Discuss and Review Your Code

Modify your codes according to reviewers' suggestions and then push your changes.

6. Merge Your Branch to `dev` Branch and Delete the Branch

After the PR is merged by the maintainer, you can delete the branch you created in your forked repository.

git branch -d branchname # delete local branch
git push origin --delete branchname # delete remote branch

Development Setup

Environment Setup

Clone the repository

git clone https://github.com/MigoXLab/dingo.git
cd dingo

Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies

pip install -e .
pip install -r requirements/runtime.txt

Install pre-commit hooks
```
pre-commit install
```

Running Tests

# Run all tests
python -m pytest test/

# Run specific test file
python -m pytest test/scripts/data/dataset/test_hf_dataset.py

# Run with coverage
python -m pytest --cov=dingo test/

Running Examples

# Test CLI functionality
python -m dingo.run.cli --input test/env/local_plaintext.json

# Start local demo
cd app_gradio
python app.py

Code Style

We adopt PEP8 as the preferred code style.

We use the following tools for linting and formatting:

flake8: A wrapper around some linter tools
isort: A Python utility to sort imports
black: A formatter for Python files
pre-commit: Git hooks for code quality

Style configurations can be found in setup.cfg and .pre-commit-config.yaml.

Code Quality Guidelines

Follow PEP8 for Python code style
Use type hints where appropriate
Write docstrings for all public functions and classes
Keep functions small and focused on a single responsibility
Use meaningful variable names
Add comments for complex logic

Example Code Style

from typing import List, Optional

from dingo.io.input import Data
from dingo.io.output.eval_detail import EvalDetail


class ExampleRule:
  """Example rule for demonstration purposes.

  This rule checks for specific patterns in text data.

  Args:
      pattern: Regular expression pattern to match
      threshold: Minimum threshold for rule activation
  """

  def __init__(self, pattern: str, threshold: float = 0.5) -> None:
    self.pattern = pattern
    self.threshold = threshold

  def eval(self, input_data: Data) -> EvalDetail:
    """Evaluate input data against the rule.

    Args:
        input_data: Input data to evaluate

    Returns:
        EvalDetail: Evaluation result
    """
    res = EvalDetail()
    # Implementation here
    return res

Contributing Guidelines

Adding New Features

Create an issue first to discuss the feature
Follow the existing architecture patterns
Add comprehensive tests for new functionality
Update documentation as needed
Ensure backward compatibility when possible

Adding New Evaluation Rules

Inherit from appropriate base class (BaseRule for rule-based evaluation)
Register your rule using the @Model.rule_register decorator
Add comprehensive tests in the test/ directory
Document the rule with clear docstrings and examples

Example:

from dingo.model import Model
from dingo.model.rule.base import BaseRule
from dingo.config.input_args import EvaluatorRuleArgs
from dingo.io import Data
from dingo.io.output.eval_detail import EvalDetail


@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])
class CustomRule(BaseRule):
  """Custom rule for specific quality check."""

  dynamic_config = EvaluatorRuleArgs(pattern=r'custom_pattern')

  @classmethod
  def eval(cls, input_data: Data) -> EvalDetail:
    res = EvalDetail()
    # Implementation
    return res

Adding New LLM Models

Inherit from appropriate base class (BaseOpenAI for OpenAI-compatible APIs)
Register your model using the @Model.llm_register decorator
Handle API keys and configuration properly
Add error handling for API failures

Adding New Prompts

Follow existing prompt structure in dingo/model/prompt/
Use clear and specific prompt templates
Test prompts with different models
Document prompt purpose and expected outputs

Documentation

Update README.md if adding major features
Add docstrings to all public functions and classes
Create examples in the examples/ directory
Update configuration documentation in docs/config.md

Testing Guidelines

Writing Tests

Use pytest for all tests
Create test data in test/data/ directory
Mock external dependencies (APIs, file systems)
Test edge cases and error conditions
Maintain high test coverage

Test Structure

test/
├── data/                    # Test data files
├── scripts/                 # Test scripts
├── test_rules.py           # Rule tests
├── test_models.py          # Model tests
└── test_integration.py     # Integration tests

Example Test

import pytest
from dingo.io.input import Data
from dingo.model.rule.rule_common import RuleContentNull


class TestRuleContentNull:
    """Test cases for RuleContentNull."""

    def test_null_content(self):
        """Test rule with null content."""
        data = Data(data_id='test', content='')
        result = RuleContentNull().eval(data)
        assert result.is_bad is True

    def test_valid_content(self):
        """Test rule with valid content."""
        data = Data(data_id='test', content='Valid content')
        result = RuleContentNull().eval(data)
        assert result.is_bad is False

About Contributing Test Datasets

Submitting Test Datasets

Please implement logic for automatic dataset downloading in the code; or provide a method for obtaining the dataset in the PR
If the dataset is not yet public, please indicate so
Ensure datasets comply with licensing requirements

Submitting Data Configuration Files

Provide a README in the same directory as the data configuration
The README should include:
- A brief description of the dataset
- The official link to the dataset
- Some test examples from the dataset
- Evaluation results of the dataset on relevant models
- Citation of the dataset

Dataset Integration

Add dataset configuration to appropriate rule groups
Test dataset with existing evaluation rules
Document any special requirements or preprocessing steps

Release Process

Version Numbering

We follow semantic versioning (SemVer):

Major version (X.0.0): Breaking changes
Minor version (X.Y.0): New features, backward compatible
Patch version (X.Y.Z): Bug fixes, backward compatible

Release Checklist

Update version in setup.py
Update CHANGELOG.md with new features and fixes
Run full test suite
Update documentation
Create release PR to main branch
Tag release after merge

Community

Getting Help

GitHub Issues: For bugs and feature requests
Discord: For real-time discussion and support
WeChat: For Chinese community support

Communication Guidelines

Be respectful and inclusive
Search existing issues before creating new ones
Provide clear descriptions and reproduction steps
Use appropriate labels for issues and PRs

License

By contributing to Dingo, you agree that your contributions will be licensed under the Apache 2.0 License.

Acknowledgments

We appreciate all contributors who help make Dingo better! Your contributions, whether code, documentation, or feedback, are valuable to the community.

For more detailed information about specific components, please refer to:

FilesExpand file tree

CONTRIBUTING.md

Latest commit

History