Skip to content

Latest commit

 

History

History
387 lines (274 loc) · 10.6 KB

File metadata and controls

387 lines (274 loc) · 10.6 KB

Contributing to Dingo

Thanks for your interest in contributing to Dingo! All kinds of contributions are welcome, including but not limited to the following:

  • Fix typo or bugs
  • Add documentation or translate the documentation into other languages
  • Add new features and components
  • Add new evaluation rules, prompts, or models
  • Improve test coverage and performance

What is PR

PR is the abbreviation of Pull Request. Here's the definition of PR in the official document of Github.

Pull requests let you tell others about changes you have pushed to a branch in a repository on GitHub. Once a pull request is opened, you can discuss and review the potential changes with collaborators and add follow-up commits before your changes are merged into the base branch.

Basic Workflow

  1. Get the most recent codebase
  2. Checkout a new branch from dev branch
  3. Commit your changes (Don't forget to use pre-commit hooks!)
  4. Push your changes and create a PR
  5. Discuss and review your code
  6. Merge your branch to dev branch

Procedures in Detail

1. Get the Most Recent Codebase

  • When you work on your first PR

    Fork the Dingo repository: click the fork button at the top right corner of Github page

    Clone forked repository to local

    git clone git@github.com:XXX/dingo.git

    Add source repository to upstream

    git remote add upstream git@github.com:MigoXLab/dingo.git
  • After your first PR

    Checkout the latest branch of the local repository and pull the latest branch of the source repository.

    git checkout dev
    git pull upstream dev

2. Checkout a New Branch from dev Branch

git checkout dev -b branchname

3. Commit Your Changes

  • If you are a first-time contributor, please install and initialize pre-commit hooks from the repository root directory first.

    pip install -U pre-commit
    pre-commit install
  • Commit your changes as usual. Pre-commit hooks will be triggered to stylize your code before each commit.

    # coding
    git add [files]
    git commit -m 'messages'

    Note: Sometimes your code may be changed by pre-commit hooks. In this case, please remember to re-stage the modified files and commit again.

4. Push Your Changes to the Forked Repository and Create a PR

  • Push the branch to your forked remote repository

    git push origin branchname
  • Create a PR

    Go to your forked repository on GitHub and click "New pull request"

  • Revise PR message template to describe your motivation and modifications made in this PR. You can also link the related issue to the PR manually in the PR message.

  • You can also ask a specific person to review the changes you've proposed.

5. Discuss and Review Your Code

  • Modify your codes according to reviewers' suggestions and then push your changes.

6. Merge Your Branch to dev Branch and Delete the Branch

  • After the PR is merged by the maintainer, you can delete the branch you created in your forked repository.
    git branch -d branchname # delete local branch
    git push origin --delete branchname # delete remote branch

Development Setup

Environment Setup

  1. Clone the repository

    git clone https://github.com/MigoXLab/dingo.git
    cd dingo
  2. Create a virtual environment (recommended)

    python -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies

    pip install -e .
    pip install -r requirements/runtime.txt
  4. Install pre-commit hooks

    pre-commit install

Running Tests

# Run all tests
python -m pytest test/

# Run specific test file
python -m pytest test/scripts/data/dataset/test_hf_dataset.py

# Run with coverage
python -m pytest --cov=dingo test/

Running Examples

# Test CLI functionality
python -m dingo.run.cli --input test/env/local_plaintext.json

# Start local demo
cd app_gradio
python app.py

Code Style

We adopt PEP8 as the preferred code style.

We use the following tools for linting and formatting:

  • flake8: A wrapper around some linter tools
  • isort: A Python utility to sort imports
  • black: A formatter for Python files
  • pre-commit: Git hooks for code quality

Style configurations can be found in setup.cfg and .pre-commit-config.yaml.

Code Quality Guidelines

  1. Follow PEP8 for Python code style
  2. Use type hints where appropriate
  3. Write docstrings for all public functions and classes
  4. Keep functions small and focused on a single responsibility
  5. Use meaningful variable names
  6. Add comments for complex logic

Example Code Style

from typing import List, Optional

from dingo.io.input import Data
from dingo.io.output.eval_detail import EvalDetail


class ExampleRule:
  """Example rule for demonstration purposes.

  This rule checks for specific patterns in text data.

  Args:
      pattern: Regular expression pattern to match
      threshold: Minimum threshold for rule activation
  """

  def __init__(self, pattern: str, threshold: float = 0.5) -> None:
    self.pattern = pattern
    self.threshold = threshold

  def eval(self, input_data: Data) -> EvalDetail:
    """Evaluate input data against the rule.

    Args:
        input_data: Input data to evaluate

    Returns:
        EvalDetail: Evaluation result
    """
    res = EvalDetail()
    # Implementation here
    return res

Contributing Guidelines

Adding New Features

  1. Create an issue first to discuss the feature
  2. Follow the existing architecture patterns
  3. Add comprehensive tests for new functionality
  4. Update documentation as needed
  5. Ensure backward compatibility when possible

Adding New Evaluation Rules

  1. Inherit from appropriate base class (BaseRule for rule-based evaluation)
  2. Register your rule using the @Model.rule_register decorator
  3. Add comprehensive tests in the test/ directory
  4. Document the rule with clear docstrings and examples

Example:

from dingo.model import Model
from dingo.model.rule.base import BaseRule
from dingo.config.input_args import EvaluatorRuleArgs
from dingo.io import Data
from dingo.io.output.eval_detail import EvalDetail


@Model.rule_register('QUALITY_BAD_CUSTOM', ['default'])
class CustomRule(BaseRule):
  """Custom rule for specific quality check."""

  dynamic_config = EvaluatorRuleArgs(pattern=r'custom_pattern')

  @classmethod
  def eval(cls, input_data: Data) -> EvalDetail:
    res = EvalDetail()
    # Implementation
    return res

Adding New LLM Models

  1. Inherit from appropriate base class (BaseOpenAI for OpenAI-compatible APIs)
  2. Register your model using the @Model.llm_register decorator
  3. Handle API keys and configuration properly
  4. Add error handling for API failures

Adding New Prompts

  1. Follow existing prompt structure in dingo/model/prompt/
  2. Use clear and specific prompt templates
  3. Test prompts with different models
  4. Document prompt purpose and expected outputs

Documentation

  1. Update README.md if adding major features
  2. Add docstrings to all public functions and classes
  3. Create examples in the examples/ directory
  4. Update configuration documentation in docs/config.md

Testing Guidelines

Writing Tests

  1. Use pytest for all tests
  2. Create test data in test/data/ directory
  3. Mock external dependencies (APIs, file systems)
  4. Test edge cases and error conditions
  5. Maintain high test coverage

Test Structure

test/
├── data/                    # Test data files
├── scripts/                 # Test scripts
├── test_rules.py           # Rule tests
├── test_models.py          # Model tests
└── test_integration.py     # Integration tests

Example Test

import pytest
from dingo.io.input import Data
from dingo.model.rule.rule_common import RuleContentNull


class TestRuleContentNull:
    """Test cases for RuleContentNull."""

    def test_null_content(self):
        """Test rule with null content."""
        data = Data(data_id='test', content='')
        result = RuleContentNull().eval(data)
        assert result.is_bad is True

    def test_valid_content(self):
        """Test rule with valid content."""
        data = Data(data_id='test', content='Valid content')
        result = RuleContentNull().eval(data)
        assert result.is_bad is False

About Contributing Test Datasets

Submitting Test Datasets

  • Please implement logic for automatic dataset downloading in the code; or provide a method for obtaining the dataset in the PR
  • If the dataset is not yet public, please indicate so
  • Ensure datasets comply with licensing requirements

Submitting Data Configuration Files

  • Provide a README in the same directory as the data configuration
  • The README should include:
    • A brief description of the dataset
    • The official link to the dataset
    • Some test examples from the dataset
    • Evaluation results of the dataset on relevant models
    • Citation of the dataset

Dataset Integration

  • Add dataset configuration to appropriate rule groups
  • Test dataset with existing evaluation rules
  • Document any special requirements or preprocessing steps

Release Process

Version Numbering

We follow semantic versioning (SemVer):

  • Major version (X.0.0): Breaking changes
  • Minor version (X.Y.0): New features, backward compatible
  • Patch version (X.Y.Z): Bug fixes, backward compatible

Release Checklist

  1. Update version in setup.py
  2. Update CHANGELOG.md with new features and fixes
  3. Run full test suite
  4. Update documentation
  5. Create release PR to main branch
  6. Tag release after merge

Community

Getting Help

  • GitHub Issues: For bugs and feature requests
  • Discord: For real-time discussion and support
  • WeChat: For Chinese community support

Communication Guidelines

  1. Be respectful and inclusive
  2. Search existing issues before creating new ones
  3. Provide clear descriptions and reproduction steps
  4. Use appropriate labels for issues and PRs

License

By contributing to Dingo, you agree that your contributions will be licensed under the Apache 2.0 License.

Acknowledgments

We appreciate all contributors who help make Dingo better! Your contributions, whether code, documentation, or feedback, are valuable to the community.


For more detailed information about specific components, please refer to: