Skip to content

ByteDance-Seed/DiscoX

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DISCOX

Project Overview

DISCOX is a benchmark designed to evaluate the performance of LLMs on discourse-level and expert-level translation tasks.
Unlike existing benchmarks that primarily focus on isolated sentences or general-domain texts, DISCOX emphasizes the ability of models to maintain discourse coherence, terminological precision, and domain-specific accuracy across long-form professional content.

This benchmark covers multiple domains (e.g., social sciences, natural sciences, humanities, applied disciplines, news, domain-specific scenarios, and literature & arts) with a wide range of translation challenges. It enables fine-grained evaluation of LLM outputs using criteria such as accuracy, fluency, and appropriateness.

Getting Started

1. Install Dependencies

Make sure you are using Python 3.9+. Then install the required packages:

pip install -r requirements.txt

2. Configure Environment Variables

Set up your API key and endpoint in the .env file:

JUDGE_API_KEY=your_judgemodel_api_key_here
JUDGE_API_BASE=your_judgemodel_api_base_here
CANDIDATE_API_KEY=your_candidatemodel_api_key_here
CANDIDATE_API_BASE=your_candidatemodel_api_base_here

3. Run Evaluation Tasks

You can run tasks by specifying the target model and the judge model. For example:

python3 run_tasks.py --model openai/gpt4o-2024-11-20 --judgemodel azure/gemini-2.5-pro

Example Use Case

  • Model Under Evaluation: openai/gpt4o-2024-11-20
  • Judge Model: azure/gemini-2.5-pro
    This configuration runs translation tasks using GPT-4o and evaluates them with Gemini-2.5-Pro under the Metric-S evaluation framework.

License

This project is released under the Apache 2.0 License. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages