Skip to content

DSLighting support for MLE-Bench #141

@luckyfan-cs

Description

@luckyfan-cs

Hi!

Thanks for your wonderful work on MLE-Bench, we found it very insightful for evaluating machine learning engineering capabilities of agents 🙌

We’d like to briefly introduce our project, DSLighting.

About DSLighting

DSLighting is a data science agent harness — an LLM-driven autonomous execution engine that turns task descriptions and datasets into iterative workflows including:

  • Code generation
  • Execution
  • Evaluation
  • Refinement

It is designed to make it easy to build, run, and evaluate data science agents in a reproducible and extensible way.

Support for MLE-Bench

We’ve recently added support for running MLE-Bench within DSLighting. With just a few lines of code, users can easily run the benchmark:

from dotenv import load_dotenv
load_dotenv()

from dslighting.api import DSBenchmark
from dslighting.core import ConfigBuilder

config = ConfigBuilder().build_config(
    workflow="aide",
    model="gpt-4o",
)

benchmark = DSBenchmark("mlebench", data_dir="/path/to/mlebench")
result = benchmark.run(config=config)

print(result.results_path)
print(result.metadata_path)

Why this might be useful

  • Minimal setup to run MLE-Bench
  • Unified interface across multiple benchmarks
  • Supports iterative agent workflows
  • Easy to configure for different models and workflows

Other supported benchmarks

DSLighting currently also supports:

  • DACode (EMNLP 2024)
  • DABench (ICML 2024)
  • MoSciBench (ICLR 2026)
  • ScienceAgentBench (ICLR 2025)

We hope this can help make MLE-Bench easier to run and extend in agent-based workflows.

Happy to hear your thoughts, and we’d love to explore potential collaboration!

Thanks again for your great work 🙌

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions