Skip to content

Commit 51bfb07

Browse files
committed
ix: change name in README
1 parent 8fd29ee commit 51bfb07

File tree

1 file changed

+10
-13
lines changed

1 file changed

+10
-13
lines changed

README.md

Lines changed: 10 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,10 @@
1-
# BigEval
1+
# BigCodeBench
22

33
> [!WARNING]
44
> The project is under active development. Please check back later for more updates.
55
66
> [!WARNING]
7-
> Please use WildCode with caution. Different from [EvalPlus](https://github.com/evalplus/evalplus), WildCode has a much less constrained execution environment to support tasks with diverse library dependencies. This may lead to security risks. We recommend using a sandbox such as [Docker](https://docs.docker.com/get-docker/) to run the evaluation.
7+
> Please use BigCodeBench with caution. Different from [EvalPlus](https://github.com/evalplus/evalplus), BigCodeBench has a much less constrained execution environment to support tasks with diverse library dependencies. This may lead to security risks. We recommend using a sandbox such as [Docker](https://docs.docker.com/get-docker/) to run the evaluation.
88
99
<p align="center">
1010
<a href="https://pypi.org/project/bigcodebench/"><img src="https://img.shields.io/pypi/v/bigcodebench?color=g"></a>
@@ -27,29 +27,26 @@
2727
### BigCodeBench
2828

2929
BigCodeBench is a rigorous benchmark for code generation with realistic constraints in the wild. It aims to evaluate the true programming capabilities of large language models (LLMs) in a more realistic setting. The benchmark is designed for HumanEval-like function-level code generation tasks, but with much more fine-grained descriptions and diverse tool use.
30-
31-
### WildCode
32-
3330
To facilitate the evaluation of LLMs on BigCodeBench, we provide a Python package `bigcodebench` that includes the dataset, generation scripts, and evaluation scripts. The package is built on top of the [EvalPlus](https://github.com/evalplus/evalplus) framework, which is a flexible and extensible evaluation framework for code generation tasks.
3431

35-
### Why WildCode?
32+
### Why BigCodeBench?
3633

37-
WildCode is a rigorous evaluation framework for LLM4Code, with:
34+
BigCodeBench focuses on the evaluation of LLM4Code with *diverse function calls* and *complex instruction*, with:
3835

3936
***Precise evaluation & ranking**: See [our leaderboard](https://bigcodebench.github.io/leaderboard.html) for latest LLM rankings before & after rigorous evaluation.
40-
***Pre-generated samples**: WildCode accelerates code intelligence research by open-sourcing [LLM-generated samples](#-LLM-generated-code) for various models -- no need to re-run the expensive benchmarks!
37+
***Pre-generated samples**: BigCodeBench accelerates code intelligence research by open-sourcing [LLM-generated samples](#-LLM-generated-code) for various models -- no need to re-run the expensive benchmarks!
4138

4239
### Main Differences from EvalPlus
4340

44-
We inherit the design of the EvalPlus framework, which is a flexible and extensible evaluation framework for code generation tasks. However, WildCode has the following differences:
45-
* Execution Environment: The execution environment in WildCode is less bounded than EvalPlus to support tasks with diverse library dependencies.
46-
* Test Evaluation: WildCode relies on `unittest` for evaluating the generated code, which is more suitable for the test harness in BigCodeBench.
41+
We inherit the design of the EvalPlus framework, which is a flexible and extensible evaluation framework for code generation tasks. However, BigCodeBench has the following differences:
42+
* Execution Environment: The execution environment in BigCodeBench is less bounded than EvalPlus to support tasks with diverse library dependencies.
43+
* Test Evaluation: BigCodeBench relies on `unittest` for evaluating the generated code, which is more suitable for the test harness in BigCodeBench.
4744

4845
## 🔥 Quick Start
4946

5047
> [!Tip]
5148
>
52-
> WildCode ❤️ [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness)!
49+
> BigCodeBench ❤️ [bigcode-evaluation-harness](https://github.com/bigcode-project/bigcode-evaluation-harness)!
5350
> BigCodeBench will be integrated to bigcode-evaluation-harness, and you can also run it there!
5451
5552
To get started, please first set up the environment:
@@ -68,7 +65,7 @@ pip install "git+https://github.com/bigcode-project/bigcodebench.git" --upgrade
6865
</div>
6966
</details>
7067

71-
<details><summary>⏬ Using WildCode as a local repo? <i>:: click to expand ::</i></summary>
68+
<details><summary>⏬ Using BigCodeBench as a local repo? <i>:: click to expand ::</i></summary>
7269
<div>
7370

7471
```shell

0 commit comments

Comments
 (0)