Skip to content

Commit ad69a25

Browse files
committed
docs: fix typos in README files (set up, evaluated, Together AI, spacing)
1 parent 89ae0a6 commit ad69a25

File tree

2 files changed

+6
-6
lines changed

2 files changed

+6
-6
lines changed

README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ SciCode is a challenging benchmark designed to evaluate the capabilities of lang
2929

3030

3131
## Dataset Creation
32-
SciCode sources challenging and realistic research-level coding problems across 6 natural science disciplines, covering a total of 16 subfields. Scicode mainly focuses on 1. Numerical methods 2.Simulation of systems 3. Scientific calculation. These are the tasks we believe require intense scientific knowledge and reasoning to optimally test LM’s science capability.
32+
SciCode sources challenging and realistic research-level coding problems across 6 natural science disciplines, covering a total of 16 subfields. SciCode mainly focuses on 1. Numerical methods 2. Simulation of systems 3. Scientific calculation. These are the tasks we believe require intense scientific knowledge and reasoning to optimally test LM’s science capability.
3333

3434
## 🏆 Leaderboard
3535

@@ -59,12 +59,12 @@ SciCode sources challenging and realistic research-level coding problems across
5959
## Instructions to evaluate a new model using `inspect_ai` (recommended)
6060

6161

62-
Scicode has been integrated with `inspect_ai` for easier and faster model evaluation. You need to run the following steps to run:
62+
SciCode has been integrated with `inspect_ai` for easier and faster model evaluation. You need to run the following steps to run:
6363

6464
1. Clone this repository `git clone [email protected]:scicode-bench/SciCode.git`
6565
2. Install the `scicode` package with `pip install -e .`
6666
3. Download the [numeric test results](https://drive.google.com/drive/folders/1W5GZW6_bdiDAiipuFMqdUhvUaHIj6-pR?usp=drive_link) and save them as `./eval/data/test_data.h5`
67-
4. Go to the `eval/inspect_ai` directory, setup correspoinding API key, and run the following command:
67+
4. Go to the `eval/inspect_ai` directory, set up the corresponding API key, and run the following command:
6868

6969
```bash
7070
cd eval/inspect_ai

eval/inspect_ai/README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ However, there are some additional command line arguments that could be useful a
2525
- `gold` mode can only be used on the validation set which loads the gold answer
2626
- `dummy` mode does not call any real LLMs and generates some dummy outputs
2727

28-
For example, user can run five samples on the validation set with background as
28+
For example, users can run five samples on the validation set with background as
2929

3030
```bash
3131
inspect eval scicode.py \
@@ -38,7 +38,7 @@ inspect eval scicode.py \
3838
-T mode=normal
3939
```
4040

41-
User can run the evaluation on `Deepseek-v3` using together ai via the following command:
41+
Users can run the evaluation on `Deepseek-v3` using Together AI via the following command:
4242

4343
```bash
4444
export TOGETHER_API_KEY=<YOUR_API_KEY>
@@ -55,7 +55,7 @@ For more information regarding `inspect_ai`, we refer users to its [official doc
5555

5656
### Extra: How SciCode are Evaluated Under the Hood?
5757

58-
During the evaluation, the sub-steps of each main problem of SciCode are passed in order to the evalauted LLM with necessary prompts and LLM responses for previous sub-steps. The generated Python code from LLM will be parsed and saved to disk, which will be used to run on test cases to determine the pass or fail for the sub-steps. The main problem will be considered as solved if the LLM can pass all sub-steps of the main problem.
58+
During the evaluation, the sub-steps of each main problem of SciCode are passed in order to the evaluated LLM with necessary prompts and LLM responses for previous sub-steps. The generated Python code from LLM will be parsed and saved to disk, which will be used to run on test cases to determine the pass or fail for the sub-steps. The main problem will be considered as solved if the LLM can pass all sub-steps of the main problem.
5959

6060
### Extra: Reproducibility of `inspect_ai` Integration
6161
We use the SciCode `inspect_ai` integration to evaluate OpenAI's GPT-4o, and we compare it with the original way of evaluation. Below shows the comparison of two ways of the evaluations.

0 commit comments

Comments
 (0)