Skip to content

Commit 9c09ce2

Browse files
EYH0602Copilot
andauthored
release 0.1.0: NeurIPS 2025 (#69)
* resturcture project * fix: ollama new type api * fix lint * add ruff * add ruff to ci * refactor: llm api (#60) New LLM generation workflow. * add an empty .env * refactor OpenAI util class * use new OpenAI client in main * assume .env unchanged * fix: response processing * use new Gemini client in main * enable reasoning effort from cli * document why two gemini wrapper * add Claude API * add claude models to supported list * handle UnionType for Literal ReasoningEffort * add vLLM support and use it as default option * fix: use vLLM chat interface instead of gen * env add vllm api key * add VLLM_HOST and VLLM_PORT * add vllm server mode * add vLLM in dependencies * doc: instruct to run vllm from uv * make deprecated ollama a standalone script * doc: revise ollama * use 3.12 * add Ollama models * fix: ollama model name * fix: ollama model name * fix: Gemini use its own EFFORT_TOKEN_MAP * remove unused imports * fix: google-genai version * fix: ci with uv run * feat: load TF-Bench from HuggingFace by default (#61) * update tfb to huggingface with base and pure splits * feat: load tfbench from huggingface * remove mandatory path * avoid loading vLLM for now * remove vLLM option in main * feat: update response processing inside tfbench package (#62) * answer cannot be None from LM * move evaluation logic inside the tfbench package * fix: orjson writes binary, error is not an option * fix: use pure as parameter in main._eval * feat: script to analysis saved generation results * use orjsonl in main for consistancy * feat: evaluation prove type equiv using TypeOperators (#64) * fix: allow generation to fail * remove unnecessary imports * fix: OpenAI response add reasoning summary * fix: load_gen_results_json type * fix: analysis_saved script * fix: evaluation benchmark name * fix: OpenAI response API add summary * use pydantic-v2 * extract incorrect task-answer pairs * fix: groundtruth error (#63) * fix: missing type class and typevar in benchmark * fix: order of tasks in tfb * fix: allow load_gen_results to load error * remove error_cls unused imports * extract type variables from source code * add GHC type check by proving type equiv * fix: cp -> process * fix: API change for AST * feat: type prover support new type definition * test: ghc and type_util * feat: use prover_evaluate for base split * test: add real tfbench test cases, which the deprecated evaluation failed * alt error to syntax parsing error * feat: typeclass constrains reorder * fix: AST.get_all_nodes_of_type ignores the root itself * reorder_constraints using compiler frontend static analysis * feat: add type definitions for pure tasks * test: check type equivalence prover after rewriting mono types * fix: handle type classes alone when ading new definitions * feat: define new types automatically for pure tasks * ghc prover remove standalone type class * doc: detaile docstring for prover_evaluate * script: analysis_saved run both split * fix: experiment use prover_evaluate * feat: error analysis with reasoning steps (#65) * error analysis use prover * error analysis script * feat: record model name when doing error analysis * add plot script for error analysis * adjust row and column spacing * update color map * revise error_analysis default path * test: list constructor * remove tmp file * fix: main missing pure parameter to * error analysis only output category * default error analysis model to gpt-5-mini * adjust fontsize for 5 pies in a row * doc: require GHC >= 9.2.1 for ImpredicativeTypes * feat: add default option using transformers (#67) * add transformers generation as default * remove None option for router * remove vllm option for ease of dependency * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <[email protected]> * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <[email protected]> * remove unnecessary imports --------- Co-authored-by: Copilot <[email protected]> * doc: make readme and export clearer (#68) * add transformers generation as default * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <[email protected]> * Update src/tfbench/lm/_hf.py Co-authored-by: Copilot <[email protected]> * remove unnecessary imports * doc: improve instructions * fix: unused parameter and import * enable github actions on main commits * doc: add badges and images --------- Co-authored-by: Copilot <[email protected]> --------- Co-authored-by: Copilot <[email protected]>
1 parent 8c99702 commit 9c09ce2

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

72 files changed

+5024
-2160
lines changed

.envrc

Lines changed: 0 additions & 1 deletion
This file was deleted.

.github/workflows/mypy.yml

Lines changed: 7 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
name: MyPy Type Checking
22

3-
on: [pull_request]
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
47

58
jobs:
69
type-check:
@@ -17,6 +20,8 @@ jobs:
1720
- name: Set up Python
1821
run: uv sync
1922

23+
- name: install mypy
24+
run: uv pip install mypy
25+
2026
- name: Type Check Source Code
2127
run: uv run mypy src
22-

.github/workflows/pylint.yml

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
name: Pylint Linting
22

3-
on: [pull_request]
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
47

58
jobs:
69
linting:
@@ -16,6 +19,9 @@ jobs:
1619

1720
- name: Set up Python
1821
run: uv sync
22+
23+
- name: Install Pylint
24+
run: uv pip install pylint
1925

2026
- name: Lint Source Code
21-
run: uv run pylint src
27+
run: uv run pylint src

.github/workflows/ruff.yml

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Ruff Linting
2+
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
7+
8+
jobs:
9+
linting:
10+
runs-on: ubuntu-latest
11+
12+
steps:
13+
- uses: actions/checkout@v2
14+
with:
15+
fetch-depth: 1
16+
17+
- name: Install uv
18+
uses: astral-sh/setup-uv@v5
19+
20+
- name: Set up Python
21+
run: uv sync
22+
23+
- name: install ruff
24+
run: uv tool install ruff@latest
25+
26+
- name: Lint Source Code
27+
run: uvx ruff check

.github/workflows/unitttest.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
name: Unit Testing
22

3-
on: [pull_request]
3+
on:
4+
push:
5+
branches: [main]
6+
pull_request:
47

58
jobs:
69
unittest:
@@ -18,4 +21,4 @@ jobs:
1821
run: uv sync
1922

2023
- name: Run Unit Tests
21-
run: uv run pytest
24+
run: uv run pytest -n auto

0 commit comments

Comments
 (0)