Commit 9c09ce2
release 0.1.0: NeurIPS 2025 (#69)
* resturcture project
* fix: ollama new type api
* fix lint
* add ruff
* add ruff to ci
* refactor: llm api (#60)
New LLM generation workflow.
* add an empty .env
* refactor OpenAI util class
* use new OpenAI client in main
* assume .env unchanged
* fix: response processing
* use new Gemini client in main
* enable reasoning effort from cli
* document why two gemini wrapper
* add Claude API
* add claude models to supported list
* handle UnionType for Literal ReasoningEffort
* add vLLM support and use it as default option
* fix: use vLLM chat interface instead of gen
* env add vllm api key
* add VLLM_HOST and VLLM_PORT
* add vllm server mode
* add vLLM in dependencies
* doc: instruct to run vllm from uv
* make deprecated ollama a standalone script
* doc: revise ollama
* use 3.12
* add Ollama models
* fix: ollama model name
* fix: ollama model name
* fix: Gemini use its own EFFORT_TOKEN_MAP
* remove unused imports
* fix: google-genai version
* fix: ci with uv run
* feat: load TF-Bench from HuggingFace by default (#61)
* update tfb to huggingface with base and pure splits
* feat: load tfbench from huggingface
* remove mandatory path
* avoid loading vLLM for now
* remove vLLM option in main
* feat: update response processing inside tfbench package (#62)
* answer cannot be None from LM
* move evaluation logic inside the tfbench package
* fix: orjson writes binary, error is not an option
* fix: use pure as parameter in main._eval
* feat: script to analysis saved generation results
* use orjsonl in main for consistancy
* feat: evaluation prove type equiv using TypeOperators (#64)
* fix: allow generation to fail
* remove unnecessary imports
* fix: OpenAI response add reasoning summary
* fix: load_gen_results_json type
* fix: analysis_saved script
* fix: evaluation benchmark name
* fix: OpenAI response API add summary
* use pydantic-v2
* extract incorrect task-answer pairs
* fix: groundtruth error (#63)
* fix: missing type class and typevar in benchmark
* fix: order of tasks in tfb
* fix: allow load_gen_results to load error
* remove error_cls unused imports
* extract type variables from source code
* add GHC type check by proving type equiv
* fix: cp -> process
* fix: API change for AST
* feat: type prover support new type definition
* test: ghc and type_util
* feat: use prover_evaluate for base split
* test: add real tfbench test cases, which the deprecated evaluation failed
* alt error to syntax parsing error
* feat: typeclass constrains reorder
* fix: AST.get_all_nodes_of_type ignores the root itself
* reorder_constraints using compiler frontend static analysis
* feat: add type definitions for pure tasks
* test: check type equivalence prover after rewriting mono types
* fix: handle type classes alone when ading new definitions
* feat: define new types automatically for pure tasks
* ghc prover remove standalone type class
* doc: detaile docstring for prover_evaluate
* script: analysis_saved run both split
* fix: experiment use prover_evaluate
* feat: error analysis with reasoning steps (#65)
* error analysis use prover
* error analysis script
* feat: record model name when doing error analysis
* add plot script for error analysis
* adjust row and column spacing
* update color map
* revise error_analysis default path
* test: list constructor
* remove tmp file
* fix: main missing pure parameter to
* error analysis only output category
* default error analysis model to gpt-5-mini
* adjust fontsize for 5 pies in a row
* doc: require GHC >= 9.2.1 for ImpredicativeTypes
* feat: add default option using transformers (#67)
* add transformers generation as default
* remove None option for router
* remove vllm option for ease of dependency
* Update src/tfbench/lm/_hf.py
Co-authored-by: Copilot <[email protected]>
* Update src/tfbench/lm/_hf.py
Co-authored-by: Copilot <[email protected]>
* remove unnecessary imports
---------
Co-authored-by: Copilot <[email protected]>
* doc: make readme and export clearer (#68)
* add transformers generation as default
* Update src/tfbench/lm/_hf.py
Co-authored-by: Copilot <[email protected]>
* Update src/tfbench/lm/_hf.py
Co-authored-by: Copilot <[email protected]>
* remove unnecessary imports
* doc: improve instructions
* fix: unused parameter and import
* enable github actions on main commits
* doc: add badges and images
---------
Co-authored-by: Copilot <[email protected]>
---------
Co-authored-by: Copilot <[email protected]>1 parent 8c99702 commit 9c09ce2
File tree
72 files changed
+5024
-2160
lines changed- .github/workflows
- benchmark
- imgs
- scripts
- src
- hs_parser
- tfbench
- hs_parser
- lm
- tests
Some content is hidden
Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.
72 files changed
+5024
-2160
lines changedThis file was deleted.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
| |||
17 | 20 | | |
18 | 21 | | |
19 | 22 | | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
20 | 26 | | |
21 | 27 | | |
22 | | - | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
| |||
16 | 19 | | |
17 | 20 | | |
18 | 21 | | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
19 | 25 | | |
20 | 26 | | |
21 | | - | |
| 27 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
| 1 | + | |
| 2 | + | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1 | 1 | | |
2 | 2 | | |
3 | | - | |
| 3 | + | |
| 4 | + | |
| 5 | + | |
| 6 | + | |
4 | 7 | | |
5 | 8 | | |
6 | 9 | | |
| |||
18 | 21 | | |
19 | 22 | | |
20 | 23 | | |
21 | | - | |
| 24 | + | |
0 commit comments