@@ -8,31 +8,48 @@ pip install -r requirements.txt
88pip install -e .
99```
1010
11- ## Running Dataset Generation and Preprocessing
11+ ### Requirements
12+ Running the full evaluation scripts and funcationalities implemented requires access to a commercial formal verification tool (Cadence Jasper).
13+ The code assumes the PATH variable is updated such that the Jasper binary is accessible with the command 'jg'
14+ ``` {bash}
15+ # check Cadence Jasper is accessible globally
16+ jg -no_gui
17+ ```
1218
13- ``` {python}
14- # Note that all datasets are already generated and stored as .csv files in the
15- # data_svagen/nl2sva/data
16- # data_svagen/design2sva/data
17- # data_agr/helpergen/data
18- # directories. The following is for generating them new or again.
1919
20- # (1) generate the NL2SVA-Human dataset
21- cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
20+ ## Running Benchmark Data Generation and Preprocessing
2221
23- # (2) generate the NL2SVA-Machine dataset
24- cd data_svagen/nl2sva && python generate_nl2sva_machine.py && cd ../..
22+ ``` {python}
23+ # Run the following commands in your conda environment created above
24+ # (1) run full flow for NL2SVA-Human
25+ bash scripts/run_nl2sva_human.sh {k: number of ICL examples to use}
2526
26- # (3) generate the Design2SVA dataset
27- cd data_svagen/design2sva && python generate_nl2sva_human.py && cd ../..
27+ # (2) run full flow for NL2SVA-Machine
28+ bash scripts/run_nl2sva_machine.sh {k: number of ICL examples to use}
2829
29- # (1) generate the NL2SVA-Human dataset
30- cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
30+ # (3) run full flow for Design2SVA
31+ bash scripts/run_design2sva.sh {n: number of outputs to sample, with which pass@k k<=n is evaluated}
3132```
3233
34+ ## Running Full Evaluation Suite for Each Sub-Benchmark
35+ ``` {python}
36+ # Run the following commands in your conda environment created above
37+ # You can supply a list of models to test with the --models flag, with model names ;-separated
38+ # Run with --debug to print all input and outputs to and from the LM
39+ # Change LLM decoding temperature with the --temperature flag
40+ # You can also see the flag options available for each run script by passing the '-h' flag
41+
42+ # Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) benchmark:
43+ python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
3344
45+ # Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) benchmark:
46+ python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
3447
35- ## Running LLM generation on each task
48+ # Running LM inference on the Design2SVA (SV testbench generation) benchmark:
49+ python run_design2sva.py --models "mixtral-8x22b"
50+ ```
51+
52+ ## Running LLM Generation Only on Each Sub-Benchmark
3653``` {python}
3754# Run the following commands in your conda environment created above
3855# You can supply a list of models to test with the --models flag, with model names ;-separated
@@ -41,61 +58,52 @@ cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
4158# You can also see the flag options available for each run script by passing the '-h' flag
4259
4360# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) task:
44- python run_svagen_nl2sva .py --mode machine --models "gpt-4;mixtral-chat"
61+ python run_nl2sva .py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl 3
4562
4663# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) task:
47- python run_svagen_nl2sva .py --mode human --models "gpt-4;mixtral-chat"
64+ python run_nl2sva .py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl 3
4865
4966# Running LM inference on the Design2SVA (SV testbench generation) task:
50- python run_svagen_design2sva .py --models "gpt-4; mixtral-chat "
67+ python run_design2sva .py --models "mixtral-8x22b "
5168```
5269
5370
5471
55- ## Repo Structure
72+ ## Repository Structure
5673Overview of the repository:
5774```
5875fv_eval/
5976├── fv_eval/
6077│ ├── benchmark_launcher.py (methods for consuming input bmark data and run LM inference)
6178│ ├── evaluation.py (methods for LM response evaluation)
62- │ ├── fv_tool_execution.py (methods for launching FV tools, i.e. JasperGold )
79+ │ ├── fv_tool_execution.py (methods for launching FV tools, i.e. Cadence Jasper )
6380│ ├── data.py (definitions for input/output data)
6481│ ├── prompts_*.py (default prompts for each subtask)
6582│ ├── utils.py (misc. util functions)
6683|
67- ├── data_agr/
68- │ ├── helper_gen/
69- | |── data/
70- │ |── generate_pipelines_helpergen.py
71- │ |── generate_arbitration-clouds_helpergen.py
72- │ |── generate_fsm_helpergen.py
84+ ├── data_design2sva/
85+ | |── data/
86+ │ |── generate_pipelines_design2sva.py
87+ │ |── generate_fsm_design2sva.py
7388|
74- ├── data_svagen/
75- │ ├── design2sva/
76- | | |── data/
77- │ | |── generate_pipelines_design2sva.py
78- │ | |── generate_arbitration-clouds_design2sva.py
79- │ | |── generate_fsm_design2sva.py
80- │ ├── nl2sva/
81- │ |── annotated_instructions/
82- │ |── annotated_tb/
83- │ |── data/
84- │ |── machine_tb/
85- │ |── generate_nl2sva_human.py
86- │ |── generate_nl2sva_machine.py
89+ ├── data_nl2sva/
90+ │ |── annotated_instructions_with_signals/
91+ │ |── annotated_tb/
92+ │ |── data/
93+ │ |── machine_tb/
94+ │ |── generate_nl2sva_human.py
95+ │ |── generate_nl2sva_machine.py
8796|
8897├── tool_scripts/
98+ | ├── pec/ (property equivalence check script)
99+ | | |── pec.tcle
89100│ ├── run_jg_design2sva.tcl
90- │ ├── run_jg_helpergen.tcl
91101│ ├── run_jg_nl2sva_human.tcl
92102│ ├── run_jg_nl2sva_machine.tcl
93103|
94- ├── run_helpergen.py
95104├── run_evaluation.py
96105├── run_svagen_design2sva.py
97106├── run_svagen_nl2sva.py
98- ├── run_evaluation.py
99107|
100108├── setup.py
101109└── README.md
0 commit comments