NVlabs
diff --git a/‎README.md‎
Lines changed: 51 additions & 43 deletions b/‎README.md‎
Lines changed: 51 additions & 43 deletions
@@ -8,31 +8,48 @@ pip install -r requirements.txt
 pip install -e .
 ```
 
-##  Running Dataset Generation and Preprocessing
+### Requirements
+Running the full evaluation scripts and funcationalities implemented requires access to a commercial formal verification tool (Cadence Jasper).
+The code assumes the PATH variable is updated such that the Jasper binary is accessible with the command 'jg'
+```{bash}
+# check Cadence Jasper is accessible globally
+jg -no_gui
+```
 
-```{python}
-# Note that all datasets are already generated and stored as .csv files in the 
-# data_svagen/nl2sva/data
-# data_svagen/design2sva/data 
-# data_agr/helpergen/data
-# directories. The following is for generating them new or again.
 
-# (1) generate the NL2SVA-Human dataset
-cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
+##  Running Benchmark Data Generation and Preprocessing
 
-# (2) generate the NL2SVA-Machine dataset
-cd data_svagen/nl2sva && python generate_nl2sva_machine.py && cd ../..
+```{python}
+# Run the following commands in your conda environment created above
+# (1) run full flow for NL2SVA-Human
+bash scripts/run_nl2sva_human.sh {k: number of ICL examples to use}
 
-# (3) generate the Design2SVA dataset
-cd data_svagen/design2sva && python generate_nl2sva_human.py && cd ../..
+# (2) run full flow for NL2SVA-Machine
+bash scripts/run_nl2sva_machine.sh {k: number of ICL examples to use}
 
-# (1) generate the NL2SVA-Human dataset
-cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
+# (3) run full flow for Design2SVA
+bash scripts/run_design2sva.sh {n: number of outputs to sample, with which pass@k k<=n is evaluated}
 ```
 
+##  Running Full Evaluation Suite for Each Sub-Benchmark
+```{python}
+# Run the following commands in your conda environment created above
+# You can supply a list of models to test with the --models flag, with model names ;-separated
+# Run with --debug to print all input and outputs to and from the LM
+# Change LLM decoding temperature with the --temperature flag
+# You can also see the flag options available for each run script by passing the '-h' flag
+
+# Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) benchmark:
+python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
 
+# Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) benchmark:
+python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl {k: number of ICL examples to use}
 
-##  Running LLM generation on each task
+# Running LM inference on the Design2SVA (SV testbench generation) benchmark:
+python run_design2sva.py --models "mixtral-8x22b" 
+```
+
+##  Running LLM Generation Only on Each Sub-Benchmark
 ```{python}
 # Run the following commands in your conda environment created above
 # You can supply a list of models to test with the --models flag, with model names ;-separated
@@ -41,61 +58,52 @@ cd data_svagen/nl2sva && python generate_nl2sva_human.py && cd ../..
 # You can also see the flag options available for each run script by passing the '-h' flag
 
 # Running LM inference on the NL2SVA-machine (assertion generation from directed NL instructions) task:
-python run_svagen_nl2sva.py --mode machine --models "gpt-4;mixtral-chat" 
+python run_nl2sva.py --mode machine --models "gpt-4;gpt-3.5-turbo" --num_icl 3
 
 # Running LM inference on the NL2SVA-Human (assertion generation from testbench and high-level instructions) task:
-python run_svagen_nl2sva.py --mode human --models "gpt-4;mixtral-chat" 
+python run_nl2sva.py --mode human --models "gpt-4;gpt-3.5-turbo" --num_icl 3
 
 # Running LM inference on the Design2SVA (SV testbench generation) task:
-python run_svagen_design2sva.py --models "gpt-4;mixtral-chat" 
+python run_design2sva.py --models "mixtral-8x22b" 
 ```
 
 
 
-##  Repo Structure
+##  Repository Structure
 Overview of the repository:
 ```
 fv_eval/
 ├── fv_eval/
 │   ├── benchmark_launcher.py (methods for consuming input bmark data and run LM inference)
 │   ├── evaluation.py (methods for LM response evaluation)
-│   ├── fv_tool_execution.py (methods for launching FV tools, i.e. JasperGold)
+│   ├── fv_tool_execution.py (methods for launching FV tools, i.e. Cadence Jasper)
 │   ├── data.py (definitions for input/output data)
 │   ├── prompts_*.py  (default prompts for each subtask)
 │   ├── utils.py (misc. util functions)
 |
-├── data_agr/ 
-│   ├── helper_gen/
-|       |── data/ 
-│       |── generate_pipelines_helpergen.py 
-│       |── generate_arbitration-clouds_helpergen.py 
-│       |── generate_fsm_helpergen.py 
+├── data_design2sva/
+|   |── data/  
+│   |── generate_pipelines_design2sva.py 
+│   |── generate_fsm_design2sva.py 
 |
-├── data_svagen/ 
-│   ├── design2sva/
-|   |   |── data/  
-│   |   |── generate_pipelines_design2sva.py 
-│   |   |── generate_arbitration-clouds_design2sva.py 
-│   |   |── generate_fsm_design2sva.py 
-│   ├── nl2sva/
-│       |── annotated_instructions/ 
-│       |── annotated_tb/ 
-│       |── data/ 
-│       |── machine_tb/ 
-│       |── generate_nl2sva_human.py 
-│       |── generate_nl2sva_machine.py 
+├── data_nl2sva/
+│   |── annotated_instructions_with_signals/ 
+│   |── annotated_tb/ 
+│   |── data/ 
+│   |── machine_tb/ 
+│   |── generate_nl2sva_human.py 
+│   |── generate_nl2sva_machine.py 
 |
 ├── tool_scripts/ 
+|   ├── pec/ (property equivalence check script)
+|   |   |── pec.tcle
 │   ├── run_jg_design2sva.tcl
-│   ├── run_jg_helpergen.tcl
 │   ├── run_jg_nl2sva_human.tcl
 │   ├── run_jg_nl2sva_machine.tcl
 |
-├── run_helpergen.py
 ├── run_evaluation.py
 ├── run_svagen_design2sva.py
 ├── run_svagen_nl2sva.py
-├── run_evaluation.py
 |
 ├── setup.py
 └── README.md