Merge pull request #8 from EngineeringSoftware/readme-update

pengyunie · web-flow · commit 2990f7bdef88 · 2020-11-23T10:18:33.000-06:00
Readme update
diff --git a/README.md b/README.md
@@ -8,6 +8,7 @@ section below.
 
 ## Requirements
 
+- Linux or macOS
 - [OCaml 4.07.1](https://ocaml.org)
 - [SerAPI 0.7.1](https://github.com/ejgallego/coq-serapi)
 - [Coq 8.10.2](https://coq.inria.fr/download)
@@ -16,7 +17,19 @@ section below.
 
 ## Installation
 
-### Installation of OCaml, Coq, and SerAPI
+You can install Roosterize from source code by cloning this GitHub
+repository and setting up the dependencies following steps 1 & 2.
+(Alternatively, you can download the a [binary
+distribution](https://github.com/EngineeringSoftware/roosterize/releases)
+which already contains the Python dependencies, and then you only need
+step 1.)
+
+```
+git clone https://github.com/EngineeringSoftware/roosterize.git
+cd roosterize
+```
+
+### 1. Installation of OCaml, Coq, and SerAPI
 
 We strongly recommend installing the required versions of OCaml, Coq,
 and SerAPI via the [OPAM package manager](https://opam.ocaml.org),
@@ -35,93 +48,94 @@ opam pin add coq 8.10.2
 opam pin add coq-serapi 8.10.0+0.7.1
 ```
 
-### Installation of PyTorch and Python libraries
+### 2. Installation of PyTorch and Python libraries
+
+We strongly recommend installing the required versions of Python and
+PyTorch using [Conda](https://docs.conda.io/en/latest/miniconda.html).
 
-We strongly recommend installing the required versions of Python and PyTorch
-using [Conda](https://docs.conda.io/en/latest/miniconda.html).
+To set up the Conda environment, use one of the following command
+suitable for your operating system and whether you want to use it on a
+CPU or GPU.
 
-First, create a Python 3.7 environment using Conda:
+- Linux, CPU:
 ```
-conda create --name roosterize python=3.7 pip
-conda activate roosterize
+conda env create --name roosterize --file conda-envs/cpu.yml
 ```
 
-Then, install PyTorch 1.1.0 by following the
-[official instructions](https://pytorch.org/get-started/previous-versions/#conda-5),
-using the command suitable for your operating system and whether you want to use
-it on a CPU or GPU. For example, the following command installs PyTorch for
-CPU-only use on Linux or Windows:
+- Linux, GPU w/ CUDA 10.0:
 ```
-conda install pytorch-cpu==1.1.0 torchvision-cpu==0.3.0 cpuonly -c pytorch
+conda env create --name roosterize --file conda-envs/gpu-cuda10.yml
 ```
-Finally, install the required Python libraries:
+
+- Linux, GPU w/ CUDA 9.0:
 ```
-conda install configargparse nltk future seutil==0.4.12 six torchtext==0.4.0 tqdm==4.30.*
+conda env create --name roosterize --file conda-envs/gpu-cuda9.yml
 ```
 
-### Installation of Roosterize and trained models
+- Mac, CPU:
+```
+conda env create --name roosterize --file conda-envs/mac-cpu.yml
+```
 
-To install Roosterize itself, clone the GitHub repository and enter the root directory:
+Finally, activate the Conda environment before using Roosterize:
 ```
-git clone https://github.com/EngineeringSoftware/roosterize.git
-cd roosterize
+conda activate roosterize
 ```
 
-Next, you need to obtain pre-trained models that capture
-naming conventions. We have trained several models using our
-[corpus][math-comp-corpus], which follows the conventions
-used in the [Mathematical Components][math-comp-website]
-family of projects. These models are available for separate
-[download][models-link].
+### Installation of trained models
+
+Next, you need to obtain a pre-trained model that capture naming
+conventions.  The default pre-trained model, which was trained using
+our [corpus][math-comp-corpus] and follows the conventions used in the
+[Mathematical Components][math-comp-website] family of projects, can
+be obtained by running the command:
 
-To use our pre-trained models, go to the Roosterize
-root directory, and download and unpack the archive (`models.tgz`):
 ```
-wget https://github.com/EngineeringSoftware/roosterize/releases/download/v8.10.0/models.tgz
-tar xzf models.tgz
+./bin/roosterize download_global_model
 ```
 
-[models-link]: https://github.com/EngineeringSoftware/roosterize/releases/download/v8.10.0/models.tgz
+The model will be downloaded to `$HOME/.roosterize/`. To use a
+different model (that we [released][latest-release] or you trained),
+simply put it in `$HOME/.roosterize/`.
 
 ## Usage
 
 To use Roosterize on a Coq verification project, you first need to
-build the Coq project using a command provided by the project
-(usually `make`). Then, at the root directory of
-the Roosterize project repository, run the command
-```
-python -m roosterize.main suggest_lemmas \
- --project=$PATH_TO_PROJECT \
- --serapi-options=$SERAPI_OPTIONS \
- --model-dir=./models/roosterize-ta \
- --output=./output
-```
-where `$PATH_TO_PROJECT` should be replaced with the path to the
-Coq project, and `$SERAPI_OPTIONS` should be replaced with the SerAPI
-command line options for mapping logical paths to directories
-(see [SerAPI's documentation][serapi-faq-link]). For example,
-if the logical path (inside Coq) for the project is `Verified`,
-you should set `SERAPI_OPTIONS="-R $PATH_TO_PROJECT,Verified"`.
-
-The above command extracts all lemmas from the project, uses Roosterize's
-pre-trained model (at `./models/roosterize-ta`) to predict a lemma name
-for each lemma, and finally prints the lemma name update suggestions,
-i.e., the predicted lemma names that are different from to the existing ones.
-Below is an example of printed suggestions:
-```
->>>>> Suggestions:
-infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP1 -> inde_F2
-infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.BCH_PCM_altP2 -> inde_mul
-infotheo/ecc_classic/bch.v: infotheo.ecc_classic.bch.BCH.PCM_altP -> F2_eq0
-...
+build the Coq project using a command provided by the project (usually
+`make`).  Then, run this command to get the lemma name suggestions for
+the lemmas in a Coq document (.v file):
+
+```
+python -m roosterize.main suggest_naming --file=PATH_TO_FILE
 ```
 
+Roosterize automatically infers the root directory of the project by
+finding `_CoqProject`, and reads `_CoqProject` to infer the SerAPI
+command line options (for mapping logical paths to directories, see
+[SerAPI's documentation][serapi-faq-link]).  If you don't have a
+`_CoqProject` file, you need to provide an additional argument
+`--project_root=PATH_TO_PROJECT_ROOT`.
+
+<!-- where `$PATH_TO_PROJECT` should be replaced with the path to the -->
+<!-- Coq project, and `$SERAPI_OPTIONS` should be replaced with the SerAPI -->
+<!-- command line options for mapping logical paths to directories -->
+<!-- (see [SerAPI's documentation][serapi-faq-link]). For example, -->
+<!-- if the logical path (inside Coq) for the project is `Verified`, -->
+<!-- you should set `SERAPI_OPTIONS="-R $PATH_TO_PROJECT,Verified"`. -->
+
+The above command extracts all lemmas from the file, uses the
+pre-trained model to generate likely lemma names for each lemma, and
+finally prints the lemma name update suggestions, i.e., the generated
+lemma names that are different from to the existing ones.  See an
+[example suggestion report](./docs/example-suggestion.txt).
+
 For other usages and command line interfaces of Roosterize, please
 check the help:
 ```
 python -m roosterize.main help
 ```
 
+[latest-release]: https://github.com/EngineeringSoftware/roosterize/releases/latest
 [serapi-faq-link]: https://github.com/ejgallego/coq-serapi/blob/v8.10/FAQ.md#does-serapi-support-coqs-command-line-flags
 
 ## Technique
diff --git a/docs/example-suggestion.txt b/docs/example-suggestion.txt
@@ -0,0 +1,108 @@
+$ ~/projects/roosterize/bin/roosterize suggest_naming --file=$PWD/finmap/finmap.v
+== Analyzed 110 lemma names, 8 (7.3%) conform to the learned naming conventions.
+==========
+== 21 can be improved and here are Roosterize's suggestions:
+Line 851: fcatsK => eq_fcat (likelihood: 0.45)
+Line 822: fcatC => eq_fcat (likelihood: 0.44)
+Line 862: fcatKs => eq_fcat (likelihood: 0.43)
+Line 1118: map_key_zip' => eq_zip (likelihood: 0.31)
+Line 1178: zip_supp' => eq_zip (likelihood: 0.31)
+Line 1258: zunit_fcat => zip_fcat (likelihood: 0.30)
+Line 769: disjC => eq_disj (likelihood: 0.30)
+Line 962: mapf_disj => eq_map (likelihood: 0.29)
+Line 526: fcats0 => fcat_nil (likelihood: 0.28)
+Line 1273: zunit_disj => disj_zip (likelihood: 0.27)
+Line 1186: zip_supp => eq_zip (likelihood: 0.27)
+Line 937: mapf_ins => map_ins (likelihood: 0.26)
+Line 525: fcat0s => fcat_nil (likelihood: 0.25)
+Line 443: seqof_ins => path_ordP (likelihood: 0.24)
+Line 945: mapf_fcat => map_fcat (likelihood: 0.23)
+Line 1139: zip_sorted' => map_zip (likelihood: 0.23)
+Line 1006: path_mapk => path_ord_supp (likelihood: 0.23)
+Line 953: mapf_disjL => eq_map (likelihood: 0.23)
+Line 158: fmapP => eq_fnd (likelihood: 0.22)
+Line 1075: zipC' => eq_zip (likelihood: 0.21)
+Line 982: foldf_ins => path_foldfmap (likelihood: 0.20)
+==========
+== 81 can be improved but Roosterize cannot provide good suggestion:
+Line 1082: zipA' (best guess: zip_bind; likelihood: 0.20)
+Line 716: kfilter_pred0 (best guess: kfilter0s; likelihood: 0.19)
+Line 462: path_supp_ins_inv (best guess: path_ord_supp; likelihood: 0.19)
+Line 719: kfilter_predT (best guess: kfilterTE; likelihood: 0.18)
+Line 1017: mapk_ins (best guess: path_ord; likelihood: 0.17)
+Line 528: fcat_inss (best guess: eq_fcat; likelihood: 0.17)
+Line 1194: zip_filter' (best guess: filter_zip; likelihood: 0.16)
+Line 447: path_supp_ins (best guess: path_ord1; likelihood: 0.16)
+Line 532: fcat_sins (best guess: eq_fcat; likelihood: 0.15)
+Line 209: suppE (best guess: eq_suppP; likelihood: 0.15)
+Line 1215: zip_fnd (best guess: eq_zip; likelihood: 0.14)
+Line 979: foldf_nil (best guess: foldfmap_nil; likelihood: 0.14)
+Line 606: supp_eq_ins (best guess: eq_supp; likelihood: 0.13)
+Line 549: fcat_rems (best guess: eq_fcat; likelihood: 0.13)
+Line 97: sorted_filter' (best guess: map_filter; likelihood: 0.13)
+Line 650: sorted_kfilter (best guess: map_kfilter; likelihood: 0.13)
+Line 665: kfilt_nil (best guess: nil_kfilter; likelihood: 0.12)
+Line 120: last_ins' (best guess: path_ord; likelihood: 0.12)
+Line 319: ins_ins (best guess: eq_ins; likelihood: 0.12)
+Line 86: sorted_ins' (best guess: map_ins; likelihood: 0.12)
+Line 657: supp_kfilt (best guess: supp_filter; likelihood: 0.12)
+Line 922: map_key_mapf (best guess: map_inj; likelihood: 0.11)
+Line 134: notin_path (best guess: path_ord; likelihood: 0.11)
+Line 903: feqP (best guess: feqmP; likelihood: 0.11)
+Line 1164: zip_unitE (best guess: existsP; likelihood: 0.11)
+Line 358: rem_empty (best guess: nil_rem; likelihood: 0.11)
+Line 233: supp_nilE (best guess: nilP; likelihood: 0.10)
+Line 518: fcat_nil' (best guess: fcatP; likelihood: 0.09)
+Line 253: fnd_empty (best guess: fnd_eq; likelihood: 0.09)
+Line 262: fnd_ins (best guess: eq_fnd; likelihood: 0.09)
+Line 830: fcatA (best guess: eq_fcat; likelihood: 0.09)
+Line 1153: zip_unit_sorted' (best guess: zip_unit; likelihood: 0.09)
+Line 709: kfilt_fcat (best guess: fcat_kfilter; likelihood: 0.08)
+Line 668: fnd_kfilt (best guess: fndP; likelihood: 0.08)
+Line 844: fcatCA (best guess: eq_fcat; likelihood: 0.08)
+Line 558: fcat_srem (best guess: eq_fcat; likelihood: 0.08)
+Line 621: fmap_ind2 (best guess: eq_supp; likelihood: 0.08)
+Line 415: cancel_ins (best guess: eq_ins; likelihood: 0.08)
+Line 1250: zunit_ins (best guess: zip_unit; likelihood: 0.07)
+Line 76: path_ins' (best guess: map_ord; likelihood: 0.07)
+Line 510: fcat_ins' (best guess: eq_ins'; likelihood: 0.07)
+Line 63: sorted_nil (best guess: ord_sorted; likelihood: 0.07)
+Line 776: disj_nil (best guess: disj_fin; likelihood: 0.07)
+Line 255: fnd_rem (best guess: optio_fnd; likelihood: 0.07)
+Line 142: path_supp_ord (best guess: path_ord; likelihood: 0.07)
+Line 56: fmapE (best guess: prodP; likelihood: 0.07)
+Line 1001: sorted_mapk (best guess: ord_supp; likelihood: 0.06)
+Line 839: fcatAC (best guess: eq_fcat; likelihood: 0.06)
+Line 793: disj_rem (best guess: eqseq_disj; likelihood: 0.05)
+Line 760: disjP (best guess: eq_spec; likelihood: 0.05)
+Line 722: kfilter_predI (best guess: kfilterI1; likelihood: 0.05)
+Line 925: sorted_map (best guess: map_sorted; likelihood: 0.05)
+Line 680: kfilt_ins (best guess: ins_kfilter; likelihood: 0.04)
+Line 726: kfilter_predU (best guess: kfilter_fcat; likelihood: 0.04)
+Line 1206: zip_rem (best guess: fin_zip; likelihood: 0.04)
+Line 586: supp_fcat (best guess: eq_fcat; likelihood: 0.04)
+Line 1115: zip_unitL' (best guess: zip_unit; likelihood: 0.04)
+Line 236: supp_rem (best guess: meC_supp1; likelihood: 0.04)
+Line 277: ins_rem (best guess: eq_ins; likelihood: 0.04)
+Line 1028: map_id (best guess: fin_mapk; likelihood: 0.04)
+Line 698: kfilt_rem (best guess: rem_kfilter; likelihood: 0.03)
+Line 1126: zip_unitE' (best guess: eq_zipP; likelihood: 0.03)
+Line 370: rem_ins (best guess: eq_rem; likelihood: 0.03)
+Line 1033: map_comp (best guess: mapkC_comp; likelihood: 0.03)
+Line 800: disj_remE (best guess: eq_disj; likelihood: 0.03)
+Line 870: disj_kfilt (best guess: eqseq_kfilter; likelihood: 0.03)
+Line 1267: zunit_supp (best guess: zip_unit; likelihood: 0.03)
+Line 243: supp_ins (best guess: mort_supp; likelihood: 0.03)
+Line 689: rem_kfilt (best guess: rem_kfilter; likelihood: 0.03)
+Line 779: disj_ins (best guess: eqseq_disj; likelihood: 0.03)
+Line 471: fmap_ind' (best guess: fin_supp; likelihood: 0.03)
+Line 150: notin_filter (best guess: filter_predk; likelihood: 0.03)
+Line 411: fnd_supp_in (best guess: mex_exists; likelihood: 0.03)
+Line 191: suppP (best guess: supp_spec; likelihood: 0.03)
+Line 361: rem_rem (best guess: eqseq_rem; likelihood: 0.02)
+Line 1247: zunit0 (best guess: zip_comp; likelihood: 0.02)
+Line 124: first_ins' (best guess: map_ord; likelihood: 0.02)
+Line 181: predkN (best guess: prod0_predk; likelihood: 0.02)
+Line 398: rem_supp (best guess: min_rem; likelihood: 0.02)
+Line 483: fmap_ind'' (best guess: min_ord; likelihood: 0.01)
+Line 881: in_disj_kfilt (best guess: in_supp; likelihood: 0.01)
diff --git a/roosterize/interface/CommandLineInterface.py b/roosterize/interface/CommandLineInterface.py
@@ -61,7 +61,7 @@ def __init__(self):
         self.exclude_files = None
         self.exclude_pattern = None
         self.serapi_options = None
-        self.model_url = "https://github.com/EngineeringSoftware/roosterize/releases/download/v1.1.0+8.10.0/roosterize-model-t1.tgz"
+        self.model_url = "https://github.com/EngineeringSoftware/roosterize/releases/download/v1.1.0+8.10.2-beta/roosterize-model-t1.tgz"
         self.compile_cmd = None
         self.loaded_config_prj: Path = None
 
@@ -241,15 +241,15 @@ def report_predictions(self, data: ProcessedFile, candidates_logprobs: List[List
         # Print suggestions
         total = len(good_names) + len(bad_names_and_suggestions) + len(bad_names_no_suggestion)
         print(f"== Analyzed {total} lemma names, "
-              f"{len(good_names)} ({len(good_names)/total:.1%}) look good.")
+              f"{len(good_names)} ({len(good_names)/total:.1%}) conform to the learned naming conventions.")
         if len(bad_names_and_suggestions) > 0:
             print(f"==========")
             print(f"== {len(bad_names_and_suggestions)} can be improved and here are Roosterize's suggestions:")
             for lemma, suggestion, score in sorted(bad_names_and_suggestions, key=lambda x: x[2], reverse=True):
                 print(f"Line {lemma.vernac_command[0].lineno}: {lemma.name} => {suggestion} (likelihood: {score:.2f})")
         if len(bad_names_no_suggestion) > 0:
             print(f"==========")
-            print(f"== {len(bad_names_no_suggestion)} can be improved but Roosterize cannot provide good suggestion (please consider improve_project_model):")
+            print(f"== {len(bad_names_no_suggestion)} can be improved but Roosterize cannot provide good suggestion:")
             for lemma, suggestion, score in sorted(bad_names_no_suggestion, key=lambda x: x[2], reverse=True):
                 print(f"Line {lemma.vernac_command[0].lineno}: {lemma.name} (best guess: {suggestion}; likelihood: {score:.2f})")
 
diff --git a/roosterize/interface/VSCodeInterface.py b/roosterize/interface/VSCodeInterface.py
@@ -56,7 +56,7 @@ def report_predictions(self, data: ProcessedFile, candidates_logprobs: List[List
 
         total = len(good_names) + len(bad_names_and_suggestions) + len(bad_names_no_suggestion)
         self.show_message(f"{data.path}: Analyzed {total} lemma names, "
-                          f"{len(good_names)} ({len(good_names)/total:.1%}) look good. "
+                          f"{len(good_names)} ({len(good_names)/total:.1%}) conform to the learned naming conventions. "
                           f"Roosterize made {len(bad_names_and_suggestions)} suggestions.")
 
         # Publish suggestions as diagnostics