diff --git a/doc/book/PoP-in-FStar/LICENSE b/doc/book/PoP-in-FStar/LICENSE new file mode 100644 index 00000000000..261eeb9e9f8 --- /dev/null +++ b/doc/book/PoP-in-FStar/LICENSE @@ -0,0 +1,201 @@ + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/doc/book/PoP-in-FStar/README.md b/doc/book/PoP-in-FStar/README.md new file mode 100644 index 00000000000..104fc4696c3 --- /dev/null +++ b/doc/book/PoP-in-FStar/README.md @@ -0,0 +1,33 @@ +# PoP-in-FStar +The Proof-oriented Programming in F* Book + + +# Build + +To set up an environment to build the book, you will need python3, sphinx, and the sphinx_rtd_theme. + +Once you have python3, on Ubuntu-24, install using: + +* sudo apt install python3-sphinx +* sudo apt install python3-sphinx_rtd_theme + +Then, see book/Makefile: + +- set FSTAR_HOME, if the default is not correct +- likewise, set PULSE_HOME + +Then, + +* make -C book prep #to copy source files from F* and Pulse into the book build folders +* make -C book html +* make -C book pdf + +You should have book/_build/html/index.html + + +# Deploy + + +Set FSTARLANG_ORG_ROOT in book/Makefile + +* make deploy diff --git a/doc/book/PoP-in-FStar/book/.gitignore b/doc/book/PoP-in-FStar/book/.gitignore new file mode 100644 index 00000000000..b567567d98d --- /dev/null +++ b/doc/book/PoP-in-FStar/book/.gitignore @@ -0,0 +1,4 @@ +__pycache__ +code +_build +*~ \ No newline at end of file diff --git a/doc/book/PoP-in-FStar/book/IncrPair.fst b/doc/book/PoP-in-FStar/book/IncrPair.fst new file mode 100644 index 00000000000..629175ce07c --- /dev/null +++ b/doc/book/PoP-in-FStar/book/IncrPair.fst @@ -0,0 +1,38 @@ +module IncrPair +open Steel.Memory +open Steel.Effect +open Steel.Reference +open Steel.FractionalPermission +open Steel.Effect.Atomic +open FStar.Ghost +assume +val pts_to (#a:Type u#0) + (r:ref a) + ([@@@ smt_fallback] v:a) + : slprop u#1 + +assume +val read (#a:Type) (#v:Ghost.erased a) (r:ref a) + : Steel a (pts_to r v) (fun x -> pts_to r x) + (requires fun _ -> True) + (ensures fun _ x _ -> x == Ghost.reveal v) + +assume +val write (#a:Type0) (#v:Ghost.erased a) (r:ref a) (x:a) + : SteelT unit (pts_to r v) + (fun _ -> pts_to r x) + +// +let incr (#v:Ghost.erased int) (r:ref int) () + : SteelT unit (pts_to r v) + (fun _ -> pts_to r (v + 1)) + = let x = read r in + write #_ #(Ghost.hide x) r (x + 1); + change_slprop (pts_to r (x + 1)) (pts_to r (v + 1)) (fun _ -> ()) + +//SNIPPET_START: par_incr +let par_incr (#v0 #v1:erased int) (r0 r1:ref int) + : SteelT _ (pts_to r0 v0 `star` pts_to r1 v1) + (fun _ -> pts_to r0 (v0 + 1) `star` pts_to r1 (v1 + 1)) + = par (incr r0) (incr r1) +//SNIPPET_END: par_incr diff --git a/doc/book/PoP-in-FStar/book/Makefile b/doc/book/PoP-in-FStar/book/Makefile new file mode 100644 index 00000000000..e60ff8a33c8 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/Makefile @@ -0,0 +1,56 @@ +# Minimal makefile for Sphinx documentation +# + +# This book lives at doc/book/PoP-in-FStar/book/ within the FStar repo +export FSTAR_HOME?=$(realpath ../../../..) +export PULSE_HOME?=$(FSTAR_HOME)/pulse + +FSTARLANG_ORG_ROOT?=www + +# You can set these variables from the command line. +SPHINXOPTS = +SPHINXBUILD = sphinx-build +SPHINXPROJ = FStarBook +SOURCEDIR = . +BUILDDIR = _build + +prep: + mkdir -p code + ln -sfn $(realpath $(FSTAR_HOME)/doc/book/code)/* code/ + ln -sfn $(realpath $(PULSE_HOME)/share/pulse/examples/by-example) code/pulse + + +html: Makefile + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + find _build -name "*.html" | xargs sed -i 's/_static/static/g' + rm -rf _build/html/static + mv _build/html/_static _build/html/static + +LATEXFILE=proof-orientedprogramminginf.tex + +pdf: + @$(SPHINXBUILD) -M latex "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + sed -i -e 's|\\chapter|\\part|g' $(BUILDDIR)/latex/$(LATEXFILE) + sed -i -e 's|\\section|\\chapter|g' $(BUILDDIR)/latex/$(LATEXFILE) + sed -i -e 's|\\subsection|\\section|g' $(BUILDDIR)/latex/$(LATEXFILE) + sed -i -e 's|\\subsubsection|\\subsection|g' $(BUILDDIR)/latex/$(LATEXFILE) + sed -i -e 's|\\sphinxhref{../code/|\\sphinxhref{https://fstar-lang.org/tutorial/book/code/|g' $(BUILDDIR)/latex/$(LATEXFILE) + sed -i -e 's|\\part{Structure of this book}|\\begin{center}\\bigskip{\\Large \\textbf{Structure of this book}}\\bigskip\\end{center}|g' $(BUILDDIR)/latex/$(LATEXFILE) + $(MAKE) -C $(BUILDDIR)/latex + +# Put it first so that "make" without argument is like "make help". +help: + @$(SPHINXBUILD) -M help "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) + +deploy: + rm -rf $(FSTARLANG_ORG_ROOT)/tutorial/book/ + cp -R _build/html $(FSTARLANG_ORG_ROOT)/tutorial/book/ + cp -R code $(FSTARLANG_ORG_ROOT)/tutorial/book/ + cp _build/latex/proof-orientedprogramminginf.pdf $(FSTARLANG_ORG_ROOT)/tutorial/proof-oriented-programming-in-fstar.pdf + +.PHONY: help Makefile + +# Catch-all target: route all unknown targets to Sphinx using the new +# "make mode" option. $(O) is meant as a shortcut for $(SPHINXOPTS). +%: Makefil + @$(SPHINXBUILD) -M $@ "$(SOURCEDIR)" "$(BUILDDIR)" $(SPHINXOPTS) $(O) diff --git a/doc/book/PoP-in-FStar/book/MemCpy.c b/doc/book/PoP-in-FStar/book/MemCpy.c new file mode 100644 index 00000000000..194dab7a576 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/MemCpy.c @@ -0,0 +1,3 @@ +void alloc_copy_free() { + //TODO +} diff --git a/doc/book/PoP-in-FStar/book/MemCpy.fst b/doc/book/PoP-in-FStar/book/MemCpy.fst new file mode 100644 index 00000000000..b32bf92abec --- /dev/null +++ b/doc/book/PoP-in-FStar/book/MemCpy.fst @@ -0,0 +1,66 @@ +(* + Copyright 2008-2018 Microsoft Research + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. +*) +module MemCpy +open Demo.Deps + +(** + A demo of F*, Low* and KaRaMeL + Copying a buffer of bytes +**) + + +// SNIPPET_START: memcpy +(* + ``memcpy len src cur `` is an imperative procecdure + to copy the contents of the ``src`` array into ``dest`` +*) +let rec memcpy + (len:uint32) //a 32-bit unsigned integero + (cur:uint32{cur <= len}) //current position cur is no more than len + (src dest:larray len uint8) //two arrays whose length is len + : ST unit //A stateful computation, that can read, write, allocate or free memory + (requires fun h -> //with a precondition on the initial state h + live h src /\ //expects src to be live + live h dest /\ //expects dest to be live + disjoint src dest /\ //and the two do not overlap in memory + prefix_equal h src dest cur) //their contents are initially equal up to cur + (ensures fun h0 _ h1 -> //and a postcondition relating their initial and final states + modifies1 dest h0 h1 /\ //modifies only the dest array + prefix_equal h1 src dest len) //and src and dest are equal up to len + = (* The implementation of the function begins here *) + if cur < len + then ( + dest.(cur) <- src.(cur); //copy the cur byte + memcpy len (cur + 1ul) src dest //recurse + ) +// SNIPPET_END: memcpy + + +// SNIPPET_START: alloc_copy_free +let alloc_copy_free + (len:uint32) + (src:lbuffer len uint8) + : ST (lbuffer len uint8) + (requires fun h -> + live h src) + (ensures fun h0 dest h1 -> + live h1 dest /\ + equal h0 src h1 dest) + = let dest = alloc len 0uy in + memcpy len src dest; + free src; + dest +// SNIPPET_END: alloc_copy_free diff --git a/doc/book/PoP-in-FStar/book/_templates/layout.html b/doc/book/PoP-in-FStar/book/_templates/layout.html new file mode 100644 index 00000000000..00b05a6de76 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/_templates/layout.html @@ -0,0 +1,7 @@ +{% extends '!layout.html' %} +{% block document %} +{{super()}} + + + +{% endblock %} diff --git a/doc/book/PoP-in-FStar/book/_templates/page.html b/doc/book/PoP-in-FStar/book/_templates/page.html new file mode 100644 index 00000000000..0707f2a5eb5 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/_templates/page.html @@ -0,0 +1,14 @@ +{% extends "!page.html" %} + +{% block footer %} + +{% endblock %} diff --git a/doc/book/PoP-in-FStar/book/conf.py b/doc/book/PoP-in-FStar/book/conf.py new file mode 100644 index 00000000000..99819275de8 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/conf.py @@ -0,0 +1,137 @@ +# -*- coding: utf-8 -*- +# +# Configuration file for the Sphinx documentation builder. +# +# This file does only contain a selection of the most common options. For a +# full list see the documentation: +# http://www.sphinx-doc.org/en/stable/config + +# -- Path setup -------------------------------------------------------------- + +# If extensions (or modules to document with autodoc) are in another directory, +# add these directories to sys.path here. If the directory is relative to the +# documentation root, use os.path.abspath to make it absolute, like shown here. +# +import os +import sys + +sys.path.insert(0, os.path.abspath('.')) +import fstar_pygments +import smt2_pygments +from sphinx.highlighting import lexers + +lexers['fstar'] = fstar_pygments.CustomLexer() +lexers['smt2'] = smt2_pygments.CustomLexer() +lexers['pulse'] = fstar_pygments.PulseLexer() + +def setup(app): + app.add_css_file('custom.css') + +# -- Project information ----------------------------------------------------- + +project = u'Proof-Oriented Programming in F*' +copyright = u'2020, Microsoft Research' +author = u'Nikhil Swamy, Guido Martínez, and Aseem Rastogi' + +# The short X.Y version +version = u'' +# The full version, including alpha/beta/rc tags +release = u'' + + +# -- General configuration --------------------------------------------------- + +# If your documentation needs a minimal Sphinx version, state it here. +# +# needs_sphinx = '1.0' + +# Add any Sphinx extension module names here, as strings. They can be +# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom +# ones. +extensions = [ + 'sphinx.ext.mathjax', +] + +# Add any paths that contain templates here, relative to this directory. +templates_path = ['_templates'] + +# The suffix(es) of source filenames. +# You can specify multiple suffix as a list of string: +# +# source_suffix = ['.rst', '.md'] +source_suffix = '.rst' + +# The master toctree document. +master_doc = 'index' + +# The language for content autogenerated by Sphinx. Refer to documentation +# for a list of supported languages. +# +# This is also used if you do content translation via gettext catalogs. +# Usually you set "language" from the command line for these cases. +language = 'en' + +# List of patterns, relative to source directory, that match files and +# directories to ignore when looking for source files. +# This pattern also affects html_static_path and html_extra_path . +exclude_patterns = [u'_build', 'Thumbs.db', '.DS_Store'] + +# The name of the Pygments (syntax highlighting) style to use. +pygments_style = 'sphinx' + + +# -- Options for HTML output ------------------------------------------------- + +# The theme to use for HTML and HTML Help pages. See the documentation for +# a list of builtin themes. +# +html_theme = 'sphinx_rtd_theme' + +# Theme options are theme-specific and customize the look and feel of a theme +# further. For a list of options available for each theme, see the +# documentation. +# +# html_theme_options = {} + +# Add any paths that contain custom static files (such as style sheets) here, +# relative to this directory. They are copied after the builtin static files, +# so a file named "default.css" will overwrite the builtin "default.css". +html_static_path = ['static'] + +# Custom sidebar templates, must be a dictionary that maps document names +# to template names. +# +# The default sidebars (for documents that don't match any pattern) are +# defined by theme itself. Builtin themes are using these templates by +# default: ``['localtoc.html', 'relations.html', 'sourcelink.html', +# 'searchbox.html']``. +# +# html_sidebars = {} + +html_show_sourcelink = False + +# -- Options for HTMLHelp output --------------------------------------------- + +# Output file base name for HTML help builder. +htmlhelp_basename = 'FStarDoc' + + +# -- Options for LaTeX output ------------------------------------------------ + +latex_elements = { + # The paper size ('letterpaper' or 'a4paper'). + # + # 'papersize': 'letterpaper', + + # The font size ('10pt', '11pt' or '12pt'). + # + # 'pointsize': '10pt', + + # Additional stuff for the LaTeX preamble. + # + # 'preamble': '', + + # Latex figure (float) alignment + # + # 'figure_align': 'htbp', +} diff --git a/doc/book/PoP-in-FStar/book/fstar_pygments.py b/doc/book/PoP-in-FStar/book/fstar_pygments.py new file mode 100644 index 00000000000..e248f5dd411 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/fstar_pygments.py @@ -0,0 +1,131 @@ +from pygments.lexer import RegexLexer, words +from pygments.token import * + +fstar_keywords = ( + 'attributes' , + 'noeq' , + 'unopteq' , + 'and' , + 'assert' , + 'assert_norm' , + 'assume' , + 'begin' , + 'by' , + 'calc' , + 'class' , + 'decreases' , + 'Dv' , + 'effect' , + 'eliminate' , + 'else' , + 'end' , + 'ensures' , + 'exception' , + 'exists' , + 'false' , + 'friend' , + 'forall' , + 'fun' , + 'function' , + 'GTot' , + 'if' , + 'in' , + 'include' , + 'inline' , + 'inline_for_extraction' , + 'instance' , + 'introduce' , + 'irreducible', + 'let' , + 'logic' , + 'match' , + 'module' , + 'new' , + 'new_effect' , + 'layered_effect' , + 'polymonadic_bind' , + 'polymonadic_subcomp' , + 'SMTPat' , + 'noextract', + 'of' , + 'open' , + 'opaque' , + 'private' , + 'range_of' , + 'rec' , + 'reifiable' , + 'reify' , + 'reflectable', + 'requires' , + 'returns' , + 'set_range_of', + 'sub_effect' , + 'synth' , + 'then' , + 'total' , + 'Tot' , + 'true' , + 'try' , + 'type' , + 'unfold' , + 'unfoldable' , + 'val' , + 'when' , + 'with' , + '_' , + 'Lemma' , +) + +# very rough lexer; not 100% precise +class CustomLexer(RegexLexer): + name = 'FStar' + aliases = ['fstar'] + filenames = ['*.fst', '*.fsti'] + tokens = { + 'root': [ + (r' ', Text), + (r'\n', Text), + (r'\r', Text), + (r'//.*\n', Comment), + (r'\([*]([^*]|[*]+[^)])*[*]+\)', Comment), + (words(fstar_keywords, suffix=r'\b'), Keyword), + (r'[a-zA-Z_0-9]+', Text), + (r'.', Text), + ] + } + +pulse_keywords = ( + "fn", + "fold", + "rewrite", + "each", + "mut", + "ghost", + "atomic", + "show_proof_state", + "while", + "invariant", + "with_invariants", + "opens", + "parallel" +) + +class PulseLexer(RegexLexer): + name = 'Pulse' + aliases = ['pulse'] + filenames = ['*.fst', '*.fsti'] + tokens = { + 'root': [ + (r' ', Text), + (r'\n', Text), + (r'\r', Text), + (r'//.*\n', Comment), + (r'\([*]([^*]|[*]+[^)])*[*]+\)', Comment), + (words(fstar_keywords, suffix=r'\b'), Keyword), + (words(pulse_keywords, suffix=r'\b'), Keyword), + (r'[a-zA-Z_]+', Text), + (r'.', Text), + ] + } + +#class CustomFormatter: diff --git a/doc/book/PoP-in-FStar/book/index.rst b/doc/book/PoP-in-FStar/book/index.rst new file mode 100644 index 00000000000..8a52ce31395 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/index.rst @@ -0,0 +1,58 @@ +.. The main file for the F* manual + You can adapt this file completely to your liking, but it should at least + contain the root `toctree` directive. + +.. + developed at `Microsoft Research + `_, `MSR-Inria + `_, and `Inria `_. + + +.. role:: smt2(code) + :language: smt2 + +Proof-oriented Programming in F* +================================ + + +F* is a dependently typed programming language and proof +assistant. This book describes how to use F* for *proof-oriented +programming*, a paradigm in which one co-designs programs and proofs +to provide mathematical guarantees about various aspects of a +program's behavior, including properties like functional correctness +(precisely characterizing the input/output behavior of a program), +security properties (e.g., ensuring that a program never leaks certain +secrets), and bounds on resource usage. + +Although a functional programming language at its core, F* promotes +programming in a variety of paradigms, including programming with +pure, total functions, low-level programming in imperative languages +like C and assembly, concurrent programming with shared memory and +message-passing, and distributed programming. Built on top of F*'s +expressive, dependently typed core logic, no matter which paradigm you +choose, proof-oriented programming in F* enables constructing programs +with proofs that they behave as intended. + +**A note on authorship**: Many people have contributed to the +development of F* over the past decade. Many parts of this book too +are based on research papers, libraries, code samples, and language +features co-authored with several other people. However, the +presentation here, including especially any errors or oversights, are +due to the authors. That said, contributions are most welcome and we +hope this book will soon include chapters authored by others. + + + +.. toctree:: + :maxdepth: 2 + :caption: Contents: + + structure + intro + part1/part1 + part2/part2 + part3/part3 + part4/part4 + part5/part5 + pulse/pulse + under_the_hood/under_the_hood diff --git a/doc/book/PoP-in-FStar/book/intro.rst b/doc/book/PoP-in-FStar/book/intro.rst new file mode 100644 index 00000000000..d2443c3a567 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/intro.rst @@ -0,0 +1,491 @@ +############ +Introduction +############ + +A Capsule Summary of F* +----------------------- + +F* is a dependently type programming language that aims to play +several roles: + +* A general purpose programming language, which encourages + higher-order functional programming with effects, in the tradition + of the ML family of languages. + +* A compiler, which translates F* programs to OCaml or F#, and even C + or Wasm, for execution. + +* A proof assistant, in which to state and prove properties of + programs. + +* A program verification engine, leveraging SMT solvers to partially + automate proofs of programs. + +* A metaprogramming system, supporting the programmatic construction + of F* programs and proof automation procedures. + +To achieve these goals, the design of F* revolves around a few key +elements, described below. Not all of this may make sense to +you---that's okay, you'll learn about it as we go. + +* A core language of total functions with full dependent types, + including an extensional form of type conversion, indexed inductive + types, and pattern matching, recursive functions with semantic + termination checking, dependent refinement types and subtyping, and + polymorphism over a predicative hierarchy of universes. + +* A system of user-defined indexed effects, for modeling, + encapsulating, and statically reasoning about various forms of + computational effects, including a primitive notion of general + recursion and divergence, as well as an open system of user-defined + effects, with examples including state, exceptions, concurrency, + algebraic effects, and several others. + +* A built-in encoding of a classical fragment of F*'s logic into the + first order logic of an SMT solver, allowing many proofs to be + automatically discharged. + +* A reflection within F* of the syntax and proof state of F*, enabling + Meta-F* programs to manipulate F* syntax and proof goals and for + users to build proofs interactively with tactics. + + +DSLs Embedded in F* +~~~~~~~~~~~~~~~~~~~ + +In practice, rather than a single language, the F* ecosystem is also a +collection of domain-specific languages (DSLs). A common use of F* is +to embed within it programming languages at different levels of +abstraction or for specific programming tasks, and for the embedded +language to be engineered with domain-specific reasoning, proof +automation, and compilation backends. Some examples include: + +* Low*, an shallowly embedded DSL for sequential programming against a + C-like memory model including explicit memory management on the + stack and heap; a Hoare logic for partial correctness based on + implicit dynamic frames; and a custom backend (Karamel) to compile + Low* programs to C for further compilation by off-the-shelf C + compilers. + +* EverParse, a shallow embedding of a DSL (layered on top of the Low* + DSL) of parser and serializer combinators, for low-level binary + formats. + +* Vale, a deeply embedded DSL for structured programming in a + user-defined assembly language, with a Hoare logic for total + correctness, and a printer to emit verified programs in a assembly + syntax compatible with various standard assemblers. + +* Steel, a shallow embedding of concurrency as an effect in F*, with + an extensible concurrent separation logic for partial correctness as + a core program logic, and proof automation built using a combination + of Meta-F* tactics, higher-order unification, and SMT. + +* Pulse, a successor of Steel, a DSL with custom syntax and + typechecking algorithm, providing proofs in a small but highly + expressive core logic for mutable state and concurrency called + PulseCore, formalized entirely in terms of pure and ghost functions + in F*. + +.. _Intro_Vec: + +To get a taste of F*, let's dive right in with some examples. At this +stage, we don't expect you to understand these examples in detail, +though it should give you a flavor of what is possible with F*. + +F* is a dependently typed language +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Dependently typed programming enables one to more precisely capture +properties and invariants of a program using types. Here's a classic +example: the type ``vec a n`` represents an ``n``-dimensional vector +of ``a``-typed elements; or, more simply, a list of ``n`` values each +of type ``a``. Like other dependently typed languages, F* supports +inductively defined definitions of types. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: vec + :end-before: SNIPPET_END: vec + +Operations on vectors can be given types that describe their +behavior in terms of vector lengths. + +For example, here's a recursive function ``append`` to concatenate two +vectors. Its type shows that the resulting vector has a length that is +the sum of the lengths of the input vectors. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: append + :end-before: SNIPPET_END: append + +Of course, once a function like ``append`` is defined, it can be used +to define other operations and its type helps in proving further +properties. For example, it's easy to show that reversing a vector +does not change its length. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: reverse + :end-before: SNIPPET_END: reverse + +Finally, to get an element from a vector, one can program a selector +whose type also includes a *refinement type* to specify that the index +``i`` is less than the length of the vector. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: get + :end-before: SNIPPET_END: get + +While examples like this can be programmed in other dependently typed +languages, they can often be tedious, due to various technical +restrictions. F* provides a core logic with a more flexible notion of +equality to make programming and proving easier. For now, a takeaway +is that dependently typed programming patterns that are `quite +technical in other languages +`_ are often +fairly natural in F*. You'll learn more about this in :ref:`a later +chapter `. + + +F* supports user-defined effectful programming +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +While functional programming is at the heart of the language, F* is +about more than just pure functions. In fact, F* is a Turing complete +language. That this is even worth mentioning may come as a surprise to +readers with a background in general-purpose programming languages +like C# or Scala, but not all dependently typed languages are Turing +complete, since nontermination can break soundness. However, F* +supports general recursive functions and non-termination in a safe +manner, without compromising soundness. + +Beyond nontermination, F* supports a system of user-defined +computational effects which can be used to model a variety of +programming idioms, including things like mutable state, exceptions, +concurrency, IO, etc. + +Here below is some code in an F* dialect called Low* +which provides a sequential, imperative C-like programming model with +mutable memory. The function ``malloc_copy_free`` allocates an array +``dest``, copies the contents of an array of bytes ``src`` into a +``dest``, deallocates ``src`` and returns ``dest``. + +.. literalinclude:: code/MemCpy.fst + :language: fstar + :start-after: SNIPPET_START: malloc_copy_free + :end-before: SNIPPET_END: malloc_copy_free + +It'll take us until much later to explain this code in +full detail, but here are two main points to take away: + + * The type signature of the procedure claims that under specific + constraints on a caller, ``malloc_copy_free`` is *safe* to execute + (e.g., it does not read outside the bounds of allocated memory) + and that it is *correct* (i.e., that it successfully copies + ``src`` to ``dest`` without modifying any other memory) + + * Given the implementation of a procedure, F* actually builds a + mathematical proof that it is safe and correct with respect to its + signature. + +While other program verifiers offer features similar to what we've +used here, a notable thing about F* is that the semantics of programs +with side effects (like reading and writing memory) is entirely +encoded within F*'s logic using a system of user-defined effects. + +Whereas ``malloc_copy_free`` is programmed in Low* and specified using +a particular kind of `Floyd-Hoare logic +`_, there's nothing really +special about it in F*. + +Here, for example, is a concurrent program in another user-defined F* +dialect called Steel. It increments two heap-allocated +references in parallel and is specified for safety and correctness in +`concurrent separation logic +`_, a different kind +of Floyd-Hoare logic than the one we used for ``malloc_copy_free``. + +.. literalinclude:: IncrPair.fst + :language: fstar + :start-after: SNIPPET_START: par_incr + :end-before: SNIPPET_END: par_incr + +As an F* user, you can choose a programming model and a suite of +program proof abstractions to match your needs. You'll learn more +about this in the section on :ref:`user-defined effects `. + +.. _Part1_symbolic_computation: + +F* proofs use SMT solving, symbolic computation and tactics +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Stating a theorem or lemma in F* amounts to declaring a type signature +and doing a proof corresponds to providing an implementation of that +signature. Proving theorems can take a fair bit of work by a human and +F* seeks to reduce that burden, using a variety of techniques. + +**SMT Solving** + +Proving even a simple program often involves proving dozens or +hundreds of small facts, e.g., proving that bounded arithmetic doesn't +overflow, or that ill-defined operations like divisions by zero never +occur. All these little proofs can quickly overwhelm a user. + +The main workhorse for proofs in F* is an automated theorem prover, +known as a *Satisfiability Modulo Theories*, or SMT, solver. The F* +toolchain integrates the `Z3 SMT Solver +`_. + +By default, the F* typechecker collects all the facts that must be +proven in a program and encodes them to the SMT solver, an engine that +is capable of solving problems in various combinations of mathematical +logics---F* encodes problems to Z3 in a combination of first-order +logic, with uninterpreted functions and integer arithmetic. + +Z3 is remarkably effective at solving the kinds of problems that F* +generates for it. The result is that some F* programs enjoy a high +level of automation, e.g., in ``memcpy``, we specified a pre- and +postcondition and a loop invariant, and the system took care of all +the remaining proofs. + +You'll learn more about how to use leverage Z3 to prove theorems in F* +in :ref:`this chapter `. + +That said, Z3 cannot solve all problems that F* feeds to it. As such, +F* offers several other mechanisms with varying levels of user +control. + + +**Symbolic computation** + +SMT solvers are great at proofs that involve equational rewriting, but +many proofs can be done simply by computation. In fact, proofs by +computation are a distinctive feature of many dependently typed +languages and F* is no exception. + +As a very simple example, consider proving that ``pow2 12 == 4096``, +where ``pow2`` is the recursive function shown below. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: norm_spec + :end-before: SNIPPET_END: norm_spec + +An easy way to convince F* of this fact is to ask it (using +``normalize_term_spec``) to simply compute the result of ``pow2 12`` +on an interpreter that's part of the F* toolchain, which it can do +instantly, rather than relying on an SMT solvers expensive equational +machinery to encode the reduction of a recursive function. + +This reduction machinery (called the *normalizer*) is capable not only +of fully computing terms like ``pow2 12`` to a result, but it can also +partially reduce symbolic F* terms, as shown in the proof below. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: trefl + :end-before: SNIPPET_END: trefl + +The proof invokes the F* normalizer from a tactic called ``T.trefl``, +another F* feature that we'll review quickly, next. + +**Tactics and Metaprogramming** + +Finally, for complete control over a proof, F* includes a powerful +tactic and metaprogramming system. + +Here's a simple example of an interactive proof of a simple fact about +propositions using F* tactics. + +.. literalinclude:: code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: tac + :end-before: SNIPPET_END: tac + +This style of proof is similar to what you might find in systems like +Coq or Lean. An F* tactic is just an F* program that can manipulate F* +proof states. In this case, to prove the theorem +``a ==> b ==> (b /\ a)``, we apply commands to transform the proof +state by applying the rules of propositional logic, building a +proof of the theorem. + +Tactics are an instance of a more general metaprogramming system in +F*, which allows an F* program to generate other F* programs. + + +F* programs compile to OCaml and F#, C and Wasm +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Of course, you'll want a way to actually execute the programs you +write. For this, F* provides several ways to compile a program to +other languages for execution, including support to compile programs +to OCaml, F#, C and Wasm. + +As such, a common way to use F* is to develop critical components of +larger software systems in it, use its proof-oriented facilities to +obtain assurances about those components, and then to integrate those +formally proven components into a larger system by compiling the F* +program to C, OCaml, or F# and linking the pieces together. + +In this case, using a tool called `KaRaMeL +`_, a compiler used with F*, we +can produce the following C code for ``memcpy``. + +.. literalinclude:: code/out/MemCpy.c + :language: c + :start-after: SNIPPET_START: malloc_copy_free + :end-before: SNIPPET_END: malloc_copy_free + +Notice that the code we get contains no additional runtime checks: the +detailed requires and ensures clauses are all gone and what's left is +just a plain C code. Later we'll see how to actually write loops, so +that you're not left with recursive functions in C. The point is that +all the proof and specification effort is done before the program is +compiled, imposing no runtime overhead at all. + +To F*, or not to F*? +~~~~~~~~~~~~~~~~~~~~ + +We've quickly seen a bit of what F* has to offer---that may have been +bit overwhelming, if you're new to program proofs. So, you may be +wondering now about whether it's worth learning F* or not. Here are +some things to consider. + +If you like programming and want to get better at it, no matter what +your level is, learning about program proofs will help. Proving a +program, or even just writing down a specification for it, forces you +to think about aspects of your program that you may never have +considered before. There are many excellent resources available to +learn about program proofs, using a variety of other tools, including +some of the following: + + * `Software Foundations + `_: A comprehensive + overview of programming language semantics and formal proofs in + the Coq proof assistant. + + * `A Proof Assistant for Higher-Order Logic + `_: A tutorial on the + Isabelle/HOL proof assistant. + + * `Certified Programming with Dependent Types + `_: Provides an introduction to + proof engineering in Coq. + + * `Type-driven Development + `_: + Introduces using dependent types to developing programs correctly + in Idris. + + * `Theorem Proving in Lean + `_: This is + the standard reference for learning about the Lean theorem prover, + though there are several other `resources + `_ too. + + * `Dafny resources + `_: A different + flavor than all of the above, Dafny is an SMT powered program + verifier for imperative programs. + + * `Liquid Haskell + `_: This + tutorial showcases proving programs with refinement types. + +All of these are excellent resources and each tool has unique +offerings. This book about F* offers a few unique things too. We +discuss a few pros and cons, next. + +**Dependent Types and Extensionality** + +F*'s dependent types are similar in expressiveness to Coq, Lean, Agda, +or Idris, i.e., the expressive power allows formalizing nearly all +kinds of mathematics. What sets F* apart from these other languages +(and more like Nuprl) is its extensional notion of type equality, +making many programming patterns significantly smoother in F* (cf. the +:ref:`vector ` example). However, this design also makes +typechecking in F* undecidable. The practical consequences of this are +that F* typechecker can time-out and refuse to accept your +program. Other dependently typed languages have decidable +typechecking, though they can, in principle, take arbitrarily long to +decide whether or not your program is type correct. + +**A Variety of Proof Automation Tools** + +F*'s use of an SMT solver for proof automation is unique among +languages with dependent types, though in return, one needs to also +trust the combination of F* and Z3 to believe in the validity of an F* +proof. Isabelle/HOL provides similar SMT-assisted automation (in its +Sledgehammer tool), for the weaker logic provided by HOL, though +Sledgehammer's design ensures that the SMT solver need not be +trusted. F*'s use of SMT is also similar to what program verifiers +like Dafny and Liquid Haskell offer. However, unlike their SMT-only +proof strategies, F*, like Coq and Lean, also provides symbolic +reduction, tactics, and metaprogramming. That said, F*'s tactic and +metaprogramming engines are less mature than other systems where +tactics are the primary way of conducting proofs. + +**A Focus on Programming** + +Other dependently typed languages shine in their usage in formalizing +mathematics---Lean's `mathlib +`_ and Coq's +`Mathematical Components `_ are two +great examples. In comparison, to date, relatively little pure +mathematics has been formalized in F*. Rather, F*, with its focus on +effectful programming and compilation to mainstream languages like C, +has been used to produce industrial-grade high-assurance software, +deployed in settings like the `Windows +`_ +and `Linux `_ kernels, among `many +others `_. + +**Maturity and Community** + +Isabelle/HOL and Coq are mature tools that have been developed and +maintained for many decades, have strong user communities in academia, +and many sources of documentation. Lean's community is growing fast +and also has excellent tools and documentation. F* is less mature, its +design has been the subject of several research papers, making it +somewhat more experimental. The F* community is also smaller, its +documentation is more sparse, and F* users are usually in relatively +close proximity to the F* development team. However, F* developments +also have a good and growing track record of industrial adoption. + + +A Bit of F* History +~~~~~~~~~~~~~~~~~~~ + +F* is an open source project at `GitHub +`_ by researchers at a number of +institutions, including `Microsoft Research +`_, `MSR-Inria +`_, `Inria `_, +`Rosario `_, and `Carnegie-Mellon `_. + +**The name** The F in F* is a homage to System F +(https://en.wikipedia.org/wiki/System_F) which was the base calculus +of an early version of F*. We've moved beyond it for some years now, +however. The F part of the name is also derived from several prior +languages that many authors of F* worked on, including `Fable +`_, `F7 +`_, +`F9 +`_, +`F5 +`_, +`FX +`_, +and even `F# `_. + +The "\*" was meant as a kind of fixpoint operator, and F* was meant to +be a sort of fixpoint of all those languages. The first version of F* +also had affine types and part of the intention then was to use affine +types to encode separation logic---so the "\*" was also meant to evoke +the separation logic "\*". But, the early affine versions of F* never +really did have separation logic. It took until almost +a decade later to have a separation logic embedded in F* (see Steel), +though without relying on affine types. diff --git a/doc/book/PoP-in-FStar/book/notes b/doc/book/PoP-in-FStar/book/notes new file mode 100644 index 00000000000..f97b42411e8 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/notes @@ -0,0 +1,438 @@ +Comparing F* to other program verifiers +....................................... + +If you're coming to F* having learned about other SMT-backed +verification-oriented languages like `Dafny `_ or `Vcc `_, +you might be wondering if F* is really any different. Here are some +points of similarity and contrast. + +**User-defined language abstractions** + +Perhaps the biggest difference with other program verifiers, is that +rather than offering a fixed set of constructs in which to specify and +verify a progam, F* offers a framework in which users can design their +own abstractions, often at the level of a domain-specific language, in +which to build their programs and proofs. + +More concretely, ``memcpy`` is programmed in a user-defined language +embedded in F* called :ref:`Low* `, which targets sequential, +imperative C-like programming model with mutable heap- and +stack-allocated memory. + +There's nothing particular special about Low*. Here's + + +There are a few differences + + + +The signature of ``memcpy`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The type signature of ``memcpy`` is very detailed, specifying many +properties about the safety and correctness of ``memcpy``, much more +than in most other languages. + +**The arguments of ``memcpy``** + + * ``len``, a 32 bit unsigned integer, or a ``uint32`` + + * ``cur``, also a ``uint32``, representing the current iteration + index, but with a constraint requiring it to be bounded by ``len``. + + * ``src`` and ``dest``, pointers to arrays of bytes (``uint8``), + both with length at least ``len``. + +**The return type and effect** + +The next line ``ST unit`` states that ``memcpy`` is a function, that +may, as a side effect, read, write, allocate or deallocate memory, and +returns a value of type ``unit``---if the ``unit`` type is unfamiliar +to you, from a C or Java programmer's perspective, think of it as +returning ``void``. + +**The precondition** + +Now we get to the really interesting part, the ``requires`` and +``ensures`` clauses, describing the pre- and postconditions of +``memcpy``. In order to safely invoke ``memcpy``, a caller must prove +the following properties when the current state of the program is +``h``: + + * ``live h src``: The ``src`` array has been allocated in memory and + not deallocated yet. This is to ensure that ``memcpy`` does not + attempt to read memory that is not currently allocated, protecting + against common violations of `memory safety + `_, like + `use-after-free bugs + `_. + + * ``live h dest``: Likewise, the ``dest`` array is also allocated + and not deallocated yet. + + * ``disjoint src dest``: The ``src`` and ``dest`` arrays should + point to non-overlapping arrays in memory---if they did not, then + writing to the ``dest`` array could overwrite the contents of the + ``src`` array. + + * ``prefix_equal h src dest cur``: The contents of the ``src`` and + ``dest`` arrays are equal until the current iteration index + ``cur``. + +**The postcondition** + +Finally, the ``ensures`` clause describes what ``memcpy`` guarantees, +by relating the contents of the memory state ``h0`` in when ``memcpy`` +was called to the memory state ``h1`` at the time ``memcpy`` returns. + + * ``modifies1 dest h0 h1`` guarantees that ``memcpy`` only modified + the ``dest`` + + * ``prefix_equal h1 src dest len`` guarantees that the ``src`` and + ``dest`` arrays have equal contents all the way up to ``len`` + +The implementation and proof of ``memcpy`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The signature of ``memcpy`` is significantly longer than its +implementation---that's because many aspects of safety are left +implicit in the implementation and, in this simple case, the +implementation of ``memcpy`` is really quite simple. It just checks if +the ``cur`` index is still less than the length of the array, and, if +so, copies one byte over and recurses while advancing the ``cur`` +position. + +What's left implicit here is a proof that ``memcpy`` actually does +obey its signature. F* actually builds a mathematical proof behind the +scenes that ``memcpy`` is safe and correct with respect to its +specification. In this case, that proof is done by F*'s typechecker, +which makes use of an automated theorem prover called `Z3 +`_ +behind the scenes. As such, if you're willing to trust the +implementations of F* and Z3, you can be confident that ``memcpy`` +does exactly what its specification states, i.e., that the signature +of ``memcpy`` is a *mathematical theorem* about all its executions. + +Compiling ``memcpy`` for execution +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +F* provides several ways to compile a program to other languaegs for +execution, including support to compile programs to OCaml, F#, C and Wasm. + +In this case, using a tool called `KaRaMeL +`_, a compiler used with F*, we +get the following C code for ``memcpy``. + +.. literalinclude:: MemCpy.c + :language: c + +Notice that the code we get contains no additional runtime checks: the +detailed requires and ensures clauses are all gone and what's left is +just a plain C code. Later we'll see how to actually write loops, so +that you're not left with recursive functions in C. The point is that +all the proof and specification effort is done before the program is +compiled, imposing no runtime overhead at all. + + +.. + + It is closely related to several other languages, including other + dependently typed languages like `Coq `_, `Agda + `_, `Idris `_, and `Lean `_, and other SMT-based program + verification engines, like `Dafny `_. + +What makes F* unique is a combination of several elements. + +* Unlike most other dependently typed languages, F* is Turing complete + and has a notion of user-defined effects. It encourages higher-order + functional programming, including general recursion as well as a + user-extensible system of computational effects powerful enough to + express mutable state, exceptions, continuations, algebraic effects + etc. + +* F* is proof assistant, in which to state and prove properties of + programs. Dependently typed proof assistants like + +* A program verification engine, leveraging SMT solvers to partially + automate proofs of programs. + +* A framework within which to embed programming languages, developing + their semantics in a manner suitable for formal proof and enabling + their compilation to a variety of backends, including OCaml, F\#, C, + assembly, Wasm, etc. + +* A metaprogramming system, supporting the programmatic construction + of programs, interactive proofs, and proof automation procedures. + +Many other programming languages are + + +Why not F*? +........... + + + + +To achieve these goals, the design of F* revolves around a few key +elements. + +* A core language of total functions with full dependent types, + including an extensional form of type conversion, indexed inductive + types, and pattern matching, recursive functions with semantic + termination checking, dependent refinement types and subtyping, and + polymorphism over a predicative hierarchy of universes. + +* A system of user-defined indexed effects, for modeling, + encapsulating, and statically reasoning about various forms of + computational effects, including a primitive notion of general + recursion and divergence, as well as an open system of user-defined + effects, with examples including state, exceptions, concurrency, + algebraic effects, and several others. + +* A built-in encoding of a classical fragment of F*'s logic into the + first order logic of an SMT solver, allowing many proofs to be + automatically discharged. + +* A reflection within F* of the syntax and proof state of F*, enabling + Meta-F* programs to manipulate F* syntax and proof goals. + + +Many other programming languages are closely related to F* and address +some of these goals, including other dependently typed languages like +Coq, Agda, Idris and Lean. In comparison with these languages, the +distinctive features of F* include its extensional type conversion and +SMT-based proof automation (both of which make typechecking more +flexible but also undecidable); the use of refinement types (enabling +a concise form of lightweight specification); and its user-defined +effect system. + +This tutorial provides a first taste of +verified programming in F\*. More information about F\*, including +papers and technical reports, can be found on the `F\* website +`_. + +It will help if you are already familiar with functional programming +languages in the ML family (e.g., [OCaml], [F#], [Standard ML]), or +with [Haskell]---we provide a quick review of some basic concepts if +you're a little rusty, but if you feel you need more background, there +are many useful resources freely available on the web, e.g., [Learn +F#], [F# for fun and profit], [Introduction to Caml], the [Real World +OCaml] book, or the [OCaml MOOC]. + +[OCaml]: https://ocaml.org +[F#]: http://fsharp.org/ +[Standard ML]: http://sml-family.org/ +[Haskell]: https://www.haskell.org + +[Learn F#]: https://fsharp.org/learn.html +[F# for fun and profit]: http://fsharpforfunandprofit.com/ +[Introduction to Caml]: https://pl.cs.jhu.edu/pl/lectures/caml-intro.html +[Real World OCaml]: https://realworldocaml.org/ +[OCaml MOOC]: https://www.fun-mooc.fr/courses/course-v1:parisdiderot+56002+session03/about + +~KK +Without any experience in ML or Ocaml my experience has been that the later +exercises are very hard to solve, as some of the notation was not obvious to me. Even +knowing the correct solution, I had to either infer syntax from the exercise +code (which is fine) or go by trial and error (which was frustrating at times). +I will leave comments on the exercises detailing what kind of notation I was (or +am still) missing. Is there a resource (maybe in the wiki) that we can point +readers to, that includes examples for most of the concepts? Something like the +[F# reference] would be really helpful. Also, it might help to specify the +audience of this tutorial a bit more. As a programmer with slightly faded memory +of how inductive proofs work, the lemmas are not very straight forward. As +someone who has never seen ML or has never done any functional programming, the +syntax and some of the patterns are hard to grasp, I feel. +~ + +[F# reference]: https://docs.microsoft.com/en-us/dotnet/articles/fsharp/language-reference/ + +The easiest way to try F\* and solve the verification exercises in this tutorial is +directly in your browser by using the [online F\* editor]. You can +load the boilerplate code needed for each exercise into the online +editor by clicking the "Load in editor" link in the body of each +exercise. Your progress on each exercise will be stored in browser +local storage, so you can return to your code later (e.g. if your +browser crashes, etc). + +[online F\* editor]: https://fstar-lang.org/run.php + +You can also do this tutorial by installing F\* locally on your +machine. F\* is open source and cross-platform, and you can get +[binary packages] for Windows, Linux, and MacOS X or compile F\* from +the [source code on github] using these [instructions]. + +[binary packages]: https://github.com/FStarLang/FStar/releases +[source code on github]: http://github.com/FStarLang/FStar +[instructions]: https://github.com/FStarLang/FStar/blob/master/INSTALL.md + +You can edit F\* code using your favorite text editor, but for Emacs +the community maintains [fstar-mode.el], a sophisticated extension that adds special +support for F\*, including syntax highlighting, completion, type +hints, navigation, documentation queries, and interactive development +(in the style of CoqIDE or ProofGeneral). +You can find more details about [editor support] on the [F\* wiki]. + +The code for the exercises in this tutorial and their solutions are in the [F\* +repository] on Github. For some exercises, you have to include +additional libraries, as done by the provided Makefiles. +To include libraries for the Emacs interactive mode follow the +[instructions here](https://github.com/FStarLang/fstar-mode.el#including-non-standard-libraries-when-using-fstar-mode). + +~KK +The code available on the tutorial page and on github differs +quite a bit (as does the F\* version I guess). In my case, this lead to some +unexpected errors when copying code from the online-editor to Emacs. It would be +nice to have a pointer to the actual file and maybe the proper parameters to +verify it, in case someone prefers emacs over the online editor. +~ + +[fstar-mode.el]: https://github.com/FStarLang/fstar-mode.el +[Atom]: https://github.com/FStarLang/fstar-interactive +[Vim]: https://github.com/FStarLang/VimFStar +[editor support]: https://github.com/FStarLang/FStar/wiki/Editor-support-for-F* +[F\* wiki]: https://github.com/FStarLang/FStar/wiki +[F\* repository]: https://github.com/FStarLang/FStar/tree/master/doc/tutorial/code/ + +By default F\* only verifies the input code, **it does not execute it**. +To execute F\* code one needs to extract it to OCaml or F# and then +compile it using the OCaml or F# compiler. More details on +[executing F\* code] on the [F\* wiki]. + +[executing F\* code]: https://github.com/FStarLang/FStar/wiki/Executing-F*-code + + + + + + + + +F* is a programming language and proof assistant. + +Part 1: F* Manual + + + +1. F* Quick Start: Online tutorial (chapters 1--6, ported here) + + + +2. The Design of F* + + A Verification-oriented Programming Language and Proof Assistant + (general context from mumon paper) + + * Types + * Dependent refinement types + * Intensional vs extensional, undecidability etc. + * Equality: Definitional and provable equality + * Subtyping + * Proof irrelevance + + * Effects + * Indexed Monads and Effects + * Subsumption and sub-effecting + * Effect abstraction, reification and reflecition + + * Modules and Interfaces + * + + * A Mental Model of the F* Typechecker + * Type inference based on higher order unification + * Normalization and proofs by reflection + * SMT Solving + + * Extraction + * Computational irrelevance and erasure (Ghost) + * Normalization for extraction + * inlining, pure subterms, postprocess(fwd to meta) + +.. _meta-fstar: + + * Scripting F* with Metaprogramming + * Proof states + * Reflecting on syntax + * Quotation + * Scripting extraction + * Hooks + +4. Core libraries + +.. _corelib_Prims: + +Part 2: F* in Action + +.. _TypeConversion: + +1. Foundations of Programming Languages + + a. Simply Typed Lambda Calculus: Syntatic Type Safety + + b. Normalization for STLC + - Hereditary Substitutions: A Syntactic Normalization Technique + - Logical Relations + + c. Semantics of an Imperative Language + + d. Floyd-Hoare Logic + - Hoare triples + - Weakest Preconditions + +2. User-defined Effects + + a. A language with an ML-style heap + + b. Monotonic State + + c. Exceptions + + d. Algebraic Effects + + e. Concurrency + + +.. _LowStar: + +3. Low*: An Embedded DSL and Hoare Logic for Programming + + + - Building on 4a and 4b. + + - Integrating / referencing the Karamel tutorial + +4. A Verified Embedding of a Verified Assembly Language + +5. Modeling and Proving Cryptography Security + + - UF-CMA MAC + + - IND-CPA Encryption + + - Authenticated Encryption + +6. Verified Cryptographic Implementations + +.. _Steel: + +7. Steel: An Extensible Concurrent Separation Logic + +8. EverParse? + * Miniparse + +9. An End-to-end Verified Low-level App + +Part 3: Other resources + + * Recorded coding session + * Talks, presentations + * Exemplary code diff --git a/doc/book/PoP-in-FStar/book/part1/part1.rst b/doc/book/PoP-in-FStar/book/part1/part1.rst new file mode 100644 index 00000000000..64d7340f4e8 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1.rst @@ -0,0 +1,54 @@ +.. _Part1: + +############################################ +Programming and Proving with Total Functions +############################################ + + +The core design philosophy of F* is that the type of a term (a program +fragment) is a specification of its runtime behavior. We write ``e : +t`` to mean that a term ``e`` has type ``t``. Many terms can have the +same type and the same term can have many types. + +One (naive but useful) mental model is to think of a type as +describing a set of values. For instance, the type ``int`` describes +the set of terms which compute integer results, i.e., when you have +``e : int``, then when ``e`` is reduced fully it produces a value in +the set ``{..., -2, -1, 0, 1, 2, ...}``. Similarly, the type ``bool`` +is the type of terms that compute or evaluate to one of the values in +the set ``{true,false}``. Unlike many other languages, F* allows +defining types that describe arbitrary sets of values, e.g., the type +that contains only the number ``17``, or the type of functions that +factor a number into its primes. + +When proving a program ``e`` correct, one starts by specifying the +properties one is interested in as a type ``t`` and then trying to +convince F* that ``e`` has type ``t``, i.e., deriving ``e : t``. + +The idea of using a type to specify properties of a program has deep +roots in the connections between logic and computation. You may find +it interesting to read about `propositions as types +`_, +a concept with many deep mathematical and philosophical +implications. For now, it suffices to think of a type ``t`` as a +specification, or a statement of a theorem, and ``e : t`` as +computer-checkable claim that the term ``e`` is a proof of the theorem +``t``. + +In the next few chapters we'll learn about how to program total +functions and prove them correct. + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + part1_getting_off_the_ground + part1_polymorphism + part1_equality + part1_prop_assertions + part1_inductives + part1_termination + part1_lemmas + part1_quicksort + part1_execution + part1_wrap diff --git a/doc/book/PoP-in-FStar/book/part1/part1_equality.rst b/doc/book/PoP-in-FStar/book/part1/part1_equality.rst new file mode 100644 index 00000000000..dd131823a5d --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_equality.rst @@ -0,0 +1,77 @@ +.. _Part1_equality: + +Equality +======== + +Equality is a subtle issue that pervades the design of all dependent +type theories, and F* is no exception. In this first chapter, we +briefly touch upon two different kinds of equality in F*, providing +some basic information sufficient for the simplest usages. In a +:ref:`subsequent chapter `, we'll cover equality in +much greater depth. + +Decidable equality and ``eqtype`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +We've implicitly used the equality operator ``=`` already (e.g., when +defining ``factorial``). This is the *boolean* equality +operator. Given two terms ``e₁ : t`` and ``e₂ : t``, so long as ``t`` +supports a notion of decidable equality, ``(e₁ = e₂) : bool``. + +To see why not all types support decidably equality, consider ``t`` to +be a function type, like ``int -> int``. To decide if two functions +``f₁, f₂ : int -> int`` are equal, we'd have to apply them to all the +infinitely many integers and compare their results—clearly, this is +not decidable. + +The type ``eqtype`` is the type of types that support decidably +equality. That is, given ``e₁ : t`` and ``e₂ : t``, it is only +permissible to compare ``e₁ = e₂`` if ``t : eqtype``. + +For any type definition, F* automatically computes whether or not that +type is an ``eqtype``. We'll explain later exactly how F* decides +whether or not a type is an ``eqtype``. Roughly, for F* has built-in +knowledge that various primitive types like integers and booleans +support decidable equality. When defining a new type, F* checks +that all values of the new type are composed structurally of terms +that support decidable equality. In particular, if an ``e : t`` may +contain a sub-term that is a function, then ``t`` cannot be an +``eqtype``. + +As such, the type of the decidable equality operator is + +.. code-block:: fstar + + val ( = ) (#a:eqtype) (x:a) (y:a) : bool + +That is, ``x = y`` is well-typed only when ``x : a`` and ``y : a`` and +``a : eqtype``. + +.. note:: + + We see here a bit of F* syntax for defining infix operators. Rather + than only using the ``val`` or ``let`` notation with alphanumeric + identifiers, the notation ``( = )`` introduces an infix operator + defined with non-alphanumeric symbols. You can read more about this + `here + `_. + + + +Propositional equality +^^^^^^^^^^^^^^^^^^^^^^ + +F* offers another notion of equality, propositional equality, written +``==``. For *any type* ``t``, given terms ``e₁, e₂ : t``, the +proposition ``e₁ == e₂`` asserts the (possibly undecidable) equality +of ``e₁`` and ``e₂``. The type of the propositional equality operator +is shown below: + +.. code-block:: fstar + + val ( == ) (#a:Type) (x:a) (y:a) : prop + +Unlike decidable equality ``(=)``, propositional equality is defined +for all types. The result type of ``(==)`` is ``prop``, the type of +propositions. We'll learn more about that in the :ref:`next chapter +`. diff --git a/doc/book/PoP-in-FStar/book/part1/part1_execution.rst b/doc/book/PoP-in-FStar/book/part1/part1_execution.rst new file mode 100644 index 00000000000..d5997682089 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_execution.rst @@ -0,0 +1,232 @@ +.. _Part1_execution: + +Executing programs +================== + +We've been through several chapters already, having learned many core +concepts of F*, but we have yet to see how to compile and execute a +program, since our focus so far has been on F*'s logic and how to +prove properties about programs. + +F* offers several choices for executing a program, which we cover +briefly here, using `Quicksort <../code/Part1.Quicksort.Generic.fst>`_ +as a running example. + + +Interpreting F* programs +^^^^^^^^^^^^^^^^^^^^^^^^ + +As mentioned in the :ref:`capsule summary +`, F* includes an engine (called the +*normalizer*) that can symbolically reduce F* computations. We'll see +many more uses of F*'s normalizer as we go, but one basic usage of it +is to use it to interpret programs. + +Invoking the interpreter is easy using :ref:`fstar-mode.el +` in emacs. In emacs, go to the F* menu, then to +*Interactive queries*, then choose *Evaluate an expression* (or type +C-c C-s C-e): this prompts you to enter an expression that you want to +evaluate: type ``sort ( <= ) [4;3;2;1]`` and then press "Enter". You +should see the following output: ``sort ( <= ) [4;3;2;1]`` +:math:`\downarrow\beta\delta\iota\zeta` ``[1; 2; 3; 4] <: Prims.Tot +(list int)``, saying that the input term reduced to ``[1; 2; 3; 4]`` +of type ``Tot (list int)``. + +The :math:`\downarrow\beta\delta\iota\zeta` may seem a bit arcane, but +it describes the reduction strategy that F* used to interpret the term: + + * :math:`\beta` means that functions were applied + * :math:`\delta` means that definitions were unfolded + * :math:`\iota` means that patterns were matched + * :math:`\zeta` means that recursive functions were unrolled + +We'll revisit what these reduction steps mean in a later chapter, +including how to customize them for your needs. + +Compiling to OCaml +^^^^^^^^^^^^^^^^^^ + +The main way to execute F* programs is by compiling, or *extracting*, +them to OCaml and then using OCaml's build system and runtime to +produce an executable and run it. + +.. note:: + + The method that we show here works for simple projects with just a few + files. For larger projects, F* offers a dependence analysis that can + produce dependences for use in a Makefile. F* also offers separate + compilation which allows a project to be checked one file at a time, + and for the results to be cached and reused. For documentation and + examples of how to use these features and structure the build for + larger projects see these resources: + + * `Dealing with dependences `_ + * `Caching verified modules `_ + * `A multifile project `_ + + +Producing an OCaml library +.......................... + +To extract OCaml code from a F* program use the command-line options, +as shown below: + +.. code-block:: + + fstar --codegen OCaml --extract Part1.Quicksort --odir out Part1.Quicksort.Generic.fst + +* The ``--codegen`` option tells F* to produce OCaml code + +* The ``--extract`` option tells F* to only extract all modules in the given namespace, i.e., in this case, all modules in ``Part1.Quicksort`` + +* The ``--odir`` option tells F* to put all the generated files into the specified directory; in this case ``out`` + +* The last argument is the source file to be checked and extracted + +The resulting OCaml code is in the file +``Part1_Quicksort_Generic.ml``, where the F* ``dot``-separated name is +transformed to OCaml's naming convention for modules. The generated code is `here <../code/out/Part1_Quicksort_Generic.ml>`_ + +Some points to note about the extracted code: + +* F* extracts only those definitions that correspond to executable + code. Lemmas and other proof-only aspects are erased. We'll learn + more about erasure in a later chapter. + +* The F* types are translated to OCaml types. Since F* types are more + precise than OCaml types, this translation process necessarily + involves a loss in precision. For example, the type of total orders + in ``Part1.Quicksort.Generic.fst`` is: + + .. code-block:: fstar + + let total_order_t (a:Type) = f:(a -> a -> bool) { total_order f } + + Whereas in OCaml it becomes + + .. code-block:: + + type 'a total_order_t = 'a -> 'a -> Prims.bool + + + This means that you need to be careful when calling your extracted + F* program from unverified OCaml code, since the OCaml compiler will + not complain if you pass in some function that is not a total order + where the F* code expects a total order. + +Compiling an OCaml library +.......................... + +Our extracted code provides several top-level functions (e.g., +``sort``) but not ``main``. So, we can only compile it as a library. + +For simple uses, one can compile the generated code into a OCaml +native code library (a cmxa file) with ``ocamlbuild``, as shown below + +.. code-block:: + + OCAMLPATH=$FSTAR_HOME/lib ocamlbuild -use-ocamlfind -pkg batteries -pkg fstar.lib Part1_Quicksort_Generic.cmxa + +Some points to note: + + * You need to specify the variable ``OCAMLPATH`` which OCaml uses to + find required libraries. For F* projects, the ``OCAMLPATH`` should + include the ``bin`` directory of the FStar release bundle. + + * The ``-use-ocamlfind`` option enables a utility to find OCaml libraries. + + * Extracted F* programs rely on two libraries: ``batteries`` and + ``fstar.lib``, which is what the ``-pkg`` options say. + + * Finally, ``Part1_Quicksort_Generic.cmxa`` references the name of + the corresponding ``.ml`` file, but with the ``.cmxa`` extension + to indicate that we want to compile it as a library. + +Your can use the resulting .cmxa file in your other OCaml projects. + + +Adding a 'main' +............... + +We have focused so far on programming and proving *total* +functions. Total functions have no side effects, e.g., they cannot +read or write state, they cannot print output etc. This makes total +functions suitable for use in libraries, but to write a top-level +driver program that can print some output (i.e., a ``main``), we need +to write functions that actually have some effects. + +We'll learn a lot more about F*'s support for effectful program in a +later section, but for now we'll just provide a glimpse of it by +showing (below) a ``main`` program that calls into our Quicksort +library. + +.. literalinclude:: ../code/Part1.Quicksort.Main.fst + :language: fstar + +There are few things to note here: + + * This time, rather than calling ``Q.sort`` from unverified OCaml + code, we are calling it from F*, which requires us to prove all + its preconditions, e.g., that the comparison function ``( <= )`` + that we are passing in really is a total order---F* does that + automatically. + + * ``FStar.IO.print_string`` is a library function that prints a + string to ``stdout``. Its type is ``string -> ML unit``, a type + that we'll look at in detail when we learn more about effects. For + now, keep in mind that functions with the ``ML`` label in their + type may have observable side effects, like IO, raising + exceptions, etc. + + * The end of the file contains ``let _ = main ()``, a top-level term + that has a side-effect (printing to ``stdout``) when executed. In + a scenario where we have multiple modules, the runtime behavior of + a program with such top-level side-effects depends on the order in + which modules are loaded. When F* detects this, it raises the + warning ``272``. In this case, we intend for this program to have + a top-level effect, so we suppress the warning using the + ``--warn_error -272`` option. + +To compile this code to OCaml, along with its dependence on +``Part1.Quicksort.Generic``, one can invoke: + +.. code-block:: + + fstar --codegen OCaml --extract Part1.Quicksort --odir out Part1.Quicksort.Main.fst + +This time, F* extracts both ``Part1.Quicksort.Generic.fst`` (as +before) and ``Part1.Quicksort.Main.fst`` to OCaml, producing +`Part1_Quicksort_Main.ml <../code/out/Part1_Quicksort_Main.ml>`_ to +OCaml. + +You can compile this code in OCaml to a native executable by doing: + +.. code-block:: + + OCAMLPATH=$FSTAR_HOME/lib ocamlbuild -use-ocamlfind -pkg batteries -pkg fstar.lib Part1_Quicksort_Main.native + +And, finally, you can execute Part1_Quicksort_Main.native to see the +following output: + +.. code-block:: + + $ ./Part1_Quicksort_Main.native + Original list = [42; 17; 256; 94] + Sorted list = [17; 42; 94; 256] + +Compiling to other languages +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +F* also supports compiling programs to F# and, for a subset of the +language, supports compilation to C. + +For the F# extraction, use the ``--codegen FSharp`` option. However, +it is more typical to structure an F* project for use with F# using +Visual Studio project and solution files. Here are some examples: + + * `A simple example `_ + + * `A more advanced example mixing F* and F# code `_ + +For extraction to C, please see the `tutorial on Low* `_. + diff --git a/doc/book/PoP-in-FStar/book/part1/part1_getting_off_the_ground.rst b/doc/book/PoP-in-FStar/book/part1/part1_getting_off_the_ground.rst new file mode 100644 index 00000000000..8be615b8121 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_getting_off_the_ground.rst @@ -0,0 +1,727 @@ +.. _Part1_ch1: + +Getting off the ground +====================== + +To start writing some F* programs, we'll need to learn some basics +about the syntax of the language and some core concepts of types and +functions. + +.. _Part1_editors: + +Text Editors +^^^^^^^^^^^^ + +F* can be used as a command line tool with any text editor. If you're +viewing this in the interactive online tutorial, you can use the +`Ace-based `_ text editor alongside, which +provides some basic conveniences like syntax highlighting. However, +beyond casual use, most users of F* rely on one of the following IDE +plugins. + + * `fstar-mode.el `_, + which provides several utilities for interactively editing and + checking F* files in emacs. + + * `fstar-vscode-assistant + `_, which + also provides interactive editing and checking support in VS Code. + +The main benefit to using these IDE plugins is that they allow you to +incrementally check just the changing suffix of an F* file, rather +than rechecking the entire file in batch mode. They also provide +standard things like jumping to definitions, type of a symbol etc. + +Both these plugins rely on a generic but custom interaction protocol +implemented by the F* compiler. It should be possible to implement IDE +support similar to fstar-mode.el or fstar-vscode-assistant in your +favorite plugin-capable editor. + + +Basic syntactic structure +^^^^^^^^^^^^^^^^^^^^^^^^^ + +An F* program is a collection of modules, with each +module represented by a single file with the filename extension +``.fst``. Later, we'll see that a module's interface is in a separate +``.fsti`` file and allows hiding details of a module's implementation +from a client module. + +A module begins with the module's name (which must match the name of +its file, i.e., ``module A`` is in ``A.fst``) and contains a sequence +of top-level signatures and definitions. Module names always begin +with a capital letter. + +* Signatures ascribe a type to a definition, e.g., ``val f : t``. + +Definitions come in several flavors: the two main forms we'll focus on +when programming with total functions are + +* possibly recursive definitions (let bindings, ``let [rec] f = e``) +* and, inductive type definitions (datatypes, ``type t = | D1 : t1 | ... | Dn : tn``) + +In later sections, we'll see two other kinds of definition: +user-defined indexed effects and sub-effects. + +Comments +^^^^^^^^ + +Block comments are delimited by ``(*`` and ``*)``. Line comments begin +with ``//``. + +.. code-block:: fstar + + (* this is a + block comment *) + + + //This is a line comment + + +Primitives +^^^^^^^^^^ + +Every F* program is checked in the context of some ambient primitive +definitions taken from the core F* module ``Prims``. + +False +..... + +The type ``False`` has no elements. Since there are no terms that +satisfy ``e : False``, the type ``False`` is the type of unprovable +propositions. + +Unit +.... + +The type ``unit`` has a single element denoted ``()``, i.e., ``() : +unit``. + +Booleans +........ + +The type ``bool`` has two elements, ``true`` and ``false``. Note, the +lowercase ``false`` is a boolean constant, distinct from the uppercase +``False`` type. + +The following primitive boolean operators are available, in decreasing +order of precedence. + +* ``not``: Boolean negation (unary, prefix) +* ``&&``: Boolean conjunction (binary, infix) +* ``||``: Boolean disjunction (binary, infix) + +Conditionals +############ + +You can, of course, branch on a boolean with ``if/then/else`` + +.. code-block:: fstar + + if b then 1 else 0 + + if b1 && b2 || b3 + then 17 + else 42 + + +Integers +........ + +The type ``int`` represents unbounded, primitive mathematical +integers. Its elements are formed from the literals ``0, 1, 2, ...``, +and the following primitive operators, in decreasing order of +precedence. + +* ``-``: Unary negation (prefix) +* ``-``: Subtraction (infix) +* ``+``: Addition (infix) +* ``/``: Euclidean division (infix) +* ``%``: Euclidean modulus (infix) +* ``op_Multiply``: Unfortunately, the traditional multiplication + symbol ``*`` is reserved by default for the tuple type + constructor. Use the module ``FStar.Mul`` to treat ``*`` as integer + multiplication. +* ``<`` : Less than (infix) +* ``<=``: Less than or equal (infix) +* ``>`` : Greater than (infix) +* ``>=``: Greater than or equal (infix) + +.. note:: + + F* follows the OCaml style of no negative integer literals, + instead negate a positive integer like ``(- 1)``. + +.. _Part1_ch1_boolean_refinements: + +Boolean refinement types +^^^^^^^^^^^^^^^^^^^^^^^^ + +The F* core library, ``Prims``, defines the type of +natural numbers as follows + +.. code-block:: fstar + + let nat = x:int{x >= 0} + +This is an instance of a boolean refinement type, whose general form +is ``x:t { e }`` where ``t`` is a type, and ``e`` is a ``bool``-typed term +that may refer to the ``t``-typed bound variable ``x``. The term ``e`` +*refines* the type ``t``, in the sense that the set ``S`` denoted by ``t`` +is restricted to those elements ``x`` :math:`\in` ``S`` for which ``e`` evaluates to +``true``. + +That is, the type ``nat`` describes the set of terms that evaluate to an +element of the set ``{0, 1, 2, 3, ...}``. + +But, there's nothing particularly special about ``nat``. You can define +arbitrary refinements of your choosing, e.g., + +.. code-block:: fstar + + let empty = x:int { false } //the empty set + let zero = x:int{ x = 0 } //the type containing one element `0` + let pos = x:int { x > 0 } //the positive numbers + let neg = x:int { x < 0 } //the negative numbers + let even = x:int { x % 2 = 0 } //the even numbers + let odd = x:int { x % 2 = 1 } //the odd numbers + +If you're coming from a language like C or Java where a type primarily +describes some properties about the representation of data in memory, +this view of types as describing arbitrary sets of values may feel a +bit alien. But, let it sink in a bit---types that carve out precise +sets of values will let you state and check invariants about your +programs that may otherwise have only been implicit in your code. + +.. note:: + + Refinement types in F* trace their lineage to `F7 + `_, + a language developed at Microsoft Research c. 2007 -- 2011. `Liquid + Haskell `_ is + another language with refinement types. Those languages provide + additional background and resources for learning about refinement + types. + + Boolean refinements are a special case of a more powerful form of + propositional refinement type in F*. Refinement types, in + conjunction with dependent function types, are, in principle, + sufficient to encode many kinds of logics for program + correctness. However, refinement types are just one among several + tools in F* for program specification and proof. + + +Refinement subtyping +.................... + +We have seen so far how to define a new refinement type, like ``nat`` or +``even``. However, to make use of refinement types we need rules that +allow us to: + +1. check that a program term has a given refinement type, e.g., to + check that ``0`` has type ``nat``. This is sometimes called + *introducing* a refinement type. + +2. make use of a term that has a refinement type, e.g., given ``x : + even`` we would like to be able to write ``x + 1``, treating ``x`` as an + ``int`` to add ``1`` to it. This is sometimes called *eliminating* + a refinement type. + +The technical mechanism in F* that supports both these features is +called *refinement subtyping*. + +If you're used to a language like Java, C# or some other +object-oriented language, you're familiar with the idea of +subtyping. A type ``t`` is a subtype of ``s`` whenever a program term +of type ``t`` can be safely treated as an ``s``. For example, in Java, +all object types are subtypes of the type ``Object``, the base class +of all objects. + +For boolean refinement types, the subtyping rules are as follows: + +* The type ``x:t { p }`` is a subtype of ``t``. That is, given ``e : + (x:t{p})``, it is always safe to *eliminate* the refinement and + consider ``e`` to also have type ``t``. + +* For a term ``e`` of type ``t`` (i.e., ``e : t``), ``t`` is a subtype + of the boolean refinement type ``x:t { p }`` whenever ``p[e / x]`` + (``p[e/x]`` is notation for the term ``p`` with the variable ``x`` + replaced by ``e``), is provably equal to ``true``. In other words, + to *introduce* ``e : t`` at the boolean refinement type ``x:t{ p + }``, it suffices to prove that the term ``p`` with ``e`` substituted + for bound variable ``x``, evaluates to ``true``. + +The elimination rule for refinement types (i.e., the first part above) +is simple---with our intuition of types as sets, the refinement type +``x:t{ p }`` *refines* the set corresponding to ``t`` by the predicate +``p``, i.e., the ``x:t{ p }`` denotes a subset of ``t``, so, of course +``x:t{ p }`` is a subtype of ``t``. + +The other direction is a bit more subtle: ``x:t{ p }`` is only a +subtype of ``p``, for those terms ``e`` that validate ``p``. You're +probably also wondering about how to prove that ``p[e/x]`` evaluates +to ``true``---we will look at this in detail later. But, the short +version is that F*, by default, uses an SMT solver to prove such fact, +though you can also use tactics and other techniques to do so. + +An example +.......... + +Given ``x:even``, consider proving ``x + 1 : odd``; it takes a few +steps: + +1. The operator ``+`` is defined in F*'s library. It expects both its + arguments to have type ``int`` and returns an ``int``. + +2. To prove that the first argument ``x:even`` is a valid argument for + ``+``, we use refinement subtyping to eliminate the refinement and + obtain ``x:int``. The second argument ``1:int`` already has the + required type. Thus, ``x + 1 : int``. + +3. To conclude that ``x + 1 : odd``, we need to introduce a refinement + type, by proving that the refinement predicate of ``odd`` evaluates + to true, i.e., ``x + 1 % 2 = 1``. This is provable by SMT, since we + started with the knowledge that ``x`` is even. + +As such, F* applies subtyping repeatedly to introduce and eliminate +refinement types, applying it multiple times even to check a simple +term like ``x + 1 : odd``. + + +Functions +^^^^^^^^^ + +We need a way to define functions to start writing interesting +programs. In the core of F*, functions behave like functions in +maths. In other words, they are defined on their entire domain (i.e., +they are total functions and always return a result) and their only +observable behavior is the result they return (i.e., they don't have +any side effect, like looping forever, or printing a message etc.). + +Functions are first-class values in F*, e.g., they can be passed as +arguments to other functions and returned as results. While F* +provides several ways to define functions, the most basic form is the +:math:`\lambda` term, also called a function literal, an anonymous function, or a +simply a *lambda*. The syntax is largely inherited from OCaml, and +this `OCaml tutorial +`_ +provides more details for those unfamiliar with the language. We'll +assume a basic familiarity with OCaml-like syntax. + +Lambda terms +............ + +The term ``fun (x:int) -> x + 1`` defines a function, +a lambda term, which adds 1 to its integer-typed parameter ``x``. You +can also let F* infer the type of the parameter and write ``fun x -> +x + 1`` instead. + +.. _Part1_ch1_named_function: + +Named functions +............... + +Any term in F\* can be given a name using a ``let`` binding. We'll +want this to define a function once and to call it many times. For +example, all of the following are synonyms and bind the lambda term +``fun x -> x + 1`` to the name ``incr`` + +.. code-block:: fstar + + let incr = fun (x:int) -> x + 1 + let incr (x:int) = x + 1 + let incr x = x + 1 + +Functions can take several arguments and the result type of a function +can also be annotated, if desired + +.. code-block:: fstar + + let incr (x:int) : int = x + 1 + let more_than_twice (x:int) (y:int) : bool = x > y + y + +It's considered good practice to annotate all the parameters and +result type of a named function definition. + +.. note:: + + In addition to decorating the types of parameters and the results + of function, F* allows annotating any term ``e`` with its expected + type ``t`` by writing ``e <: t``. This is called a *type + ascription*. An ascription instructs F* to check that the + term ``e`` has the type ``t``. For example, we could have written + +.. code-block:: fstar + + let incr = fun (x:int) -> (x + 1 <: int) + + +Recursive functions +................... + +Recursive functions in F* are always named. To define them, one uses +the ``let rec`` syntax, as shown below. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: factorial + :end-before: SNIPPET_END: factorial + +This syntax defines a function names ``factorial`` with a single +parameter ``n:nat``, returning a ``nat``. The definition of factorial +is allowed to use the ``factorial`` recursively—as we'll see in a +later chapter, ensuring that the recursion is well-founded (i.e., all +recursive calls terminate) is key to F*'s soundness. However, in this +case, the proof of termination is automatic. + +.. note:: + + Notice the use of `open FStar.Mul` in the example above. This + brings the module `FStar.Mul` into scope and resolves the symbol + ``*`` to integer multiplication. + +F* also supports mutual recursion. We'll see that later. + +.. _Part1_ch1_arrows: + +Arrow types +^^^^^^^^^^^ + +Functions are the main abstraction facility of any functional language +and their types are pervasive in F*. In its most basic form, function +types, or arrows, have the shape:: + + x:t0 -> t1 + +This is the type of a function that + +1. receives an argument ``e`` of type ``t0``, and + +2. always returns a value of type ``t1[e / x]``, i.e., the type of the + returned value depends on the argument ``e``. + +It's worth emphasizing how this differs from function types in other +languages. + +* F*'s arrows are dependent---the type of the result depends on the + argument. For example, we can write a function that returns a + ``bool`` when applied to an even number and returns a ``string`` + when applied to an odd number. Or, more commonly, a function + whose result is one greater than its argument. + +* In F*'s core language, all functions are total, i.e., a function + call always terminates after consuming a finite but unbounded amount + of resources. + +.. note:: + + That said, on any given computer, it is possible for a function + call to fail to return due to resource exhaustion, e.g., running + out of memory. Later, as we look at :ref:`effects `, we + will see that F* also supports writing non-terminating functions. + + +.. _Part1_ch1_arrow_notations: + + +Some examples and common notation +................................. + +1. Functions are *curried*. Functions that take multiple arguments are + written as functions that take the first argument and return a + function that takes the next argument and so on. For instance, the + type of integer addition is:: + + val (+) : x:int -> y:int -> int + +2. Not all functions are dependent and the name of the argument can be + omitted when it is not needed. For example, here's a more concise + way to write the type of ``(+)``:: + + val (+) : int -> int -> int + +3. Function types can be mixed with refinement types. For instance, + here's the type of integer division---the refinement on the divisor + forbids division-by-zero errors:: + + val (/) : int -> (divisor:int { divisor <> 0 }) -> int + +4. Dependence between the arguments and the result type can be used to + state relationships among them. For instance, there are several + types for the function ``let incr = (fun (x:int) -> x + 1)``:: + + val incr : int -> int + val incr : x:int -> y:int{y > x} + val incr : x:int -> y:int{y = x + 1} + + The first type ``int -> int`` is its traditional type in languages + like OCaml. + + The second type ``x:int -> y:int{y > x}`` states that the returned + value ``y`` is greater than the argument ``x``. + + The third type is the most precise: ``x:int -> y:int{y = x + 1}`` + states that the result ``y`` is exactly the increment of the + argument ``x``. + +5. It's often convenient to add refinements on arguments in a + dependent function type. For instance:: + + val f : x:(x:int{ x >= 1 }) -> y:(y:int{ y > x }) -> z:int{ z > x + y } + + Since this style is so common, and it is inconvenient to have to + bind two names for the parameters ``x`` and ``y``, F* allows (and + encourages) you to write:: + + val f : x:int{ x >= 1 } -> y:int{ y > x } -> z:int{ z > x + y } + +6. To emphasize that functions in F*'s core are total functions (i.e., + they always return a result), we sometimes annotate the result type + with the effect label "``Tot``". This label is optional, but + especially as we learn about :ref:`effects `, emphasizing + that some functions have no effects via the ``Tot`` label is + useful. For example, one might sometimes write:: + + val f : x:int{ x >= 1 } -> y:int{ y > x } -> Tot (z:int{ z > x + y }) + + adding a ``Tot`` annotation on the last arrow, to indicate that the + function has no side effects. One could also write:: + + val f : x:int{ x >= 1 } -> Tot (y:int{ y > x } -> Tot (z:int{ z > x + y })) + + adding an annotation on the intermediate arrow, though this is not + customary. + +Exercises +^^^^^^^^^ + +This first example is just to show you how to run the tool and +interpret its output. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: sample + :end-before: SNIPPET_END: sample + +Notice that the program begins with a ``module`` declaration. It +contains a single definition named ``incr``. Definitions that appear +at the scope of a module are called "top-level" definitions. + +You have several options to try out these examples. + +**F\* online** + +To get started and for trying small exercises, the easiest way is via +the `online tutorial `_. If that's +where you're reading this, you can just use the in-browser editor +alongside which communicates with an F* instance running in the +cloud. Just click `on this link +<../code/exercises/Part1.GettingOffTheGround.fst>`_ to load the +code of an exercise in the editor. + +That said, the online mode can be a bit slow, depending on the load at +the server, and the editor is very minimalistic. + +For anything more than small exercises, you should have a working +local installation of the F* toolchain, as described next. + +**F\* in batch mode** + +You can download pre-built F* binaries `from here +`_. + +Once you have a local installation, to check a program you can run the +``fstar`` at the command line, like so:: + + $ fstar Sample.fst + +In response ``fstar`` should output:: + + Verified module: Sample + All verification conditions discharged successfully + +This means that F* attempted to verify the module named ``Sample``. In +doing so, it generated some "verification conditions", or proof +obligations, necessary to prove that the module is type correct, and +it discharged, or proved, all of them successfully. + +**F\* in emacs** + +Rather than running ``fstar`` in batch mode from the command line, F* +programmers using the `emacs `_ +editor often use `fstar-mode.el +`_, an editor plugin that +allows interactively checking an F* program. If you plan to use F* in +any serious way, this is strongly recommended. + +Many types for ``incr`` +....................... + +Here are some types for ``incr``, including some types that are valid +and some others that are not. + +This type claims that ``incr`` result is +greater than its argument and F* agrees—remember, the ``int`` type is +unbounded, so there's no danger of the addition overflowing. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: ex1.1 + :end-before: SNIPPET_END: ex1.1 + +This type claims that ``incr`` always returns a natural number, but it +isn't true, since incrementing a negative number doesn't always +produce a non-negative number. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: ex1.2 + :end-before: SNIPPET_END: ex1.2 + +F* produces the following error message:: + + Sample.fst(11,26-11,31): (Error 19) Subtyping check failed; expected type + Prims.nat; got type Prims.int; The SMT solver could not prove the query, try to + spell your proof in more detail or increase fuel/ifuel (see also prims.fst(626, + 18-626,24)) + Verified module: Sample + 1 error was reported (see above) + +**Source location** + +The error message points to ``Sample.fst(11,26-11,31)``, a source +range mentioned the file name, a starting position (line, column), and +an ending position (line, column). In this case, it highlights the +``x + 1`` term. + +**Severity and error code** + +The ``(Error 19)`` mentions a severity (i.e., ``Error``, as opposed +to, say, ``Warning``), and an error code (``19``). + +**Error message** + +The first part of the message stated what you might expect:: + + Subtyping check failed; expected type Prims.nat; got type Prims.int + +The rest of the message provides more details, which we'll ignore for +now, until we've had a chance to explain more about how F* interacts +with the SMT solver. However, one part of the error message is worth +pointing out now:: + + (see also prims.fst(626,18-626,24)) + +Error messages sometimes mention an auxiliary source location in a +"see also" parenthetical. This source location can provide some more +information about why F* rejected a program—in this case, it points to +the constraint ``x>=0`` in the definition of ``nat`` in ``prims.fst``, +i.e., this is the particular constraint that F* was not able to prove. + +So, let's try again. Here's another type for ``incr``, claiming that +if its argument is a natural number then so is its result. This time +F* is happy. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: ex1.3 + :end-before: SNIPPET_END: ex1.3 + +Sometimes, it is convenient to provide a type signature independently +of a definition. Below, the ``val incr4`` provides only the signature +and the subsequent ``let incr4`` provides the definition—F* checks +that the definition is compatible with the signature. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: ex1.4 + :end-before: SNIPPET_END: ex1.4 + +Try writing some more types for ``incr``. +(`Load exercise <../code/exercises/Part1.GettingOffTheGround.fst>`_.) + +.. container:: toggle + + .. container:: header + + **Some answers** + + .. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: incr_types + :end-before: SNIPPET_END: incr_types + + +Computing the maximum of two integers +..................................... + +Provide an implementation of the following signature:: + + val max (x:int) (y:int) : int + +There are many possible implementations that satisfy this signature, +including trivial ones like:: + + let max x y = 0 + +Provide an implementation of ``max`` coupled with a type that is +precise enough to rule out definitions that do not correctly return +the maximum of ``x`` and ``y``. + +.. container:: toggle + + .. container:: header + + **Some answers** + + .. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: max + :end-before: SNIPPET_END: max + + +More types for factorial +........................ + +Recall the definition of ``factorial`` from earlier. + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: factorial + :end-before: SNIPPET_END: factorial + +Can you write down some more types for factorial? + +.. container:: toggle + + .. container:: header + + **Some answers** + + .. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: factorial_answers + :end-before: SNIPPET_END: factorial_answers + +Fibonacci +......... + +Here's a doubly recursive function: + +.. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: fibonacci + :end-before: SNIPPET_END: fibonacci + +What other types can you give to it? + +.. container:: toggle + + .. container:: header + + **Some answers** + + .. literalinclude:: ../code/Part1.GettingOffTheGround.fst + :language: fstar + :start-after: SNIPPET_START: fibonacci_answers + :end-before: SNIPPET_END: fibonacci_answers diff --git a/doc/book/PoP-in-FStar/book/part1/part1_inductives.rst b/doc/book/PoP-in-FStar/book/part1/part1_inductives.rst new file mode 100644 index 00000000000..928df8fbaf4 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_inductives.rst @@ -0,0 +1,377 @@ +.. _Part1_ch3: + +Inductive types and pattern matching +==================================== + +In this chapter, you'll learn how to define new types in F*. These +types are called *inductive types*, or, more informally, +datatypes. We'll also learn how to define functions over these +inductive types by pattern matching and to prove properties about +them. + +.. note:: + + We'll only cover the most basic forms of inductive types here. In + particular, the types we show here will not make use of indexing or + any other form of dependent types---we'll leave that for a later + chapter. + +Enumerations +^^^^^^^^^^^^ + +We've seen that ``unit`` is the type with just one element ``()`` and +that ``bool`` is the type with two elements, ``true`` and ``false``. + +You can define your own types with an enumeration of elements, like so. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: three + :end-before: //SNIPPET_END: three + +This introduces a new type ``three : Type``, and three *distinct* +constants ``One_of_three : three``, ``Two_of_three : three``, +``Three_of_three : three``. These constants are also called +"constructors" or "data constructors". The name of a constructor must +begin with an uppercase letter. + +.. note:: + + In this case, it may seem redundant to have to write the type of + each constructor repeatedly—of course they're all just constructors + of the type ``three``. In this case, F* will allow you to just + write + + .. code-block:: fstar + + type three = + | One_of_three + | Two_of_three + | Three_of_three + + As we start to use indexed types, each constructor can build a + different instance of the defined type, so it will be important to + have a way to specify the result type of each constructor. For + uniformity, throughout this book, we'll always annotate the types + of constructors, even when not strictly necessary. + +F* can prove that they are distinct and that these are the only terms +of type ``three``. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: assert + :end-before: //SNIPPET_END: assert + +To write functions that can analyze these new types, one uses the +``match`` construct. The syntax of ``match`` in F* is very similar to +OCaml or F#. We'll assume that you're familiar with its basics. As we +go, we'll learn about more advanced ways to use ``match``. + +Here are some examples. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: disc_handrolled + :end-before: //SNIPPET_END: disc_handrolled + +Discriminators +.............. + +These functions test whether ``x : three`` matches a given +constructor, returning a ``bool`` in each case. Since it's so common +to write functions that test whether a value of an inductive type +matches one of its constructors, F* automatically generates these +functions for you. For example, instead of writing + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: three_as_int + :end-before: //SNIPPET_END: three_as_int + +One can write: + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: three_as_int' + :end-before: //SNIPPET_END: three_as_int' + +In other words, for every constructor ``T`` of an inductive type +``t``, F* generates a function named ``T?`` (called a "discriminator") +which tests if a ``v:t`` matches ``T``. + +Exhaustiveness +.............. + +Of course, an even more direct way of writing ``three_as_int`` is + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: three_as_int'' + :end-before: //SNIPPET_END: three_as_int'' + +Every time you use a ``match``, F* will make sure to prove that you +are handling all possible cases. Try omitting one of the cases in +``three_as_int`` above and see what happens. + +Exhaustiveness checking in F* is a semantic check and can use the SMT +solver to prove that all cases are handled appropriately. For example, +you can write this: + +.. code-block:: fstar + + let only_two_as_int (x:three { not (Three_of_three? x) }) + : int + = match x with + | One_of_three -> 1 + | Two_of_three -> 2 + +The refinement on the argument allows F* to prove that the +``Three_of_three`` case in the pattern is not required, since that +branch would be unreachable anyway. + +.. _Part1_tuples: + +Tuples +^^^^^^ + +The next step from enumerations is to define composite types, e.g., +types that are made from pairs, triples, quadruples, etc. of other +types. Here's how + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: tup + :end-before: //SNIPPET_END: tup + +The type definition for ``tup2 a b`` states that for any types ``a : +Type`` and ``b : Type``, ``Tup2 : a -> b -> tup2 a b``. That is, +``Tup2`` is a constructor of ``tup2``, such that given ``x:a`` and +``y:b``, ``Tup2 x y : tup2 a b``. + +The other types ``tup3`` and ``tup4`` are similar---the type +annotations on the bound variables can be inferred. + +These are inductive types with just one case---so the discriminators +``Tup2?``, ``Tup3?``, and ``Tup4?`` aren't particularly useful. But, +we need a way to extract, or *project*, the components of a tuple. You +can do that with a ``match``. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: proj_handrolled + :end-before: //SNIPPET_END: proj_handrolled + +Projectors +.......... + +These projectors are common enough that F* auto-generates them for +you. In particular, for any data constructor ``T`` of type +``x1:t1 -> ... -> xn:tn -> t``, F* auto-generates the following function: + + * ``T?.xi : y:t{T? y} -> ti`` + +That is, ``T?.xi`` is a function which when applied to a ``y:t`` in +case ``T? y``, returns the ``xi`` component of ``T x1 ... xn``. + +In the case of our ``tup2`` and ``tup3`` types, we have + + * ``Tup2?.fst``, ``Tup2?.snd`` + * ``Tup3?.fst``, ``Tup3?.snd``, ``Tup3?.thd`` + +Syntax for tuples +................. + +Since tuples are so common, the module ``FStar.Pervasives.Native.fst`` +defines tuple types up to arity 14. So, you shouldn't have to define +``tup2`` and ``tup3`` etc. by yourself. + +The tuple types in ``FStar.Pervasives.Native`` come with syntactic +sugar. + +* You can write ``a & b`` instead of the ``tup2 a b``; ``a & b & c`` + instead of ``tup3 a b c``; and so on, up to arity 14. + +* You can write ``x, y`` instead of ``Tup2 x y``; ``x, y, z`` instead + of ``Tup3 x y z``; an so on, up to arity 14. + +* You can write ``x._1``, ``x._2``, ``x._3``, etc. to project the + field ``i`` of a tuple whose arity is at least ``i``. + +That said, if you're using tuples beyond arity 4 or 5, it's probably a +good idea to define a *record*, as we'll see next—since it can be hard +to remember what the components of a large tuple represent. + +.. _Part1_records: + +Records +....... + +A record is just a tuple with user-chosen names for its fields and +with special syntax for constructing them and projecting their +fields. Here's an example. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: point + :end-before: //SNIPPET_END: point + +* A record type is defined using curly braces ``{}``. See ``type + point3D`` + +* A record value is also constructed using curly braces, with an + assignment for each field of the record. The fields need not be + given in order. See ``origin``. + +* To access the fields of a record, you can use the dot notation + ``p.x``; See ``dot``, which computes a dot product using dot + notation. + +* Records also support the ``with`` notation to construct a new record + whose fields are the same as the old record, except for those fields + mentioned after the ``with``. That is, ``translate_X p shift`` + returns ``{ x = p.x + shift; y = p.y; z = p.z}``. + +* Records can also be used to pattern match a value. For example, in + ``is_origin``, we match the fields of the record (in any order) + against some patterns. + +Options +^^^^^^^ + +Another common type from F*'s standard library is the ``option`` type, +which is useful to represent a possibly missing value. + +.. code-block::fstar + + type option a = + | None : option a + | Some : a -> option a + +Consider implementing a function to divide ``x / y``, for two integers +``x`` and ``y``. This function cannot be defined when ``y`` is zero, +but it can be defined partially, by excluding the case where ``y = +0``, as shown below. (Of course, one can also refine the domain of the +function to forbid ``y = 0``, but we're just trying to illustrate the +``option`` type here.) + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: option + :end-before: //SNIPPET_END: option + +Like most other functional languages, F* does not have a ``null`` +value. Whenever a value may possibly be ``null``, one typically uses +the ``option`` type, using ``None`` to signify null and ``Some v`` for +the non-null case. + +Unions, or the ``either`` type +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +``FStar.Pervasives`` also defines the ``either`` type, shown below. + +.. code-block:: fstar + + type either a b = + | Inl : v: a -> either a b + | Inr : v: b -> either a b + +The type ``either a b`` represents a value that could either be ``Inl +v`` with ``v:a``, or ``Inr v`` with ``v:b``. That is, ``either a b`` +is a tagged union of the ``a`` and ``b``. It's easy to write functions +to analyze the tag ``Inl`` (meaning it's "in the left case") or +``Inr`` ("in the right case") and compute with the underlying +values. Here's an example: + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: either + :end-before: //SNIPPET_END: either + +The ``same_case x y`` function decides if the two unions are both +simultaneously in the left or right case. + +Then, in ``sum x y``, with a refinement that ``x`` and ``y`` are in +the same case, we can handle just two cases (when they are both in +left, or both in right) and F* can prove that the case analysis is +exhaustive. In the left case, the underlying values are boolean, so we +combine them with ``||``; in the right case, the underlying values are +integers, so we combine them with ``+``; and return them with the +appropriate tag. The type of the result ``z:either bool int{ Inl? z <==> +Inl? x}`` shows that the result has the same case as ``x`` (and hence +also ``y``). We could have written the result type as ``z:either bool +int { same_case z x }``. + +.. _Part1_inductives_list: + +Lists +^^^^^ + +All the types we've seen so far have been inductive only in a degenerate +sense—the constructors do not refer to the types they construct. Now, +for our first truly inductive type, a list. + +Here's the definition of ``list`` from ``Prims``: + +.. code-block:: fstar + + type list a = + | Nil : list a + | Cons : hd:a -> tl:list a -> list a + +The ``list`` type is available implicitly in all F* programs and we +have special (but standard) syntax for the list constructors: + +* ``[]`` is ``Nil`` +* ``[v1; ...; vn]`` is ``Cons v1 ... (Cons vn Nil)`` +* ``hd :: tl`` is ``Cons hd tl``. + +You can always just write out the constructors like `Nil` and `Cons` +explicitly, if you find that useful (e.g., to partially apply ``Cons +hd : list a -> list a``). + +.. _Part1_inductives_length: + +Length of a list +................ + +Let's write some simple functions on lists, starting with computing +the length of a list. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: length + :end-before: //SNIPPET_END: length + +The ``length`` function is recursive and implicitly polymorphic in a +type ``a``. For any list ``l : list a``, ``length l`` returns a +``nat``. The definition pattern matches on the list and calls +``length`` recursively on the tail of list, until the ``[]`` case is +reached. + +.. _Part1_inductives_append: + +Exercises +^^^^^^^^^ + +`Click here <../code/exercises/Part1.Inductives.fst>`_ for the exercise file. + +Here's the definition of ``append``, a function that concatenates two +lists. Can you give it a type that proves it always returns a list +whose length is the sum of the lengths of its arguments? + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: def append + :end-before: //SNIPPET_END: def append + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: SNIPPET_START: sig append + :end-before: SNIPPET_END: sig append diff --git a/doc/book/PoP-in-FStar/book/part1/part1_lemmas.rst b/doc/book/PoP-in-FStar/book/part1/part1_lemmas.rst new file mode 100644 index 00000000000..2642eddbc76 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_lemmas.rst @@ -0,0 +1,577 @@ +.. _Part1_lemmas: + +Lemmas and proofs by induction +============================== + +Let's say you wrote the ``factorial`` function and gave it the type +``nat -> nat``. Later, you care about some other property about +``factorial``, e.g., that if ``x > 2`` then ``factorial x > x``. One +option is to revise the type you wrote for ``factorial`` and get F\* +to reprove that it has this type. But this isn't always feasible. What +if you also wanted to prove that if ``x > 3`` then ``factorial x > 2 * +x``. Clearly, polluting the type of ``factorial`` with all these +properties that you may or may not care about is impractical. + +You could write assertions to ask F* to check these properties, e.g., + +.. code-block:: fstar + + let _ = assert (forall (x:nat). x > 2 ==> factorial x > 2) + +But, F* complains saying that it couldn't prove this fact. That's not +because the fact isn't true—recall, checking the validity of +assertions in F* is undecidable. So, there are facts that are true +that F* may not be able to prove, at least not without some help. + +In this case, proving this property about ``factorial`` requires a +proof by induction. F* and Z3 cannot do proofs by induction +automatically—you will have to help F* here by writing a *lemma*. + + +Introducing lemmas +^^^^^^^^^^^^^^^^^^ + +A lemma is a function in F* that always returns the ``():unit`` +value. However, the type of lemma carries useful information about +which facts are provable. + +Here's our first lemma: + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_is_positive + :end-before: //SNIPPET_END: factorial_is_positive + +There's a lot of information condensed in that definition. Let's spell +it out in detail: + +* ``factorial_is_positive`` is a recursive function with a parameter ``x:nat`` + +* The return type of ``factorial_is_positive`` is a refinement of + unit, namely ``u:unit{factorial x > 0}``. That says that the + function always returns ``()``, but, additionally, when + ``factorial_is_positive x`` returns (which it always does, since it + is a total function) it is safe to conclude that ``factorial x > + 0``. + +* The next three lines prove the lemma using a proof by induction on + ``x``. The basic concept here is that by programming total + functions, we can write proofs about other pure expressions. We'll + discuss such proofs in detail in the remainder of this section. + +.. _Part1_lemma_syntax: + +Some syntactic shorthands for Lemmas +.................................... + +Lemmas are so common in F* that it's convenient to have special syntax +for them. Here's another take at our proof by ``factorial x > 0`` + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_is_positive_lemma + :end-before: //SNIPPET_END: factorial_is_positive_lemma + +The type ``x:t -> Lemma (requires pre) (ensures post)`` is the type of +a function + +* that can be called with an argument ``v:t`` +* the argument must satisfies the precondition ``pre[v/x]`` +* the function always returns a ``unit`` +* and ensures that the postcondition ``post[v/x]`` is valid + +The type is equivalent to ``x:t{pre} -> u:unit{post}``. + +When the precondition ``pre`` is trivial, it can be omitted. One can +just write: + +.. code-block:: fstar + + Lemma (ensures post) + +or even + +.. code-block:: fstar + + Lemma post + + +A proof by induction, explained in detail +......................................... + +Let's look at this lemma in detail again—why does it convince F* that +``factorial x > 0``? + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_is_positive_lemma + :end-before: //SNIPPET_END: factorial_is_positive_lemma + +* It is a proof by induction on ``x``. Proofs by induction in F* are + represented by total recursive functions. The fact that it is total + is extremely important—it ensures that the inductive argument is + well-founded, i.e., that the induction hypothesis is only applied + correctly on strictly smaller arguments. + +* The base case of the induction is when ``x=0``. In this case, F* + + Z3 can easily prove that ``factorial 0 > 0``, since this just + requires computing ``factorial 0`` to ``1`` and checking ``1 > 0``. + +* What remains is the case where ``x > 0``. + +* In the inductive case, the type of the recursively bound + ``factorial_is_pos`` represents the induction hypothesis. In this + case, its type is + + .. code-block:: fstar + + y:int {y < x} -> Lemma (requires y >= 0) (ensures factorial y > 0) + + In other words, the type of recursive function tells us that for all + ``y`` that are smaller than that current argument ``x`` and + non-negative , it is safe to assume that ``factorial y > 0``. + +* By making a recursive call on ``x-1``, F* can conclude that + ``factorial (x - 1) > 0``. + +* Finally, to prove that ``factorial x > 0``, the solver figures out + that ``factorial x = x * factorial (x - 1)``. From the recursive + lemma invocation, we know that ``factorial (x - 1) > 0``, and since + we're in the case where ``x > 0``, the solver can prove that the + product of two positive numbers must be positive. + +Exercises: Lemmas about integer functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`Click here <../code/exercises/Part1.Lemmas.fst>`_ for the exercise file. + +Exercise 1 +.......... + +Try proving the following lemmas about ``factorial``: + +.. code-block:: fstar + + val factorial_is_greater_than_arg (x:int) + : Lemma (requires x > 2) + (ensures factorial x > x) + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: factorial_is_greater_than_arg + :end-before: SNIPPET_END: factorial_is_greater_than_arg + + +Exercise 2 +.......... + +Try proving the following lemmas about ``fibonacci``: + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fibonacci_question + :end-before: SNIPPET_END: fibonacci_question + +.. container:: toggle + + .. container:: header + + **Answer** (Includes two proofs and detailed explanations) + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fibonacci_answer + :end-before: SNIPPET_END: fibonacci_answer + + + Let's have a look at that proof in some detail. It's much like the + proof by induction we discussed in detail earlier, except now we + have two uses of the induction hypothesis. + + * It's a proof by induction on ``n:nat{n >= 2}``, as you can tell from the + ``let rec``. + + * The base cases are when ``n = 2`` and ``n = 3``. In both these + cases, the solver can simply compute ``fibonacci n`` and check + that it is greater than ``n``. + + * Otherwise, in the inductive case, we have ``n >= 4`` and the + induction hypothesis is the type of the recursive function:: + + m:nat{m >= 2 /\ m < n} -> Lemma (fibonacci m >= m) + + * We call the induction hypothesis twice and get:: + + fibonacci (n - 1) >= n - 1 + fibonacci (n - 2) >= n - 2 + + * To conclude, we show:: + + fibonacci n = //by definition + fibonacci (n - 1) + fibonacci (n - 2) >= //from the facts above + (n - 1) + (n - 2) = //rearrange + 2*n - 3 >= //when n >= 4 + n + + As you can see, once you set up the induction, the SMT solver does + a lot of the work. + + Sometimes, the SMT solver can even find proofs that you might not + write yourself. Consider this alternative proof of ``fibonacci n + >= n``. + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fibonacci_answer_alt + :end-before: SNIPPET_END: fibonacci_answer_alt + + This proof works with just a single use of the induction + hypothesis. How come? Let's look at it in detail. + + 1. It's a proof by induction on ``n:nat{n >= 2}``. + + 2. The base case is when ``n=2``. It's easy to compute ``fibonacci 2`` + and check that it's greater than or equal to 2. + + 3. In the inductive case, we have:: + + n >= 3 + + 4. The induction hypothesis is:: + + m:nat{m >= 2 /\ m < n} -> Lemma (fibonacci m >= m) + + 5. We apply the induction hypothesis to ``n - 1`` and get :: + + fibonacci (n - 1) >= n - 1 + + 6. We have:: + + fibonacci n = //definition + fibonacci (n - 1) + fibonacci (n - 2) >= //from 5 + (n - 1) + fibonacci (n - 2) + + 7. So, our goal is now:: + + (n - 1) + fibonacci (n - 2) >= n + + 8. It suffices if we can show ``fibonacci (n - 2) >= 1`` + + 9. From (2) and the definition of ``fibonacci`` we have:: + + fibonacci (n - 1) = //definition + fibonacci (n - 2) + fibonacci (n - 3) >= //from 5 + n - 1 >= // from 3 + 2 + + + 10. Now, suppose for contradiction, that ``fibonacci (n - 2) = 0``. + + 10.1. Then, from step 9, we have ``fibonacci (n-3) >= 2`` + + 10.2 If ``n=3``, then ``fibonacci 0 = 1``, so we have a contradiction. + + 10.3 If ``n > 3``, then + + 10.3.1. ``fibonacci (n-2) = fibonacci (n-3) + fibonacci (n-4)``, by definition + + 10.3.2. ``fibonacci (n-3) + fibonacci (n-4) >= fibonacci (n-3)``, since ``fibonacci (n-4) : nat``. + + 10.3.3. ``fibonacci (n-2) >= fibonacci (n-3)``, using 10.3.1 and 10.3.2 + + 10.3.4. ``fibonacci (n-2) >= 2``, using 10.1 + + 10.3.5. But, 10.3.4 contradicts 10; so the proof is complete. + + You probably wouldn't have come up with this proof yourself, and + indeed, it took us some puzzling to figure out how the SMT solver + was able to prove this lemma with just one use of the induction + hypothesis. But, there you have it. All of which is to say that + the SMT solver is quite powerful! + +Exercise: A lemma about append +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:ref:`Earlier `, we saw a definition of +``append`` with the following type: + +.. code-block:: fstar + + val append (#a:Type) (l1 l2:list a) + : l:list a{length l = length l1 + length l2} + +Now, suppose we were to define `app``, a version of ``append`` with a +weaker type, as shown below. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: //SNIPPET_START: def append alt + :end-before: //SNIPPET_END: def append alt + +Can you prove the following lemma? + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: //SNIPPET_START: sig app_length + :end-before: //SNIPPET_END: sig app_length + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: def app_length + :end-before: SNIPPET_END: def app_length + +.. _Part1_intrinsic_extrinsic: + +Intrinsic vs extrinsic proofs +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As the previous exercise illustrates, you can prove properties either +by enriching the type of a function or by writing a separate lemma +about it---we call these the 'intrinsic' and 'extrinsic' styles, +respectively. Which style to prefer is a matter of taste and +convenience: generally useful properties are often good candidates for +intrinsic specification (e.g, that ``length`` returns a ``nat``); more +specific properties are better stated and proven as lemmas. However, +in some cases, as in the following example, it may be impossible to +prove a property of a function directly in its type---you must resort +to a lemma. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: reverse + :end-before: SNIPPET_END: reverse + +Let's try proving that reversing a list twice is the identity +function. It's possible to *specify* this property in the type of +``reverse`` using a refinement type. + +.. code-block:: fstar + + val reverse (#a:Type) : f:(list a -> list a){forall l. l == f (f l)} + +.. note:: + + A subtle point: the refinement on ``reverse`` above uses a + :ref:`propositional equality + `. That's because equality on + lists of arbitrary types is not decidable, e.g., consider ``list + (int -> int)``. All the proofs below will rely on propositional + equality. + +However, F* refuses to accept this as a valid type for ``reverse``: +proving this property requires two separate inductions, neither of +which F* can perform automatically. + +Instead, one can use two lemmas to prove the property we care +about. Here it is: + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: reverse_involutive + :end-before: SNIPPET_END: reverse_involutive + +In the ``hd :: tl`` case of ``rev_involutive`` we are explicitly +applying not just the induction hypothesis but also the ``snoc_cons`` +auxiliary lemma also proven there. + +Exercises: Reverse is injective +............................... + +`Click here <../code/exercises/Part1.Lemmas.fst>`_ for the exercise file. + +Prove that reverse is injective, i.e., prove the following lemma. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: sig rev_injective + :end-before: SNIPPET_END: sig rev_injective + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: def rev_injective + :end-before: SNIPPET_END: def rev_injective + + That's quite a tedious proof, isn't it. Here's a simpler proof. + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: rev_injective_alt + :end-before: SNIPPET_END: rev_injective_alt + + The ``rev_injective_alt`` proof is based on the idea that every + invertible function is injective. We've already proven that + ``reverse`` is involutive, i.e., it is its own inverse. So, we + invoke our lemma, once for ``l1`` and once for ``l2``. This gives + to the SMT solver the information that ``reverse (reverse l1) = + l1`` and ``reverse (reverse l2) = l2``, which suffices to complete + the proof. As usual, when structuring proofs, lemmas are your + friends! + +Exercise: Optimizing reverse +............................ + +Earlier, we saw how to implement :ref:`a tail-recursive variant +` of ``reverse``. + +.. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: rev + :end-before: SNIPPET_END: rev + +Prove the following lemma to show that it is equivalent to the +previous non-tail-recursive implementation, i.e., + +.. code-block:: fstar + + val rev_is_ok (#a:_) (l:list a) : Lemma (rev [] l == reverse l) + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: rev_is_ok + :end-before: SNIPPET_END: rev_is_ok + +Exercise: Optimizing Fibonacci +.............................. + + +Earlier, we saw how to implement :ref:`a tail-recursive variant +` of ``fibonacci``---we show it again below. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fib_tail$ + :end-before: SNIPPET_END: fib_tail$ + +Prove the following lemma to show that it is equivalent to the +non-tail-recursive implementation, i.e., + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: val fib_tail_is_ok$ + :end-before: SNIPPET_END: val fib_tail_is_ok$ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fib_is_ok$ + :end-before: SNIPPET_END: fib_is_ok$ + +.. _Part1_higher_order_functions: + +Higher-order functions +^^^^^^^^^^^^^^^^^^^^^^^ + +Functions are first-class values—they can be passed to other functions +and returned as results. We've already seen some examples in the +section on :ref:`polymorphism +`. Here are some more, starting with +the ``map`` function on lists. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: map + :end-before: SNIPPET_END: map + +It takes a function ``f`` and a list ``l`` and it applies ``f`` to +each element in ``l`` producing a new list. More precisely ``map f +[v1; ...; vn]`` produces the list ``[f v1; ...; f vn]``. For example: + +.. code-block:: fstar + + map (fun x -> x + 1) [0; 1; 2] = [1; 2; 3] + + +Exercise: Finding a list element +................................ + +Here's a function called ``find`` that given a boolean function ``f`` +and a list ``l`` returns the first element in ``l`` for which ``f`` +holds. If no element is found ``find`` returns ``None``. + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: find + :end-before: SNIPPET_END: find + +Prove that if ``find`` returns ``Some x`` then ``f x = true``. Is it +better to do this intrinsically or extrinsically? Do it both ways. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: sig find + :end-before: SNIPPET_END: sig find + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: find_alt + :end-before: SNIPPET_END: find_alt + +Exercise: fold_left +................... + +Here is a function ``fold_left``, where:: + + fold_left f [b1; ...; bn] a = f (bn, ... (f b2 (f b1 a))) + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: def fold_left + :end-before: SNIPPET_END: def fold_left + +Prove the following lemma: + +.. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: sig fold_left_Cons_is_rev + :end-before: SNIPPET_END: sig fold_left_Cons_is_rev + +.. container:: toggle + + .. container:: header + + Hint: This proof is a level harder from what we've done so far. + You will need to strengthen the induction hypothesis, and + possibly to prove that ``append`` is associative and that + ``append l [] == l``. + + **Answer** + + .. literalinclude:: ../code/Part1.Lemmas.fst + :language: fstar + :start-after: SNIPPET_START: fold_left_Cons_is_rev + :end-before: SNIPPET_END: fold_left_Cons_is_rev diff --git a/doc/book/PoP-in-FStar/book/part1/part1_polymorphism.rst b/doc/book/PoP-in-FStar/book/part1/part1_polymorphism.rst new file mode 100644 index 00000000000..497bfe735f4 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_polymorphism.rst @@ -0,0 +1,207 @@ +.. _Part1_polymorphism_and_inference: + +Polymorphism and type inference +=============================== + +In this chapter, we'll learn about defining type polymorphic +functions, or how to work with generic types. + +.. _Part1_type_of_types: + +Type: The type of types +^^^^^^^^^^^^^^^^^^^^^^^ + +One characteristic of F* (and many other dependently typed languages) +is that it treats programs and their types uniformly, all within a +single syntactic class. A type system in this style is sometimes +called a *Pure Type System* or `PTS +`_. + +In F* (as in other PTSs) types have types too, functions can take +types as arguments and return types as results, etc. In particular, +the type of a type is ``Type``, e.g., ``bool : Type``, ``int : Type``, +``int -> int : Type`` etc. In fact, even ``Type`` has a type---as +we'll see when we learn about *universes*. + +Parametric polymorphism or generics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Most modern typed languages provide a way to write programs with +generic types. For instance, C# and Java provide generics, C++ has +templates, and languages like OCaml and Haskell have several kinds of +polymorphic types. + +In F*, writing functions that are generic or polymorphic in types +arises naturally as a special case of the :ref:`arrow types +` that we have already learned about. For example, +here's a polymorphic identity function:: + + let id : a:Type -> a -> a = fun a x -> x + +There are several things to note here: + +* The type of ``id`` is an arrow type, with two arguments. The first + argument is ``a : Type``; the second argument is a term of type + ``a``; and the result also has the same type ``a``. + +* The definition of ``id`` is a lambda term with two arguments ``a : + Type`` (corresponding to the first argument type) and ``x : a``. The + function returns ``x``---it's an identity function on the second + argument. + +Just as with any function, you can write it instead like this: + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: id + :end-before: //SNIPPET_END: id + +To call ``id``, one can apply it first to a type and then to a value of that type, as shown below. + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: id applications + :end-before: //SNIPPET_END: id applications + +We've defined a function that can be applied to a value ``x:a`` for +any type ``a``. The last line there maybe requires a second read: we +instantiated ``id`` to ``int -> int`` and then applied it to ``id`` +instantiated to ``int``. + +Exercises +^^^^^^^^^ + +Let's try a few simple exercises. `Click here +<../code/exercises/Part1.Poly.fst>`_ for the exercise file. + + +Try defining functions with the following signatures: + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: sig apply_and_compose + :end-before: //SNIPPET_END: sig apply_and_compose + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: apply_and_compose + :end-before: //SNIPPET_END: apply_and_compose + +How about writing down a signature for ``twice``: + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: def twice + :end-before: //SNIPPET_END: def twice + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: SNIPPET_START: sig twice + :end-before: SNIPPET_END: sig twice + +It's quite tedious to have to explicitly provide that first type +argument to ``id``. Implicit arguments and type inference will help, +as we'll see, next. + + +Type inference: Basics +^^^^^^^^^^^^^^^^^^^^^^ +.. _inference: + +Like many other languages in the tradition of +`Milner's ML `_, +type inference is a central component in F*'s design. + +You may be used to type inference in other languages, where one can +leave out type annotations (e.g., on variables, or when using +type-polymorphic (aka generic) functions) and the compiler determines +an appropriate type based on the surrounding program context. F*'s +type inference includes such a feature, but is considerably more +powerful. Like in other dependently typed languages, F*'s inference +engine is based on `higher-order unification +`_ +and can be used to infer arbitrary fragments of program text, not just +type annotations on variables. + +Let's consider our simple example of the definition and use of the +identity function again + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: id + :end-before: //SNIPPET_END: id + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: id applications + :end-before: //SNIPPET_END: id applications + +Instead of explicitly providing that first type argument when applying +``id``, one could write it as follows, replacing the type arguments +with an underscore ``_``. + +.. literalinclude:: ../code/Part1.Poly.fst + :language: fstar + :start-after: //SNIPPET_START: implicit id applications + :end-before: //SNIPPET_END: implicit id applications + +The underscore symbol is a wildcard, or a hole in program, and it's +the job of the F* typechecker to fill in the hole. + +.. note:: + + Program holes are a very powerful concept and form the basis of + Meta-F*, the metaprogramming and tactics framework embedded in + F*---we'll see more about holes in a later section. + +Implicit arguments +^^^^^^^^^^^^^^^^^^ + +Since it's tedious to write an ``_`` everywhere, F* has a notion of +*implicit arguments*. That is, when defining a function, one can add +annotations to indicate that certain arguments can be omitted at call +sites and left for the typechecker to infer automatically. + +For example, one could write + +.. literalinclude:: ../code/Part1.Poly2.fst + :language: fstar + :start-after: //SNIPPET_START: id + :end-before: //SNIPPET_END: id + +decorating the first argument ``a`` with a ``#``, to indicate that it is +an implicit argument. Then at call sites, one can simply write: + +.. literalinclude:: ../code/Part1.Poly2.fst + :language: fstar + :start-after: //SNIPPET_START: id applications + :end-before: //SNIPPET_END: id applications + + +And F* will figure out instantiations for the missing first argument +to ``id``. + +In some cases, it may be useful to actually provide an implicit +argument explicitly, rather than relying on the F* to pick one. For +example, one could write the following: + +.. literalinclude:: ../code/Part1.Poly2.fst + :language: fstar + :start-after: //SNIPPET_START: explicit id applications + :end-before: //SNIPPET_END: explicit id applications + +In each case, we provide the first argument of ``id`` explicitly, by +preceding it with a ``#`` sign, which instructs F* to take the user's +term rather than generating a hole and trying to fill it. diff --git a/doc/book/PoP-in-FStar/book/part1/part1_prop_assertions.rst b/doc/book/PoP-in-FStar/book/part1/part1_prop_assertions.rst new file mode 100644 index 00000000000..456d9ef2be2 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_prop_assertions.rst @@ -0,0 +1,392 @@ +.. _Part1_prop_assertions: + +Interfacing with an SMT solver +============================== + +As mentioned :ref:`at the start of this section `, a type ``t`` +represents a proposition and a term ``e : t`` is a proof of ``t``. In +many other dependently typed languages, exhibiting a term ``e : t`` is +the only way to prove that ``t`` is valid. In F*, while one can do +such proofs, it is not the only way to prove a theorem. + +By way of illustration, let's think about :ref:`Boolean refinement +types `. As we've seen already, it is +easy to prove ``17 : x:int{x >= 0}`` in F*. Under the covers, F* +proves that ``(x >= 0) [17/x]`` reduces to ``true``, yet no explicit +term is given to prove this fact. Instead, F* encodes facts about a +program (including things like the semantics of arithmetic operators +like ``>=``) in the classical logic of an SMT solver and asks it (Z3 +typically) to prove whether the formula ``17 >= 0`` is valid in a +context including all encoded facts about a program. If Z3 is able to +prove it valid, F* accepts the formula as true, without ever +constructing a term representing a proof of ``17 >= 0``. + +This design has many important consequences, including, briefly: + +* Trust: F* implicitly trusts its encoding to SMT logic and the + correctness of the Z3 solver. + +* Proof irrelevance: Since no proof term is constructed for proofs + done by SMT, a program cannot distinguish between different proofs + of a fact proven by SMT. + +* Subtyping: Since no proof term is constructed, a term like ``17`` + can have many types, ``int``, ``nat``, ``x:int{x = 17}``, etc. As + mentioned :ref:`earlier `, F* + leverages this to support refinement subtyping. + +* Undecidability: Since Z3 can check the validity of formulas in the + entirety of its logic, including things like quantifying universally + and existentially over infinite ranges, F* does not restrict the + formulas checked for validity by Z3 to be boolean, or even + decidable. Yes, typechecking in F* is undecidable. + +In this chapter, we'll learn about the classical logic parts of +F*, i.e., the parts that allow it to interface with an SMT solver. + +.. note:: + + The beginning of this chapter is a little technical, even though + we're not telling the full story behind F*'s classical logic + yet. If parts of it are hard to understand right now, here's what + you need to know before you :ref:`jump ahead + `. + + F* lets you write quantified formulas, called propositions, like + so + + .. code-block:: fstar + + forall (x1:t1) ... (xn:tn). p + exists (x1:t1) ... (xn:tn). p + + You can build propositions from booleans and conjunctions, + disjunctions, negations, implications, and bi-implications: + + .. code-block:: fstar + + p /\ q //conjunction + p \/ q //disjunction + ~p //negation + p ==> q //implication + p <==> q //bi-implication + + For example, one can say (as shown below) that for all natural + numbers ``x`` and ``y``, if the modulus ``x % y`` is ``0``, then + there exists a natural number ``z`` such that ``x`` is ``z * y``. + + .. code-block:: fstar + + forall (x:nat) (y:nat). x % y = 0 ==> (exists (z:nat). x = z * y) + + F* also has a notion of propositional equality, written ``==``, + that can be used to state that two terms of any type are equal. In + contrast, the boolean equality ``=`` can only be used on types that + support decidable equality. For instance, for ``f1, f2 : int -> + int``, you can write ``f1 == f2`` but you cannot write ``f1 = f2``, + since two functions cannot be decidably compared for equality. + +.. _Part1_prop: + +Propositions +^^^^^^^^^^^^ + +The type ``prop`` defined in ``Prims`` is F*'s type of +proof-irrelevant propositions. More informally, ``prop`` is the type +given to facts that are provable using the SMT solver's classical +logic. + +Propositions defined in ``prop`` need not be decidable. For example, +for a Turing machine ``tm``, the fact ``halts tm`` can be defined as a +``prop``, although it is impossible to decide for an arbitrary ``tm`` +whether ``tm`` halts on all inputs. This is contrast with ``bool``, +the type of booleans ``{true, false}``. Clearly, one could not define +``halts tm`` as a ``bool``, since one would be claiming that for +``halts`` is function that for any ``tm`` can decide (by returning +true or false) whether or not ``tm`` halts on all inputs. + +F* will implicitly convert a ``bool`` to a ``prop`` when needed, since +a decidable fact can be turned into a fact that may be +undecidable. But, when using propositions, one can define things that +cannot be defined in ``bool``, including quantified formulae, as we'll +see next. + +.. _Part1_prop_connectives: + +Propositional connectives +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Consider stating that ``factorial n`` always returns a positive +number, when ``n:nat``. In the :ref:`previous section ` we +learned that one way to do this is to give ``factorial`` a type like so. + +.. code-block:: fstar + + val factorial (n:nat) : x:nat{x > 0} + +Here's another way to state it: + +.. code-block:: fstar + + forall (n:nat). factorial n > 0 + +What about stating that ``factorial n`` can sometimes return a value +that's greater than ``n * n``? + +.. code-block:: fstar + + exists (n:nat). factorial n > n * n + +We've just seen our first use of universal and existential +quantifiers. + +Quantifiers +........... + +A universal quantifier is constructed using the ``forall`` keyword. Its +syntax has the following shape. + +.. code-block:: fstar + + forall (x1:t1) ... (xn:tn) . p + +The ``x1 ... xn`` are bound variables and signify the domain over +which one the proposition ``p`` is quantified. That is, ``forall +(x:t). p`` is valid when for all ``v : t`` the proposition ``p[v/x]`` +is valid. + +And existential quantifier has similar syntax, using the ``exists`` +keyword. + +.. code-block:: fstar + + exists (x1:t1) ... (xn:tn) . p + +In this case, ``exists (x:t). p`` is valid when for some ``v : t`` the +proposition ``p[v/x]`` is valid. + +The scope of a quantifier extends as far to the right as possible. + +As usual in F*, the types on the bound variables can be omitted and F* +will infer them. However, in the case of quantified formulas, it's a +good idea to write down the types, since the meaning of the quantifier +can change significantly depending on the type of the variable. Consider +the two propositions below. + +.. code-block:: fstar + + exists (x:int). x < 0 + exists (x:nat). x < 0 + +The first formula is valid by considering ``x = -1``, while the second +one is not—there is no natural number less than zero. + +It is possible to quantify over any F* type. This makes the +quantifiers higher order and dependent. For example, one can write + +.. code-block:: fstar + + forall (n:nat) (p: (x:nat{x >= n} -> prop)). p n + +.. note:: + + The SMT solver uses a number of heuristics to determine if a + quantified proposition is valid. As you start writing more + substantial F* programs and proofs, it will become important to + learn a bit about these heuristics. We'll cover this in a later + chapter. If you're impatient, you can also read about in on the `F* + wiki + `_. + + +Conjunction, Disjunction, Negation, Implication +............................................... + +In addition to the quantifiers, you can build propositions by +combining them with other propositions, using the operators below, in +decreasing order of precedence. + +**Negation** + +The proposition ``~p`` is valid if the negation of ``p`` is +valid. This is similar to the boolean operator ``not``, but applies to +propositions rather than just booleans. + +**Conjunction** + +The proposition ``p /\ q`` is valid if both ``p`` and ``q`` are +valid. This is similar to the boolean operator ``&&``, but applies to +propositions rather than just booleans. + +**Disjunction** + +The proposition ``p \/ q`` is valid if at least one of ``p`` and ``q`` +are valid. This is similar to the boolean operator ``||``, but applies +to propositions rather than just booleans. + +**Implication** + +The proposition ``p ==> q`` is valid if whenever ``p`` is valid, ``q`` +is also valid. + +**Double Implication** + +The proposition ``p <==> q`` is valid if ``p`` and ``q`` are +equivalent. + +.. note:: + + This may come as a surprise, but these precedence rules mean that + ``p /\ q ==> r`` is parsed as ``(p /\ q) ==> r`` rather than + ``p /\ (q ==> r)``. When in doubt, use parentheses. + + +Atomic propositions +^^^^^^^^^^^^^^^^^^^ + +We've shown you how to form new propositions by building them from +existing propositions using the connectives. But, what about the basic +propositions themselves? + + +Falsehood +......... + +The proposition ``False`` is always invalid. + +Truth +..... + +The proposition ``True`` is always valid. + +.. _Part1_ch2_propositional_equality: + +Propositional equality +...................... + +We learned in the previous chapter about the :ref:`two different forms +of equality `. The type of propositional equality is + +.. code-block:: fstar + + val ( == ) (#a:Type) (x:a) (y:a) : prop + +Unlike decidable equality ``(=)``, propositional equality is defined +for all types. The result type of ``(==)`` is ``prop``, the type of +propositions, meaning that ``x == y`` is a proof-irrelevant +proposition. + + +**Turning a Boolean into a proposition** + +Propositional equality provides a convenient way to turn a boolean +into a proposition. For any boolean ``b``, then term ``b == true`` is +a ``prop``. One seldom needs to write this manually (although it +does come up occasionally), since F* will automatically insert a +``b==true`` if you're using a ``b:bool`` in a context where a ``prop`` +was expected. + +``Type`` vs. ``prop`` +..................... + +This next bit is quite technical. Don't worry if you didn't understand +it at first. It's enough to know at this stage that, just like +automatically converting a boolean to `prop`, F* automatically +converts any type to ``prop``, when needed. So, you can form new +atomic propositions out of types. + +Every well-typed term in F* has a type. Even types have types, e.g., +the type of ``int`` is ``Type``, i.e., ``int : Type``, ``bool : +Type``, and even ``prop : Type``. We'll have to leave a full +description of this to a later section, but, for now, we'll just +remark that another way to form an atomic proposition is to convert a +type to a proposition. + +For any type ``t : Type``, the type ``_:unit { t } : prop``. We call +this "squashing" a type. This is so common, that F* provides two +mechanisms to support this: + +1. All the propositional connectives, like ``p /\ q`` are designed so + that both ``p`` and ``q`` can be types (i.e., ``p,q : Type``), + rather than propositions, and they implicitly squash their types. + +2. The standard library, ``FStar.Squash``, provides several utilities + for manipulating squashed types. + +.. _Part1_ch2_assertions: + +Assertions +^^^^^^^^^^ + +Now that we have a way to write down propositions, how can we ask F* +to check if those propositions are valid? There are several ways, the +most common of which is an *assertion*. Here's an example: + +.. code-block:: fstar + + let sqr_is_nat (x:int) : unit = assert (x * x >= 0) + +This defines a function ``sqr_is_nat : int -> unit``—meaning it takes +a ``nat`` and always returns ``()``. So, it's not very interesting as +a function. + +However, it's body contains an assertion that ``x * x >= 0``. Now, +many programming languages support runtime assertions—code to check +some property of program when it executes. But, assertions in F* are +different—they are checked by the F* compiler *before* your program is +executed. + +In this case, the ``assert`` instructs F* to encode the program to SMT +and to ask Z3 if ``x * x >= 0`` is valid for an arbitrary integer +``x:int``. If Z3 can confirm this fact (which it can), then F* accepts +the program and no trace of the assertion is left in your program when +it executes. Otherwise the program is rejected at compile time. For +example, if we were to write + +.. code-block:: fstar + + let sqr_is_pos (x:int) : unit = assert (x * x > 0) + +Then, F* complains with the following message:: + + Ch2.fst(5,39-5,50): (Error 19) assertion failed; The SMT solver could not prove the query, try to spell your proof in more detail or increase fuel/ifuel + +You can use an assertion with any proposition, as shown below. + +.. literalinclude:: ../code/Part1.Assertions.fst + :language: fstar + :start-after: //SNIPPET_START: max + :end-before: //SNIPPET_END: max + +Assumptions +^^^^^^^^^^^ + +The dual of an assertion is an assumption. Rather than asking F* and +Z3 to prove a fact, an assumption allows one to tell F* and Z3 to +accept that some proposition is valid. You should use assumptions with +care—it's easy to make a mistake and assume a fact that isn't actually +true. + +The syntax of an assumption is similar to an assertion. Here, below, +we write ``assume (x <> 0)`` to tell F* to assume ``x`` is non-zero in +the rest of the function. That allows F* to prove that the assertion +that follows is valid. + +.. code-block:: fstar + + let sqr_is_pos (x:int) = assume (x <> 0); assert (x * x > 0) + +Of course, the assertion is not valid for all ``x``—it's only valid +for those ``x`` that also validate the preceding assumption. + +Just like an ``assert``, the type of ``assume p`` is ``unit``. + +There's a more powerful form of assumption, called an ``admit``. The +term ``admit()`` can be given any type you like. For example, + +.. code-block:: fstar + + let sqr_is_pos (x:int) : y:nat{y > 0} = admit() + +Both ``assume`` and ``admit`` can be helpful when you're working +through a proof, but a proof isn't done until it's free of them. diff --git a/doc/book/PoP-in-FStar/book/part1/part1_quicksort.rst b/doc/book/PoP-in-FStar/book/part1/part1_quicksort.rst new file mode 100644 index 00000000000..a8b59454e4c --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_quicksort.rst @@ -0,0 +1,410 @@ +.. _Part1_quicksort: + +Case Study: Quicksort +===================== + +We'll now put together what we've learned about defining recursive +functions and proving lemmas about them to prove the correctness of +`Quicksort `_, a classic +sorting algorithm. + +We'll start with lists of integers and describe some properties that +we'd like to hold true of a sorting algorithm, starting with a +function ``sorted``, which decides when a list of integers is sorted +in increasing order, and ``mem``, which decides if a given element is +in a list. Notice that ``mem`` uses an ``eqtype``, :ref:`the type of +types that support decidable equality `. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sorted mem + :end-before: SNIPPET_END: sorted mem + +Given a sorting algorithm ``sort``, we would like to prove the +following property, meaning that for all input list ``l``, the +resulting list ``sort l`` is sorted and has all the elements that +``l`` does. + +.. code-block:: fstar + + forall l. sorted (sort l) /\ (forall i. mem i l <==> mem i (sort l)) + +This specification is intentionally a bit weak, e.g., in case there +are multiple identical elements in ``l``, this specification does not +prevent ``sort`` from retaining only one of them. + +We will see how to improve this specification below, as part of an +exercise. + +If you're unfamiliar with the algorithm, you can `read more about it +here `_. We'll describe +several slightly different implementations and proofs of Quicksort in +detail—you may find it useful to follow along interactively with the +`entire code development <../code/Part1.Quicksort.fst>`_ of this +sequence. + +Implementing ``sort`` +^^^^^^^^^^^^^^^^^^^^^ + +Our implementation of Quicksort is pretty simple-minded. It always +picks the first element of the list as the pivot; partitions the rest +of the list into those elements greater than or equal to the pivot, +and the rest; recursively sorts the partitions; and slots the pivot in +the middle before returning. Here it is: + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort-impl + :end-before: SNIPPET_END: sort-impl + +There are a few points worth discussing in detail: + +1. The notation ``((<=) pivot)`` may require some explanation: it is + the *partial application* of the ``<=`` operator to just one + argument, ``pivot``. It is equivalent to ``fun x -> pivot <= x``. + +2. We have to prove that ``sort`` terminates. The measure we've + provided is ``length l``, meaning that at each recursive call, + we're claiming that the length of input list is strictly + decreasing. + +3. Why is this true? Well, informally, the recursive calls ``sort lo`` + and ``sort hi`` are partitions of the ``tl`` of the list, which is + strictly shorter than ``l``, since we've removed the ``pivot`` + element. We'll have to convince F* of this fact by giving + ``partition`` an interesting type that we'll see below. + +Implementing ``partition`` +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Here's an implementation of ``partition``. It's a :ref:`higher-order +function `, where ``partition f l`` +returns a pair of lists ``l₁`` and ``l₂``, a partitioning of the +elements in ``l`` such that the every element in ``l₁`` satisfies +``f`` and the elements in ``l₂`` do not. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: partition + :end-before: SNIPPET_END: partition + +The specification we've given ``partition`` is only partial—we do not +say, for instance, that all the elements in ``l₁`` satisfy ``f``. We +only say that the sum of the lengths of the ``l₁`` and ``l₂`` are +equal to the length of ``l``. That's because that's the only property +we need (so far) about ``partition``—this property about the lengths +is what we need to prove that on the recursive calls ``sort lo`` and +``sort hi``, the arguments ``lo`` and ``hi`` are strictly shorter than +the input list. + +This style of partial specification should give you a sense of the art +of program proof and the design choices between :ref:`intrinsic and +extrinsic proof `. One tends to specify +only what one needs, rather than specifying all properties one can +imagine right up front. + +Proving ``sort`` correct +^^^^^^^^^^^^^^^^^^^^^^^^ + +Now that we have our definition of ``sort``, we still have to prove it +correct. Here's a proof—it requires three auxiliary lemmas and we'll +discuss it in detail. + +Our first lemma relates ``partition`` to ``mem``: it proves what we +left out in the intrinsic specification of ``partition``, i.e., that +all the elements in ``l₁`` satisfy ``f``, the elements in ``l₂`` do +not, and every element in ``l`` appears in either ``l₁`` or ``l₂``. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: partition_mem + :end-before: SNIPPET_END: partition_mem + +Our next lemma is very specific to Quicksort. If ``l₁`` and ``l₂`` are +already sorted, and partitioned by ``pivot``, then slotting ``pivot`` +in the middle of ``l₁`` and ``l₂`` produces a sorted list. The +specification of ``sorted_concat`` uses a mixture of refinement types +(e.g., ``l1:list int{sorted l1}``) and ``requires`` / ``ensures`` +specifications–this is just a matter of taste. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sorted_concat + :end-before: SNIPPET_END: sorted_concat + +Our third lemma is a simple property about ``append`` and ``mem``. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: append_mem + :end-before: SNIPPET_END: append_mem + +Finally, we can put the pieces together for our top-level statement +about the correctness of ``sort``. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort_correct + :end-before: SNIPPET_END: sort_correct + +The structure of the lemma is mirrors the structure of ``sort`` +itself. + +* In the base case, the proof is automatic. + +* In the inductive case, we partition the tail of the list and + recursively call the lemma on the the ``hi`` and ``lo`` components, + just like ``sort`` itself. The intrinsic type of ``partition`` is + also helpful here, using the ``length`` measure on the list to prove + that the induction here is well-founded. + + - To prove the ``ensures`` postcondition, we apply our three + auxiliary lemmas. + + + ``partition_mem ((<=) pivot) tl`` gives us the precondition of + needed to satisfy the ``requires`` clause of + ``sorted_concat``. + + + We also need to prove the ``sorted`` refinements on ``sort lo`` + and ``sort hi`` in order to call ``sorted_concat``, but the + recursive calls of the lemma give us those properties. + + + After calling ``sorted_concat``, we have proven that the + resulting list is sorted. What's left is to prove that all the + elements of the input list are in the result, and ``append_mem`` + does that, using the postcondition of ``partition_mem`` and the + induction hypothesis to relate the elements of ``append (sort + lo) (pivot :: sort hi)`` to the input list ``l``. + +Here's another version of the ``sort_correct`` lemma, this time +annotated with lots of intermediate assertions. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort_correct_annotated + :end-before: SNIPPET_END: sort_correct_annotated + +This is an extreme example, annotating with assertions at almost every +step of the proof. However, it is indicative of a style that one often +uses to interact with F* when doing SMT-assisted proofs. At each point +in your program or proof, you can use ``assert`` to check what the +prover "knows" at that point. See what happens if you move the +assertions around, e.g., if you move ``assert (sort_ok lo)`` before +calling ``sort_correct_annotated lo``, F* will complain that it is not +provable. + +Limitations of SMT-based proofs at higher order +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +You may be wondering why we used ``(<=) pivot`` instead of ``fun x -> +pivot <= x`` in our code. Arguably, the latter is more readable, +particularly to those not already familiar with functional programming +languages. Well, the answer is quite technical. + +We could indeed have written ``sort`` like this, + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort_alt + :end-before: SNIPPET_END: sort_alt + +And we could have tried to write our main lemma this way: + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort_alt_correct + :end-before: SNIPPET_END: sort_alt_correct + +However, without further assistance, F*+SMT is unable to prove the +line at which the ``assume`` appears. It turns out, this is due to a +fundamental limitation in how F* encodes its higher-order logic into +the SMT solver's first-order logic. This encoding comes with some loss +in precision, particularly for lambda terms. In this case, the SMT +solver is unable to prove that the occurrence of ``fun x -> pivot <= +x`` that appears in the proof of ``sort_alt_correct_annotated`` is +identical to the occurrence of the same lambda term in ``sort_alt``, +and so it cannot conclude that ``sort_alt l`` is really equal to +``append (sort_alt lo) (pivot :: sort_alt hi))``. + +This is unfortunate and can lead to some nasty surprises when trying +to do proofs about higher order terms. Here are some ways to avoid +such pitfalls: + +* Try to use named functions at higher order, rather than lambda + literals. Named functions do not suffer a loss in precision when + encoded to SMT. This is the reason why ``(<=) pivot`` worked out + better than the lambda term here—the ``(<=)`` is a name that + syntactically appears in both the definition of ``sort`` and the + proof of ``sort_alt_correct`` and the SMT solver can easily see that + the two occurrences are identical. + +* If you must use lambda terms, sometimes an intrinsic proof style can + help, as we'll see below. + +* If you must use lambda terms with extrinsic proofs, you can still + complete your proof, but you will have to help F* along with tactics + or proofs by normalization, more advanced topics that we'll cover in + later sections. + +* Even more forward looking, recent `higher-order variants of SMT + solvers `_ are promising and + may help address some of these limitations. + +An intrinsic proof of ``sort`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +As we observed earlier, our proof of ``sort_correct`` had essentially +the same structure as the definition of ``sort`` itself—it's tempting +to fuse the definition of ``sort`` with ``sort_correct``, so that we +avoid the duplication and get a proof of correctness of ``sort`` +built-in to its definition. + +So, here it is, a more compact proof of ``sort``, this time done +intrinsically, i.e., by enriching the type of ``sort`` to capture the +properties we want. + +.. literalinclude:: ../code/Part1.Quicksort.fst + :language: fstar + :start-after: SNIPPET_START: sort_intrinsic + :end-before: SNIPPET_END: sort_intrinsic + +We still use the same three auxiliary lemmas to prove the properties +we want, but this time the recursive calls to sort the partitioned +sub-lists also serve as calls to the induction hypothesis for the +correctness property we're after. + +Notice also that in this style, the use of a lambda literal isn't +problematic—when operating within the same scope, F*'s encoding to SMT +is sufficiently smart to treat the multiple occurrences of ``fun x -> +pivot <= x`` as identical functions. + +Runtime cost? +............. + +You may be concerned that we have just polluted the definition of +``sort_intrinsic`` with calls to three additional recursive +functions–will this introduce any runtime overhead when executing +``sort_intrinsic``? Thankfully, the answer to that is "no". + +As we'll learn in the section on :ref:`effects `, F* supports +of notion of *erasure*—terms that can be proven to not contribute to +the observable behavior of a computation will be erased by the +compiler before execution. In this case, the three lemma invocations +are total functions returning unit, i.e., these are functions that +always return in a finite amount of time with the constant value +``()``, with no other observable side effect. So, there is no point in +keeping those function calls around—we may as well just optimize them +away to their result ``()``. + +Indeed, if you ask F* to extract the program to OCaml (using +``fstar --codegen OCaml``), here's what you get: + +.. code-block:: fstar + + let rec (sort_intrinsic : Prims.int Prims.list -> Prims.int Prims.list) = + fun l -> + match l with + | [] -> [] + | pivot::tl -> + let uu___ = partition (fun x -> pivot <= x) tl in + (match uu___ with + | (hi, lo) -> + append (sort_intrinsic lo) (pivot :: (sort_intrinsic hi))) + +The calls to the lemmas have disappeared. + +Exercises +^^^^^^^^^ + +Generic sorting +............... + +Here's `a file with the scaffolding for this exercise +<../code/exercises/Part1.Quicksort.Generic.fst>`_. + +The point of this exercise is to define a generic version of ``sort`` +that is parameterized by any total order over the list elements, +rather than specializing ``sort`` to work on integer lists only. Of +course, we want to prove our implementations correct. So, let's do it +in two ways, both intrinsically and extrinsically. Your goal is to +remove the all the occurrences of ``admit`` in the development below. + +.. literalinclude:: ../code/exercises/Part1.Quicksort.Generic.fst + :language: fstar + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Quicksort.Generic.fst + :language: fstar + + +Proving that ``sort`` is a permutation +...................................... + +We promised at the beginning of this section that we'd eventually give +a better specification for ``sort``, one that proves that it doesn't +drop duplicate elements in the list. That's the goal of the exercise +in this section—we'll prove that our generic Quicksort is returns a +permutation of the input list. + +Let's start by defining what it means for lists to be permutations of +each other—we'll do this using occurrence counts. + +.. literalinclude:: ../code/exercises/Part1.Quicksort.Permutation.fst + :language: fstar + :start-after: //SNIPPET_START: count permutation + :end-before: //SNIPPET_END: count permutation + +The definitions should be self-explanatory. We include one key lemma +``append_count`` to relate occurrence to list concatenations. + +The next key lemma to prove is ``partition_mem_permutation``. + +.. code-block:: fstar + + val partition_mem_permutation (#a:eqtype) + (f:(a -> bool)) + (l:list a) + : Lemma (let l1, l2 = partition f l in + (forall x. mem x l1 ==> f x) /\ + (forall x. mem x l2 ==> not (f x)) /\ + (is_permutation l (append l1 l2))) + +You will also need a lemma similar to the following: + +.. code-block:: fstar + + val permutation_app_lemma (#a:eqtype) (hd:a) (tl l1 l2:list a) + : Lemma (requires (is_permutation tl (append l1 l2))) + (ensures (is_permutation (hd::tl) (append l1 (hd::l2)))) + +Using these, and adaptations of our previous lemmas, prove: + +.. code-block:: fstar + + val sort_correct (#a:eqtype) (f:total_order_t a) (l:list a) + : Lemma (ensures + sorted f (sort f l) /\ + is_permutation l (sort f l)) + +Load the `exercise script +<../code/exercises/Part1.Quicksort.Permutation.fst>`_ and give it a +try. + + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Quicksort.Permutation.fst + :language: fstar + + + + diff --git a/doc/book/PoP-in-FStar/book/part1/part1_termination.rst b/doc/book/PoP-in-FStar/book/part1/part1_termination.rst new file mode 100644 index 00000000000..02cdf5a5929 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_termination.rst @@ -0,0 +1,431 @@ +.. _Part1_termination: + +Proofs of termination +===================== + +It's absolutely crucial to the soundness of F*'s core logic that all +functions terminate. Otherwise, one could write non-terminating +functions like this:: + + let rec loop (x:unit) : False = loop x + +and show that ``loop () : False``, i.e., we'd have a proof term for +``False`` and the logic would collapse. + +In the previous chapter, we just saw how to define recursive functions +to :ref:`compute the length of list ` and to +:ref:`append two lists `. We also said +:ref:`earlier ` that all functions in F*'s core are +*total*, i.e., they always return in a finite amount of time. So, you +may be wondering, what is it that guarantees that recursive functions +like ``length`` and ``append`` actually terminate on all inputs? + +The full details of how F* ensures termination of all functions in its +core involves several elements, including positivity restrictions on +datatype definitions and universe constraints. However, the main thing +that you'll need to understand at this stage is that F* includes a +termination check that applies to the recursive definitions of total +functions. The check is a semantic check, not a syntactic criterion, +like in some other dependently typed languages. + +We quickly sketch the basic structure of the F\* termination check on +recursive functions---you'll need to understand a bit of this in order +to write more interesting programs. + +A well-founded partial order on terms +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In order to prove a function terminating in F\* one provides a +*measure*: a pure expression depending on the function's +arguments. F\* checks that this measure strictly decreases on each +recursive call. The measure for the arguments of the call is compared +to the measure for the previous call according to a well-founded +partial order on F\* terms. We write `v1 << v2` when `v1` precedes +`v2` in this order. + +.. note:: + + A relation `R` is a well-founded partial order on a set `S` if, and + only if, `R` is a partial order on `S` and there are no infinite + descending chains in `S` related by `R`. For example, taking `S` to + be `nat`, the set of natural numbers, the integer ordering `<` is a + well-founded partial order (in fact, it is a total order). + +Since the measure strictly decreases on each recursive call, and there +are no infinite descending chains, this guarantees that the function +eventually stops making recursive calls, i.e., it terminates. + +.. _Part1_precedes_relation: + +The precedes relation +..................... + +Given two terms ``v1:t1`` and ``v2:t2``, we can prove ``v1 << v2`` +if any of the following are true: + +1. **The ordering on integers**: + + ``t1 = nat`` and ``t2 = nat`` and ``v1 < v2`` + + Negative integers are not related by the `<<` relation, which is + only a _partial_ order. + +2. **The sub-term ordering on inductive types** + + If ``v2 = D u1 ... un``, where ``D`` is a constructor of an + inductive type fully applied to arguments ``u1`` to ``un``, then + ``v1 << v2`` if either + + * ``v1 = ui`` for some ``i``, i.e., ``v1`` is a sub-term of ``v2`` + + * ``v1 = ui x`` for some ``i`` and ``x``, i.e., ``v1`` is the + result of applying a sub-term of ``v2`` to some argument ``x``. + + +.. _Part1_why_length_terminates: + + +Why ``length`` terminates +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Let's look again at the definition of ``length`` and see how F* checks +that it terminates, i.e., + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: length + :end-before: //SNIPPET_END: length + +First off, the definition of ``length`` above makes use of various +syntactic shorthands to hide some details. If we were to write it out +fully, it would be as shown below: + +.. code-block:: fstar + + let rec length #a (l:list a) + : Tot nat (decreases l) + = match l with + | [] -> 0 + | _ :: tl -> 1 + length tl + +The main difference is on the second line. As opposed to just writing +the result type of ``length``, in full detail, we write +``Tot nat (decreases l)``. This states two things + +* The ``Tot nat`` part states that ``length`` is a total function + returning a ``nat``, just as the ``nat`` did before. + +* The additional ``(decreases l)`` specifying a *measure*, i.e., the + quantity that decreases at each recursive call according to the + well-founded relation ``<<``. + +To check the definition, F* gives the recursively bound name +(``length`` in this case) a type that's guarded by the measure. I.e., +for the body of the function, ``length`` has the following type: + +.. code-block:: fstar + + #a:Type -> m:list a{ m << l } -> nat + +This is to say that when using ``length`` to make a recursive call, we +can only apply it to an argument ``m << l``, i.e., the recursive call +can only be made on an argument ``m`` that precedes the current +argument ``l``. This is enough to ensure that the recursive calls will +eventually bottom out, since there are no infinite descending chains +related by ``<<``. + +In the case of ``length``, we need to prove at the recursive call +``length tl`` that ``tl : (m : list a { m << l })``, or, equivalently +that ``tl << l`` is valid. But, from the sub-term ordering on +inductive types, ``l = Cons _ tl``, so ``tl << l`` is indeed provable +and everything checks out. + +.. _Part1_lexicographic_orderings: + +Lexicographic orderings +^^^^^^^^^^^^^^^^^^^^^^^ + +F* also provides a convenience to enhance the well-founded ordering +``<<`` to lexicographic combinations of ``<<``. That is, given two +lists of terms ``v1, ..., vn`` and ``u1, ..., un``, F* accepts that +the following lexicographic ordering:: + + v1 << u1 ‌‌\/ (v1 == u1 /\ (v2 << u2 ‌‌\/ (v2 == u2 /\ ( ... vn << un)))) + +is also well-founded. In fact, it is possible to prove in F* that this +ordering is well-founded, provided ``<<`` is itself well-founded. + +Lexicographic ordering are common enough that F* provides special +support to make it convenient to use them. In particular, the +notation:: + + %[v1; v2; ...; vn] << %[u1; u2; ...; un] + +is shorthand for:: + + v1 << u1 ‌‌\/ (v1 == u1 /\ (v2 << u2 ‌‌\/ (v2 == u2 /\ ( ... vn << un)))) + +Let's have a look at lexicographic orderings at work in proving that +the classic ``ackermann`` function terminates on all inputs. + +.. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: ackermann + :end-before: SNIPPET_END: ackermann + +The ``decreases %[m;n]`` syntax tells F* to use the lexicographic +ordering on the pair of arguments ``m, n`` as the measure to prove +this function terminating. + +When defining ``ackermann m n``, for each recursive call of the form +``ackermann m' n'``, F* checks that ``%[m';n'] << %[m;n]``, i.e., F* +checks that either + +* ``m' << m``, or +* ``m' = m`` and ``n' << n`` + +There are three recursive calls to consider: + +1. ``ackermann (m - 1) 1``: In this case, since we know that ``m > + 0``, we have ``m - 1 << m``, due to the ordering on natural + numbers. Since the ordering is lexicographic, the second argument + is irrelevant for termination. + +2. ``ackermann m (n - 1)``: In this case, the first argument remained + the same (i.e., it's still ``m``), but we know that ``n > 0`` so + ``n - 1 << n`` by the natural number ordering. + +3. ``ackermann (m - 1) (ackermann m (n - 1))``: Again, like in the + first case, the first argument ``m - 1 << m``, and the second is + irrelevant for termination. + +.. _Part1_termination_default_measures: + +Default measures +^^^^^^^^^^^^^^^^ + +As we saw earlier, F* allows you to write the following code, with no +``decreases`` clause, and it still accepts it. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: length + :end-before: //SNIPPET_END: length + +For that matter, you can leave out the ``decreases`` clause in +``ackermann`` and F* is okay with it. + +.. code-block:: fstar + + let rec ackermann (m n:nat) + : nat + = if m=0 then n + 1 + else if n = 0 then ackermann (m - 1) 1 + else ackermann (m - 1) (ackermann m (n - 1)) + +This is because F* uses a simple heuristic to choose the decreases +clause, if the user didn't provide one. + +The *default* decreases clause for a total, recursive function is the +lexicographic ordering of all the non-function-typed arguments, taken +in order from left to right. + +That is, the default decreases clause for ``ackermann`` is exactly +``decreases %[m; n]``; and the default for ``length`` is just +``decreases %[a; l]`` (which is equivalent to ``decreases l``). So, you +needn't write it. + +On the other hand, it you were to flip the order of arguments to +``ackermann``, then the default choice of the measure would not be +correct—so, you'll have to write it explicitly, as shown below. + +.. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: ackermann_flip + :end-before: SNIPPET_END: ackermann_flip + +.. _Part1_mutual_recursion: + +Mutual recursion +^^^^^^^^^^^^^^^^ + +F* also supports mutual recursion and the same check of proving that a +measure of the arguments decreases on each (mutually) recursive call +applies. + +For example, one can write the following code to define a binary +``tree`` that stores an integer at each internal node—the keyword +``and`` allows defining several types that depend mutually on each +other. + +To increment all the integers in the tree, we can write the mutually +recursive functions, again using ``and`` to define ``incr_tree`` and +``incr_node`` to depend mutually on each other. F* is able to prove +that these functions terminate, just by using the default measure as +usual. + +.. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: //SNIPPET_START: incr_tree + :end-before: //SNIPPET_END: incr_tree + +.. note:: + + Sometimes, a little trick with lexicographic orderings can help + prove mutually recursive functions correct. We include it here as a + tip, you can probably skip it on a first read. + + .. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: foo_bar + :end-before: SNIPPET_END: foo_bar + + What's happening here is that when ``foo l`` calls ``bar``, the + argument ``xs`` is legitimately a sub-term of ``l``. However, ``bar + l`` simply calls back ``foo l``, without decreasing the + argument. The reason this terminates, however, is that ``bar`` can + freely call back ``foo``, since ``foo`` will only ever call ``bar`` + again with a smaller argument. You can convince F* of this by + writing the decreases clauses shown, i.e., when ``bar`` calls + ``foo``, ``l`` doesn't change, but the second component of the + lexicographic ordering does decrease, i.e., ``0 << 1``. + + +The termination check, precisely +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Having seen a few examples at work, we can now describe how the +termination check works in general. + + +.. note:: + + We use a slightly more mathematical notation here, so that we can + be precise. If it feels unfamiliar, you needn't understand this + completely at first. Continue with the examples and refer back to + this section, if and when you feel like a precise description would + be helpful. + +When defining a recursive function + +.. math:: + + \mathsf{f~(\overline{x:t})~:~Tot~r~(decreases~m)~=~e} + +i.e., :math:`\mathsf{f}` is a function with several arguments +:math:`\mathsf{x1:t1}, ..., \mathsf{x_n:t_n}`, returning +:math:`\mathsf{r}` with measure :math:`\mathsf{m}`, mutually +recursively with other functions of several arguments at type: + +.. math:: + + \mathsf{f_1~(\overline{x_1:t_1})~:~Tot~r_1~(decreases~m_1)} \\ + \ldots \\ + \mathsf{f_n~(\overline{x_n:t_n})~:~Tot~r_n~(decreases~m_n)} \\ + +we check the definition of the function body of :math:`\mathsf{f}` +(i.e., :math:`\mathsf{e}`) with all the mutually recursive functions +in scope, but at types that restrict their domain, in the following +sense: + +.. math:: + + \mathsf{f~:~(\overline{y:t}\{~m[\overline{y}/\overline{x}]~<<~m~\}~\rightarrow~r[\overline{y}/\overline{x}])} \\ + \mathsf{f_1~:~(\overline{x_1:t_1}\{~m_1~<<~m~\}~\rightarrow~r_1)} \\ + \ldots \\ + \mathsf{f_n~:~(\overline{x_n:t_n}\{~m_n~<<~m~\}~\rightarrow~r_n)} \\ + +That is, each function in the mutually recursive group can only be +applied to arguments that precede the current formal parameters of +:math:`\mathsf{f}` according to the annotated measures of each +function. + +.. _Part1_termination_fibonacci: + +Exercise: Fibonacci in linear time +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`Click here <../code/exercises/Part1.Termination.fst>`_ for the exercise file. + +Here's a function to compute the :math:`n`-th Fibonacci number. + +.. code-block:: fstar + + let rec fibonacci (n:nat) + : nat + = if n <= 1 + then 1 + else fibonacci (n - 1) + fibonacci (n - 2) + +Here's a more efficient, tail-recursive, linear-time variant. + +.. code-block:: fstar + + let rec fib a b n = + match n with + | 0 -> a + | _ -> fib b (a+b) (n-1) + + let fibonacci n = fib 1 1 n + +Add annotations to the functions to get F* to accept them, in +particular, proving that ``fib`` terminates. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: fib + :end-before: SNIPPET_END: fib + + +.. _Part1_termination_reverse: + +Exercise: Tail-recursive reversal +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`Click here <../code/exercises/Part1.Termination.fst>`_ for the exercise file. + +Here is a function to reverse a list: + +.. code-block:: fstar + + let rec rev #a (l:list a) + : list a + = match l with + | [] -> [] + | hd::tl -> append (rev tl) hd + +But, it is not very efficient, since it is not tail recursive and, +worse, it is quadratic, it traverses the reversed tail of the list +each time to add the first element to the end of it. + +This version is more efficient, because it is tail recursive and +linear. + +.. code-block:: fstar + + let rec rev_aux l1 l2 = + match l2 with + | [] -> l1 + | hd :: tl -> rev_aux (hd :: l1) tl + + let rev l = rev_aux [] l + +Add type annotations to ``rev_aux`` and ``rev``, proving, in +particular, that ``rev_aux`` terminates. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part1.Termination.fst + :language: fstar + :start-after: SNIPPET_START: rev + :end-before: SNIPPET_END: rev diff --git a/doc/book/PoP-in-FStar/book/part1/part1_wrap.rst b/doc/book/PoP-in-FStar/book/part1/part1_wrap.rst new file mode 100644 index 00000000000..39c31a73938 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part1/part1_wrap.rst @@ -0,0 +1,35 @@ +.. _Part1_wrap: + +Wrapping up +=========== + +Congratulations! You've reached the end of an introduction to basic F*. + +You should have learned the following main concepts: + +* Basic functional programming +* Using types to write precise specifications +* Writing proofs as total functions +* Defining and working with new inductive types +* Lemmas and proofs by induction + +Throughout, we saw how F*'s use of an SMT solver can reduce the +overhead of producing proofs, and you should know enough now to +be productive in small but non-trivial F* developments. + +However, it would be wrong to conclude that SMT-backed proofs in F* +are all plain sailing. And there's a lot more to F* than SMT +proofs---so read on through the rest of this book. + +But, if you do plan to forge ahead with mainly SMT-backed proofs, you +should keep the following in mind before attempting more challenging +projects. + +It'll serve you well to learn a bit more about how an SMT solver works +and how F* interfaces with it---this is covered in a few upcoming +sections, including a section on :ref:`classical proofs +` and in :ref:`understanding how F* uses Z3 +`. Additionally, if you're interested in doing proofs about +arithmetic, particularly nonlinear arithmetic, before diving in, you +would do well to read more about the F* library ``FStar.Math.Lemmas`` +and F* arithmetic settings. diff --git a/doc/book/PoP-in-FStar/book/part2/part2.rst b/doc/book/PoP-in-FStar/book/part2/part2.rst new file mode 100644 index 00000000000..133cf34a997 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2.rst @@ -0,0 +1,172 @@ +.. _Part2: + +################################################################ +Representing Data, Proofs, and Computations with Inductive Types +################################################################ + + +.. + In this second part of the book, we'll dive deeper into F*, focusing + on *inductive definitions*, the main mechanism in F* for the user to + define new types. + +Earlier, we learned about :ref:`defining new data types ` +in F*. For example, here's the type of lists parameterized by a type +``a`` of the list elements. + +.. code-block:: fstar + + type list a = + | Nil : list a + | Cons : hd:a -> tl:list a -> list a + +We also saw that it was easy to define basic functions over these +types, using pattern matching and recursion. For example, here's +a function to compute the length of a list. + +.. literalinclude:: ../code/Part1.Inductives.fst + :language: fstar + :start-after: //SNIPPET_START: length + :end-before: //SNIPPET_END: length + +The function ``length`` defines some property of a ``list`` (its +length) separately from the definition of the ``list`` type itself. +Sometimes, however, it can be convenient to define a property of a +type together with the type itself. For example, in some situations, +it may be natural to define the length of the list together with the +definition of the list type itself, so that every list is structurally +equipped with a notion of its length. Here's how: + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: vec + :end-before: SNIPPET_END: vec + +What we have here is our first indexed type, ``vec a n``. One way to +understand this definition is that ``vec a : nat -> Type`` describes a +family of types, ``vec a 0``, ``vec a 1``, ... etc., all representing +lists of ``a``-typed elements, but where the *index* ``n`` describes +the length of the list. With this definition of ``vec``, the function +``length`` is redundant: given a ``v : vec a n`` we know that its +``length v`` is ``n``, without having to recompute it. + +This style of enriching a type definition with indexes to state +properties of the type is reminiscent of what we learned earlier about +:ref:`intrinsic versus extrinsic proofs +`. Rather than defining a single type +``list a`` for all lists and then separatately (i.e., extrinsically) +defining a function ``length`` to compute the length of a list, with +``vec`` we've enriched the type of the list intrinsically, so that +type of ``vec`` immediately tells you its length. + +Now, you may have seen examples like this length-indexed ``vec`` type +before---it comes up often in tutorials about dependently typed +programming. But, indexed types can do a lot more. In this section we +learn about indexed inductive types from three related perspectives: + + * Representing data: Inductive types allow us to build new data + types, includes lists, vectors, trees, etc. in several flavors. + We present two case studies: :ref:`vectors ` and + :ref:`Merkle trees `, a binary tree data structure + equipped with cryptographic proofs. + + * Representing proofs: The core logic of F* rests upon several + simple inductive type definitions. We revisit the logical + connectives we've seen before (including the :ref:`propositional + connectives ` and :ref:`equality + `) and show how rather than being primitive + notions in F*, their definitions arise from a few core + constructions involving inductive type. Other core notions in the + language, including the handling of :ref:`termination proofs + `, can also be understood in terms of inductive + types that :ref:`model well-founded recursion + `. + + * Representing computations: Inductive type definitions allow + embedding other programming languages or computational models + within F*. We develop two case studies. + + + We develop a :ref:`deep embedding of the simply-typed lambda + calculus ` with several reduction strategies, and a + proof of its syntactic type soundness. The example showcases the + use of several inductive types to represent the syntax of a + programming language, a relation describing its type system, and + another relation describing its operational semantics. + + + We also show how to use :ref:`higher-order abstract syntax + ` to represent well-typed lambda terms, a concise + style that illustrates how to use inductive types that store + functions. + + + Finally, we look at a :ref:`shallow embedding of an imperative + programming language with structured concurrency `, + representing computations as infinitely branching inductively + defined trees. The example introduces modeling computational + effects as monads and showcases the use of inductive types + at higher order. + +This section is somewhat more advanced than the first. It also +interleaves some technical material about F*'s core logic with case +studies showing some of those core concepts at work. You can certainly +work through the material sequentially, but depending on your +interests, you may find the following paths through the material to be +more accessible. + +If you're familiar with dependent types but are new to F* and want a +quick tour, the following path might work for you: + + * :ref:`Length-indexed lists `, F*-specific notations + + * :ref:`Equality ` + + * :ref:`Logical connectives ` + + * Any of the case studies, depending on your interest. + +If you're unfamiliar with dependent types and are more curious to +learn how to use F* by working through examples, following path might +work for you: + + * :ref:`Inductive type definitions `, basic concepts + + * :ref:`Length-indexed lists `, F*-specific notations in the simplest setting + + * :ref:`Merkle trees `, a more interesting example, with applications to cryptographic security + + * :ref:`Logical connectives `, some utilities to manipulate F*'s logical connectives + + * Any of the case studies, depending on your interest, with the :ref:`Simply Typed Lambda Calculus ` perhaps the easiest of them. + +But, by the end of this section, through several exercises, we expect +the reader to be familiar enough with inductive types to define their +own data structures and inductively defined relations, while also +gaining a working knowledge of some core parts of F*'s type theory. + + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + part2_inductive_type_families + part2_vectors + part2_merkle + part2_equality + part2_logical_connectives + part2_stlc + part2_phoas + part2_well_founded + part2_par + part2_universes + +.. + Vectors for basics + - But vectors are too simple, we can do them with just refined lists + + Merkle trees to capture more interesting invariants of a type + + Higher-order inductive types: infinitely branching trees + - Free monads and computation trees + + Representing proof terms: Simply-typed lambda calculus + + Representing proof terms: Accessibility predicates and termination proofs diff --git a/doc/book/PoP-in-FStar/book/part2/part2_equality.rst b/doc/book/PoP-in-FStar/book/part2/part2_equality.rst new file mode 100644 index 00000000000..60ff65f244a --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_equality.rst @@ -0,0 +1,453 @@ +.. _Part2_equality: + +Equality Types +============== + +In an :ref:`early section ` we learned that F* +supports at least two kinds of equality. In this section, we look in +detail at definitional equality, propositional equality, extensional +equality of functions, and decidable equality. These topics are fairly +technical, but are core features of the language and their treatment +in F* makes essential use of an indexed inductive type, ``equals #t x +y``, a proposition asserting the equality of ``x:t`` and ``y:t``. + +Depending on your level of comfort with functional programming and +dependent types, you may want to skip or just skim this chapter on a +first reading, returning to it for reference if something is unclear. + +Definitional Equality +..................... + +One of the main distinctive feature of a type theory like F* (or Coq, +Lean, Agda etc., and in contrast with foundations like set theory) is +that *computation* is a primitive notion within the theory, such that +lambda terms that are related by reduction are considered +identical. For example, there is no way to distinguish within the +theory between :math:`(\lambda x.x) 0` and :math:`0`, since the former +reduces in a single step of computation to the latter. Terms that are +related by reduction are called *definitionally equal*, and this is +the most primitive notion of equality in the language. Definitional +equality is a congruence, in the sense that within any context +:math:`T[]`, :math:`T[n]` is definitionally equal to :math:`T[m]`, +when :math:`n` and :math:`m` are definitionally equal. + +Since definitionally equal terms are identical, all type theories, +including F*, will implicit allow treating a term ``v:t`` as if it had +type ``t'``, provided ``t`` and ``t'`` are definitionally equal. + +Let's look at a few examples, starting again with our type of +length-indexed vectors. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: vec$ + :end-before: //SNIPPET_END: vec$ + +As the two examples below show a ``v:vec a n`` is also has type ``vec +a m`` when ``n`` and ``m`` are definitionally equal. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: vec_conversions$ + :end-before: //SNIPPET_END: vec_conversions$ + +In the first case, a single step of computation (a function +application, or :math:`\beta`-reduction) suffices; while the second +case requires a :math:`\beta`-reduction followed by a step of integer +arithmetic. In fact, any computational step, including unfolding +defintions, conditionals, fixpoint reduction etc. are all allowed when +deciding if terms are definitionally equivalent---the code below +illustrates how F* implicitly reduces the ``factorial`` function when +deciding if two terms are definitionally equal. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: vec_conversions_fact$ + :end-before: //SNIPPET_END: vec_conversions_fact$ + +Of course, there is nothing particularly special about the ``vec`` +type or its indices. Definitional equality applies everywhere, as +illustrated below. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: conv_int$ + :end-before: //SNIPPET_END: conv_int$ + +Here, when adding ``1`` to ``x``, F* implicitly converts the type of +``x`` to ``int`` by performing a :math:`\beta`-reduction followed by a +case analysis. + +Propositional Equality +...................... + +Definitional equality is so primitive in the language that there is no +way to even state within the terms that two terms are definitional +equal, i.e., there is no way to state within the logic that two terms +are related to each other by reduction. The closest one can get +stating that two terms are equal is through a notion called a +*provable equality* or propositional equality. + +In thinking of propositions as types, we mentioned at the :ref:`very +start of the book `, that one can think of a type ``t`` as a +proposition, or a statement of a theorem, and ``e : t`` as a proof of +the theorem ``t``. So, one might ask, what type corresponds to the +equality proposition and how are proofs of equality represented? + +The listing below shows the definition of an inductive type ``equals +#a x y`` representing the equality proposotion between ``x:a`` and +``y:a`` . Its single constructor ``Reflexivity`` is an equality proof. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: equals$ + :end-before: //SNIPPET_END: equals$ + +Its easy to construct some simple equality proofs. In the second case, +just as with our vector examples, F* accepts ``Reflexivity #_ #6`` as +having type ``equals (factorial 3) 6``, since ``equals 6 6`` is +definitionally equal to ``equals (factorial 3) 6``. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: sample_equals_proofs$ + :end-before: //SNIPPET_END: sample_equals_proofs$ + +Although the only constructor of ``equals`` is ``Reflexivity``, as the +the following code shows, ``equals`` is actually an equivalence +relation, satisfying (in addition to reflexivity) the laws of symmetry +and transitivity. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: equivalence_relation$ + :end-before: //SNIPPET_END: equivalence_relation$ + +This might seem like magic: how is it is that we can derive symmetry +and transitivity from reflexivity alone? The answer lies in how F* +interprets inductive type definitions. + +In particular, given an inductive type definition of type +:math:`T~\overline{p}`, where :math:`\overline{p}` is a list of +parameters and, F* includes an axiom stating that any value :math:`v: +T~\overline{p}` must be an application of one of the constructors of +:math:`T`, :math:`D~\overline{v} : T~\overline{p'}`, such that +:math:`\overline{p} = \overline{p'}`. + +In the case of equality proofs, this allows F* to conclude that every +equality proof is actually an instance of ``Reflexivity``, as shown +below. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: uip_refl$ + :end-before: //SNIPPET_END: uip_refl$ + +Spend a minute looking at the statement above: the return type is a +statement of equality about equality proofs. Write down a version of +``uip_refl`` making all implicit arguments explicit. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: uip_refl_explicit$ + :end-before: //SNIPPET_END: uip_refl_explicit$ + +-------------------------------------------------------------------------------- + +In fact, from ``uip_refl``, a stronger statement showing that all +equality proofs are equal is also provable. The property below is +known as the *uniqueness of identity proofs* (UIP) and is at the core +of what makes F* an extensional type theory. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: uip$ + :end-before: //SNIPPET_END: uip$ + +The F* module ``Prims``, the very first module in every program's +dependence graph, defines the ``equals`` type as shown here. The +provable equality predicate ``(==)`` that we've used in several +examples already is just a squashed equality proof, as shown below. + +.. code-block:: fstar + + let ( == ) #a (x y : a) = squash (equals x y) + +In what follows, we'll mostly use squashed equalities, except where we +wish to emphasize the reflexivity proofs. + +Equality Reflection +................... + +What makes F* an *extensional* type theory (and unlike the +*intensional* type theories implemented by Coq, Lean, Agda, etc.) is a +feature known as equality reflection. Whereas intensional type +theories treat definitional and provable equalities separate, in F* +terms that are provably equal are also considered definitionally +equal. That is, if in a given context ``x == y`` is derivable, the +``x`` is also definitionally equal to ``y``. This has some +wide-reaching consequences. + +Implicit conversions using provable equalities +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Recall from the start of the chapter that ``v:vec a ((fun x -> x) 0)`` +is implicitly convertible to the type ``vec a 0``, since the two types +are related by congruence and reduction. However, as the examples +below show, if ``a == b`` is derivable in the context, then +``v:a`` can be implicity converted to the type ``b``. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: conversion_with_equality_proofs$ + :end-before: //SNIPPET_END: conversion_with_equality_proofs$ + +We do not require a proof of ``a == b`` to be literally bound in the +context. As the example below shows, the hypothesis ``h`` is used in +conjunction with the control flow of the program to prove that in the +``then`` branch ``aa : int`` and in the ``else`` branch ``bb : int``. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: conversion_complex$ + :end-before: //SNIPPET_END: conversion_complex$ + +In fact, with our understanding of equality proofs, we can better +explain how case analysis works in F*. In the code above, the +``then``-branch is typechecked in a context including a hypothesis +``h_then: squash (equals (x > 0) true)``, while the ``else`` branch +includes the hypothesis ``h_else: squash (equals (x > 0) false)``. The +presence of these additional control-flow hypotheses, in conjunction +with whatever else is in the context (in particular hypothesis ``h``) +allows us to derive ``(a == int)`` and ``(b == int)`` in the +respective branches and convert the types of ``aa`` and ``bb`` +accordingly. + +Undecidability and Weak Normalization +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Implicit conversions with provable equalities are very convenient---we +have relied on it without noticing in nearly all our examples so far, +starting from the simplest examples about lists to vectors and Merkle +trees, and some might say this is the one key feature which gives F* +its programming-oriented flavor. + +However, as the previous example hinted, it is, in general, +undecidable to determine if ``a == b`` is derivable in a given +context. In practice, however, through the use of an SMT solver, F* +can often figure out when terms are provably equal and convert using +it. But, it cannot always do this. In such cases, the F* standard +library offers the following primitive (in FStar.Pervasives), which +allows the user to write ``coerce_eq pf x``, to explicitly coerce the +type of ``x`` using the equality proof ``pf``. + +.. code-block:: fstar + + let coerce_eq (#a #b:Type) (_:squash (a == b)) (x:a) : b = x + +Another consequence of equality reflection is the loss of strong +normalization. Intensional type theories enjoy a nice property +ensuring that every term will reduce to a canonical normal form, no +matter the order of evaluation. F* does not have this property, since +some terms, under certain evaluation orders, can reduce +infinitely. However, metatheory developed for F* proves that closed +terms (terms without free variables) in the ``Tot`` effect do not +reduce infinitely, and as a corollary, there are no closed proofs of +``False``. + +F* includes various heuristics to avoid getting stuck in an infinite +loop when reducing open terms, but one can craft examples to make F*'s +reduction macinery loop forever. As such, deciding if possibly open +terms have the same normal form is also undecidable in F*. + +.. _Part2_funext: + +Functional Extensionality +......................... + +Functional extensionality is a principle that asserts the provable +equality of functions that are pointwise equal. That is, for functions +:math:`f` and :math:`g`, :math:`\forall x. f x == g x` implies +:math:`f == g`. + +This principle is provable as a theorem in F*, but only for function +literals, or, equivalently, :math:`\eta`-expanded functions. That is, +the following is a theorem in F*. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: funext_eta$ + :end-before: //SNIPPET_END: funext_eta$ + +.. note:: + + Note, the proof of the theorem makes use of tactics, a topic we'll + cover in a later chapter. You do not need to understand it in + detail, yet. The proof roughly says to descend into every sub-term + of the goal and try to rewrite it using the pointwise equality + hypothesis ``hyp``, and if it fails to just rewrite the sub-term to + itself. + +Unfortunately, functional extensionality does not apply to all +functions. That is, the following is not provable in F* nor is it +sound to assume it as an axiom. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: funext$ + :end-before: //SNIPPET_END: funext$ + +The problem is illustrated by the following counterexample, which +allows deriving ``False`` in a context where ``funext`` is valid. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: funext_false$ + :end-before: //SNIPPET_END: funext_false$ + +The proof works by exploiting the interaction with refinement +subtyping. ``f`` and ``g`` are clearly not pointwise equal on the +entire domain of natural numbers, yet they are pointwise equal on the +positive natural numbers. However, from ``ax #pos f g`` we gain that +``f == g``, and in particular that ``f 0 == g 0``, which is false. + +.. note:: + + The trouble arises in part because although ``ax:funext`` proves + ``squash (equals #(pos -> int) f g)``, F*'s encoding of the + equality to the SMT solver (whose equality is untyped) treats the + equality as ``squash (equals #(nat -> int) f g)``, which leads to + the contradiction. + +Further, :math:`\eta`-equivalent functions in F* are not considered +provably equal. Otherwise, in combination with ``funext_on_eta``, an +:math:`\eta`-equivalence principle leads to the same contradiction as +``funext_false``, as shown below. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: eta_equiv_false$ + :end-before: //SNIPPET_END: eta_equiv_false$ + +The F* standard library module ``FStar.FunctionalExtensionality`` +provides more information and several utilities to work with +functional extensionality on :math:`\eta`-expanded functions. + +Thanks in particular to Aseem Rastogi and Dominique Unruh for many +insights and discussions related to functional extensionality. + +Exercise +........ + +Leibniz equality ``leq x y``, relates two terms ``x:a`` and ``y:a`` if +for all predicates ``p:a -> Type``, ``p a`` implies ``p b``. That is, +if no predicate can distinguish ``x`` and ``y``, the they must be +equal. + +Define Leibniz equality and prove that it is an equivalence relation. + +Then prove that Leibniz equality and the equality predicate ``equals x +y`` defined above are isomorphic, in the sense that ``leq x y -> +equals x y`` and ``equals x y -> leq x y``. + +`Exercise file <../code/exercises/Part2.Leibniz.fst>`_ + +.. container:: toggle + + .. container:: header + + **Hint** + + The section on Leibniz equality `here + `_ tells you how to do it in + Agda. + + .. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: leibniz$ + :end-before: //SNIPPET_END: leibniz$ + +-------------------------------------------------------------------------------- + +.. _Part2_equality_qualifiers: + +Decidable equality and equality qualifiers +.......................................... + +To end this chapter, we discuss a third kind of equality in F*, the +polymorphic *decidable equality* with the signature shown below taken +from the the F* module ``Prims``. + +.. code-block:: fstar + + val ( = ) (#a:eqtype) (x y:a) : bool + +On ``eqtype``, i.e., ``a:Type{hasEq a}``, decidable quality ``(=)`` +and provable equality coincide, as shown below. + +.. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: dec_equals_dec$ + :end-before: //SNIPPET_END: dec_equals_dec$ + +That is, for the class of ``eqtype``, ``x = y`` returns a boolean +value that decides equality. Decidable equality and ``eqtype`` were +first covered in :ref:`an earlier chapter `, where we +mentioned that several primitive types, like ``int`` and ``bool`` all +validate the ``hasEq`` predicate and are, hence, instances of ``eqtype``. + +When introducing a new inductive type definition, F* tries to +determine whether or not the type supports decidable equality based on +a structural equality of the representation of the values of that +type. If so, the type is considered an ``eqtype`` and uses of the ``( += )`` operator are compiled at runtime to structural comparison of +values provided by the target language chosen, e.g., OCaml, F\#, or C. + +The criterion used to determine whether or not the type supports +equality decidable is the following. + +Given an inductive type definition of :math:`T` with parameters +:math:`\overline{p}` and indexes :math:`~\overline{q}`, for each +constructor of :math:`D` with arguments :math:`\overline{v:t_v}`, + +1. Assume, or every type parameter :math:`t \in \overline{p}`, :math:`\mathsf{hasEq}~t`. + +2. Assume, for recursive types, for all :math:`\overline{q}`, :math:`\mathsf{hasEq}~(T~\overline{p}~\overline{q})`. + +3. For all arguments :math:`\overline{v:t_v}`, prove :math:`\mathsf{hasEq}~t_v`. + +If the proof in step 3 suceeds for all constructors, then F* +introduces an axiom +:math:`\forall~\overline{p}~\overline{q}. (\forall t \in \overline{p}. \mathsf{hasEq}~t) \Rightarrow \mathsf{hasEq}~(T~\overline{p}~\overline{q})`. + +If the check in step 3 fails for any constructor, F* reports an error +which the user can address by adding one of two qualifiers to the type. + +1. ``noeq``: This qualifier instructs F* to consider that the type + does not support decidable equality, e.g., if one of the + constructors contains a function, as show below. + + .. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: noeq$ + :end-before: //SNIPPET_END: noeq$ + +2. ``unopteq``: This qualifier instructs F* to determine whether a + given instance of the type supports equality, even when some of its + parameters are not themselves instances of ``eqtype``. This can be + useful in situations such as the following: + + .. literalinclude:: ../code/ProvableEquality.fst + :language: fstar + :start-after: //SNIPPET_START: unopteq$ + :end-before: //SNIPPET_END: unopteq$ + +This `wiki page +`_ +provides more information about equality qualifiers on inductive types. diff --git a/doc/book/PoP-in-FStar/book/part2/part2_inductive_type_families.rst b/doc/book/PoP-in-FStar/book/part2/part2_inductive_type_families.rst new file mode 100644 index 00000000000..6139d7652ab --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_inductive_type_families.rst @@ -0,0 +1,253 @@ +.. _Part2_inductives: + +Inductive type definitions +========================== + +An inductive type definition, sometimes called a *datatype*, has the +following general structure. + +.. math:: + + \mathsf{type}~T_1~\overline{(x_1:p_1)} : \overline{y_1:q_1} \rightarrow \mathsf{Type} = \overline{| D_1 : t_1} \\ + \ldots\qquad\qquad\qquad\qquad\\ + \mathsf{and}~T_n~\overline{(x_n:p_n)} : \overline{y_n:q_n} \rightarrow \mathsf{Type} = \overline{| D_n : t_n} \\ + +This defines :math:`n` mutually inductive types, named :math:`T_1 \ldots +T_n`, called the *type constructors*. Each type constructor :math:`T_i` +has a number of *parameters*, the :math:`\overline{x_i : p_i}`, and a +number of *indexes*, the :math:`\overline{y_i:q_i}`. + +Each type constructor :math:`T_i` has zero or more *data constructors* +:math:`\overline{D_i:t_i}`. For each data constructor :math:`D_{ij}`, its +type :math:`t_{ij}` must be of the form :math:`\overline{z:s} \rightarrow +T_i~\bar{x_i}~\bar{e}`, i.e., it must be a function type returning an +instance of :math:`T_i` with *the same parameters* +:math:`\overline{x_i}` as in the type constructor's signature, but with +any other well-typed terms :math:`\overline{e}` for the index +arguments. This is the main difference between a parameter and an +index—a parameter of a type constructor *cannot* vary in the result +type of the data constructors, while the indexes can. + +Further, in each of the arguments :math:`\overline{z:s}` of the data +constructor, none of the mutually defined type constructors +:math:`\overline{T}` may appear to the left of an arrow. That is, all +occurrences of the type constructors must be *strictly positive*. This +is to ensure that the inductive definitions are well-founded, as +explained below. Without this restriction, it is easy to break +soundness by writing non-terminating functions with ``Tot`` types. + +Also related to ensuring logical consistency is the *universe* level +of an inductive type definition. We'll return to that later, once +we've done a few examples. + +.. _Part2_strict_positivity: + +Strictly positive definitions ++++++++++++++++++++++++++++++ + +As a strawman, consider embedding a small dynamically typed +programming language within F*. All terms in our language have the +same static type ``dyn``, although at runtime values could have +type ``Bool``, or ``Int``, or ``Function``. + +One attempt at representing a language like this using a data type in +F* is as follows: + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: dyn$ + :end-before: //SNIPPET_END: dyn$ + +The three cases of the data type represent our three kinds of runtime +values: ``Bool b``, ``Int b``, and ``Function f``. The ``Function`` +case, however, is problematic: The argument ``f`` is itself a function +from ``dyn -> dyn``, and the constructor ``Function`` allows promoting +a ``dyn -> dyn`` function into the type ``dyn`` itself, e.g., one can +represent the identity function in ``dyn`` as ``Function (fun (x:dyn) +-> x)``. However, the ``Function`` case is problematic: as we will see +below, it allows circular definitions that enable constructing +instances of ``dyn`` without actually providing any base case. F* +rejects the definition of ``dyn``, saying "Inductive type dyn does not +satisfy the strict positivity condition". + +Consider again the general shape of an inductive type definition: + +.. math:: + + \mathsf{type}~T_1~\overline{(x_1:p_1)} : \overline{y_1:q_1} \rightarrow \mathsf{Type} = \overline{| D_1 : t_1} \\ + \ldots\qquad\qquad\qquad\qquad\\ + \mathsf{and}~T_n~\overline{(x_n:p_n)} : \overline{y_n:q_n} \rightarrow \mathsf{Type} = \overline{| D_n : t_n} \\ + +This definition is strictly positive when + + * for every type constructor :math:`T \in T_1, ..., T_n`, + + * and every data constructor :math:`D : t \in \overline{D_1}, + ... \overline{D_n}`, where `t` is of the form + :math:`x0:s_0 \rightarrow ... \rightarrow xn:s_n \rightarrow T_i ...`, + and :math:`s_0, ..., s_n` are the types of the fields of :math:`D` + + * and for all instantiations :math:`\overline{v}` of the type parameters + :math:`\overline{p}` of the type :math:`T`, + + * :math:`T` does not appear to the left of any arrow in any + :math:`s \in (s_0, ..., s_k)[\overline{v}/\overline{p}]`. + +Our type ``dyn`` violates this condition, since the defined typed +``dyn`` appears to the left of an arrow type in the ``dyn -> +dyn``-typed field of the ``Function`` constructor. + +To see what goes wrong if F* were to accept this definition, we can +suppress the error reported by using the option ``__no_positivity`` +and see what happens. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: nopos_dyn$ + :end-before: //SNIPPET_END: nopos_dyn$ + +.. note:: + + F* maintains an internal stack of command line options. The + ``#push-options`` pragma pushes additional options at the top of + the stack, while ``#pop-options`` pops the stack. The pattern used + here instructs F* to typecheck ``dyn`` only with the + ``__no_positivity`` option enabled. As we will see, the + ``__no_positivity`` option can be used to break soundness, so use + it only if you really know what you're doing. + +Now, having declared that ``dyn`` is a well-formed inductive type, +despite not being strictly positive, we can break the soundness of +F*. In particular, we can write terms and claim they are total, when +in fact their execution will loop forever. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: nopos_dyn_loop$ + :end-before: //SNIPPET_END: nopos_dyn_loop$ + +Here, the type of ``loop`` claims that it is a term that always +evaluates in a finite number of steps to a value of type ``dyn``. Yet, +reducing it produces an infinite chain of calls to ``loop' +(Function loop')``. Admitting a non-positive definition like ``dyn`` +has allowed us to build a non-terminating loop. + +Such loops can also allow one to prove ``False`` (breaking soundness), +as the next example shows. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: non_positive$ + :end-before: //SNIPPET_END: non_positive$ + +This example is very similar to ``dyn``, except ``NP`` stores a +non-positive function that returns ``False``, which allows use to +prove ``ff : False``, i.e., in this example, not only does the +violation of strict positivity lead to an infinite loop at runtime, it +also renders the entire proof system of F* useless, since one can +prove ``False``. + +Finally, in the example below, although the type ``also_non_pos`` does +not syntactically appear to the left of an arrow in a field of the +``ANP`` constructor, an instantiation of the type parameter ``f`` +(e.g., with the type ``f_false``) does make it appear to the left of +an arrow---so this type too is deemed not strictly positive, and can be used +to prove ``False``. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: also_non_positive$ + :end-before: //SNIPPET_END: also_non_positive$ + +We hope you are convinced that non-strictly positive types should not +be admissible in inductive type definitions. In what follows, we will +no longer use the ``__no_positivity`` option. In a later section, once +we've introduced the *effect of divergence*, we will see that +non-positive definitions can safely be used in a context where +programs are not expected to terminate, allowing one to safely model +things like the ``dyn`` type, without compromising the soundness of +F*. + +.. _Part2_strictly_positive_annotations: + +Strictly Positive Annotations +----------------------------- + +Sometimes it is useful to parameterize an inductive definition with a +type function, without introducing a non-positive definition as we did +in ``also_non_pos`` above. + +For example, the definition below introduces a type ``free f a``, a +form of a tree whose leaf nodes contain ``a`` values, and whose +internal nodes branch according the type function ``f``. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: free$ + :end-before: //SNIPPET_END: free$ + +We can instantiate this generic ``free`` to produce various kinds of +trees. Note: when instantiating ``free list a`` in +``variable_branching_list`` below, we need to explicitly re-define the +``list`` type with a strict-positivity annotation: F* does not +correctly support rechecking type constructors to prove that they are +strictly positive when they are used at higher order. + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: free_instances$ + :end-before: //SNIPPET_END: free_instances$ + +However, we should only be allowed to instantate ``f`` with type +functions that are strictly positive in their argument, since otherwise +we can build a proof of ``False``, as we did with +``also_non_pos``. The ``@@@strictly_positive`` attribute on the +formal parameter of ``f`` enforces this. + +If we were to try to instantiate ``free`` with a non-strictly positive +type function, + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: free_bad$ + :end-before: //SNIPPET_END: free_bad$ + +then F* raises an error: + +.. code-block:: + + Binder (t: Type) is marked strictly positive, but its use in the definition is not + +Unused Annotations +------------------ + +Sometimes one indexes a type by another type, though the index has no +semantic meaning. For example, in several F* developments that model +mutable state, the a heap reference is just a natural number modeling +its address in the heap. However, one might use the type ``let ref +(a:Type) = nat`` to represent the type of a reference, even though the +type ``a`` is not used in the definition. In such cases, it can be +useful to mark the parameter as unused, to inform F*'s positivity +checker that the type index is actually irrelevant. The snippet below +shows an example: + +.. literalinclude:: ../code/Part2.Positivity.fst + :language: fstar + :start-after: //SNIPPET_START: unused$ + :end-before: //SNIPPET_END: unused$ + +Here, we've marked the parameter of ``ref`` with the ``unused`` +attribute. We've also marked ``ref`` as ``irreducible`` just to +ensure for this example that F* does not silently unfold the +definition of ``ref``. + +Now, knowing that the parameter of ``ref`` is unused, one can define +types like ``linked_list a``, where although ``linked_list a`` appears +as an argument to the ``ref`` type, the positivity checker accepts it, +since the parameter is unused. This is similar to the use of a +``strictly_positive`` annotation on a parameter. + +However, with the ``unused`` attribute, one can go further: e.g., the +type ``neg_unused`` shows that even a negative occurrence of the +defined type is accepted, so long as it appears only as an +instantiation of an unused parameter. diff --git a/doc/book/PoP-in-FStar/book/part2/part2_logical_connectives.rst b/doc/book/PoP-in-FStar/book/part2/part2_logical_connectives.rst new file mode 100644 index 00000000000..78e5fee4746 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_logical_connectives.rst @@ -0,0 +1,623 @@ +.. _Part2_connectives: + +Constructive & Classical Connectives +==================================== + +In :ref:`an earlier chapter `, we learned +about the propositional connectives :math:`\forall, \exists, +\Rightarrow, \iff, \wedge, \vee, \neg`, etc. Whereas in other logical +frameworks these connectives are primitive, in a type theory like F* +these connectives are defined notions, built from inductive type +definitions and function types. In this section, we take a closer look +at these logical connectives, show their definitions, and present some +utilities to manipulate them in proofs. + +Every logical connective comes in two flavors. First, in its most +primitive form, it is defined as an inductive or arrow type, giving a +constructive interpretation to the connective. Second, and more +commonly used in F*, is a *squashed*, or proof-irrelevant, variant of +the same connective---the squashed variant is classical rather than +constructive and its proofs are typically derived by writing partial +proof terms with the SMT filling in the missing parts. + +Each connective has an *introduction* principle (which describes how +to build proofs of that connective) and an *elimination* principle +(which describes how to use a proof of that connective to build other +proofs). Example uses of introduction and elimination principles for +all the connectives can be found in `ClassicalSugar.fst +`_ + +All these types are defined in ``Prims``, the very first module in all +F* programs. + +Falsehood +......... + +The ``empty`` inductive type is the proposition that has no +proofs. The logical consistency of F* depends on there being no closed +terms whose type is ``empty``. + +.. code-block:: fstar + + type empty = + +This definition might look odd at first: it defines an inductive type +with *zero* constructors. This is perfectly legal in F*, unlike in +languages like OCaml or F#. + +The squashed variant of ``empty`` is called ``False`` and is defined +as shown below: + +.. code-block:: fstar + + let False = squash empty + +Introduction +++++++++++++ + +The ``False`` proposition has no introduction form, since it has no proofs. + +Elimination ++++++++++++ + +From a (hypothetical) proof of ``False``, one can build a proof of any +other type. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: empty_elim$ + :end-before: //SNIPPET_END: empty_elim$ + +This body of ``elim_false`` is a ``match`` expression with no branches, +which suffices to match all the zero cases of the ``empty`` type. + +``FStar.Pervasives.false_elim`` provides an analogous elimination rule +for ``False``, as shown below, where the termination check for the +recursive call succeeds trivially in a context with ``x:False``. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: false_elim$ + :end-before: //SNIPPET_END: false_elim$ + +Truth +..... + +The ``trivial`` inductive type has just a single proof, ``T``. + +.. code-block:: + + type trivial = T + +.. note:: + + Although isomorphic to the ``unit`` type with its single element + ``()``, for historic reasons, F* uses the ``trivial`` type to + represent trivial proofs. In the future, it is likely that + ``trivial`` will just be replaced by ``unit``. + +The squashed form of ``trivial`` is written ``True`` and is defined as: + +.. code-block:: + + let True = squash trivial + +Introduction +++++++++++++ + +The introduction forms for both the constructive and squashed variants +are trivial. + +.. code-block:: + + let _ : trivial = T + let _ : True = () + +Elimination ++++++++++++ + +There is no elimination form, since proofs of ``trivial`` are vacuous +and cannot be used to derive any other proofs. + + +Conjunction +........... + +A constructive proof of ``p`` and ``q`` is just a pair containing +proofs of ``p`` and ``q``, respectively. + +.. code-block:: + + type pair (p q:Type) = | Pair : _1:p -> _2:q -> pair p q + +.. note:: + + This type is isomorphic to the tuple type ``p & q`` that we + encountered previously :ref:`here `. F* currently + uses a separate type for pairs used in proofs and those used to + pair data, though there is no fundamental reason for this. In the + future, it is likely that ``pair`` will just be replaced by the + regular tuple type. + +The squashed form of conjunction is written ``/\`` and is defined as +follows: + +.. code-block:: + + let ( /\ ) (p q:Type) = squash (pair p q) + +Introduction +++++++++++++ + +Introducing a conjunction simply involves constructing a pair. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: and_intro$ + :end-before: //SNIPPET_END: and_intro$ + +To introduce the squashed version, there are two options. One can +either rely entirely on the SMT solver to discover a proof of ``p /\ +q`` from proofs of ``p`` and ``q``, which it is usually very capable +of doing. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: conj_intro$ + :end-before: //SNIPPET_END: conj_intro$ + +Or, if one needs finer control, F* offers specialized syntax +(defined in ``FStar.Classical.Sugar``) to manipulate each of the +non-trivial logical connectives, as shown below. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: conj_intro_sugar$ + :end-before: //SNIPPET_END: conj_intro_sugar$ + +The sugared introduction form for conjunction is, in general, as +follows: + +.. code-block:: fstar + + introduce p /\ q //Term whose top-level connective is /\ + with proof_of_p //proof_of_p : squash p + and proof_of_q //proof_of_q : squash q + +Elimination ++++++++++++ + +Eliminating a conjunction comes in two forms, corresponding to +projecting each component of the pair. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: and_elim$ + :end-before: //SNIPPET_END: and_elim$ + +For the squashed version, we again have two styles, the first relying +on the SMT solver. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: conj_elim$ + :end-before: //SNIPPET_END: conj_elim$ + +And a style using syntactic sugar: + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: conj_elim_sugar$ + :end-before: //SNIPPET_END: conj_elim_sugar$ + +Disjunction +........... + +A constructive proof of ``p`` or ``q`` is represented by the following +inductive type: + +.. code-block:: fstar + + type sum (p q:Type) = + | Left : p -> sum p q + | Right : q -> sum p q + +The constructors ``Left`` and ``Right`` inject proofs of ``p`` or +``q`` into a proof of ``sum p q``. + +.. note:: + + Just like before, this type is isomorphic to the type ``either p q`` + from ``FStar.Pervasives``. + +The classical connective ``\/`` described previously is just a +squashed version of ``sum``. + +.. code-block:: fstar + + let ( \/ ) (p q: Type) = squash (sum p q) + +Introduction +++++++++++++ + +As with the other connectives, introducing a constructive disjunction +is just a matter of using the ``Left`` or ``Right`` constructor. + +To introduce the squashed version ``\/``, one can either rely on the +SMT solver, as shown below. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: or_intro$ + :end-before: //SNIPPET_END: or_intro$ + +Or, using the following syntactic sugar, one can specifically provide +a proof for either the ``Left`` or ``Right`` disjunct. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: or_intro_sugar$ + :end-before: //SNIPPET_END: or_intro_sugar$ + +Elimination ++++++++++++ + +Eliminating a disjunction requires a *motive*, a goal proposition to +be derived from a proof of ``sum p q`` or ``p \/ q``. + +In constructive style, eliminating ``sum p q`` amounts to just +pattern matching on the cases and constructing a proof of the goal +by applying a suitable goal-producing hypothesis. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: sum_elim$ + :end-before: //SNIPPET_END: sum_elim$ + +The squashed version is similar, except the case analysis can either +be automated by SMT or explicitly handled using the syntactic +sugar. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: or_elim$ + :end-before: //SNIPPET_END: or_elim$ + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: or_elim_sugar$ + :end-before: //SNIPPET_END: or_elim_sugar$ + +Implication +........... + +One of the elimination principles for disjunction used the implication +connective ``==>``. Its definition is shown below: + +.. code-block:: fstar + + let ( ==> ) (p q : Type) = squash (p -> q) + +That is, ``==>`` is just the squashed version of the non-dependent +arrow type ``->``. + +.. note:: + + In ``Prims``, the definition of ``p ==> q`` is actually ``squash (p + -> GTot q)``, a **ghost** function from ``p`` to ``q``. We'll learn + about this more when we encounter effects. + +Introduction +++++++++++++ + +Introducing a constructive arrow ``p -> q`` just involves constructing +a :math:`\lambda`-literal of the appropriate type. + +One can turn several kinds of arrows into implications, as shown below. + +One option is to directly use a function from the ``FStar.Classical`` +library, as shown below: + +.. code-block:: fstar + + val impl_intro_tot (#p #q: Type) (f: (p -> q)) : (p ==> q) + +However, this form is seldom used in F*. Instead, one often works with +functions between squashed propositions, or Lemmas, turning them into +implications when needed. We show a few styles below. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: implies_intro$ + :end-before: //SNIPPET_END: implies_intro$ + +Unlike the other connectives, there is no fully automated SMT-enabled +way to turn an arrow type into an implication. Of course, the form +shown above remains just sugar: it may be instructive to look at its +desugaring, shown below. + +.. code-block:: fstar + + let implies_intro_1 (#p #q:Type) (pq: (squash p -> squash q)) + : squash (p ==> q) + = FStar.Classical.Sugar.implies_intro + p + (fun (_: squash p) -> q) + (fun (pf_p: squash p) -> pq pf_p) + +``FStar.Squash`` and ``FStar.Classical`` provide the basic building +blocks and the sugar packages it into a more convenient form for use. + +Elimination ++++++++++++ + +Of course, the elimination form for a constructive implication, i.e., +``p -> q`` is just function application. + +.. code-block:: fstar + + let arrow_elim #p #q (f:p -> q) (x:p) : q = f x + +The elimination rule for the squashed form is the classical logical +rule *modus ponens*, which is usually very well automated by SMT, as +shown in ``implies_elim`` below. We also provide syntactic sugar for +it, for completeness, though it is seldom used in practice. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: implies_elim$ + :end-before: //SNIPPET_END: implies_elim$ + +Negation +........ + +Negation is just a special case of implication. + +In its constructive form, it corresponds to ``p -> empty``. + +In ``Prims``, we define ``~p`` as ``p ==> False``. + +Being just an abbreviation for an implication to ``False``, negation +has no particular introduction or elimination forms of its +own. However, the following forms are easily derivable. + +Introduction (Exercise) ++++++++++++++++++++++++ + +Prove the following introduction rule for negation: + +`Exercise file <../code/exercises/Part2.Connectives.Negation.fst>`__ + +.. code-block:: fstar + + val neg_intro #p (f:squash p -> squash False) + : squash (~p) + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: neg_intro$ + :end-before: //SNIPPET_END: neg_intro$ + +-------------------------------------------------------------------------------- + + +Elimination (Exercise) ++++++++++++++++++++++++ + +Prove the following elimination rule for negation using the sugar +rather than just SMT only. + +.. code-block:: fstar + + val neg_elim #p #q (f:squash (~p)) (x:unit -> Lemma p) + : squash (~q) + +`Exercise file <../code/exercises/Part2.Connectives.Negation.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: neg_elim$ + :end-before: //SNIPPET_END: neg_elim$ + +-------------------------------------------------------------------------------- + +Universal Quantification +........................ + +Whereas implication is represented by the non-dependent arrow ``p -> +q``, universal quantification corresponds to the dependent arrow ``x:t +-> q x``. Its classical form in ``forall (x:t). q x``, and is defined +in as shown below: + +.. code-block:: fstar + + let ( forall ) #t (q:t -> Type) = squash (x:t -> q x) + +.. note:: + + As with ``==>``, in ``Prims`` uses ``x:t -> GTot (q x)``, a ghost + arrow, though the difference is not yet significant. + +Introduction +++++++++++++ + +Introducing a dependent function type ``x:t -> p x`` is just like +introducing a non-dependent one: use a lambda literal. + +For the squashed form, F* provides sugar for use with several styles, +where names corresponding to each of the ``forall``-bound variables on +the ``introduce`` line are in scope for the proof term on the ``with`` +line. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: forall_intro$ + :end-before: //SNIPPET_END: forall_intro$ + +Note, as ``forall_intro_3`` shows, the sugar also works for ``forall`` +quantifiers of arities greater than 1. + +Elimination ++++++++++++ + +Eliminating a dependent function corresponds to dependent function +application. + +.. code-block:: fstar + + let dep_arrow_elim #t #q (f:(x:t -> q x)) (x:t) : q x = f x + +For the squashed version, eliminating a ``forall`` quantifier amounts +to instantiating the quantifier for a given term. Automating proofs +that require quantifier instantiation is a large topic in its own +right, as we'll cover in a later section---this `wiki page +`_ +provides some hints. + +Often, eliminating a universal quantifier is automated by the SMT +solver, as shown below, where the SMT solver easily instantiates the +quantified hypothesis ``f`` with ``a``. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: forall_elim_1$ + :end-before: //SNIPPET_END: forall_elim_1$ + +But, F* also provides syntactic sugar to explicitly trigger quantifier +insantiation (as shown below), where the terms provided on the +``with`` line are instantiations for each of the binders on the +``eliminate`` line. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: forall_elim_sugar$ + :end-before: //SNIPPET_END: forall_elim_sugar$ + +Its desugaring may be illuminating: + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: forall_elim_2_desugar$ + :end-before: //SNIPPET_END: forall_elim_2_desugar$ + +.. _Part2_connectives_exists: + +Existential Quantification +.......................... + +Finally, we come to existential quantification. Its constructive form +is a dependent pair, a dependent version of the pair used to represent +conjunctions. The following inductive type is defined in ``Prims``. + +.. code-block:: fstar + + type dtuple2 (a:Type) (b: a -> Type) = + | Mkdtuple2 : x:a -> y:b x -> dtuple2 a b + +As with ``tuple2``, F* offers specialized syntax for ``dtuple2``: + + * Instead of ``dtuple2 a (fun (x:a) -> b x)``, one writes ``x:a & b x``. + + * Instead of writing ``Mkdtuple2 x y``, one writes ``(| x, y |)``. + +The existential quantifier ``exists (x:t). p x`` is a squashed version +of the dependent pair: + +.. code-block:: fstar + + let ( exists ) (#a:Type) (#b:a -> Type) = squash (x:a & b x) + +Introduction +++++++++++++ + +Introducing a constructive proof of ``x:a & b x`` is just a question +of using the constructor---we show a concrete instance below. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: dtuple2_intro$ + :end-before: //SNIPPET_END: dtuple2_intro$ + +For the squashed version, introducing an ``exists (x:t). p x`` +automatically using the SMT solver requires finding an instance ``a`` +for the quantifier such that ``p a`` is derivable---this is the dual +problem of quantifier instantiation mentioned with universal + +In the first example below, the SMT solver finds the instantiation and +proof automatically, while in the latter two, the user picks which +instantiation and proof to provide. + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: exists_intro$ + :end-before: //SNIPPET_END: exists_intro$ + +Elimination ++++++++++++ + +Just as with disjunction and conjunction, eliminating ``dtuple2`` or +``exists`` requires a motive, a goal proposition that *does not +mention* the bound variable of the quantifier. + +For constructive proofs, this is just a pattern match: + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: dtuple2_elim$ + :end-before: //SNIPPET_END: dtuple2_elim$ + +For the ``exists``, the following sugar provides an elimination +principle: + +.. literalinclude:: ../code/Connectives.fst + :language: fstar + :start-after: //SNIPPET_START: exists_elim$ + :end-before: //SNIPPET_END: exists_elim$ + +Names corresponding to the binders on the ``eliminate`` line are in +scope in the ``with`` line, which additionally binds a name for a +proof term corresponding to the body of the existential formula. That +is, in the examples above, ``x:t`` is implicitly in scope for the proof +term, while ``pf_p: squash p``. + +Exercise +++++++++ + +In a :ref:`previous exercise `, we defined a +function to insert an element in a Merkle tree and had it return a new +root hash and an updated Merkle tree. Our solution had the following +signature: + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: update_hint + :end-before: //SNIPPET_END: update_hint + +Revise the solution so that it instead returns a dependent +pair. ``dtuple2`` is already defined in ``Prims``, so you don't have +to define it again. + +`Exercise file <../code/exercises/Part2.MerkleTreeUpdate.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: update + :end-before: //SNIPPET_END: update diff --git a/doc/book/PoP-in-FStar/book/part2/part2_merkle.rst b/doc/book/PoP-in-FStar/book/part2/part2_merkle.rst new file mode 100644 index 00000000000..c41b6becd27 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_merkle.rst @@ -0,0 +1,385 @@ +.. _Part2_merkle: + +Merkle Trees +============ + +A `Merkle tree `_ is a +cryptographic data structure designed by `Ralph Merkle +`_ in the late 1970s and +has grown dramatically in prominence in the last few years, inasmuch +as variants of Merkle trees are at the core of most `blockchain +systems `_. + +A Merkle tree makes use of cryptographic hashes to enable efficient +cryptographic proofs of the authenticity of data stored in the +tree. In particular, for a Merkle tree containing :math:`2^n` data +items, it only takes :math:`n` hash computations to prove that a +particular item is in the tree. + +In this section, we build a very simple, but canonical, Merkle tree +and prove it correct and cryptographically secure. And we'll use +several indexed inductive types to do it. Thanks to Aseem Rastogi for +this example! + +Setting +....... + +Merkle trees have many applications. To motivate our presentation +here, consider the following simple scenario. + +A content provider (someone like, say, the New York Times) has a large +archive of digital artifacts---documents, multimedia files, etc. These +artifacts are circulated among users, but when receiving an artifact +one may question its authenticity. One way to ensure the authenticity +of received artifacts is for the content provider to use a digital +signature based on a public-key cryptosystem and for users to verify +these signatures upon receiving an artifact. However, signatures can +be quite heavyweight for certain applications. + +Instead, the content provider can organize their archive into a Merkle +tree, a tree of hashes with the artifacts themselves stored at the +leaves, such that a single hash associated with the root node of the +tree authenticates *all* the artifacts in the tree. By publishing just +this root hash, and associating with each artifact a path in the tree +from the root to it, a skeptical client can quickly check using a +small number of hash computations (logarithmic in the size of the +entire archive) whether or not a given artifact is authentic (by +recomputing the root hash and checking if it matches the known +published root hash). + + +Intuitions +.......... + +Our Merkle tree will be a full binary tree of height :math:`n` storing +:math:`2^n` data items and their corresponding hashes at the +nodes. The main idea of a Merkle tree is for each internal node to +also maintain a *hash of the hashes* stored at each of its +children. If the hash algorithm being used is cryptographically +secure, in the sense that it is collision resistant (i.e., it is +computationally hard to find two strings that hash to the same value), +then the hash associated with the root node authenticates the content +of the entire tree. + +Informally, a Merkle tree is an authenticated data structure in that +it is computationally hard to tamper with any of the data items in the +tree while still producing the same root hash. Further, to prove that +a particular data item ``d`` is in the tree, it suffices to provide +the hashes associated with the nodes in the path from the root to that +the leaf containing that item ``d``, and one can easily check by +comparing hashes that the claimed path is accurate. In fact, we can +prove that if a claimed path through the tree attests to the presence +of some other item ``d' <> d``, then we can construct a collision on +the underlying hash algorithm---this property will be our main proof of +security. + + +Preliminaries +............. + +We'll model the resources and the hashes we store in our tree as +strings of characters. F* standard library ``FStar.String`` provides +some utilities to work with strings. + +In the code listing below, we define the following + + * ``lstring n``, the type of strings of length ``n``. Like the + ``vec`` type, ``lstring`` is a length-indexed type; unlike + ``vector`` it is defined using a refinement type rather than an + indexed inductive type. Defining indexed types using refinements + is quite common in F*. + + * ``concat``, a utility to concatenate strings, with its type + proving that the resulting string's length is the sum of the lengths + of the input strings. + + * ``hash_size`` and ``hash``, a parameter of our development + describing the length in characters of a ``hash`` function. The F* + keyword ``assume`` allows you to assume the existence of a symbol + at a given type. Use it with care, since you can trivially prove + anything by including an ``assume nonsense : False``. + + * The type of resources we store in the tree will just be + ``resource``, an alias for ``string``. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: preliminaries + :end-before: //SNIPPET_END: preliminaries + + +Defining the Merkle tree +........................ + +The inductive type ``mtree`` below defines our Merkle tree. The type +has *two* indices, such that ``mtree n h`` is the type of a Merkle +tree of height ``n`` whose root node is associated with the hash +``h``. + +Leaves are trees of height ``0`` and are constructed using ``L res``, +where the hash associated with this node is just ``hash res``, the +hash of the resource stored at the leaf. + +Internal nodes of the tree are constructed using ``N left right``, +where both the ``left`` and ``right`` trees have the same height +``n``, producing a tree of height ``n + 1``. More interestingly, the +hash associated with ``N left right`` is ``hash (concat hl hr)``, the +hash of the concatenation of hashes of the left and right subtrees. + + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: mtree + :end-before: //SNIPPET_END: mtree + +In our previous examples like vectors, the index of the type +abstracts, or summarizes, some property of the type, e.g., the +length. This is also the case with ``mtree``, where the first index is +an abstraction summarizing only the height of the tree; the second +index, being a cryptographic hash, summarizes the entire contents of +the tree. + + +Accessing an element in the tree +................................ + +A resource identifier ``resource_id`` is a path in the tree from the +root to the leaf storing that resource. A path is just a list of +booleans describing whether to descend left or right from a node. + +Just like a regular binary tree, it's easy to access an element in the +tree by specifying its ``resource_id``. + + +Exercise +^^^^^^^^ + +Implement a function to access an element in a ``mtree`` in given a +``rid : list bool``. Figuring out its type, including its decreases +clause, is the most interesting part. The function itself is +straightforward. + +`Exercise file <../code/exercises/Part2.MerkleTreeGet.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: get + :end-before: //SNIPPET_END: get + +-------------------------------------------------------------------------------- + +The Prover +.......... + +Unlike the ordinary ``get`` function, we can define a function +``get_with_evidence`` that retrieves a resource from the tree together +with some evidence that that resource really is present in the tree. +The evidence contains the resource identifier and the hashes of +sibling nodes along the path from root to that item. + +First, we define ``resource_with_evidence n``, an indexed type that +packages a ``res:resource`` with its ``rid:resource_id`` and +``hashes:list hash_t``---both ``rid`` and ``hashes`` have the same +length, which is the index of the constructed type. + +The function ``get_with_evidence`` is similar to ``get``, except as it +returns from descending into a child node, it adds the hash of the +other child node to the list of hashes. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: prover + :end-before: //SNIPPET_END: prover + +In the cryptographic literature, this function is sometimes called +*the prover*. A ``RES r ri hs`` is a claimed proof of the membership +of ``r`` in the tree at the location specified by ``ri``. + +Going back to our motivating scenario, artifacts distributed by our +content provider would be elements of the type +``resource_with_evidence n``, enabling clients to verify that a given +artifact is authentic, as shown next. + +The Verifier +............ + +Our next step is to build a checker of claimed proofs, sometimes +called *a verifier*. The function ``verify`` below takes a +``p:resource_with_evidence n``, re-computes the root hash from the +evidence presented, and checks that that hash matches the root hash of +a given Merkle tree. Note, the ``tree`` itself is irrelevant: all +that's needed to verify the evidence is *the root hash* of the Merkle +tree. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: verify + :end-before: //SNIPPET_END: verify + +The main work is done by ``compute_root_hash``, shown below. + + * In the first branch, we simply hash the resource itself. + + * In the second branch, we recompute the hash from the tail of the + path, and then based on which direction was taken, we either + concatenate sibling hash on the left or the right, and hash the + result. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: compute_root_hash + :end-before: //SNIPPET_END: compute_root_hash + + +Convince yourself of why this is type-correct---refer back to the +description of :ref:`vectors`, if needed. For example, +why is it safe to call ``L.hd`` to access the first element of +``hashes``? + + +Correctness +........... + +Now, we can prove our main correctness theorem, namely that +``get_with_evidence`` returns a resource with verifiable +evidence. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: correctness + :end-before: //SNIPPET_END: correctness + +The proof is a simple proof by induction on the height of the tree, or +equivalently, the length of the resource id. + +In other words, evidence constructed by a honest prover is accepted by +our verifier. + +Security +........ + +The main security theorem associated with this construction is the +following: if the verifier can be convinced to accept a resource with +evidence of the form ``RES r rid hs``, and if the resource in the +Merkle tree associated with ``rid`` is *not* ``r``, then we can easily +construct a collision on the underlying cryptographic hash. Since the +hash is meant to be collision resistant, one should conclude that it +is at least as hard to convince our verifier to accept incorrect +evidence as it is to find collisions on the underlying hash. + +We start by defining the type of a ``hash_collision``, a pair of +distinct strings that hash to the same value. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: hash_collision + :end-before: //SNIPPET_END: hash_collision + +The ``security`` theorem shown below takes a ``tree`` and +``p:resource_with_evidence n``, where the refinement on ``p`` states +that the verifier accepts the evidence (``verify p tree``) although +the resource associated with ``p.ri`` is not ``p.res``: in this case, +we can build a function, by induction on the height of the tree, that +returns a hash collision. + +.. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: security + :end-before: //SNIPPET_END: security + +We look at its cases in detail: + + * In the base case, it's easy to construct a hash collision + directly from the differing resources. + + * Otherwise, we recompute the hash associate with the current node + from the tail of the evidence presented, and the two cases of the + left and right subtrees are symmetric. + + - If the recomputed hash matches the hash of the node, then we + can generate a collision just by the induction hypothesis on + the left or right subtree. + + - Otherwise, we can build a hash collision, relying on + ``String.concat_injective``, a lemma from the library stating + that the concatenation of two pairs of equal length strings are + equal only if their components are. Knowing that ``h' <> h1`` + (or, symmetically, ``h' <> h2``) this allows us to prove that + the concatenations are unequal, although their hashes are, by + assumption, equal. + +.. _Part2_merkle_insert: + +Exercise +........ + +Implement a function to update an ``mtree`` at a given +``rid:resource_id`` with a new resource ``res:resource``. The +resulting tree will have a new root hash, so you will have to return +the new hash along with the updated tree. + +`Exercise file <../code/exercises/Part2.MerkleTreeUpdate_V0.fst>`__ + +.. container:: toggle + + .. container:: header + + **Hint** + + One type of the ``update`` function could be as follows: + + .. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: update_hint + :end-before: //SNIPPET_END: update_hint + +-------------------------------------------------------------------------------- + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/MerkleTree.fst + :language: fstar + :start-after: //SNIPPET_START: update_mtree' + :end-before: //SNIPPET_END: update_mtree' + + One interesting part of our solution is that we never explicitly + construct the hash of the nodes. Instead, we just use ``_`` and + let F* infer the calls to the hash functions. + +---------------------------------------------------------------------------------------------------------------------------------------------------------------- + + +Summary and Further Reading +........................... + +In summary, we've built a simple but powerful authenticated data +structure with a proof of its correctness and cryptographic security. + +In practice, Merkle trees can be much more sophisticated than our the +most basic one shown here. For instance, they can support incremental +updates, contain optimizations for different kinds of workloads, +including sparse trees, and be implemented using high-performance, +mutable structures. + +You can read more about various flavors of Merkle trees implemented in +F* in the following papers. + +* `EverCrypt, Section VII (B) + `_, + describes a high-performance Merkle tree with fast incremental + updates. + +* `FastVer `_ + describes the design and use of hybrid authenticated data strutures, + including sparse Merkle trees, for applications such as verificable + key-value stores. diff --git a/doc/book/PoP-in-FStar/book/part2/part2_par.rst b/doc/book/PoP-in-FStar/book/part2/part2_par.rst new file mode 100644 index 00000000000..1c952ba18ce --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_par.rst @@ -0,0 +1,686 @@ +.. _Part2_par: + +A First Model of Computational Effects +====================================== + +As a final chapter in this section, we show how inductive types can be +used model not just data, but also *computations*, including +computations with side effects, like mutable state and shared-memory +concurrency. This is meant to also give a taste of the next section in +this book, which deals with modeling and proving properties of +programs with effects and F*'s user-extensible system of indexed +effects. + +Thanks to Guido Martinez and Danel Ahman, for some of the content in +this chapter. + +A First Taste: The State Monad +++++++++++++++++++++++++++++++ + +All the programs we've written so far have been purely +functional. However, one can model programs that manipulate mutable +state within a purely functional language, and one common but powerful +way to do this is with something called a *monad*, an idea that was +introduced to functional programmers in the late 1980s and early 90s +by `Philip Wadler `_ +building on semantic foundations developed by `Eugenio Moggi +`_. If you've been +puzzled about monads before, we'll start from scratch here, and +hopefully this time it will all make sense! + +Consider modeling a program that manipulates a single piece of mutable +state, just a single integer that programs can read and write, and +which returns a result of type ``a``. One way to do this is to +represent such a *stateful computation* as a program whose type is +``st a``: + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: st$ + :end-before: //SNIPPET_END: st$ + +A ``(st a)`` computation is a function which when given an initial +value for the state ``s0`` returns a pair ``(x, s1)`` with the result +of the computation ``x:a`` and a final value for the state ``s1``. + +For example, a computation that reads the state, increments it, and +returns the initial value of the state, can be expressed as shown +below. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: read_and_increment_v0$ + :end-before: //SNIPPET_END: read_and_increment_v0$ + +This is pretty straightforward, but writing computations in this style +can be quite tedious and error prone. For example, if you wanted to +read the state and increment it twice, one would write: + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: inc_twice_v0$ + :end-before: //SNIPPET_END: inc_twice_v0$ + +This is quite clumsy, since at each call to ``read_and_increment_v0`` +we had to be careful to pass it the "the most recent version" of the +state. For instance, a small typo could easily have caused us to write +the program below, where we pass ``s0`` to the second call of +``read_and_increment``, causing the program to only increment the +state once. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: inc_twice_buggy$ + :end-before: //SNIPPET_END: inc_twice_buggy$ + +The main idea with the state monad is to structure stateful programs +by abstracting out all the plumbing related to manipulating the state, +eliminating some of the tedium and possibilities for errors. + +The way this works is by defining a functions to read and write the +state, plus a couple of functions to return a pure value without +reading or writing the state (a kind of an identity function that's a +noop on the state); and a function to sequentially compose a pair of +stateful computations. + +* The function ``read : st int`` below, reads the state and returns it, + without modifying the state. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: read$ + :end-before: //SNIPPET_END: read$ + +* The function ``write (s1:int) : st unit`` below, sets the state to ``s1`` and + returns ``() : unit``. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: write$ + :end-before: //SNIPPET_END: write$ + +* The function ``bind`` is perhaps the most interesting. Given a + stateful computation ``f: st a`` and another computation ``g : a -> + st b`` which depends on the result of ``f`` and then may read or + write the state returning a ``b``; ``bind f g`` composes ``f`` and + ``g`` sequentially, passing the initial state ``s0`` to ``f``, then + passing its result ``x`` and next state ``s1`` to ``g``. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: bind$ + :end-before: //SNIPPET_END: bind$ + +* Finally, ``return`` promotes a pure value ``x:a`` into an ``st a``, + without touching the state. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: return$ + :end-before: //SNIPPET_END: return$ + +Some stateful programs +---------------------- + +With these combinators in hand, we can write stateful programs in a +more compact style, never directly manipulating the underlying integer +variable that holds the state directly. + +Here's a next attempt at ``read_and_increment``: + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: read_and_increment_v1$ + :end-before: //SNIPPET_END: read_and_increment_v1$ + +Now, you're probably thinking that this version is even worse than +``read_and_increment_v0``! But, the program looks obscure "just" +because it's using a convoluted syntax to call ``bind``. Many +languages, most famously Haskell, provide specialized syntax to +simplify writing computations that work with APIs like ``bind`` and +``return``. F* provides some syntactic sugar to handle this too. + +Monadic let bindings +++++++++++++++++++++ + +The definition below defines a function with a special name +``let!``. Names of this form, the token ``let`` followed by a sequence +of one or more operator characters such as ``!``, ``?``, ``@``, ``$``, +``<``, and ``>`` are monadic let-binding operators. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: let!$ + :end-before: //SNIPPET_END: let!$ + +With ``let!`` in scope, the following syntactic sugar becomes available: + +* Instead of writing ``bind f (fun x -> e)`` you can write ``let! x = f in e``. + +* Instead of writing ``bind f (fun _ -> e)`` you can write ``f ;! + e``, i.e., a semicolon followed the sequence of operator characters + uses in the monadic let-binding operator. + +* Instead of writing ``bind f (fun x -> match x with ...)``, you can + write ``match! f with ...`` + +* Instead of writing ``bind f (fun x -> if x then ...)``, you can + write ``if! f then ...`` + +See this file `MonadicLetBindings.fst +`_ +for more details an examples of the syntactic sugar. + +Using this syntactic sugar, we come to our final version of +``read_and_increment``, where now, hopefully, the imperative-looking +state updates make the intent of our program clear. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: read_and_increment$ + :end-before: //SNIPPET_END: read_and_increment$ + +Having structured our programs with ``return`` and ``bind``, larger +``st`` computations can be built from smaller ones, without having to +worry about how to plumb the state through---that's handled once and +for all by our combinators. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: inc_twice$ + :end-before: //SNIPPET_END: inc_twice$ + +.. _Part2_monad_intro: + +``st`` is a monad +----------------- + +It turns out that every API that is structured like our ``st`` is an +instance of a general pattern called a *monad*, an algebraic +structure. Specifically, a monad consists of: + + * A type operator ``m : Type -> Type`` + * A function ``return (#a:Type) (x:a) : m a`` + * A function ``bind (#a #b:Type) (f:m a) (g: a -> m b) : m b`` + +which satisfy the following laws, where `~` is some suitable +equivalence relation on ``m a`` + + * Left identity: ``bind (return x) f ~ f`` + * Right identity: ``bind f return ~ f`` + * Associativity: ``bind f1 (fun x -> bind (f2 x) f3) ~ bind (bind f1 f2) f3`` + +Its easy to prove that ``st``, ``return``, and ``bind`` satisfy these +laws in F*, where we pick the equivalence relation to equate functions +that take equal arguments to equal results. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: monad_laws$ + :end-before: //SNIPPET_END: monad_laws$ + +These laws are practically useful in that they can catch bugs in our +implementations of the combinators. For instance, suppose we were to +write ``bind_buggy`` below, which like in ``inc_twice_buggy``, +mistakenly re-uses the old state ``s0`` when calling ``g``---in this +case, the ``right_identity`` law below cannot be proved. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: bind_buggy$ + :end-before: //SNIPPET_END: bind_buggy$ + +We can also prove laws about how the stateful actions, ``read`` and +``write``, interact with each other in sequential composition. + +.. literalinclude:: ../code/Part2.STInt.fst + :language: fstar + :start-after: //SNIPPET_START: action_laws$ + :end-before: //SNIPPET_END: action_laws$ + +That completes our tour of our very first monad, the state monad. + +Exercise +-------- + +Make the ``st`` type generic, so that instead of the state being fixed +to a single integer value, it can be used with any type for the +state. I.e., define ``st (s:Type) (a:Type) : Type``, where ``s`` is +the type of the state. + +Adapt the full development seen above to work with ``st s``, including +proving the various laws. + +`Exercise file <../code/exercises/Part2.ST.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.ST.fst + :language: fstar + +-------------------------------------------------------------------------------- + +Exercise +-------- + +Monads can be used to model many computational effects, not just +mutable state. Another common example is to use monads to model +computations that may raise runtime errors. Here's an exercise to help +you see how. + +Prove that the ``option`` type can be made into a monad, i.e., define +``bind`` and ``return`` and prove the monad laws. + +`Exercise file <../code/exercises/Part2.Option.fst>`__ + +-------------------------------------------------------------------------------- + +Computation Trees, or Monads Generically +++++++++++++++++++++++++++++++++++++++++ + +Each time one defines a monad to model a computational effect, one +usually thinks first of the effectful *actions* involved (e.g., +reading and writing the state, or raising an error), and then finds a +way to package those actions into the interface of monad with +``return`` and ``bind``, and then, to keep things honest, proves that +the implementation satisfies the monad laws. + +However, a lot of this is boilerplate and can be done once and for all +by representing effectful computations not just as functions (as we +did with ``st a = int -> a & s``) but instead as an inductive type +that models a *computation tree*, with effectful actions made explicit +at each node in the tree. One can prove that this representation, +sometimes called a *free monad*, is a monad, and then instantiate it +repeatedly for the particular kinds of actions that one may want to +use in given program. + +.. note :: + + In this section, we're scratching the surface of a rich area of + research called *algebraic effects*. While what we show here is not + a full-blown algebraic effects development (we'll save that for a + later chapter), here are some other resources about it. + + * `Alg.fst + `_: + An F* development with algebraic effects (to be covered in + detail later). + + * `Koka `_, a + language with algebraic effects at its core + + * A bibliography about `effects + `_ + +We'll start our development of computation trees with a type +describing the signature of a language of actions. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: action_class$ + :end-before: //SNIPPET_END: action_class$ + +This kind of signature is sometimes called a *type class*, a type +``act:Type``, together with some operations it supports. In this case, +the operations tell us what kind of input and output a given action +expects. + +.. note:: + + F* also provides support for type classes and inference of type + class instantiations. This will be described in a later + chapter. Meanwhile, you can learn more about type classes in F* + `from the wiki + `_ + and from these `examples + `_. + +For example, if we were interested in just the read/write actions on a +mutable integer state (as in our ``st a`` example), we could build an +instance of the ``action_class``, as shown below. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: rw_action_class$ + :end-before: //SNIPPET_END: rw_action_class$ + +However, we can define a type ``tree acts a``, the type of a computation +tree whose effectful actions are from the class ``acts``, completely +generically in the actions themselves. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: tree$ + :end-before: //SNIPPET_END: tree$ + +A ``tree act a`` has just two cases: + + * Either it is a leaf node, ``Return x``, modeling a computation + that immediately returns the value ``x``; + + * Or, we have a node ``DoThen act input k``, modeling a computation + that begins with some action ``act``, to which we pass some input, + and where ``k`` represents all the possible "continuations" of the + action, represented by a ``tree act a`` for each possible output + returned by the action. That is, ``DoThen`` represents a node in + the tree with a single action and a possibly infinite number of + sub-trees. + +With this representation we can define ``return`` and ``bind``, and +prove the monad laws once and for all. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: return and bind$ + :end-before: //SNIPPET_END: return and bind$ + +* The ``return`` combinator is easy, since we already have a + ``Return`` leaf node in the tree type. + +* The ``bind`` combinator is a little more interesting, involving a + structural recursion over the tree, relying (as we did in the + previous chapter on well-founded recursion) on the property that all + the trees returned by ``k`` are strictly smaller than the original + tree ``f``. + +To prove the monad laws, we first need to define an equivalence +relation on trees---this relation is not quite just ``==``, since each +continuation in the tree is function which itself returns a tree. So, +we define ``equiv`` blow, relating trees that are both ``Returns``, or +when they both begin with the same action and have +pointwise-equivalent continuations. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: equiv$ + :end-before: //SNIPPET_END: equiv$ + +.. note:: + + We are specifically avoiding the use of :ref:`functional + extensionality ` here, a property which allows + equating pointwise equal :math:`\eta`-expanded functions. We show + how one can use functional extensionality in this development as an + advanced exercise. + +To prove that ``equiv`` is an equivalence relation, here are lemmas +that prove that it is reflexive, symmetric, and transitive---we see +here a use of the syntactic sugar for logical connectives, +:ref:`introduced in a previous chapter `. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: equiv is an equivalence$ + :end-before: //SNIPPET_END: equiv is an equivalence$ + +Now, we can prove that ``tree`` satisifies the monad laws with respect +to ``equiv``. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: tree is a monad$ + :end-before: //SNIPPET_END: tree is a monad$ + +The associativity law, in particular, should make intuitive sense in +that a ``tree acts a`` represents a computation in a canonical fully +left-associative form, i.e., a single action followed by its +continuation. As such, no matter how you associate computations in +``bind``, the underlying representation is always fully +left-associated. + +Having defined our computation trees generically, we can use them with +any actions we like. For example, here's our ``read_and_increment`` +re-built using computation trees. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: read_and_increment$ + :end-before: //SNIPPET_END: read_and_increment$ + +Finally, given a computation tree we can "run" it, by interpreting it +as a state-passing function. + +.. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: interp$ + :end-before: //SNIPPET_END: interp$ + +.. note:: + + A main difference between what we've shown here with ``interp`` and + a general treament of algebraic effects is that rather than + "bake-in" the interpretation of the individual actions in + ``interp``, we can also abstract the semantics of the actions using + an idea similar to exception handling, allowing the context to + customize the semantics of the actions simply by providing a + different handler. + +Exercise +-------- + +Prove that the ``interp`` function interprets equivalent trees ``f`` +and ``g`` to pointwise equivalent functions. + +`Exercise File <../code/exercises/Part2.ComputationTreeEquiv.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.Free.fst + :language: fstar + :start-after: //SNIPPET_START: interp_equiv$ + :end-before: //SNIPPET_END: interp_equiv$ + +-------------------------------------------------------------------------------- + +Exercise +-------- + +Instead of proving every time that a function like ``interp`` produces +equivalent results when applied to equivalent trees, using functional +extensionality, we can prove that equivalent trees are actually +provably equal, i.e., ``equiv x y ==> x == y``. + +This is a little technical, since although functional extensionality +is a theorem in F*, it is only true of :math:`\eta`-expanded functions. + +Try to use ``FStar.FunctionalExtensionality.fsti`` to adapt the +definitions shown above so that we can prove the lemma ``equiv x y ==> +x == y``. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.FreeFunExt.fst + :language: fstar + +-------------------------------------------------------------------------------- + +Manipulating Computation Trees: Nondeterminism and Concurrency +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ + +As a final bit, we show that representing computations as trees is not +just useful from a perspective of genericity and code re-use. +Computation trees expose the structure of a computation in a way that +allows us to manipulate it, e.g., interpreting actions in an +alternative semantics. + +In this section, we enhance our computation trees to support +non-deterministic choice, i.e., given pair of computations ``l, r:tree +acts a``, we can non-deterministically choose to evaluate ``l`` or +``r``. With this capability, we can also express some models of +concurrency, e.g., a semantics that interleaves imperative actions +from several threads. + +Let's start by enhancing our ``tree`` type to now include an node ``Or +l r``, to represent non-deterministic choice between ``l`` and ``r``. + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: tree$ + :end-before: //SNIPPET_END: tree$ + +As before, we can define ``return`` and ``bind``, this time in +``bind`` we sequence ``g`` after both choices in ``Or``. + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: return and bind$ + :end-before: //SNIPPET_END: return and bind$ + +What's more interesting is that, in addition to sequential +composition, we can also define parallel composition of a pair of +computations using ``par f g``, as shown below. + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: par$ + :end-before: //SNIPPET_END: par$ + +There's quite a lot going on here, so let's break it down a bit: + + * The functions ``l_par f g`` and ``r_par f g`` are mutually + recursive and define an interleaving semantics of the actions in + ``f`` and ``g``. + + * ``l_par f g`` is left-biased: picking an action from ``f`` to + execute first (if any are left); while ``r_par f g`` is + right-biased, picking an action from ``g`` to execute first. + + * Consider the ``DoThen`` case in ``l_par``: if picks the head + action ``a`` from ``f`` and the recurses in the continuation with + ``r_par (k x) g``, to prefer executing first an action from ``g`` + rather than ``k x``. The ``DoThen`` case of ``r_par`` is + symmetric. + + * For ``l_par``, in the non-deterministic choice case (``Or``), we + interleave either choice of ``f`` with ``g``, and ``r_par`` is + symmetric. + + * Finally, we define parallel composition ``par f g`` as the + non-deterministic choice of either the left-biased or right-biased + interleaving of the actions of ``f`` and ``g``. This fixes the + semantics of parallel composition to a round-robin scheduling of + the actions between the threads, but one could imagine + implementing other kinds of schedulers too. + +As before, we can now instantiate our tree with read/write actions and +write some simple programs, including ``par_inc``, a computation that +tries to increment the counter twice in parallel. + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: sample program$ + :end-before: //SNIPPET_END: sample program$ + +However, there's trouble ahead---because of the interleaving +semantics, we don't actually increment the state twice. + +To check, let's define an interpretation function to run our +computations. Since we need to resolve the non-deterministic choice in +the ``Or`` nodes, we'll parameterize our intepreter by a source of +"randomness", an infinite stream of booleans. + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: interp$ + :end-before: //SNIPPET_END: interp$ + +This interpreter is very similar to our prior interpreter, except in +the ``Or`` case, we read a boolean from ``rand``, our randomness +stream, and choose the left- or right-branch accordingly. + +We can run our program on this interpreter and check what it +returns. One way to do this is to make use of F*'s normalizer, the +abstract machine that F* uses to reduce computations during +type-checking. The ``assert_norm p`` feature used below instructs F* +to symbolically reduce the term ``p`` as much as possible and then +check that the result is equivalent to ``True``. + +.. note:: + + F*'s emacs mode ``fstar-mode.el`` provides some utilites to allow + reducing terms of F*'s abstract machine and showing the results to + the user. F*'s tactics also also allow evaluating terms and viewing + the results---we leave further discussion of these features to a + future chapter. + + +.. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: test_par_inc$ + :end-before: //SNIPPET_END: test_par_inc$ + +In this case, we ask F* to interpret ``par_inc`` on the interpreter we +just defined. And, indeed, F* confirms that in the final state, we +have incremented the state only once. Due to the round-robin +scheduling of actions, the interpreter has executed both the reads +before both the writes, making one of the reads and one of the writes +redundant. + +Exercise +-------- + +Define an action class that include an increment operation, in +addition to reads and writes. Adapt the interpreter shown above to +work with this action class and prove (using ``assert_norm``) that a +program that contains two parallel atomic increments increments the +state twice. + +`Exercise File <../code/exercises/Part2.AtomicIncrement.fst>`__ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.Par.fst + :language: fstar + :start-after: //SNIPPET_START: atomic increment$ + :end-before: //SNIPPET_END: atomic increment$ + +-------------------------------------------------------------------------------- + +Looking ahead ++++++++++++++ + +Writing correct programs with side-effects is hard, particularly when +those effects include features like mutable state and concurrency! + +What we've seen here is that although we've been able to model the +semantics of these programs, proving that they work correctly is +non-trivial. Further, while we have defined interpreters for our +programs, these interpreters are far from efficient. In practice, one +usually resorts to things like shared-memory concurrency to gain +performance and our interpreters, though mathematically precise, are +horribly slow. + +Addressing these two topics is the main purpose of F*'s user-defined +effect system, a big part of the language which we'll cover in a +subsequent section. The effect system aims to address two main needs: + + * Proofs of effectful programs: The effect system enables developing + effectful programs coupled with *program logics* that enable + specifying and proving program properties. We'll learn about many + different kinds of logics that F* libraries provide, ranging from + classical Floyd-Hoare logics for sequential programs, relational + logics for program equivalence, weakest precondition calculi, and + separation logics for concurrent and distributed programs. + + * Effect abstraction: Although programs can be specified and proven + against a clean mathematical semantics, for efficient execution, + F* provides mechanisms to hide the representation of an effect so + that effectful programs can be compiled efficiently to run with + native support for effects like state, exceptions, concurrency, + and IO. diff --git a/doc/book/PoP-in-FStar/book/part2/part2_phoas.rst b/doc/book/PoP-in-FStar/book/part2/part2_phoas.rst new file mode 100644 index 00000000000..03451482f86 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_phoas.rst @@ -0,0 +1,257 @@ +.. _Part2_phoas: + +Higher-order Abstract Syntax +============================ + +In the previous chapter, we looked at a *deep embedding* of the simply +typed lambda calculus (STLC). The encoding is "deep" in the sense that +we used an inductive type to represent the *syntax* of the lambda +calculus in F*, and then defined and proved some properties of its +semantics represented mathematically in F*. + +Another way to embed a language like the STLC in F* is a *shallow +embedding*. F* is itself a functional programming language, and it has +a type system that is certainly powerful enough to represent simply +typed terms, so why not use lambda terms in F* itself to represent +STLC, rather than merely encoding STLC's abstract syntax in F*. This +kind of encoding is called a shallow embedding, where we use semantic +constructs in the host (or meta) language (F*) to represent analogous +features of the embedded (or object) language (STLC, in our example). + +In this chapter, we look at a particularly elegant technique for doing +this called *higher-order abstract syntax* (or HOAS). For more +background about this, a `2008 paper by Adam Chlipala +`_ is a +good resource, though it develops a more sophisticated parametric +version. + +Our small case study in HOAS is meant to illustrate the use of +inductive types with non-trivial indexes while also containing +strictly positive functions as arguments, and also a bit of type-level +computation. + +Roadmap ++++++++ + +The type ``typ`` below represents the types we'll use in our STLC +object language, i.e., the base types ``Bool`` and ``Int``, and +function types ``Arrow t1 t2``. + +.. literalinclude:: ../code/Part2.HOAS.fst + :language: fstar + :start-after: //SNIPPET_START: typ$ + :end-before: //SNIPPET_END: typ$ + +This is analogous to our representation of STLC types in the deep +embedding of the previous chapter. + +Where things get interesting is in the representation of STLC terms +and their semantics. To set the goal posts, we want to + + 1. Give an interpretation of STLC types into F* types, by defining a + function ``denote_typ : typ -> Type`` + + 2. Define a type ``term t``, to represent well-typed STLC + terms whose type is ``t:typ`` + + 3. Give an intepretation of STLC terms in to F* terms of the + suitable type, i.e., define a function ``denote_term (#t:typ) + (e:term t) : denote_typ t``, proving that every well-typed STLC + term at type ``t`` can be represented in F* as a function of type + ``denote_typ t``. + +Such a result would encompass the type soundness results we proved in +the previous chapter (proving that well-typed programs can always make +progress), but would go substantially further to show that the +reduction of all such terms always terminates producing F* values that +model their semantics. + +.. _Part2_phoas_denotation: + +Denotation of types ++++++++++++++++++++ + +Step 1 in our roadmap is to give an interpreration of STLC types +``typ`` into F* types. This is easily done, as shown below. + +.. literalinclude:: ../code/Part2.HOAS.fst + :language: fstar + :start-after: //SNIPPET_START: denote_typ$ + :end-before: //SNIPPET_END: denote_typ$ + +We have here a recursive function that computes a *Type* from its +argument. This is may seem odd at first, but it's perfectly legal in a +dependently typed language like F*. + +The function ``denote_typ`` The interpretation of ``Bool`` and ``Int`` +are the F* type ``bool`` and ``int``, while the interpretation of +``Arrow`` is an F* arrow. Note, the function terminates because the +two recursive calls are on strict sub-terms of the argument. + + +Term representation +++++++++++++++++++++ + +The main difficulty in representing a language like STLC (or any +language with lambda-like variable-binding structure), is the question +of how to represent variables and their binders. + +In the deep embedding of the previous chapter, our answer to this +questions was very syntactic---variables are de Bruijn indexes, where, +at each occurrence, the index used for a variable counts the number of +lambdas to traverse to reach the binder for that variable. + +The HOAS approach to answering these questions is very different. The +idea is to use the binding constructs and variables already available +in the host language (i.e., lambda terms in F*) to represent binders +and variables in the object language (STLC). + +The main type in our representation of terms is the ``term`` defined +below. There are several clever subtleties here, which we'll try to +explain briefly. + +.. literalinclude:: ../code/Part2.HOAS.fst + :language: fstar + :start-after: //SNIPPET_START: term$ + :end-before: //SNIPPET_END: term$ + +First, the ``term`` type represents both the abstract syntax of the +STLC as well as its typing rules. We'll see in detail how this works, +but notice already that ``term`` is is indexed by a ``t:typ``, which +describes the type of encoded STLC term. The indexing structure +encodes the typing rules of STLC, ensuring that only well-typed STLC +terms can be constructed. + +The second interesting part is the use of ``denote_typ`` within the +syntax---variables and binders at a given STLC type ``t`` will be +represented by F* variables and binders of the corresponding F* type +``denote_typ t``. + + * ``Var`` : Variables are represented as ``Var #t n : term t``, + where ``n`` is a term of type ``denote_typ t``. + + * ``TT`` and ``FF``: The two boolean constansts are represented by + these constructors, both of type ``term Bool``, where the index + indicates that they have type ``Bool``. + + * ``I``: STLC integers are represented by tagged F* integers. + + * ``App``: To apply ``e1`` to ``e2`` in a well-typed way, we must + prove that ``e1`` has an arrow type ``TArrow t1 t2``, while ``e2`` + has type ``t1``, and the resulting term ``App e1 e2`` has type + ``t2``. Notice how the indexing structure of the ``App`` + constructor precisely captures this typing rule. + + * ``Lam``: Finally, and crucially, we represent STLC lambda terms + using F* functions, i.e., ``Lam f`` has type ``Arrow t1 t2``, when + ``f`` is represented by an F* function from arguments of type + ``denote_typ t1``, to terms of type ``term t``. The ``Lam`` case + includes a function-typed field, but the type of that field, + ``denote_typ t1 -> term t2`` is strictly positive---unlike the the + ``dyn`` type, :ref:`shown earlier `. + + +Denotation of terms ++++++++++++++++++++ + +Finally, we come to Step 3 (below), where we give an interpretation to +``term t`` as an F* term of type ``denote_typ t``. The trickiest part +of such a interpretation is to handle functions and variables, but +this part is already done in the representation, since these are +already represented by the appropriate F* terms. + +.. literalinclude:: ../code/Part2.HOAS.fst + :language: fstar + :start-after: //SNIPPET_START: denote_term$ + :end-before: //SNIPPET_END: denote_term$ + +Let's look at each of the cases: + + * The ``Var`` case is easy, since the variable ``x`` is already + interpreted into an element of the appropriate F* type. + + * The constants ``TT``, ``FF``, and ``I`` are easy too, since we can + just interpret them as the suitable boolean or integer constants. + + * For the ``App #t1 #t2 f a`` case, we recursively interpret ``f`` + and ``a``. The type indices tell us that ``f`` must be interpreted + into an F* function of type ``denote_typ t1 -> denote_typ t2`` and + that ``denote_term a`` has type ``denote_typ t1``. So, we can + simply apply the denotation of ``f`` to the denotation of ``a`` to + get a term of type ``denote_typ t2``. In other words, function + application in STLC is represented semantically by function + application in F*. + + * Finally, in the ``Lam #t1 #t2 f`` case, we need to produce a term + of type ``denote_typ t1 -> denote_typ t2``. So, we build an F* + lambda term (where the argument ``x`` has type ``denote_typ t1``), + and in the body we apply ``f x`` and recursively call + ``denote_term`` on ``f x``. + +If that felt a little bit magical, it's because it almost is! We've +defined the syntax, typing rules, and an interpreter that doubles as a +denotational semantics for the STLC, and proved everything sound in +around 30 lines of code and proof. By picking the right +representation, everything just follows very smoothly. + +Termination +----------- + +You may be wondering why F* accepts that ``denote_term e`` +terminates. There are three recursive calls to consider + + * The two calls in the ``App`` case are easy: The recursive calls + are on strict sub-terms of ``App f a``. + + * In the ``Lam f`` case, we have one recursive call ``denote_term + (f x)``, and F* accepts that ``f x`` is strictly smaller than ``Lam + f``. This is an instance of the sub-term ordering on inductive + types explained earlier, as part of F*'s ``precedes`` relation, + :ref:`explained earlier `. + +For a bit of intuition, one way to understand the type ``term`` is by +thinking of it as a tree of finite depth, but possibly infinite width: + + * The leaves of the tree are the ``Var``, ``TT``, and ``FF`` cases. + + * The internal node ``App e0 e1`` composes two sub-trees, ``e0`` and + ``e1``. + + * The internal node ``Lam #t1 #t2 f`` composes a variable number of + sub-trees, where the number of sub-trees depends on the parameter + ``t1``. For example: + + - If ``t1 = Unit``, then ``f : unit -> term v t``, i.e., there is + only one child node, ``f()``. + + - If ``t1 = Bool``, then ``f : bool -> term v t``, i.e., there are + two children, ``f true`` and ``f false``. + + - if ``t1 = Int``, then ``f : int -> term v t``, i.e., there are + infinitely many children, ``..., f -1, f 0, f 1, ...``. + +With this intuition, informally, it is safe to recursively call +``denote_term e`` on any of the children of ``e``, since the depth of +the tree will decrease on each recursive call. Hence the call +``denote_term (f x)`` terminates. + +We'll revisit termination arguments for recursive functions more +formally in a subsequent chapter on :ref:`well-founded recursion +`. + +Exercises ++++++++++ + +Giving a semantics to STLC is just the tip of the iceberg. There's a +lot more one can do with HOAS and Chlipala's paper gives lots of +examples and sample code in Coq. + +For several more advanced exercises, based on the definitions shown +below, try reconstructing other examples from Chlipala's paper, +including a proof of correctness of a compiler implementing a +continuation-passing style (CPS) transformation of STLC. + +`Exercise file <../code/Part2.PHOAS.fst>`_ + +.. literalinclude:: ../code/Part2.PHOAS.fst + :language: fstar diff --git a/doc/book/PoP-in-FStar/book/part2/part2_stlc.rst b/doc/book/PoP-in-FStar/book/part2/part2_stlc.rst new file mode 100644 index 00000000000..6b105c852d6 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_stlc.rst @@ -0,0 +1,675 @@ +.. _Part2_stlc: + +Simply Typed Lambda Calculus +============================ + +In this chapter, we look at how inductively defined types can be used +to represent both raw data, inductively defined relations, and proofs +relating the two. + +By way of illustration, we develop a case study in the simply typed +lambda calculus (STLC), a very simple programming language which is +often studied in introductory courses on the semantics of programming +languages. Its syntax, type system, and runtime behavior can be +described in just a few lines. The main result we're interested in +proving is the soundness of the type system, i.e., that if a program +type checks then it can be executed safely without a certain class of +runtime errors. + +If you haven't seen the STLC before, there are several good resources +for it available on the web, including the `Software Foundations book +`_, +though we'll try to keep the presentation here as self-contained as +possible. Thanks to Simon Forest, Catalin Hritcu, and Simon Schaffer +for contributing parts of this case study. + +Syntax +++++++ + +The syntax of programs :math:`e` is defined by the context-free +grammar shown below. + +.. math:: + + e~::=~()~|~x~|~\lambda x:t. e_0~|~e_0~e_1 + +This can be read as follows: a program :math:`e` is either + + * the unit value :math:`()`; + + * a variable :math:`x`; + + * a lambda term :math:`\lambda x:t. e_0` associating a variable + :math:`x` to a type :math:`t` and a some sub-program :math:`e_0`; + + * or, the application of the sub-program :math:`e_0` to another + sub-program :math:`e_1`. + +The syntax of the type annotation :math:`t` is also very simple: + +.. math:: + + t~::=~\mathsf{unit}~|~t_0 \rightarrow t_1 + +A type :math:`t` is either + + * the :math:`\mathsf{unit}` type constant; + + * or, arrow type :math:`t_0 \rightarrow t_1` formed from two smaller types + :math:`t_0` and :math:`t_1` + +This language is very minimalistic, but it can be easily extended with +some other forms, e.g., one could add a type of integers, integer +constants, and operators like addition and subtraction. We'll look at +that as part of some exercises. + +We'll define the syntax of types and programs formally in F* as a pair +of simple inductive datatypes ``typ`` (for types) and ``exp`` (for +programs or expressions) with a constructor for each of the cases +above. + +The main subtlety is in the representation of variables. For example, +ignoring the type annotations, in the term +:math:`\lambda x. (\lambda x. x)` the inner lambda binds *a different* +:math:`x` than the outer one, i.e., the term is equivalent to +:math:`\lambda x. (\lambda y. y)` and our representation of programs +must respect this this convention. We'll use a technique called de +Bruijn indices, where the names of the variables are no longer +significant and instead each variable is represented by a natural +number describing the number of :math:`\lambda` binders that one must +cross when traversing a term from the occurrence of the variable to +that variable's :math:`\lambda` binder. + +For example, the terms :math:`\lambda x. (\lambda x. x)` and +:math:`\lambda x. (\lambda y. y)` are both represented as +:math:`\lambda _. (\lambda _. 0)`, since the inner occurrence of +:math:`x` is associated with the inner :math:`\lambda`; while +:math:`\lambda x. (\lambda y. (\lambda z. x))` is represented as +:math:`\lambda _. (\lambda _. (\lambda _. 2)`, since from the inner +occurrence of :math:`x` one must skip past :math:`2` :math:`\lambda`'s +to reach the :math:`\lambda` associated with :math:`x`. Note, the +variable names are no longer significant in de Bruijn's notation. + +Representing types +------------------ + +The inductive type ``typ`` defined below is our representation of +types. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: typ$ + :end-before: //SNIPPET_END: typ$ + +This is entirely straightforward: a constructor for each case in our +type grammar, as described above. + +Representing programs +--------------------- + +The representation of program expressions is shown below: + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: exp$ + :end-before: //SNIPPET_END: exp$ + +This too is straightforward: a constructor for each case in our +program grammar, as described above. We use a ``nat`` to represent +variables ``var`` and ``ELam`` represents an annotated lambda term of +the form :math:`\lambda _:t. e`, where the name of the binder is +omitted, since we're using de Bruijn's representation. + +Runtime semantics ++++++++++++++++++ + +STLC has just one main computation rule to execute a program---the +function application rule or a :math:`\beta` reduction, as shown below: + +.. math:: + + (\lambda x:t. e_0)~e_1 \longrightarrow e_0 [x \mapsto e_1] + +This says that when a :math:`\lambda` literal is applied to an +argument :math:`e_1` the program takes a single step of computation to +the body of the lambda literal :math:`e_0` with every occurrence of +the bound variable :math:`x` replaced by the argument :math:`e_1`. The +substituion has to be careful to avoid "name capture", i.e., +substituting a term in a context that re-binds its free variables. For +example, when substituting :math:`y \mapsto x` in +:math:`\lambda x. y`, one must make sure that the resulting term is +**not** :math:`\lambda x. x`. Using de Bruijn notation will help us +make precise and avoid name capture. + +The other computation rules in the language are inductively defined, +e.g., :math:`e_0~e_1` can take a step to :math:`e_0'~e_1` if +:math:`e_0 \longrightarrow e_0'`, and similarly for :math:`e_1`. + +By choosing these other rules in different ways one obtains different +reduction strategies, e.g., call-by-value or call-by-name etc. We'll +leave the choice of reduction strategy non-deterministic and represent +the computation rules of the STLC as an indexed inductive type, ``step +e e'`` encoding one or more steps of computation. + +Formalizing an Operational Semantics +------------------------------------ + +The inductive type ``step`` below describes a single step of +computation in what is known as a "small-step operational +semantics". The type ``step e e'`` is a relation between an initial +program ``e`` and a program ``e'`` that results after taking one step +of computation on some sub-term of ``e``. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: step$ + :end-before: //SNIPPET_END: step$ + +-------------------------------------------------------------------------------- + + * The constructor ``Beta`` represents the rule for :math:`\beta` + reduction. The most subtle part of the development is defining + ``subst`` and ``sub_beta``---we'll return to that in detail + shortly. + + * ``AppLeft`` and ``AppRight`` allow reducing either the left- or + right-subterm of ``EApp e1 e2``. + + +Exercise +^^^^^^^^ + +Define an inductive relation ``steps : exp -> exp -> Type`` for the +transitive closure of ``step``, representing multiple steps of +computation. + +Use this `exercise file <../code/exercises/Part2.STLC.fst>`_ for all +the exercises that follow. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: steps$ + :end-before: //SNIPPET_END: steps$ + +-------------------------------------------------------------------------------- + +Substitution: Failed Attempt +---------------------------- + +Defining substitution is the trickiest part of the system. Our first +attempt will convey the main intuitions, but F* will refuse to accept +it as well-founded. We'll then enrich our definitions to prove that +substitution terminates. + +We'll define a substitution as a total function from variables ``var`` +to expressions ``exp``. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: sub0$ + :end-before: //SNIPPET_END: sub0$ + +These kind of substitutions are sometimes called "parallel +substitutions"---the each variable is substituted independently of the +others. + +When doing a :math:`\beta` reduction, we want to substitute the +variable associated with de Bruijn index ``0`` in the body of the +function with the argument ``e`` and then remove the :math:`\lambda` +binder---``sub_beta0`` does just that, replacing variable ``0`` with +``e`` and shifting other variables down by ``1``, since the +:math:`\lambda` binder of the function is removed. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: sub_beta0$ + :end-before: //SNIPPET_END: sub_beta0$ + +The function ``subst s e`` applies the substitution ``s`` to ``e``: + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: subst0$ + :end-before: //SNIPPET_END: subst0$ + +* The ``EUnit`` case is trivial---there are no variables to substitute. + +* In the variable case ``subst0 s (EVar x)`` just applies ``s`` to ``x``. + +* In the ``EApp`` case, we apply the substitution to each sub-term. + +* The ``ELam`` case is the most interesting. To apply the substitution + ``s`` to the body ``e1``, we have to traverse a binder. The mutally + recursive function ``sub_elam0 s`` adjusts ``s`` to account for this + new binder, which has de Bruijn index ``0`` in the body ``e1`` (at + least until another binder is encountered). + + - In ``sub_elam0``, if we are applying ``s`` to the newly bound + variable at index ``0``, then we leave that variable unchanged, + since ``s`` cannot affect it. + + - Otherwise, we have a variable with index at least ``1``, + referencing a binder that is bound in an outer scope; so, we shift + it down and apply ``s`` to it, and then increment all the + variables in the resulting term (using ``sub_inc0``) to avoid capture. + +This definition of substitution is correct, but F* refuses to accept +it since we have not convinced the typechecker that ``subst0`` and +``sub_elam0`` actually terminate. In fact, F* complains in two +locations about a failed termination check. + +.. note:: + + This definition is expected to fail, so the ``[@@expect_failure + [19;19]]`` attribute on the definition asks F* to check that the + definition raises Error 19 twice. We'll look in detail as to why it + fails, next. + +Substitution, Proven Total +-------------------------- + +Informally, let's try to convince ourselves why ``subst0`` and +``sub_elam0`` actually terminate. + +* The recursive calls in the ``EApp`` case are applied to strictly + smaller sub-terms (``e0`` and ``e1``) of the original term ``e``. + +* In the ``ELam`` case, we apply ``subst0`` to a smaller sub-term + ``e1``, but we make a mutally recursive call to ``sub_elam0 s`` + first---so we need to check that that call terminates. This is the + first place where F* complains. + +* When calling ``sub_elam0``, it calls back to ``subst0`` on a + completely unrelated term ``s (y - 1)``, and F* complains that this + may not terminate. But, thankfully, this call makes use only of the + ``sub_inc0`` substitution, which is just a renaming substitution and + which does not make any further recursive calls. Somehow, we have to + convince F* that a recursive call with a renaming substitution is + fine. + +To distinguish renamings from general substitutions, we'll use an +indexed type ``sub r``, shown below. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: sub$ + :end-before: //SNIPPET_END: sub$ + +* ``sub true`` is the type of renamings, substitutions that map + variables to variables. + +* ``sub false`` are substitutions that map at least one variable to a + non-variable. + +It's easy to prove that ``sub_inc`` is a renaming: + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: sub_inc$ + :end-before: //SNIPPET_END: sub_inc$ + +The function ``sub_beta`` shown below is the analog of ``sub_beta0``, +but with a type that tracks whether it is a renaming or not. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: sub_beta$ + :end-before: //SNIPPET_END: sub_beta$ + +* The type says that ``sub_beta e`` is a renaming if and only if ``e`` + is itself a variable. + +* Proving this type, particularly in the case where ``e`` is not a + variable requires proving an existentially quantified formula, i.e., + ``exists x. ~(EVar (subst_beta e) x)``. As mentioned + :ref:`previously `, the SMT solver cannot + always automatically instantiate existential quantifiers in the + goal. So, we introduce the existential quantifier explicitly, + providing the witness ``0``, and then the SMT solver can easily + prove ``~(EVar (subst_beta e) 0)``. + +Finally, we show the definitions of ``subst`` and ``sub_elam`` +below---identical to ``subst0`` and ``sub_elam0``, but enriched with +types that allow expressing a termination argument to F* using a +4-ary lexicographic ordering. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: subst$ + :end-before: //SNIPPET_END: subst$ + +Let's analyze the recursive calls of ``subst`` and ``subst_elam`` to +see why this order works. + +* Cases of ``subst``: + + - The ``EUnit`` and ``EVar`` cases are trivial, as before. + + - In ``EApp``, ``e`` is definitely not a variable, so ``bool_order + (EVar? e)`` is ``1``. if ``e1`` (respectively ``e2``) are + variables, then this recursive call terminates, the lexicographic + tuple ``(0, _, _, _) << (1, _, _, _)``, regardles of the other + values. Otherwise, the last component of the tuple decreases + (since ``e1`` and ``e2`` are proper sub-terms of ``e``), while + none of the other components of the tuple change. + + - The call to ``sub_elam s`` in ``ELam`` terminates because the + third component of the tuple decreases from ``1`` to ``0``, while + the first two do not change. + + - The final recursive call to ``subst`` terminates for similar + reasons to the recursive calls in the ``EApp`` case, since the + type of ``sub_elam`` guarantees that ``sub_elam s`` is renaming if + an only of ``s`` is (so the ``r`` bit does not change). + +* Cases of ``sub_elam``, in the recursive call to ``subst sub_inc (s + (y - 1))``, we have already proven that ``sub_inc`` is a + renaming. So, we have two cases to consider: + + - If ``s (y - 1)`` is a variable, then ``bool_order (EVar? e)``, the + first component of the decreases clause of ``subst`` is ``0``, + which clearly precedes ``1``, the first component of the decreases + clause of ``subst_elam``. + + - Otherwwise, ``s (y - 1)`` is not a variable, so ``s`` is + definitely not a renaming while ``sub_inc`` is. So, the second + second component of the decreases clause decreases while the first + component is unchanged. + +Finally, we need to prove that ``sub_elam s`` is a renaming if and +only if ``s`` is. For this, we need two things: + +* First, strengthen the type of ``subst s`` to show that it maps + variables to variables if and only if ``r`` is a renaming, + +* Second, we need to instantiate an exisential quantifier in + ``sub_elam``, to show that if ``s`` is not a renaming, then it must + map some ``x`` to a non-variable and, hence, ``sub_elam s (x + 1)`` + is a non-variable too. One way to do this is by asserting this fact, + which is a sufficient hint to the SMT solver to find the + instantiation needed. Another way is to explicitly introduce the + existential, as in the exercise below. + +In summary, using indexed types combined with well-founded recursion +on lexicographic orderings, we were able to prove our definitions +total. That said, coming up with such orderings is non-trivial and +requires some ingenuity, but once you do, it allows for relatively +compact definitions that handle both substiutions and renamings. + +Exercise +^^^^^^^^ + +Remove the first component of the decreases clause of both definitions +and revise the definitions to make F* accept it. + +Your solution should have signature + +.. code-block:: fstar + + let rec subst1 (#r:bool) + (s:sub r) + (e:exp) + : Tot (e':exp { r ==> (EVar? e <==> EVar? e') }) + (decreases %[bool_order r; + 1; + e]) + ... + + and sub_elam1 (#r:bool) (s:sub r) + : Tot (sub r) + (decreases %[bool_order r; + 0; + EVar 0]) + + +.. container:: toggle + + .. container:: header + + **Hint** + + Inline a case of ``subst`` in ``subst_elam``. + The answer is included with the next problem below. + +-------------------------------------------------------------------------------- + +Replace the assertion in ``subst_elam`` with a proof that explicitly +introduces the existential quantifier. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: subst1$ + :end-before: //SNIPPET_END: subst1$ + +-------------------------------------------------------------------------------- + +Type system ++++++++++++ + +If when running a program, if one ends up with an term like +:math:`()~e` (i.e., some non-function term like :math:`()` being used +as if it were a function) then a runtime error has occurred and the +program crashes. A type system for the simply-typed lambda calculus is +designed to prevent this kind of runtime error. + +The type system is an inductively defined relation ``typing g e t`` +between a + + * typing environment ``g:env``, a partial map from variable indexes + in a particular scope to their annotated types; + + * a program expression ``e:exp``; + + * and its type ``t:typ``. + +Environments +------------ + +The code below shows our representation of typing environments +``env``, a total function from variable indexes ``var`` to ``Some t`` +or ``None``. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: env$ + :end-before: //SNIPPET_END: env$ + +* The ``empty`` environment maps all variables to ``None``. + +* Extending an an environment ``g`` associating a type ``t`` with a + new variable at index ``0`` involves shifting up the indexes of all + other variables in ``g`` by ``1``. + +.. _Part2_stlc_typing: + +Typing Relation +--------------- + +The type system of STLC is defined by the inductively defined relation +``typing g e t`` shown below. A value of ``typing g e t`` is a +derivation, or a proof, that ``e`` has type ``t`` in the environment +``g``. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: typing$ + :end-before: //SNIPPET_END: typing$ + +* The type does not support decidable equality, since all its + constructors contain a field ``g:env``, a function-typed value + without decidable equality. So, we mark the inductive with the + ``noeq`` qualifier, :ref:`as described previously + `. + +* ``TyUnit`` says that the unit value ``EUnit`` has type ``TUnit`` in + all environments. + +* ``TyVar`` says that a variable ``x`` is well-typed only in an + environment ``g`` that binds its type to ``Some t``, in which case, + the program ``EVar x`` has type ``t``. This rule ensures that no + out-of-scope variables can be used. + +* ``TyLam`` says that a function literal ``ELam t e1`` has type ``TArr + t t'`` in environment ``g``, when the body of the function ``e1`` + has type ``t'`` in an environment that extends ``g`` with a binding + for the new variable at type ``t`` (while shifting and retaining all + other ariables). + +* Finally, ``TyApp`` allows applying ``e1`` to ``e2`` only when ``e1`` + has an arrow type and the argument ``e2`` has the type of the formal + parameter of ``e1``---the entire term has the return type of ``e1``. + +Progress +++++++++ + +It's relatively easy to prove that a well-typed non-unit or lambda +term with no free variables can take a single step of +computation. This property is known as *progress*. + + +Exercise +-------- + +State and prove progress. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: progress$ + :end-before: //SNIPPET_END: progress$ + +-------------------------------------------------------------------------------- + +Preservation +++++++++++++ + +Given a well-typed term satisfying ``typing g e t`` and ``steps e e'``, +we would like to prove that ``e'`` has the same type as ``e``, i.e., +``typing g e' t``. This property is known as *preservation* (or +sometimes *subject reduction*). When taken in combination with +*progress*, this allows us to show that a well-typed term can keep +taking a step until it reaches a value. + +The proof below establishes preservation for a single step. + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: preservation_step$ + :end-before: //SNIPPET_END: preservation_step$ + +* Since we know the computation takes a step, the typing derivation + ``ht`` must be an instance of ``TyApp``. + +* In the ``AppLeft`` and ``AppRight`` case, we can easily use the + induction hypothesis depending on which side actually stepped. + +* The ``Beta`` case is the most interesting and requires a lemma about + substitutions preserving typing. + +The substitution lemma follows: + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: substitution$ + :end-before: //SNIPPET_END: substitution$ + +It starts with a notion of typability of substitutions, ``subst_typing +s g1 g2``, which that if a variable ``x`` has type ``g1 x``, then ``s +x`` must have that same type in ``g2``. + +The substitution lemma lifts this notion to expressions, stating that +applying a well-typed substitution ``subst_typing s g1 g2`` to a term +well-typed in ``g1`` produces a term well-typed in ``g2`` with the +same type. + +Exercise +-------- + +Use the substitution lemma to state and prove the +``substitution_beta`` lemma used in the proof of preservation. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: substitution_beta$ + :end-before: //SNIPPET_END: substitution_beta$ + +-------------------------------------------------------------------------------- + +Exercise +-------- + +Prove a preservation lemma for multiple steps. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: preservation$ + :end-before: //SNIPPET_END: preservation$ + +-------------------------------------------------------------------------------- + +Exercise +++++++++ + +Prove a type soundness lemma with the following statement: + +.. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: soundness_stmt$ + :end-before: //SNIPPET_END: soundness_stmt$ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.fst + :language: fstar + :start-after: //SNIPPET_START: soundness_sol$ + :end-before: //SNIPPET_END: soundness_sol$ + +-------------------------------------------------------------------------------- + +Exercise +++++++++ + +Add a step for reduction underneath a binder and prove the system +sound. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part2.STLC.Strong.fst + :language: fstar diff --git a/doc/book/PoP-in-FStar/book/part2/part2_universes.rst b/doc/book/PoP-in-FStar/book/part2/part2_universes.rst new file mode 100644 index 00000000000..145bd1e6501 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_universes.rst @@ -0,0 +1,760 @@ +.. _Part2_universes: + + +Universes +========= + +As mentioned :ref:`earlier `, ``Type`` is the +type of types. So, one might ask the question, what is the type of +``Type`` itself? Indeed, one can write the following and it may appear +at first that the type of ``Type`` is ``Type``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: ty$ + :end-before: //SNIPPET_END: ty$ + +However, behind the scenes, F* actually has a countably infinite +hierarchy of types, ``Type u#0``, ``Type u#1``, ``Type u#2``, ..., and +the type of ``Type u#i`` is actually ``Type u#(i + 1)``. The ``u#i`` +suffixes are called *universe levels* and if you give F* the following +option, it will actually show you the universe levels it inferred when +it prints a term. + +.. code-block:: fstar + + #push-options "--print_universes" + +With this option enabled, in fstar-mode.el, the F* emacs plugin, +hovering on the symbol ``ty`` prints ``Type u#(1 + _)``, i.e., the +type of ``ty`` is in a universe that is one greater than some universe +metavariable ``_``, i.e., ``ty`` is universe *polymorphic*. But, we're +getting a bit ahead of ourselves. + +In this chapter, we'll look at universe levels in detail, including +why they're necessary to avoid paradoxes, and showing how to +manipulate definitions that involve universes. For the most part, F* +infers the universe levels of a term and you don't have to think too +much about it---in fact, in all that we've seen so far, F* inferred +universe levels behind the scenes and we haven't had to mention +them. Though, eventually, they do crop up and understanding what they +mean and how to work with them becomes necessary. + +Other resources to learn about universes: + + * The Agda manual has a nice `chapter on universes + `_, + including universe polymorphism. + + * This chapter from Adam Chlipala's `Certified Programming with + Dependent Types + `_ + describes universes in Coq. While it also provides useful + background, F*'s universe system is more similar to Agda's and + Lean's than Coq's. + + +Basics +------ + +A universe annotation on a term takes the form ``u#l``, where ``l`` is +a universe level. The universe levels are terms from the following +grammar: + +.. code-block:: + + k ::= 0 | 1 | 2 | ... any natural number constant + l ::= k universe constant + | l + k | k + l constant offset from level l + | max l1 l2 maximum of two levels + | a | b | c | ... level variables + + +Let's revisit our first example, this time using explicit universe +annotations to make things clearer. + +We've defined, below, instances of ``Type`` for universe levels ``0, +1, 2`` and we see that each of them has a type at the next level. The +constant ``Type u#0`` is common enough that F* allows you to write +``Type0`` instead. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: ty_constants$ + :end-before: //SNIPPET_END: ty_constants$ + +If you try to define ``ty_bad`` below, F* complains with the following +error: + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: ty_bad$ + :end-before: //SNIPPET_END: ty_bad$ + +.. code-block:: + + Expected expression of type "Type0"; got expression "Type0" of type "Type u#1" + +The restriction that prevents a ``Type`` from being an inhabitant of +itself if sometimes called *predicativity*. The opposite, +*impredicativity*, if not suitably restricted, usually leads to +logical inconsistency. F* provides a limited form of impredicativity +through the use of ``squash`` types, which we'll see towards the end +of this chapter. + +.. note:: + + That said, if we didn't turn on the option ``--print_universes``, the + error message you get may be, sadly, bit baffling: + + .. code-block:: + + Expected expression of type "Type"; got expression "Type" of type "Type" + + Turning on ``--print_universes`` and ``--print_implicits`` is a + good way to make sense of type errors where the expected type and + the type that was computed seem identical. + + +Now, instead of defining several constants like ``ty0, ty1, ty2`` +etc., F* definitions can be *universe polymorphic*. Below, we define +``ty_poly`` as ``Type u#a``, for any universe variable ``a``, and so +``ty`` has type ``Type u#(a + 1)``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: ty_poly$ + :end-before: //SNIPPET_END: ty_poly$ + +One way to think of ``ty_poly`` is as a "definition template" +parameterized by the by the universe-variable ``a``. One can +instantiate ``ty_poly`` with a specific universe level ``l`` (by +writing ``ty_poly u#l``) and obtain a copy of its definition +specialized to level ``l``. F* can prove that instantiation of +``ty_poly`` are equal to the non-polymorphic definitions we had +earlier. As the last example shows, F* can usually infer the universe +instantiation, so you often don't need to write it. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: ty_poly_assert$ + :end-before: //SNIPPET_END: ty_poly_assert$ + +Universe computations for other types +------------------------------------- + +Every type in F* lives in a particular universe. For example, here are +some common types in ``Type u#0``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: some common types$ + :end-before: //SNIPPET_END: some common types$ + + +**Universe of an arrow type**: In general, the universe of an arrow +type ``x:t -> t'`` is the the maximum of the universe of ``t`` and +``t'``. + +This means that functions that are type-polymorphic live in higher +universes. For example, the polymorphic identity function that we saw +in an :ref:`earlier section `, is +actually also polymorphic in the universe level of its type argument. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: poly_id$ + :end-before: //SNIPPET_END: poly_id$ + +That is, the type of the identity function ``id`` is ``id_t`` or +``a:Type u#i -> a -> a``---meaning, for all types ``a`` in +universe ``Type u#i``, ``id a`` is function of type ``a -> a``. + +Now, ``id_t`` is itself a type in universe ``Type u#(i + 1)``, and +since the ``id`` function can be applied to types in any universe, it +can be applied to ``id_t`` too. So, it may look like this allows one +to write functions that can be applied to themselves---which would be +bad, since that would allow one to create infinite loops and break +F*'s logic. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: seemingly_self_application$ + :end-before: //SNIPPET_END: seemingly_self_application$ + +However, if we write out the universe levels explicitly, we see that +actually we aren't really applying the ``id`` function to +itself. Things are actually stratified, so that we are instead applying an +instance of ``id`` at universe ``u#(i + 1)`` to the instance of ``id`` +at universes ``u#i``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: stratified_application$ + :end-before: //SNIPPET_END: stratified_application$ + + +One intuition for what's happening here is that there are really +infinitely many instances of the F* type system nested within each +other, with each instance forming a universe. Type-polymorphic +functions (like ``id``) live in some universe ``u#(a + 1)`` and are +parameterized over all the types in the immediately preceding universe +``u#a``. The universe levels ensure that an F* function within +universe level ``u#a`` cannot consume or produce terms that live in +some greater universe. + +Universe level of an inductive type definition +.............................................. + +F* computes a universe level for inductive type definitions. To +describe the rules for this in full generality, consider again the +general form of an inductive type definition, shown first :ref:`here +`, but this time with the universe level of each +type constructor made explicit, i.e., :math:`T_i` constructs a type in +universe :math:`\mathsf{Type~u\#l_i}` + +.. math:: + + \mathsf{type}~T_1~\overline{(x_1:p_i)} : \overline{y_1:q_1} \rightarrow \mathsf{Type}~u\#l_1 = \overline{\bar D_1 : t_1} \\ + \mathsf{and}~T_n~\overline{(x_n:p_n)} : \overline{y_n:q_n} \rightarrow \mathsf{Type}~u\#l_n = \overline{\bar D_n : t_n} \\ + +Recall that each type constructor :math:`T_i` has zero or more *data +constructors* :math:`\overline{D_i:t_i}` and for each data constructor +:math:`D_{ij}`, its type :math:`t_{ij}` must be of the form +:math:`\overline{z_{ij}:s_{ij}} \rightarrow T_i~\bar{x_i}~\bar{e}` + +In addition to checking, as usual, that the each :math:`t_{ij}` is +well-typed, F* also checks the universe levels according to the +following rule: + + * Assuming that each :math:`T_i` has universe level :math:`l_i`, for + every data constructor :math:`D_{ij}`, and for each of its + arguments :math:`z_{ijk} : s_{ijk}`, check :math:`s_{ijk} : + \mathsf{Type}~u\#l_{ijk}` and :math:`l_{ijk} \leq l_i`. + +In other words, the universe level of each type constructor must not +be less than the universe of any of the fields of data constructors. + +In practice, F* infers the universe levels :math:`l_1, \ldots, l_n`, by +collecting level-inequality constraints and solving them using the +``max`` operator on universe levels, i.e., :math:`l_i` is set to +:math:`max_{jk}~l_{ijk}`, the maximum of the universe levels of all +the fields of the constructors :math:`\overline{D_i : t_i}`. Let's +look at some examples. + +The ``list`` type ++++++++++++++++++ + +The ``list`` type below is parameterized by ``a:Type u#a`` and +constructs a type in the same universe. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: list$ + :end-before: //SNIPPET_END: list$ + +* The ``Nil`` constructor has no fields, so it imposes no + constraints on the universe level of ``list a``. + +* The ``Cons`` constructor has two fields. Its first field ``hd`` + has type ``a: Type u#a``: we have a constraint that ``u#a`` :math:`\leq` ``u#a``; and + the second field, by assumption, has type ``list a : Type u#a``, + and again we have the constraint ``u#a`` :math:`\leq` ``u#a``. + +F* infers the minimum satisfiable universe level assignment, by +default. But, there are many solutions to the inequalities, and if +needed, one can use annotations to pick another solution. For example, +one could write this, though it rarely makes sense to pick a universe +for a type higher than necessary (see :ref:`this section ` for an exception). + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: list'$ + :end-before: //SNIPPET_END: list'$ + +.. note:: + + Universe level variables are drawn from a different namespace than + other variables. So, one often writes ``a:Type u#a``, where ``a`` + is a regular variable and ``u#a`` is the universe level of the type + of ``a``. + +The ``pair`` type ++++++++++++++++++ + +The ``pair`` type below is parameterized by ``a:Type u#a`` and +``b:Type u#b`` and constructs a type in ``u#(max a b)``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: pair$ + :end-before: //SNIPPET_END: pair$ + +* The ``fst`` field is in ``u#a`` and so we have ``u#a`` :math:`\leq` ``u#(max a b)``. + +* The ``snd`` field is in ``u#b`` and so we have ``u#b`` :math:`\leq` ``u#(max a b)``. + +The ``top`` type ++++++++++++++++++ + +The ``top`` type below packages a value at any type ``a:Type u#a`` +with its type. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: top$ + :end-before: //SNIPPET_END: top$ + +* The ``a`` field of ``Top`` is in ``u#(a + 1)`` while the ``v`` field + is in ``u#a``. So, ``top`` itself is in ``u#(max (a + 1) a)``, + which simplifies to ``u#(a + 1)``. + +One intuition for why this is so is that from a value of type ``t : +top`` one can write a function that computes a value of type ``Type +u#a``, i.e., ``Top?.a t``. So, if instead we have ``top: Type u#a`` +and ``t:top``, then ``Top?.a : top -> Type u#a``, which would break +the stratification of universes, since from a value ``top`` in +universe ``u#a``, we would be able to project out a value in +``Type u#(a + 1)``, which leads to a paradox, as we'll see next. + +What follows is quite technical and you only need to know that the +universe system exists to avoid paradoxes, not how such paradoxes are +constructed. + +Russell's Paradox +----------------- + +Type theory has its roots in Bertrand Russell's `The Principles +of Mathematics +`_, which +explores the logical foundations of mathematics and set theory. In +this work, Russell proposed the paradoxical set :math:`\Delta` whose +elements are exactly all the sets that don't contain themselves and +considered whether or not :math:`\Delta` contained itself. This +self-referential construction is paradoxical since: + + * If :math:`\Delta \in \Delta`, then since the only sets in + :math:`\Delta` are the sets that don't contain themselves, we can + conclude that :math:`\Delta \not\in \Delta`. + + * On the other hand, if :math:`\Delta \not\in \Delta`, then since + :math:`\Delta` contains all sets that don't contain themselves, we + can conclude that :math:`\Delta \in \Delta`. + +To avoid such paradoxes, Russell formulated a stratified system of +types to prevent nonsensical constructions that rely on +self-reference. The universe levels of modern type theories serve much +the same purpose. + +In fact, as the construction below shows, if it were possible to break +the stratification of universes in F*, then one can code up Russell's +:math:`\Delta` set and prove ``False``. This construction is derived +from `Thorsten Altenkirch's Agda code +`_. Liam +O'Connor also provides `some useful context and comparison +`_. Whereas +the Agda code uses a special compiler pragma to enable unsound +impredicativity, in F* we show how a user-introduced axiom can +simulate impredicativity and enable the same paradox. + + +Breaking the Universe System +............................. + +Consider the following axioms that intentionally break the +stratification of F*'s universe system. We'll need the following +ingredients: + +1. A strictly positive type constructor ``lower`` that takes a type in + any universe ``a:Type u#a``, and returns a type in ``u#0``. Note, + we covered :ref:`strictly positive type functions, previously + `. + +.. literalinclude:: ../code/UnsoundUniverseLowering.fst + :language: fstar + :start-after: //SNIPPET_START: lower$ + :end-before: //SNIPPET_END: lower$ + +2. Assume there is a function called ``inject``, which takes value of + type ``x:a`` and returns value of type ``lower a``. + +.. literalinclude:: ../code/UnsoundUniverseLowering.fst + :language: fstar + :start-after: //SNIPPET_START: inject$ + :end-before: //SNIPPET_END: inject$ + +3. ``lower`` and ``inject`` on their own are benign (e.g., ``let lower + _ = unit`` and ``let inject _ = ()``). But, now if we assume we + have a function ``project`` that is the inverse of ``inject``, then + we've opened the door to paradoxes. + +.. literalinclude:: ../code/UnsoundUniverseLowering.fst + :language: fstar + :start-after: //SNIPPET_START: project$ + :end-before: //SNIPPET_END: project$ + +Encoding Russell's Paradox +........................... + +To show the paradox, we'll define a notion of ``set`` in terms of a +form of set comprehensions ``f: x -> set``, where ``x:Type u#0`` is +the domain of the comprehension, supposedly bounding the cardinality +of the set. We'll subvert the universe system by treating ``set`` as +living in universe ``u#0``, even though its constructor ``Set`` has a +field ``x:Type u#0`` that has universe level ``u#1`` + +.. literalinclude:: ../code/Russell.fst + :language: fstar + :start-after: //SNIPPET_START: set$ + :end-before: //SNIPPET_END: set$ + +This construction allows us to define many useful sets. For example, +the empty set ``zero`` uses the empty type ``False`` as the domain of +its comprehension and so has no elements; or the singleton set ``one`` +whose only element is the empty set; or the set ``two`` that contains +the empty set ``zero`` and the singleton set ``one``. + +.. literalinclude:: ../code/Russell.fst + :language: fstar + :start-after: //SNIPPET_START: zero,one,two$ + :end-before: //SNIPPET_END: zero,one,two$ + +One can also define set membership: A set ``a`` is a member of a set +``b``, if one can exhibit an element ``v`` of the domain type of +``b`` (i.e., ``(project b).x``), such that ``b``'s comprehension +``(project b).f`` applied to ``v`` is ``a``. + +For example, one can prove ``mem zero two`` by picking ``true`` for +``v`` and ``mem one two`` by picking ``false`` for +``v``. Non-membership is just the negation of membership. + +.. literalinclude:: ../code/Russell.fst + :language: fstar + :start-after: //SNIPPET_START: mem$ + :end-before: //SNIPPET_END: mem$ + +Now, we are ready to define Russell's paradoxical set +:math:`\Delta`. First, we define ``delta_big`` in a larger universe +and then use ``inject`` to turn it into a ``set : Type u#0``. The +encoding of ``delta_big`` is fairly direct: Its domain type is the +type of sets ``s`` paired with a proof that ``s`` does not contain +itself; and its comprehension function just returns ``s`` itself. + +.. literalinclude:: ../code/Russell.fst + :language: fstar + :start-after: //SNIPPET_START: delta$ + :end-before: //SNIPPET_END: delta$ + +We can now prove both ``delta `mem` delta`` and ``delta `not_mem` +delta``, using the unsound ``inj_proj`` axiom that breaks the universe +system, and derive ``False``. + +.. literalinclude:: ../code/Russell.fst + :language: fstar + :start-after: //SNIPPET_START: proof$ + :end-before: //SNIPPET_END: proof$ + +The proofs are more detailed than they need to be, and if you're +curious, maybe you can follow along by reading the comments. + +The upshot, however, is that without the stratification of universes, +F* would be unsound. + + +Refinement types, FStar.Squash, ``prop``, and Impredicativity +------------------------------------------------------------- + +We've seen how universes levels are computed for arrow types and +inductive type definitions. The other way in which types can be formed +in F* is with refinement types: ``x:t{p}``. As we've seen previously, +a value ``v`` of type ``x:t{p}`` is just a ``v:t`` where ``p[v/x]`` is +derivable in the current scope in F*'s SMT-assisted classical +logic—there is no way to extract a proof of ``p`` from a proof of +``x:t{p}``, i.e., refinement types are F*'s mechanism for proof +irrelevance. + +**Universe of a refinement type**: The universe of a refinement type ``x:t{p}`` is the universe of ``t``. + +Since the universe of a refinement type does not depend on ``p``, it +enables a limited form of impredicativity, and we can define the +following type (summarized here from the F* standard library +``FStar.Squash``): + +.. code-block:: fstar + + let squash (p:Type u#p) : Type u#0 = _:unit { p } + let return_squash (p:Type u#p) (x:p) : squash p = () + +This is a lot like the ``lower`` and ``inject`` assumptions that we +saw in the previous section, but, importantly, there is no ``project`` +operation to invert an ``inject``. In fact, ``FStar.Squash`` proves +that ``squash p`` is proof irrelevant, meaning that all proofs of +``squash p`` are equal. + +.. code-block:: fstar + + val proof_irrelevance (p: Type u#p) (x y: squash p) : squash (x == y) + +``FStar.Squash`` does provide a limited way to manipulate a proof of +``p`` given a ``squash p``, using the combinator ``bind_squash`` shown +below, which states that if ``f`` can build a proof ``squash b`` from any +proof of ``a``, then it can do so from the one and only proof of ``a`` +that is witnessed by ``x:squash a``. + +.. code-block:: fstar + + val bind_squash (#a: Type u#a) (#b: Type u#b) (x: squash a) (f: (a -> squash b)) : squash b + +It is important that ``bind_squash`` return a ``squash b``, +maintaining the proof-irrelevance of the ``squash`` type. Otherwise, +if one could extract a proof of ``a`` from ``squash a``, we would be +perilously close to the unsound ``project`` axiom which enables +paradoxes. + +This restriction is similar to Coq's restriction on its ``Prop`` type, +forbidding functions that match on ``Prop`` to return results outside +``Prop``. + +The F* type ``prop`` (which we saw first :ref:`here `) is +defined primitively as type of all squashed types, i.e., the only +types in ``prop`` are types of the form ``squash p``; or, +equivalently, every type ``t : prop``, is a subtype of ``unit``. Being +the type of a class of types, ``prop`` in F* lives in ``u#1`` + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: prop$ + :end-before: //SNIPPET_END: prop$ + +However, ``prop`` still offers a form of impredicativity, e.g., you +can quantify over all ``prop`` while remaining in ``prop``. + +.. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: prop impredicative$ + :end-before: //SNIPPET_END: prop impredicative$ + +* The first line above shows that, as usual, an arrow type is in a + universe that is the maximum of the universes of its argument and + result types. In this case, since it has an argument ``prop : Type + u#1`` the arrow itself is in ``u#1``. + +* The second line shows that by squashing the arrow type, we can bring + it back to ``u#0`` + +* The third line shows the more customary way of doing this in F*, + where ``forall (a:prop). a`` is just syntactic sugar for ``squash + (a:prop -> a)``. Since this is a ``squash`` type, not only does it + live in ``Type u#0``, it is itself a ``prop``. + +* The fourth line shows that the same is true for ``exists``. + +.. _Part2_Universes_raising: + +Raising universes and the lack of cumulativity +---------------------------------------------- + +In some type theories, notably in Coq, the universe system is +*cumulative*, meaning that ``Type u#i : Type u#(max (i + 1) j)``; +or, that ``Type u#i`` inhabits all universes greater than +``i``. In contrast, in F*, as in Agda and Lean, ``Type u#i : Type +u#(i + 1)``, i.e., a type resides only in the universe immediately +above it. + +Cumulativity is a form of subtyping on universe levels, and it can be +quite useful, enabling definitions at higher universes to be re-used +for all lower universes. However, systems that mix universe +polymorphism with cumulativity are quite tricky, and indeed, it was +only recently that Coq offered both universe polymorphism and +cumulativity. + +Lacking cumulativity, F* provides a library ``FStar.Universe`` that +enables lifting a term from one universe to a higher one. We summarize +it here: + +.. code-block:: fstar + + val raise_t ([@@@ strictly_positive] t : Type u#a) : Type u#(max a b) + + val raise_val (#a:Type u#a) (x:a) : raise_t u#a u#b a + + val downgrade_val (#a:Type u#a) (x:raise_t u#a u#b a) : a + + val downgrade_val_raise_val (#a: Type u#a) (x: a) + : Lemma (downgrade_val u#a u#b (raise_val x) == x) + + val raise_val_downgrade_val (#a: Type u#a) (x: raise_t u#a u#b a) + : Lemma (raise_val (downgrade_val x) == x) + +The type ``raise_t t`` is strictly positive in ``t`` and raises ``t`` +from ``u#a`` to ``u#(max a b)``. ``raise_val`` and +``downgrade_val`` are mutually inverse functions between ``t`` and +``raise_t t``. + +This signature is similar in structure to the unsound signature for +``lower, inject, project`` that we use to exhibit Russell's +paradox. However, crucially, the universe levels in ``raise_t`` ensure +that the universe levels *increase*, preventing any violation of +universe stratification. + +In fact, this signature is readily implemented in F*, as shown below, +where the universe annotation on ``raise_t`` explicitly defines the +type in a higher universe ``u#(max a b)`` rather than in its minimum +universe ``u#a``. + +.. code-block:: fstar + + noeq + type raise_t (a : Type u#a) : Type u#(max a b) = + | Ret : a -> raise_t a + + let raise_val #a x = Ret x + let downgrade_val #a x = match x with Ret x0 -> x0 + let downgrade_val_raise_val #a x = () + let raise_val_downgrade_val #a x = () + +.. _Part2_tips_for_universes: + +Tips for working with universes +------------------------------- + +Whenever you write ``Type`` in F*, you are implicitly writing ``Type +u#?x``, where ``?x`` is a universe *metavariable* left for F* to infer. When +left implicit, this means that F* may sometimes infer universes for +your definition that are not what you expect---they may be too general +or not general enough. We conclude this section with a few tips to +detect and fix such problems. + +* If you see puzzling error messages, enable the following pragma: + + .. code-block:: fstar + + #push-options "--print_implicits --print_universes" + + This will cause F* to print larger terms in error messages, which + you usually do not want, except when you are confronted with error + messages of the form "expected type t; got type t". + +* Aside from the built-in constants ``Type u#a``, the ``->`` type + constructor, and the refinement type former, the only universe + polymorphic F* terms are top-level definitions. That is, while you + can define ``i`` at the top-level and use it polymorphically: + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: id_top_level$ + :end-before: //SNIPPET_END: id_top_level$ + + You cannot do the same in a non-top-level scope: + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: no_local_poly$ + :end-before: //SNIPPET_END: no_local_poly$ + + Of course, non-universe-polymorphic definitions work at all scopes, + e.g., here, the ``i`` is polymorphic in all types at universe + ``u#0``. + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: type_poly$ + :end-before: //SNIPPET_END: type_poly$ + +* If you write a ``val f : t`` declaration for ``f``, F* will compute + the most general universe for the type ``t`` independently of the + ``let f = e`` or ``type f =`` definition. + + A simple example of this behavior is the following. Say, you declare + ``tup2`` as below. + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: val tup2$ + :end-before: //SNIPPET_END: val tup2$ + + Seeing this declaration F* infers ``val tup2 (a:Type u#a) (b:Type u#b) + : Type u#c``, computing the most general type for ``tup2``. + + If you now try to define ``tup2``, + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: let tup2$ + :end-before: //SNIPPET_END: let tup2$ + + F* complains with the following error (with ``--print_universes`` on): + + .. code-block:: + + Type u#(max uu___43588 uu___43589) is not a subtype of the expected type Type u#uu___43590 + + Meaning that the inferred type for the definition of ``tup2 a b`` is + ``Type u#(max a b)``, which is of course not the same as ``Type + u#c``, and, sadly, the auto-generated fresh names in the error + message don't make your life any easier. + + The reason for this is that one can write a ``val f : t`` in a + context where a definition for ``f`` may never appear, in which case + F* has to compute some universes for ``t``---it chooses the most + general universe, though if you do try to implement ``f`` you may + find that the most general universe is too general. + + A good rule of thumb is the following: + + - Do not write a ``val`` declaration for a term, unless you are + writing an :ref:`interface `. Instead, directly + write a ``let`` or ``type`` definition and annotate it with the + type you expect it to have---this will lead to fewer + surprises. For example, instead of separating the ``val tup2`` + from ``let tup2`` just write them together, as shown below, and F* + infers the correct universes. + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: tuple2$ + :end-before: //SNIPPET_END: tuple2$ + + - If you must write a ``val f : t``, because, say, the type ``t`` is + huge, or because you are writing an interface, it's a good idea to + be explicit about universes, so that when defining ``f``, you know + exactly how general you have to be in terms of universes; and, + conversely, users of ``f`` know exactly how much universe + polymorphism they are getting. For example: + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: tup2_again$ + :end-before: //SNIPPET_END: tup2_again$ + +* When defining an inductive type, prefer using parameters over + indexes, since usually type parameters lead to types in lower + universes. For example, one might think to define lists this way: + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: list_alt$ + :end-before: //SNIPPET_END: list_alt$ + + Although semantically equivalent to the standard list + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: list$ + :end-before: //SNIPPET_END: list$ + + ``list_alt`` produces a type in ``u#(a + 1)``, since both ``NilAlt`` + and ``ConsAlt`` have fields of type ``a:Type u#a``. So, unless the + index of your type varies among the constructors, use a parameter + instead of an index. + + That said, recall that it's the fields of the constructors of the + inductive type that count. You can index your type by a type in any + universe and it doesn't influence the result type. Here's an + artificial example. + + .. literalinclude:: ../code/Universes.fst + :language: fstar + :start-after: //SNIPPET_START: crazy_index$ + :end-before: //SNIPPET_END: crazy_index$ + diff --git a/doc/book/PoP-in-FStar/book/part2/part2_vectors.rst b/doc/book/PoP-in-FStar/book/part2/part2_vectors.rst new file mode 100644 index 00000000000..4bd9a429430 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_vectors.rst @@ -0,0 +1,328 @@ +.. _Part2_vectors: + +Length-indexed Lists +==================== + +To make concrete some aspects of the formal definitions above, we'll +look at several variants of a parameterized list datatype augmented +with indexes that carry information about the list's length. + +Even and Odd-lengthed Lists +........................... + +Our first example is bit artifical, but helps illustrate a usage of +mutually inductive types. + +Here, we're defining two types constructors called ``even`` and +``odd``, (i.e, just :math:`T_1` and :math:`T_2` from our formal +definition), both with a single parameter ``(a:Type)``, for the type +of the lists' elements, and no indexes. + +All lists of type ``even a`` have an even number of elements---zero +elements, using its first constructor ``ENil``, or using ``ECons``, +one more than the number of elements in an ``odd a``, a list with an +odd number of elements. Elements of the type ``odd a`` are constructed +using the constructor ``OCons``, which adds a single element to an +``even a`` list. The types are mutually inductive since their +definitions reference each other. + + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: even_and_odd + :end-before: //SNIPPET_END: even_and_odd + +Although closely related, the types ``even a`` and ``odd a`` are from +distinct inductive types. So, to compute, say, the length of one of +these lists one generally write a pair of mutually recursive +functions, like so: + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: elength_and_olength + :end-before: //SNIPPET_END: elength_and_olength + +Note, we can prove that the length of an ``even a`` and ``odd a`` are +really even and odd. + +Now, say, you wanted to map a function over an ``even a``, you'd have +to write a pair of mutually recursive functions to map simultaneoulsy +over them both. This can get tedious quickly. Instead of rolling out +several mutually inductive but distinct types, one can instead use an +*indexed* type to group related types in the same inductive family of +types. + +The definition of ``even_or_odd_list`` below shows an inductive type +with one parameter ``a``, for the type of lists elements, and a single +boolean index, which indicates whether the list is even or odd. Note +how the index varies in the types of the constructors, whereas the +parameter stays the same in all instances. + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: even_or_odd_list + :end-before: //SNIPPET_END: even_or_odd_list + +Now, we have a single family of types for both even and odd lists, and +we can write a single function that abstracts over both even and odd +lists, just by abstracting over the boolean index. For example, +``eo_length`` computes the length of an ``even_or_odd_list``, with its +type showing that it returns an even number with ``b`` is true and an +odd number otherwise. + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: eo_length + :end-before: //SNIPPET_END: eo_length + +.. note:: + + Note, in ``eo_length`` we had to explicitly specify a decreases + clause to prove the function terminating. Why? Refer back to the + section on :ref:`default + measures` to recall that by + default is the lexicographic ordering of all the arguments in + order. So, without the decreases clause, F* will try to prove that + the index argument ``b`` decreases on the recursive call, which it + does not. + +This is our first type with with both parameters and indices. But why +stop at just indexing to distinguish even and odd-lengthed lists? We +can index a list by its length itself. + +Vectors +....... + +Let's look again at the definition of the ``vec`` type, first shown in +:ref:`the introduction`. + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: vec + :end-before: //SNIPPET_END: vec + +Here, we're defining just a single type constructor called ``vec`` +(i.e, just :math:`T_1`), which a single parameter ``(a:Type)`` and one +index ``nat``. + +``vec`` has two data constructors: ``Nil`` builds an instance of ``vec +a 0``, the empty vector; and ``Cons hd tl`` builds an instance of +``vec a (n + 1)`` from a head element ``hd:a`` and a tail ``tl : vec a +n``. That is, the two constructors build different instances of +``vec``---those instances have the same parameter (``a``), but +different indexes (``0`` and ``n + 1``). + +.. note:: + + Datatypes in many languages in the ML family, including OCaml and + F#, have parameters but no indexes. So, all the data construtors + construct the same instance of the type constructor. Further, all + data constructors take at most one argument. If your datatype + happens to be simple enough to fit these restrictions, you can use + a notation similar to OCaml or F# for those types in F*. For + example, here's the ``option`` type defined in F* using an + OCaml-like notation. + + .. code-block:: fstar + + type option a = + | None + | Some of a + + This is equivalent to + + .. code-block:: fstar + + type option a = + | None : option a + | Some : a -> option a + +Getting an element from a vector +................................ + +With our length-indexed ``vec`` type, one can write functions with +types that make use of the length information to ensure that they are +well-defined. For example, to get the ``i`` th element of a vector, one +can write: + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: long_get + :end-before: //SNIPPET_END: long_get + +The type of ``get i v`` says that ``i`` must be less than ``n``, where +``n`` is the length of ``v``, i.e., that ``i`` is within bounds of the +vector, which is enough to prove that ``get i v`` can always return an +element of type ``a``. Let's look a bit more closely at how this +function is typechecked by F*. + +The first key bit is pattern matching ``v``: + +.. code-block:: fstar + + match v with + | Nil -> false_elim() + | Cons hd tl -> + +In case ``v`` is ``Nil``, we use the library function +``Prims.false_elim : squash False -> a`` to express that this case is +impossible. Intuitively, since the index ``i`` is a natural number +strictly less than the length of the list, we should be able to +convince F* that ``n <> 0``. + +The way this works is that F* typechecks the branch in a context that +includes an *equation*, namely that the ``v : vec a n`` equals the +pattern ``Nil : vec a 0``. With the assumption that ``v == Nil`` in +the context, F* tries to check that ``false_elim`` is well-typed, +which in turn requires ``() : squash False``. This produces an proof +obligation sent to the SMT solver, which is able to prove ``False`` in +this case, since from ``v = Nil`` we must have that ``n = 0`` which +contradicts ``i < n``. Put another way, the branch where ``v = Nil`` +is unreachable given the precondition ``i < n``. + +.. note:: + + When a branch is unreachable, F* allows you to just omit the branch + altogether, rather than writing it an explicitly calling + ``false_elim``. For example, it is more common to write: + + .. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: get + :end-before: //SNIPPET_END: get + + where ``let Cons hd tl = v`` pattern matches ``v`` against just + ``Cons hd tl``. F* automatically proves that the other cases of + the match are unreachable. + +Now, turning to the second case, we have a pattern like this: + +.. code-block:: fstar + + match v with + | Cons hd tl -> + +But, recall that ``Cons`` has an implicit first argument describing +the length of ``tl``. So, more explicitly, our pattern is of the form +below, where ``tl : vec a m``. + +.. code-block:: fstar + + match v with + | Cons #m hd tl -> + +F* typechecks the branch in a context that includes the equation that +``v == Cons #m hd tl``, which lets the solve conclude that ``n == m + +1``, from the type of ``Cons``. + +If ``i=0``, we've found the element we want and return it. + +Otherwise, we make a recursive call ``get (i - 1) tl`` and now F* has +to: + + * Instantiate the implicit argument of ``get`` to ``m``, the length + of ``tl``. That is, in explicit form, this recursive call is + really ``get #m (i - 1) tl``. F* does this by relying on a + unification algorithm implemented as part of its type inference + procedure. + + * Prove that ``(i - 1) < m``, which follows from ``i < n`` and ``n + == m + 1``. + + * Prove that the recursive call terminates, by proving that ``m << + n``, or, equivalently, since ``m`` and ``n`` are natural numbers, + ``m < n``. This is easy, since we have ``n == m + 1``. + +Let's try a few exercises. The main work is to find a type for the +functions in question. Once you do, the rest of the code will "write +itself". + +Exercises +......... + + +Exercise: Concatenating vectors +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +`Click here <../code/exercises/Part2.Vec.fst>`_ for the exercise file. + +Implement a function to concatenate vectors. It should have the +following signature: + +.. code-block:: fstar + + val append (#a:Type) (#n #m:nat) (v1:vec a n) (v2:vec a m) + : vec a (n + m) + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: append + :end-before: SNIPPET_END: append + +-------------------------------------------------------------------------------- + +Exercise: Splitting a vector +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Write a function called ``split_at`` to split a vector ``v : vec a n`` +at index ``i`` into its ``i`` -length prefix from position ``0`` and a +suffix starting at ``i``. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: split_at + :end-before: SNIPPET_END: split_at + +-------------------------------------------------------------------------------- + +Write a tail-recursive version of ``split_at``. You will need a +``reverse`` function as a helper. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: reverse + :end-before: SNIPPET_END: reverse + + .. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: SNIPPET_START: split_at_tail + :end-before: SNIPPET_END: split_at_tail + +Bonus: Prove ``split_at`` and ``split_at_tail`` are equivalent. + +-------------------------------------------------------------------------------- + +Vectors: Probably not worth it +.............................. + +Many texts about dependent types showcase length-indexed vectors, much +as we've done here. Although useful as a simple illustrative example, +the ``vec`` type we've seen is probably not what you want to use in +practice. Especially in F*, where regular lists can easily be used +with refinement types, length-indexed vectors are redundant because we +simply refine our types using a ``length`` function. The code below +shows how: + + .. literalinclude:: ../code/LList.fst + :language: fstar + +In the next few sections, we'll see more useful examples of indexed +inductive types than just mere length-indexed vectors. diff --git a/doc/book/PoP-in-FStar/book/part2/part2_well_founded.rst b/doc/book/PoP-in-FStar/book/part2/part2_well_founded.rst new file mode 100644 index 00000000000..bed21fd71f5 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part2/part2_well_founded.rst @@ -0,0 +1,243 @@ +.. _Part2_well_founded_recursion: + +Well-founded Relations and Termination +====================================== + +In an earlier chapter on :ref:`proofs of termination +`, we learned about how F* checks that recursive +functions terminate. In this chapter, we see how the termination check +arises from inductive types and structural recursion. Just as with +:ref:`equality `, termination checking, a core feature +of F*'s logic and proof system, finds its foundation in inductive +types. + +For more technical background on this topic, the following resources +may be useful: + + * `Constructing Recursion Operators in Type Theory, L. Paulson, Journal of Symbolic Computation (1986) 2, 325-355 `_ + + * `Modeling General Recursion in Type Theory, A. Bove & V. Capretta, + Mathematical Structures in Computer Science (2005) + `_ + +Thanks to Aseem Rastogi, Chantal Keller, and Catalin Hritcu, for +providing some of the F* libraries presented in this chapter. + + * `FStar.WellFounded.fst `_ + + * `FStar.LexicographicOrdering `_ + + + +Well-founded Relations and Accessibility Predicates +--------------------------------------------------- + +A binary relation :math:`R` on elements of type :math:`T` is +well-founded if there is no infinite sequence :math:`x_0, x_1, x_2, +...`, such that :math:`x_i~R~x_{i + 1}`, for all :math:`i`. + +As explained :ref:`earlier `, when typechecking a +recursive function ``f``, F* requires the user to provide a *measure*, +some function of the arguments of `f`, and checks that on a recursive +call, the measure of the arguments is related to the measure of the +formal parameters a built-in well-founded relation on F* terms. Since +well-founded relations have no infinite descending chains, every chain +of recursive calls related by such a relation must eventually +terminate. However, this built-in well-founded relation, written +``<<`` or ``precedes``, is a derived notion. + +In its most primitive form, the well-foundedness of a relation can be +expressed in terms of an inductive type ``acc`` (short for +"accessible") shown below. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: acc$ + :end-before: //SNIPPET_END: acc$ + +The type ``acc`` is parameterized by a type ``a``; a binary relation +``r: a -> a -> Type`` on ``a``; and an element of the type +``x:a``. Informally, the relation ``r y x`` is provable when ``y`` +is "smaller" than ``x``. + +The ``acc`` type has just one constructor ``AccIntro``, whose only +argument is a function of type ``y:a -> r y x -> acc r +y``. Intuitively, this says that in order to build an instance of +``acc r x0``, you have to provide a function which can build a proof +of ``acc r x1`` for all ``x1:a`` smaller than ``x0``. The only way to +build such a function is one can avoid infinite regress, is if +the chain ``x0 r x1 r x2 r ...``, eventually terminates in some ``xn`` +such that there are no elements smaller than it according to ``r``. + +In other words, if one can prove ``acc r x`` for all ``x:a``, then +this precisely captures the condition that there are no infinite +descending ``r``-related chains in ``a``, or that ``r`` is +well-founded. This is exactly what the definition below says, where +``is_well_founded`` is a classical (SMT-automatable) variant of +``well_founded``. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: well_founded$ + :end-before: //SNIPPET_END: well_founded$ + +Well-founded Recursion +---------------------- + +Given a relation ``r`` and proof of ``p:acc r x`` , one can define a +recursive function on ``x`` whose termination can be established +purely in terms of structural recursion on the proof ``p``, even +though the function may not itself be structurally recursive on ``x``. + +The combinator ``fix_F`` shown below illustrates this at work: + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: fix_F$ + :end-before: //SNIPPET_END: fix_F$ + +If ``f`` is a function such that every recursive call in the +definition of ``f x`` is on an argument ``y``, such that that ``y`` is +smaller than ``x`` according to some relation ``r``; and if starting +from some argument ``x0``, we have a proof of accessibility ``acc r +x0`` (i.e., no infinite descending ``r``-chains starting at ``x0``), +then the fixpoint of ``f`` can be defined by structural recursion on +the proof of ``accessible_x0``. + + * ``fix_F`` is structurally recursive on ``accessible_x0`` since the + recursive call is on an element ``h1 y r_yx``, i.e., a child node + of the (possibly infinitely branching) tree rooted at ``AccIntro h1``. + +A slightly simpler version of ``fix_F`` is derivable if ``r`` is +well-founded, i.e., ``r`` is accessible for all elements ``x:a``. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: fix$ + :end-before: //SNIPPET_END: fix$ + +Some Well-founded Relations +--------------------------- + +We show how to buid some basic well-founded relations here. For +starters, since F* already internalizes that the ``<`` ordering on +natural numbers as part of its termination check, it is easy to prove +that ``<`` is well-founded. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: lt_nat$ + :end-before: //SNIPPET_END: lt_nat$ + +We can also define combinators to derive well-founded relations from +other well-founded relations. For example, if a relation ``sub_r`` is +a *sub-relation* of a well-founded relation ``r`` (meaning we have ``r +x y`` whenever we have ``sub_r x y``), then ``sub_r`` is well-founded +too. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: subrel_wf$ + :end-before: //SNIPPET_END: subrel_wf$ + +Another useful combinator derives the well-foundedness of a relation +``r: binrel b`` if it can be defined as the inverse image under some +function ``f: a -> b`` of some other well-founded relation ``r: +binrel``. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: inverse_image$ + :end-before: //SNIPPET_END: inverse_image$ + +For example, the ``>`` ordering on negative numbers can be proven +well-founded by defining it as the inverse image of the ``<`` ordering +on natural numbers. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: inverse_image_neg$ + :end-before: //SNIPPET_END: inverse_image_neg$ + +Termination Checking with Custom Well-founded Relations +------------------------------------------------------- + +In the F* library, ``FStar.LexicographicOrdering``, several other +relations are proven to be well-founded, including the lexicographic +ordering on dependent pairs. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: lexicographic_order$ + :end-before: //SNIPPET_END: lexicographic_order$ + +This order, defined as a ``binrel (x:a & b x)``, and is paramterized +by a binary relation (``r_a``) on ``a`` and a family of relations +(``r_b``) on ``b x``, one for each ``x:a``. It has two cases: + + * ``Left_lex``: The first component of the pair decreases by + ``r_a``, and the second component is irrelevant. + + + * ``Right_lex``: The first component of the pair is invariant, but + the second component decreases by ``r_b``. + +The proof is a little involved (see +``FStar.LexicographicOrdering.fst``), but one can prove that it is +well-founded when ``r_a`` and ``r_b`` are themselves well-founded, +i.e., + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: lexicographic_order_wf$ + :end-before: //SNIPPET_END: lexicographic_order_wf$ + +But, with this well-foundedness proof in hand, we can define recursive +functions with our own well-founded orders. + +To illustrate, let's define the ``ackermann`` function again (we saw +it first :ref:`here `), this time using +accessibilty and well-founded relations, rather than the built-in +``precedes`` relation. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: ackermann_manual$ + :end-before: //SNIPPET_END: ackermann_manual$ + +This version is way more verbose than the ackermann we saw +earlier---but this version demonstrates that F*'s built-in support for +the lexicographic orders over the precedes relation is semantically +justified by a more primitive model of well-founded relations + +To make user-defined well-founded orderings easier to work with, F* +actually provides a variant of the ``decreases`` clause to work with +well-founded relations. For example, one can use the following syntax +to gain from F*'s built-in from SMT automation and termination +checking, with the expressiveness of using ones own well-founded relation. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: ackermann_wf$ + :end-before: //SNIPPET_END: ackermann_wf$ + +To explain the syntax: + + * ``decreases {:well-founded p x}``: Here, ``p`` is meant to be an + instance for + ``FStar.LexicographicOrdering.well_founded_relation``, applied to + some term ``x`` built from the formal parameters in scope. + + * In this case, we use the combinator ``L.lex`` to build a + lexicographic ordering from ``wf_lt_nat`` (coercing it using a + utility ``coerce_wf`` to turn the definitions used in our tutorial + chapter here to the types expected by the + ``FStar.LexicographicOrdering`` library). + +We show the coercions below for completeness, though one would not +necessarily need them outside the context of this tutorial. + +.. literalinclude:: ../code/Part2.WellFounded.fst + :language: fstar + :start-after: //SNIPPET_START: coercions$ + :end-before: //SNIPPET_END: coercions$ diff --git a/doc/book/PoP-in-FStar/book/part3/part3.rst b/doc/book/PoP-in-FStar/book/part3/part3.rst new file mode 100644 index 00000000000..4cc349eaa3c --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part3/part3.rst @@ -0,0 +1,58 @@ +.. _Part3: + +########################################## +Modularity With Interfaces and Typeclasses +########################################## + +In this section, we'll learn about two abstraction techniques used to +structure larger F* developments: *interfaces* and *typeclasses*. + +**Interfaces**: An F* module ``M`` (in a file ``M.fst``) can be paired +with an interface (in a file ``M.fsti``). A module's interface is a +subset of the module's declarations and definitions. Another module +``Client`` that uses ``M`` can only make use of the part of ``M`` +revealed in its interface---the rest of ``M`` is invisible to +``Client``. As such, interfaces provide an abstraction mechanism, +enabling the development of ``Client`` to be independent of any +interface-respecting implementation of ``M``. + +Unlike module systems in other ML-like languages (which provide more +advanced features like signatures, functors, and first-class modules), +F*'s module system is relatively simple. + +* A module can have at most one interface. + +* An interface can have at most one implementation. + +* A module lacking an interface reveals all its details to clients. + +* An interface lacking an implementation can be seen as an assumption or an axiom. + +**Typeclasses**: Typeclasses cater to more complex abstraction +patterns, e.g., where an interface may have several +implementations. Many other languages, ranging from Haskell to Rust, +support typeclasses that are similar in spirit to what F* also +provides. + +Typeclasses in F* are actually defined mostly by a "user-space" +metaprogam (relying on general support for metaprogramming in `Meta-F* +`_), making them very +flexible (e.g., multi-parameter classes, overlapping instances, +etc. are easily supported). + +That said, typeclasses are a relatively recent addition to the +language and most of F*'s standard library does not yet use +typeclasses. As such, they are somewhat less mature than interfaces +and some features require encodings (e.g., typeclass inheritance), +rather than being supported with built-in syntax. + +Thanks especially to Guido Martinez, who designed and implemented +most of F*'s typeclass system. + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + part3_interfaces + part3_typeclasses + part3_alacarte diff --git a/doc/book/PoP-in-FStar/book/part3/part3_alacarte.rst b/doc/book/PoP-in-FStar/book/part3/part3_alacarte.rst new file mode 100644 index 00000000000..b350bb10f05 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part3/part3_alacarte.rst @@ -0,0 +1,482 @@ +.. _Part3_alacarte: + +Fun with Typeclasses: Datatypes a la Carte +=========================================== + +In a classic 1998 post, Phil Wadler describes a difficulty in language and +library design: how to modularly extend a data type together with the operations +on those types. Wadler calls this the `Expression Problem +`_, +saying: + + The Expression Problem is a new name for an old problem. The goal is to + define a datatype by cases, where one can add new cases to the datatype and + new functions over the datatype, without recompiling existing code, and while + retaining static type safety (e.g., no casts). + +There are many solutions to the Expression Problem, though a particularly +elegant one is Wouter Swierstra's `Data Types a la Carte +`_. +Swierstra's paper is a really beautiful functional pearl and is highly +recommended---it's probably useful background to have before diving into this +chapter, though we'll try to explain everything here as we go. His solution is a +great illustration of extensibility with typeclasses: so, we show how to apply +his approach using typeclasses in F*. More than anything, it's a really fun +example to work out. + +Swierstra's paper uses Haskell: so he does not prove his functions terminating. +One could do this in F* too, using the effect of :ref:`divergence `. +However, in this chapter, we show how to make it all work with total functions +and strictly positive inductive definitions. As a bonus, we also show how to do +proofs of correctness of the various programs that Swierstra develops. + +Getting Started +--------------- + +To set the stage, consider the following simple type of arithmetic expressions +and a function ``evaluate`` to evaluate an expression to an integer: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: exp$ + :end-before: //SNIPPET_END: exp$ + +This is straightforward to define, but it has an extensibility problem. + +If one wanted to add another type of expression, say ``Mul : exp -> exp -> +exp``, then one needs to redefine both the type ``exp`` adding the new case and +to redefine ``evaluate`` to handle that case. + +A solution to the Expression Problem would allow one to add cases to the ``exp`` +type and to progressively define functions to handle each case separately. + +Swierstra's idea is to define a single generic data type that is parameterized +by a type constructor, allowing one to express, in general, a tree of finite +depth, but one whose branching structure and payload is left generic. A first +attempt at such a definition in F* is shown below: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: expr-fail$ + :end-before: //SNIPPET_END: expr-fail$ + +Unfortunately, this definition is not accepted by F*, because it is not +necessarily well-founded. As we saw in a previous section on :ref:`strictly +positive definitions `, if we're not +careful, such definitions can allow one to prove ``False``. In particular, we +need to constrain the type constructor argument ``f`` to be *strictly positive*, +like so: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: expr$ + :end-before: //SNIPPET_END: expr$ + +This definition may bend your mind a little at first, but it's actually quite +simple. It may help to consider an example: the type ``expr list`` has values of +the form ``In of list (expr list)``, i.e., trees of arbitrary depth with a +variable branching factor such as in the example shown below. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: elist$ + :end-before: //SNIPPET_END: elist$ + +Now, given two type constructors ``f`` and ``g``, one can take their *sum* or +coproduct. This is analogous to the ``either`` type we saw in :ref:`Part 1 +`, but at the level of type constructors rather than types: we write +``f ++ g`` for ``coprod f g``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: coprod$ + :end-before: //SNIPPET_END: coprod$ + +Now, with these abstractions in place, we can define the following, where ``expr +(value ++ add)`` is isomorphic to the ``exp`` type we started with. Notice that +we've now defined the cases of our type of arithmetic expressions independently +and can compose the cases with ``++``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: value add$ + :end-before: //SNIPPET_END: value add$ + +Of course, building a value of type ``expr (value ++ add)`` is utterly horrible: +but we'll see how to make that better using typeclasses, next . + +Smart Constructors with Injections and Projections +-------------------------------------------------- + +A data constructor, e.g., ``Inl : a -> either a b`` is an injective function +from ``a`` to ``either a b``, i.e., each element ``x:a`` is mapped to a unique +element ``Inl x : either a b``. One can also project back that ``a`` from an +``either a b``, though this is only a partial function. Abstracting injections +and projections will give us a generic way to construct values in our extensible +type of expressions. + +First, we define some abbreviations: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: inj_proj$ + :end-before: //SNIPPET_END: inj_proj$ + +A type constructor ``f`` is less than or equal to another constructor ``g`` if +there is an injection from ``f a`` to ``g a``. This notion is captured by the +typeclass below: We have an ``inj`` and a ``proj`` where ``proj`` is an inverse +of ``inj``, and ``inj`` is a partial inverse of ``proj``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: leq$ + :end-before: //SNIPPET_END: leq$ + +We can now define some instances of ``leq``. First, of course, ``leq`` is +reflexive, and F* can easily prove the inversion lemma with SMT. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: leq_refl$ + :end-before: //SNIPPET_END: leq_refl$ + +More interestingly, we can prove that ``f`` is less than or equal to the +extension of ``f`` with ``g`` on the left: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: leq_ext_left$ + :end-before: //SNIPPET_END: leq_ext_left$ + +We could also prove the analogous ``leq_ext_right``, but we will explicitly not +give an instance for it, since as we'll see shortly, the instances we give are +specifically chosen to allow type inference to work well. Additional instances +will lead to ambiguities and confuse the inference algorithm. + +Instead, we will give a slightly more general form, including a congruence rule +that says that if ``f`` is less than or equal to ``h``, then ``f`` is also less +than or equal to the extension of ``h`` with ``g`` on the right. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: leq_cong_right$ + :end-before: //SNIPPET_END: leq_cong_right$ + +Now, for any pair of type constructors that satisfy ``leq f g``, we can lift the +associated injections and projections to our extensible expression datatype and +prove the round-trip lemmas + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: inject project$ + :end-before: //SNIPPET_END: inject project$ + +Now, with this machinery in place, we get to the fun part. For each of the cases +of the ``expr`` type, we can define a generic smart constructor, allowing one to +lift it to any type more general than the case we're defining. + +For instance, the smart constructor ``v`` lifts the constructor ``Val x`` into +the type ``expr f`` for any type greater than or equal to ``value``. Likewise, +``(+^)`` lifts ``Add x y`` into any type greater than or equal to ``add``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: v and +^$ + :end-before: //SNIPPET_END: v and +^$ + +And now we can write our example value so much more nicely than before: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: ex1$ + :end-before: //SNIPPET_END: ex1$ + +The type annotation on ``ex1 : expr (value ++ add)`` is crucial: it allows the +type inference algorithm to instantiate the generic parameter ``f`` in each +``v`` and in ``(+^)`` to ``(value ++ add)`` and then the search for typeclass +instances finds ``value `leq` (value ++ add)`` by using ``leq_cong_right`` and +``leq_left``; and ``add `leq` (value ++ add)`` using ``leq_ext_left``. + +With this setup, extensibility works out smoothly: we can add a multiplication +case, define a smart constructor for it, and easily use it to build expressions +with values, addition, and multiplication. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: mul$ + :end-before: //SNIPPET_END: mul$ + +Evaluating Expressions +---------------------- + +Now that we have a way to construct expressions, let's see how to define an +interpreter for expressions in an extensible way. An interpreter involves +traversing the expression tree, and applying operations to an accumulated +result, and returning the final accumulated value. In other words, we need a way +to *fold* over an expression tree, but to do so in an extensible, generic way. + +The path to doing that involves defining a notion of a functor: we saw functors +briefly in a :ref:`previous section `, and maybe you're +already familiar with it from Haskell. + +Our definition of functor below is slightly different than what one might +normally see. Usually, a type constructor ``t`` is a functor if it supports an +operation ``fmap: (a -> b) -> t a -> t b``. In our definition below, we flip the +order of arguments and require ``fmap x f`` to guarantee that it calls ``f`` +only on subterms of ``x``---this will allow us to build functors over +inductively defined datatypes in an extensible way, while still proving that all +our functions termination. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor$ + :end-before: //SNIPPET_END: functor$ + +Functor instances for ``value``, ``add``, and ``mul`` are easy to define: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor_value$ + :end-before: //SNIPPET_END: functor_value$ + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor_add$ + :end-before: //SNIPPET_END: functor_add$ + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor_mul$ + :end-before: //SNIPPET_END: functor_mul$ + +Maybe more interesting is a functor instance for co-products, or sums of +functors, i.e., if ``f`` and ``g`` are both functors, then so is ``f ++ g``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor_coprod$ + :end-before: //SNIPPET_END: functor_coprod$ + +With this in place, we can finally define a generic way to fold over an +expression. Given a function ``alg`` to map an ``f a`` to a result value ``a``, +``fold_expr`` traverses an ``expr f`` accumulating the results of ``alg`` +applied to each node in the tree. Here we see why it was important to refine the +type of ``fmap`` with the precondition ``x << t``: the recursive call to +``fold_expr`` terminates only because the argument ``x`` is guarantee to precede +``t`` in F*'s built-in well-founded order. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: fold_expr$ + :end-before: //SNIPPET_END: fold_expr$ + +Now that we have a general way to fold over our expression trees, we need an +extensible way to define the evaluators for each type of node in a tree. For +that, we can define another typeclass, ``eval f`` for an evaluator for nodes of +type ``f``. It's easy to give instances of eval for our three types of nodes, +separately from each other. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: eval$ + :end-before: //SNIPPET_END: eval$ + +With evaluators for ``f`` and ``g``, one can build an evaluator for ``f++g``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: eval_coprod$ + :end-before: //SNIPPET_END: eval_coprod$ + +Finally, we can build a generic evaluator for expressions: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: eval_expr$ + :end-before: //SNIPPET_END: eval_expr$ + +And, hooray, it works! We can ask F* to normalize and check that the result +matches what we expect: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: eval_test$ + :end-before: //SNIPPET_END: eval_test$ + +Provably Correct Optimizations +------------------------------ + +Now, let's say we wanted to optimize our expressions, rewriting them by +appealing to the usual arithmetic rules, e.g., distributing multiplication over +addition etc. Swierstra shows how to do that, but in Haskell, there aren't any +proofs of correctness. But, in F*, we can prove our expression rewrite rules +correct, in the sense that they preserve the semantics of expression evaluation. + +Let's start by defining the type of a rewrite rule and what it means for it to +be sound: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: rewrite_rule$ + :end-before: //SNIPPET_END: rewrite_rule$ + +A rewrite rule may fail, but if it rewrites ``x`` to ``y``, then both ``x`` and +``y`` must evaluate to the same result. We can package up a rewrite rule and its +soundness proof into a record, ``rewrite_t``. + +Now, to define some rewrite rules, it's convenient to have a bit of syntax to +handle potential rewrite failures---we'll use the monadic syntax :ref:`shown +previously `. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: error_monad$ + :end-before: //SNIPPET_END: error_monad$ + +Next, in order to define our rewrite rules for each case, we define what we +expect to be true for the expression evaluator for an expession tree that has +that case. + +For instance, if we're evaluating an ``Add`` node, then we expect the result to +the addition of each subtree. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: expected_semantics$ + :end-before: //SNIPPET_END: expected_semantics$ + +We can now define two example rewrite rules. The first rewrites ``(a * (c + +d))`` to ``(a * c + a * d)``; and the second rewrites ``(c + d) * b`` to ``(c * +b + d * b)``. Both of these are easily proven sound for any type of expression +tree whose nodes ``f`` include ``add`` and ``mul``, under the hypothesis that +the evaluator behaves as expected. + +We can generically compose rewrite rules: + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: compose_rewrites$ + :end-before: //SNIPPET_END: compose_rewrites$ + +Then, given any rewrite rule ``l``, we can fold over the expression applying the +rewrite rule bottom up whenever it is eligible. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: rewrite_expr$ + :end-before: //SNIPPET_END: rewrite_expr$ + +As with our evaluator, we can test that it works, by asking F* to evaluate the +rewrite rules on an example. We first define ``rewrite_distr`` to apply both +distributivity rewrite rules. And then assert that rewrite ``ex6`` produces +``ex6'``. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: rewrite_test$ + :end-before: //SNIPPET_END: rewrite_test$ + +Of course, more than just testing it, we can prove that it is correct. In fact, +we can prove that applying any rewrite rule over an entire expression tree +preserves its semantics. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: rewrite_soundness$ + :end-before: //SNIPPET_END: rewrite_soundness$ + +This is the one part of this development where the definition is not completely +generic in the type of expression nodes. Instead, this is proof for the specific +case of expressions that contain values, additions, and multiplications. I +haven't found a way to make this more generic. One would likely need to define a +generic induction principle similar in structure to ``fold_expr``---but that's +for another day's head scratching. If you know an easy way, please let me know! + +That said, the proof is quite straightforward and pleasant: We simply match on +the cases, use the induction hypothesis on the subtrees if any, and then apply +the soundness lemma of the rewrite rule. F* and Z3 automates much of the +reasoning, e.g., in the last case, we know we must have a ``Mul`` node, since +we've already matched the other two cases. + +Of course, since rewriting is sound for any rule, it is also sound for rewriting +with our distributivity rules. + +.. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: rewrite_distr_soundness$ + :end-before: //SNIPPET_END: rewrite_distr_soundness$ + +Exercises +--------- + +This `file <../code/exercises/Part3.DataTypesALaCarte.fst>`_ provides +the definitions you need. + +Exercise 1 +++++++++++ + +Write a function ``to_string_specific`` whose type is ``expr (value ++ add ++ +mul) -> string`` to print an expression as a string. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: functor$ + :end-before: //SNIPPET_END: functor$ + +Exercise 2 +++++++++++ + +Next, write a class ``render f`` with a ``to_string`` function to generically +print any expression of type ``expr f``. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: to_string$ + :end-before: //SNIPPET_END: to_string$ + +Exercise 3 +++++++++++ + +Write a function ``lift`` with the following signature + +.. code-block:: fstar + + let lift #f #g + {| ff: functor f |} + {| fg: leq f g |} + (x: expr f) + : expr g + +Use it to reuse an expression defined for one type to another, so that the +assertion below success + +.. code-block:: fstar + + let ex3 : expr (value ++ add ++ mul) = lift addExample *^ v 2 + + [@@expect_failure] + let test_e3 = assert_norm (eval_expr ex3 == (1337 * 2)) + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part3.DataTypesALaCarte.fst + :language: fstar + :start-after: //SNIPPET_START: lift$ + :end-before: //SNIPPET_END: lift$ + diff --git a/doc/book/PoP-in-FStar/book/part3/part3_interfaces.rst b/doc/book/PoP-in-FStar/book/part3/part3_interfaces.rst new file mode 100644 index 00000000000..0f2eeee29de --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part3/part3_interfaces.rst @@ -0,0 +1,279 @@ +.. _Part3_interfaces: + +Interfaces +========== + +Look through the F* standard libary (in the ``ulib`` folder) and +you'll find many files with ``.fsti`` extensions. Each of these is an +interface file that pairs with a module implementation in a +corresponding ``.fst`` file. + +An interface (``.fsti``) is very similar to a module implementation +(``.fst``): it can contain all the elements that a module can, +including inductive ``type`` definitions; ``let`` and ``let rec`` +definitions; ``val`` declarations; etc. However, unlike module +implementations, an interface is allowed to declare a symbol ``val f : +t`` without any corresponding definition of ``f``. This makes ``f`` +abstract to the rest of the interface and all client modules, i.e., +``f`` is simply assumed to have type ``t`` without any definition. The +definition of ``f`` is provided in the ``.fst`` file and checked to +have type ``t``, ensuring that a client module's assumption of ``f:t`` +is backed by a suitable definition. + +To see how interfaces work, we'll look at the design of the **bounded +integer** modules ``FStar.UInt32``, ``FStar.UInt64``, and the like, +building our own simplified versions by way of illustration. + +.. _Machine_integers: + +Bounded Integers +^^^^^^^^^^^^^^^^ + +The F* primitive type ``int`` is an unbounded mathematical +integer. When compiling a program to, say, OCaml, ``int`` is compiled +to a "big integer", implemented by OCaml's `ZArith package +`_. However, the ``int`` type +can be inefficient and in some scenarios (e.g., when compiling F* +to C) one may want to work with bounded integers that can always be +represented in machine word. ``FStar.UInt32.t`` and ``FStar.UInt64.t`` +are types from the standard library whose values can always be +represented as 32- and 64-bit unsigned integers, respectively. + +Arithmetic operations on 32-bit unsigned integers (like addition) are +interpreted modulo ``2^32``. However, for many applications, one wants +to program in a discipline that ensures that there is no unintentional +arithmetic overflow, i.e., we'd like to use bounded integers for +efficiency, and by proving that their operations don't overflow we can +reason about bounded integer terms without using modular arithmetic. + +.. note:: + + Although we don't discuss them here, F*'s libraries also provide + signed integer types that can be compiled to the corresponding + signed integters in C. Avoiding overflow on signed integer + arithmetic is not just a matter of ease of reasoning, since signed + integer overflow is undefined behavior in C. + +Interface: UInt32.fsti +---------------------- + +The interface ``UInt32`` begins like any module with the +module's name. Although this could be inferred from the name of the file +(``UInt32.fsti``, in this case), F* requires the name to be explicit. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: t$ + :end-before: //SNIPPET_END: t$ + +``UInt32`` provides one abstract type ``val t : eqtype``, the type of +our bounded integer. Its type says that it supports decidable +equality, but no definition of ``t`` is revaled in the interface. + +The operations on ``t`` are specified in terms of a logical model that +relates ``t`` to bounded mathematical integers, in particular +``u32_nat``, a natural number less than ``pow2 32``. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: bounds$ + :end-before: //SNIPPET_END: bounds$ + +.. note:: + + Unlike interfaces in languages like OCaml, interfaces in F* *can* + include ``let`` and ``let rec`` definitions. As we see in + ``UInt32``, these definitions are often useful for giving precise + specifications to the other operations whose signatures appear in + the interface. + +To relate our abstract type ``t`` to ``u32_nat``, the interface +provides two coercions ``v`` and ``u`` that go back and forth between +``t`` and ``u32_nat``. The lemma signatures ``vu_inv`` and +``uv_inv`` require ``v`` and ``u`` to be mutually inverse, meaning +that ``t`` and ``u32_nat`` are in 1-1 correspondence. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: vu$ + :end-before: //SNIPPET_END: vu$ + +Modular addition and subtraction +................................ + +Addition and subtraction on ``t`` values are defined modulo ``pow2 +32``. This is specified by the signatures of ``add_mod`` and +``sub_mod`` below. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: add_mod$ + :end-before: //SNIPPET_END: add_mod$ + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: sub_mod$ + :end-before: //SNIPPET_END: sub_mod$ + +Bounds-checked addition and subtraction +....................................... + +Although precise, the types of ``add_mod`` and ``sub_mod`` aren't +always easy to reason with. For example, proving that ``add_mod (u 2) +(u 3) == u 5`` requires reasoning about modular arithmetic---for +constants like ``2``, ``3``, and ``5`` this is easy enough, but proofs +about modular arithmetic over symbolic values will, in general, involve +reasoning about non-linear arithmetic, which is difficult to automate +even with an SMT solver. Besides, in many safety critical software +systems, one often prefers to avoid integer overflow altogether. + +So, the ``UInt32`` interface also provides two additional operations, +``add`` and ``sub``, whose specification enables two ``t`` values to be added +(resp. subtracted) only if there is no overflow (or underflow). + +First, we define an auxiliary predicate ``fits`` to state when an +operation does not overflow or underflow. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: fits$ + :end-before: //SNIPPET_END: fits$ + +Then, we use ``fits`` to restrict the domain of ``add`` and ``sub`` +and the type ensures that the result is the sum (resp. difference) of +the arguments, without need for any modular arithmetic. + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: add$ + :end-before: //SNIPPET_END: add$ + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: sub$ + :end-before: //SNIPPET_END: sub$ + +.. note :: + + Although the addition operator can be used as a first-class + function with the notation ``(+)``, the same does not work for + subtraction, since ``(-)`` resolves to unary integer negation, + rather than subtraction---so, we write ``fun x y -> x - y``. + +Comparison +.......... + +Finally, the interface also provides a comparison operator ``lt``, as +specified below: + +.. literalinclude:: ../code/UInt32.fsti + :language: fstar + :start-after: //SNIPPET_START: lt$ + :end-before: //SNIPPET_END: lt$ + +Implementation: UInt32.fst +-------------------------- + +An implementation of ``UInt32`` must provide definitions for +all the ``val`` declarations in the ``UInt32`` interface, starting +with a representation for the abstract type ``t``. + +There are multiple possible representations for ``t``, however the +point of the interface is to isolate client modules from these +implementation choices. + +Perhaps the easiest implementation is to represent ``t`` as a +``u32_nat`` itself. This makes proving the correspondence between +``t`` and its logical model almost trivial. + +.. literalinclude:: ../code/UInt32.fst + :language: fstar + +Another choice may be to represent ``t`` as a 32-bit vector. This is a +bit harder and proving that it is correct with respect to the +interface requires handling some interactions between Z3's theory of +bit vectors and uninterpreted functions, which we handle with a +tactic. This is quite advanced, and we have yet to cover F*'s support +for tactics, but we show the code below for reference. + +.. literalinclude:: ../code/UInt32BV.fst + :language: fstar + :start-after: //SNIPPET_START: UInt32BV$ + :end-before: //SNIPPET_END: UInt32BV$ + +Although both implementations correctly satisfy the ``UInt32`` +interface, F* requires the user to pick one. Unlike module systems in some +other ML-like languages, where interfaces are first-class entities +which many modules can implement, in F* an interface can have at most +one implementation. For interfaces with multiple implementations, one +must use typeclasses. + + +Interleaving: A Quirk +--------------------- + +The current F* implementation views an interface and its +implementation as two partially implemented halves of a module. When +checking that an implementation is a correct implementation of an +interface, F* attempts to combine the two halves into a complete +module before typechecking it. It does this by trying to *interleave* +the top-level elements of the interface and implementation, preserving +their relative order. + +This implementation strategy is far from optimal in various ways and a +relic from a time when F*'s implementation did not support separate +compilation. This implementation strategy is likely to change in the +future (see `this issue +`_ for +details). + +Meanwhile, the main thing to keep in mind when implementing interfaces +is the following: + + * The order of definitions in an implementation much match the order + of ``val`` declarations in the interface. E.g., if the interface + contains ``val f : tf`` followed by ``val g : tg``, then the + implementation of ``f`` must precede the implementation of ``g``. + +Also, remember that if you are writing ``val`` declarations in an +interface, it is a good idea to be explicit about universe levels. See +:ref:`here for more discussion `. + +Other issues with interleaving that may help in debugging compiler +errors with interfaces: + + * `Issue 2020 `_ + * `Issue 1770 `_ + * `Issue 959 `_ + +Comparison with machine integers in the F* library +-------------------------------------------------- + +F*'s standard library includes ``FStar.UInt32``, whose interface is +similar, though more extensive than the ``UInt32`` shown in this +chapter. For example, ``FStar.UInt32`` also includes multiplication, +division, modulus, bitwise logical operations, etc. + +The implementation of ``FStar.UInt32`` chooses a representation for +``FStar.UInt32.t`` that is similar to ``u32_nat``, though the F* +compiler has special knowledge of this module and treats +``FStar.UInt32.t`` as a primitive type and compiles it and its +operations in a platform-specific way to machine integers. The +implementation of ``FStar.UInt32`` serves only to prove that its +interface is logically consistent by providing a model in terms of +bounded natural numbers. + +The library also provides several other unsigned machine integer types +in addition to ``FStar.UInt32``, including ``FStar.UInt8``, +``FStar.UInt16``, and ``FStar.UInt64``. F* also has several signed +machine integer types. + +All of these modules are very similar, but not being first-class +entities in the language, there is no way to define a general +interface that is instantiated by all these modules. In fact, all +these variants are generated by a script from a common template. + +Although interfaces are well-suited to simple patterns of information +hiding and modular structure, as we'll learn next, typeclasses are +more powerful and enable more generic solutions, though sometimes +requiring the use of higher-order code. diff --git a/doc/book/PoP-in-FStar/book/part3/part3_typeclasses.rst b/doc/book/PoP-in-FStar/book/part3/part3_typeclasses.rst new file mode 100644 index 00000000000..ed6ea0e0547 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part3/part3_typeclasses.rst @@ -0,0 +1,691 @@ +.. _Part3_typeclasses: + +Typeclasses +=========== + +Consider writing a program using bounded unsigned integers while being +generic in the actual bounded integer type, E.g., a function that sums +a list of bounded integers while checking for overflow, applicable to +both ``UInt32`` and ``UInt64``. Since F*'s interfaces are not +first-class, one can't easily write a program that abstracts over +those interfaces. Typeclasses can help. + +Some background reading on typeclasses: + + * Phil Wadler and Stephen Blott introduced the idea in 1989 in a + paper titled "`How to make ad hoc polymorphism less ad hoc + `_." Their work, with + many extensions over the years, is the basis of Haskell's + typeclasses. + + * A tutorial on typeclasses in the Coq proof assistant is available + `here + `_. + + * Typeclasses are used heavily in the Lean proof assistant to + structure its `math library + `_. + +Printable +--------- + +A typeclass associates a set of *methods* to a tuple of types, +corresponding to the operations that can be performed using those +types. + +For instance, some types may support an operation that enables them to +be printed as strings. A typeclass ``printable (a:Type)`` represent +the class of all types that support a ``to_string : a -> string`` +operation. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: printable$ + :end-before: //SNIPPET_END: printable$ + +The keyword ``class`` introduces a new typeclass, defined as a +:ref:`record type ` with each method represented as a +field of the record. + +To define instances of a typeclass, one uses the ``instance`` keyword, +as shown below. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: printable_bool_and_int$ + :end-before: //SNIPPET_END: printable_bool_and_int$ + +The notation ``instance printable_bool : printable bool = e`` states +that the value ``e`` is a record value of type ``printable bool``, and +just as with a ``let``-binding, the term ``e`` is bound to the +top-level name ``printable_bool``. + +The convenience of typeclasses is that having defined a class, the +typeclass method is automatically overloaded for all instances of the +class, and the type inference algorithm finds the suitable instance to +use. This is the original motivation of typeclasses---to provide a +principled approach to operator overloading. + +For instance, we can now write ``printb`` and ``printi`` and use +``to_string`` to print both booleans and integers, since we shown that +they are instances of the class ``printable``. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: printb and printi$ + :end-before: //SNIPPET_END: printb and printi$ + +Instances need not be only for base types. For example, all lists are +printable so long as their elements are, and this is captured by what +follows. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: printable_list$ + :end-before: //SNIPPET_END: printable_list$ + +That is, ``printable_list`` constructs a ``to_string`` method of type +``list a -> string`` by mapping the ``to_string`` method of the +``printable a`` instance over the list. And now we can use +``to_string`` with lists of booleans and integers too. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: printis and printbs$ + :end-before: //SNIPPET_END: printis and printbs$ + +There's nothing particularly specific about the ground instances +``printable bool`` and ``printable int``. It's possible to write +programs that are polymorphic in printable types. For example, here's +a function ``print_any_list`` that is explicitly parameterized by a +``printable a``---one can call it by passing in the instance that one +wishes to use explicitly: + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: print_any_list_explicit$ + :end-before: //SNIPPET_END: print_any_list_explicit$ + +However, we can do better and have the compiler figure out which +instance we intend to use by using a bit of special syntax for a +typeclass parameter, as shown below. + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: print_any_list$ + :end-before: //SNIPPET_END: print_any_list$ + +The parameter ``{| _ : printable a |}`` indicates an implicit argument +that, at each call site, is to be computed by the compiler by finding +a suitable typeclass instance derivable from the instances in +scope. In the first example above, F* figures out that the instance +needed is ``printable_list printable_int : printable (list +int)``. Note, you can always pass the typeclass instance you want +explicitly, if you really want to, as the second example ``_ex2`` +above shows. + +In many cases, the implicit typeclass argument need not be named, in +which case one can just omit the name and write: + +.. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: print_any_list_alt$ + :end-before: //SNIPPET_END: print_any_list_alt$ + +Under the hood +.............. + +When defining a ``class``, F* automatically generates generic +functions corresponding to the methods of the class. For instance, in +the case of ``printable``, F* generates: + +.. code-block:: fstar + + let to_string #a {| i : printable a |} (x:a) = i.to_string x + +Having this in scope overloads ``to_string`` for all instance of the +``printable`` class. In the implementation of ``to_string``, we use +the instance ``i`` (just a record, sometimes called a dictionary in +the typeclass literature) and project its ``to_string`` field and +apply it to ``x``. + +Defining an ``instance p x1..xn : t = e`` is just +like an ordinary let binding ``let p x1..xn : t = e``, however the +``instance`` keyword instructs F*'s type inference algorithm to +consider using ``p`` when trying to instantiate implicit arguments +for typeclass instances. + +For example, at the call site ``to_string (x:bool)``, having unified +the implicit type arguments ``a`` with ``bool``, what remains is to +find an instance of ``printable bool``. F* looks through the current +context for all variable bindings in the local scope, and ``instance`` +declarations in the top-level scope, for a instance of ``printable +bool``, taking the first one it is able to construct. + +The resolution procedure for ``to_string [[1;2;3]]`` is a bit more +interesting, since we need to find an instance ``printable (list +int)``, although no such ground instance exists. However, the +typeclass resolution procedure finds the ``printable_list`` instance +function, whose result type ``printable (list a)`` matches the goal +``printable (list int)``, provided ``a = int``. The resolution +procedure then spawns a sub-goal ``printable int``, which it solves +easily and completes the derivation of ``printable (list int)``. + +This backwards, goal-directed search for typeclass resolution is a +kind of logic programming. An interesting implementation detail is +that most of the typeclass machinery is defined as a metaprogran in +``FStar.Tactics.Typeclasses``, outside of the core of F*'s +compiler. As such, the behavior of typeclass resolution is entirely +user-customizable, simply by revising the metaprogram in use. Some +details about how this works can be found in a paper on `Meta F* +`_. + +Exercises +......... + +Define instances of ``printable`` for ``string``, ``a & b``, ``option +a``, and ``either a b``. Check that you can write ``to_string [Inl (0, +1); Inr (Inl (Some true)); Inr (Inr "hello") ]`` and have F* infer the +typeclass instance needed. + +Also write the typeclass instances you need explicitly, just to check +that you understand how things work. This is exercise should also +convey that typeclasses do not increase the expressive power in any +way---whatever is expressible with typeclasses, is also expressible by +explicitly passing records that contain the operations needed on +specific type parameters. However, expliciting passing this operations +can quickly become overwhelming---typeclass inference keeps this +complexity in check and makes it possible to build programs in an +generic, abstract style without too much pain. + +This `exercise file <../code/exercises/Part3.Typeclasses.fst>`_ provides +the definitions you need. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Typeclasses.fst + :language: fstar + :start-after: //SNIPPET_START: print_answer$ + :end-before: //SNIPPET_END: print_answer$ + +-------------------------------------------------------------------------------- + +Bounded Unsigned Integers +------------------------- + +The ``printable`` typeclass is fairly standard and can be defined in +almost any language that supports typeclasses. We now turn to a +typeclass that leverages F*'s dependent types by generalizing the +interface of bounded unsigned integers that we developed in a +:ref:`previous chapter `. + +A type ``a`` is in the class ``bounded_unsigned_int``, when it +admits: + + * An element ``bound : a``, representing the maximum value + + * A pair of functions ``from_nat`` and ``to_nat`` that form a + bijection between ``a`` and natural numbers less than ``to_nat + bound`` + +This is captured by the ``class`` below: + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: bounded_unsigned_int$ + :end-before: //SNIPPET_END: bounded_unsigned_int$ + +.. note :: + + The attribute ``FStar.Tactics.Typeclasses.no_method`` on the + ``properties`` field instructs F* to not generate a typeclass + method for this field. This is useful here, since we don't really + want to overload the name ``properties`` as an operator over all + instances bound the class. It's often convenient to simply ``open + FStar.Tactics.Typeclasses`` when using typeclasses, or to use a + module abbreviation like ``module TC = FStar.Tactics.Typeclasses`` + so that you don't have to use a fully qualified name for + ``no_method``. + +For all ``bounded_unsigned_ints``, one can define a generic ``fits`` +predicate, corresponding to the bounds check condition that we +introduced in the ``UInt32`` interface. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: fits$ + :end-before: //SNIPPET_END: fits$ + +Likewise, the predicate ``related_ops`` defines when an operation +``bop`` on bounded integers is equivalent to an operation ``iop`` on +mathematical integers. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: related_ops$ + :end-before: //SNIPPET_END: related_ops$ + +Typeclass Inheritance +..................... + +Our ``bounded_unsigned_int a`` class just showed that ``a`` is in a +bijection with natural numbers below some bound. Now, we can define a +separate class, extending ``bounded_unsigned_int`` with the operations +we want, like addition, subtraction, etc. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: bui_ops$ + :end-before: //SNIPPET_END: bui_ops$ + +The class above makes use of *typeclass inheritance*. The ``base`` +field stores an instance of the base class ``bounded_unsigned_int``, +while the remaining fields extend it with: + + * ``add``: a bounded addition operation + * ``sub``: a bounded subtraction operation + * ``lt`` : a comparison function + * ``properties``, which show that + + - ``add`` is related to integer addition ``+`` + - ``sub`` is related to integer subtraction ``-`` + - ``lt`` is related to ``<`` + - and that ``sub bound x`` is always safe + +Typeclass inheritance in the form of additional fields like ``base`` +is completely flexible, e.g., multiple inheritance is permissible +(though, as we'll see below, should be used with care, to prevent +surprises). + +Treating an instance of a class as an instance of one its base classes +is easily coded as instance-generating function. The code below says +that an instance from ``bounded_unsigned_int a`` can be derived from +an instance of ``d : bounded_unsigned_int_ops a`` just by projecting +its ``base`` field. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: ops_base$ + :end-before: //SNIPPET_END: ops_base$ + + +Infix Operators +............... + +F* does not allows the fields of a record to be named using infix +operator symbols. This will likely change in the future. For now, +to use custom operations with infix notation for typeclass methods, +one has to define them by hand: + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: ops$ + :end-before: //SNIPPET_END: ops$ + +Derived Instances +................. + +We've already seen how typeclass inheritance allows a class to induce +an instance of its base class(es). However, not all derived instances +are due to explicit inheritance---some instances can be *computed* +from others. + +For example, here's a class ``eq`` for types that support decidable +equality. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: eq$ + :end-before: //SNIPPET_END: eq$ + +We'll write ``x =?= y`` for an equality comparison method from this +class, to not confuse it with F*'s built-in decidable equality ``(=)`` +on ``eqtype``. + +Now, from an instance of ``bounded_unsigned_int_ops a`` we can +compute an instance of ``eq a``, since we have ``<^``, a strict +comparison operator that we know is equivalent to ``<`` on natural +numbers. F*, from all the properties we have on +``bounded_unsigned_int_ops`` and its base class +``bounded_unsigned_int``, can automatically prove that ``not (x <^ y) +&& not (y <^ x)`` is valid if and only if ``x == y``. This instance of +``eq`` now also lets us easily implement a non-strict comparison +operation on bounded unsigned ints. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: bui_eq$ + :end-before: //SNIPPET_END: bui_eq$ + +Ground Instances +................ + +We can easily provide ground instances of ``bounded_unsigned_int_ops`` +for all the F* bounded unsigned int types---we show instances for +``FStar.UInt32.t`` and ``FStar.UInt64.t``, where the proof of all the +properties needed to construct the instances is automated. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: ground_instances$ + :end-before: //SNIPPET_END: ground_instances$ + +And one can check that typeclass resolution works well on those ground +instances. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: ground_tests$ + :end-before: //SNIPPET_END: ground_tests$ + +Finally, as promised at the start, we can write functions that are +generic over all bounded unsigned integers, something we couldn't do +with interfaces alone. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: sum$ + :end-before: //SNIPPET_END: sum$ + +F* can prove that the bounds check in ``sum`` is sufficient to prove +that the addition does not overflow, and further, that the two tests +return ``Some _`` without failing due to overflow. + +However, the proving that ``Some? (sum [0x01ul; 0x02ul; 0x03ul] +0x00ul)`` using the SMT solver alone can be expensive, since it +requires repeated unfolding of the recursive function ``sum``--such +proofs are often more easily done using F*'s normalizer, as shown +below---we saw the ``assert_norm`` construct in a :ref:`previous +section `. + +.. literalinclude:: ../code/TypeclassesAlt3.fst + :language: fstar + :start-after: //SNIPPET_START: testsum32'$ + :end-before: //SNIPPET_END: testsum32'$ + +.. note :: + + That said, by using dependently typed generic programming (which we + saw a bit of :ref:`earlier `), it is + possible to write programs that abstract over all machine integer + types without using typeclasses. The F* library ``FStar.Integers`` + shows how that works. Though, the typeclass approach shown here is + more broadly applicable and extensible. + +Dealing with Diamonds +--------------------- + +One may be tempted to factor our ``bounded_unsigned_int_ops`` +typeclass further, separating out each operation into a separate +class. After all, it may be the case that some instances of +``bounded_unsigned_int`` types support only addition while others +support only subtraction. However, when designing typeclass +hierarchies one needs to be careful to not introduce coherence +problems that result from various forms of multiple inheritance. + +Here's a typeclass that captures only the subtraction operation, +inheriting from a base class. + +.. literalinclude:: ../code/TypeclassesAlt2.fst + :language: fstar + :start-after: //SNIPPET_START: subtractable$ + :end-before: //SNIPPET_END: subtractable$ + +And here's another typeclass that provides only the comparison +operation, also inheriting from the base class. + +.. literalinclude:: ../code/TypeclassesAlt2.fst + :language: fstar + :start-after: //SNIPPET_START: comparable$ + :end-before: //SNIPPET_END: comparable$ + +However, now when writing programs that expect both subtractable and +comparable integers, we end up with a coherence problem. + +The ``sub`` operation fails to verify, with F* complaining that it +cannot prove ``fits op_Subtraction bound acc``, i.e., this ``sub`` may +underflow. + +.. literalinclude:: ../code/TypeclassesAlt2.fst + :language: fstar + :start-after: //SNIPPET_START: try_sub_fail$ + :end-before: //SNIPPET_END: try_sub_fail$ + +At first, one may be surprised, since the ``s : +subtractable_bounded_unsigned_int a`` instance tells us that +subtracting from the ``bound`` is always safe. However, the term +``bound`` is an overloaded (nullary) operator and there are two ways +to resolve it: ``s.base.bound`` or ``c.base.bound`` and these two +choices are not equivalent. In particular, from ``s : +subtractable_bounded_unsigned_int a``, we only know that +``s.base.bound `sub` acc`` is safe, not that ``c.base.bound `sub` +acc`` is safe. + +Slicing type typeclass hierarchy too finely can lead to such coherence +problems that can be hard to diagnose. It's better to avoid them by +construction, if at all possible. Alternatively, if such problems do +arise, one can sometimes add additional preconditions to ensure that +the multiple choices are actually equivalent. There are many ways to +do this, ranging from indexing typeclasses by their base classes, to +adding equality hypotheses---the equality hypothesis below is +sufficient. + +.. literalinclude:: ../code/TypeclassesAlt2.fst + :language: fstar + :start-after: //SNIPPET_START: try_sub$ + :end-before: //SNIPPET_END: try_sub$ + +.. _Part3_monadic_syntax: + +Overloading Monadic Syntax +-------------------------- + +We now look at some examples of typeclasses for *type functions*, in +particular, typeclasses for functors and monads. + +.. note :: + + If you're not familiar with monads, referring back to :ref:`A First + Model of Computational Effects ` may help. + +In :ref:`a previous chapter `, we introduced syntactic +sugar for monadic computations. In particular, F*'s syntax supports +the following: + +* Instead of writing ``bind f (fun x -> e)`` you can define a custom + ``let!``-operator and write ``let! x = f in e``. + +* And, instead of writing ``bind f (fun _ -> e)`` you can write + ``f ;! e``. + +Now, if we can overload the symbol ``bind`` to work with any monad, +then the syntactic sugar described above would work for all of +them. This is accomplished as follows. + +We define a typeclass ``monad``, with two methods ``return`` and ``( +let! )``. + +.. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: monad$ + :end-before: //SNIPPET_END: monad$ + +Doing so introduces ``return`` and ``( let! )`` into scope at the +following types: + +.. code-block:: fstar + + let return #m {| d : monad m |} #a (x:a) : m a = d.return x + let ( let! ) #m {| d : monad m |} #a #b (f:m a) (g: a -> m b) : m b = d.bind f g + +That is, we now have ``( let! )`` in scope at a type general enough to +use with any monad instance. + +.. note:: + + There is nothing specific about ``let!``; F* allows you to add a + suffix of operator characters to the ``let``-token. See this file + for more examples of `monadic let operators + `_ + +The type ``st s`` is a state monad parameterized by the state ``s``, +and ``st s`` is an instance of a ``monad``. + +.. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: st$ + :end-before: //SNIPPET_END: st$ + +With some basic actions ``get`` and ``put`` to read and write the +state, we can implement ``st`` computations with a syntax similar to +normal, direct-style code. + +.. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: get_inc$ + :end-before: //SNIPPET_END: get_inc$ + +Of course, we can also do proofs about our ``st`` computations: here's +a simple proof that ``get_put`` is ``noop``. + +.. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: get_put$ + :end-before: //SNIPPET_END: get_put$ + +Now, the nice thing is that since ``( let! )`` is monad polymorphic, +we can define other monad instances and still use the syntactic sugar +to build computations in those monads. Here's an example with the +``option`` monad, for computations that may fail. + +.. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: opt_monad$ + :end-before: //SNIPPET_END: opt_monad$ + +Exercise +........ + +Define a typeclass for functors, type functions ``m: Type -> Type`` +which support the operations ``fmap : (a -> b) -> m a -> m b``. + +Build instances of ``functor`` for a few basic types, e.g., ``list``. + +Derive an instance for functors from a monad, i.e., prove + +``instance monad_functor #m {| monad m |} : functor m = admit()`` + +This `file <../code/exercises/Part3.MonadsAndFunctors.fst>`_ provides +the definitions you need. + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/MonadFunctorInference.fst + :language: fstar + :start-after: //SNIPPET_START: functor$ + :end-before: //SNIPPET_END: functor$ + +-------------------------------------------------------------------------------- + +Beyond Monads with Let Operators +-------------------------------- + +Many monad-like structures have been proposed to structure effectful +computations. Each of these structures can be captured as a typeclass +and used with F*'s syntactic sugar for let operators. + +As an example, we look at *graded monads*, a construction studied by +Shin-Ya Katsumata and others, `in several papers +`_. This +example illustrates the flexibility of typeclasses, including +typeclasses for types that themselves are indexed by other +typeclasses. + +The main idea of a graded monad is to index a monad with a monoid, +where the monoid index characterizes some property of interest of the +monadic computation. + +A monoid is a typeclass for an algebraic structure with a single +associative binary operation and a unit element for that +operation. A simple instance of a monoid is the natural numbers with +addition and the unit being ``0``. + +.. literalinclude:: ../code/GradedMonad.fst + :language: fstar + :start-after: //SNIPPET_START: monoid$ + :end-before: //SNIPPET_END: monoid$ + +A graded monad is a type constructor ``m`` indexed by a monoid as +described by the class below. In other words, ``m`` is equipped with +two operations: + + * a ``return``, similar to the ``return`` of a monad, but whose + index is the unit element of the monoid + + * a ``( let+ )``, similar to the ``( let! )`` of a monad, but whose + action on the indexes corresponds to the binary operator of the + indexing monoid. + +.. literalinclude:: ../code/GradedMonad.fst + :language: fstar + :start-after: //SNIPPET_START: graded_monad$ + :end-before: //SNIPPET_END: graded_monad$ + +With this class, we have overloaded ``( let+ )`` to work with all +graded monads. For instance, here's a graded state monad, ``count_st`` +whose index counts the number of ``put`` operations. + +.. literalinclude:: ../code/GradedMonad.fst + :language: fstar + :start-after: //SNIPPET_START: counting$ + :end-before: //SNIPPET_END: counting$ + +We can build computations in our graded ``count_st`` monad relatively +easily. + +.. literalinclude:: ../code/GradedMonad.fst + :language: fstar + :start-after: //SNIPPET_START: test$ + :end-before: //SNIPPET_END: test$ + +F* infers the typeclass instantiations and the type of ``test`` to be +``count_st s (op #monoid_nat_plus 0 1) unit``. + +In ``test2``, F* infers the type ``count_st s (op #monoid_nat_plus 0 +(op #monoid_nat_plus 1 1)) unit``, and then automatically proves that +this type is equivalent to the user annotation ``count_st s 2 unit``, +using the definition of ``monoid_nat_plus``. Note, when one defines +``let+``, one can also use ``e1 ;+ e2`` to sequence computations when +the result type of ``e1`` is ``unit``. + + +Summary +------- + +Typeclasses are a flexible way to structure programs in an abstract +and generic style. Not only can this make program construction more +modular, it can also make proofs and reasoning more abstract, +particularly when typeclasses contain not just methods but also +properties characterizing how those methods ought to behave. Reasoning +abstractly can make proofs simpler: for example, if the monoid-ness of +natural number addition is the only property needed for a proof, it +may be simpler to do a proof generically for all monoids, rather than +reasoning specifically about integer arithmetic. + +That latter part of this chapter presented typeclasses for +computational structures like monads and functors. Perhaps conspicuous +in these examples were the lack of algebraic laws that characterize +these structures. Indeed, we focused primarily on programming with +monads and graded monads, rather than reasoning about them. Enhancing +these typeclasses with algebraic laws is a useful, if challenging +exercise. This also leads naturally to F*'s effect system in the next +section of this book, which is specifically concerned with doing +proofs about programs built using monad-like structures. diff --git a/doc/book/PoP-in-FStar/book/part4/part4.rst b/doc/book/PoP-in-FStar/book/part4/part4.rst new file mode 100644 index 00000000000..a52a764eb17 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4.rst @@ -0,0 +1,73 @@ +.. _Part4: + +Computational Effects +===================== + +All the programs we've considered so far have been *pure*, which is to +say that only thing that can be observed about a program by its +context is the value that the program returns---the inner workings of +the program (e.g., whether it multiplies two numbers by repeated +addition or by using a primitive operation for multiplication) cannot +influence the behavior of its context. That is, pure terms can be +reasoned about like pure mathematical functions. [#]_ + +However, many practical programs exhibit behaviors that are beyond +just their output. For example, they may mutate some global state, or +they may read and write files, or receive or send messages over the +network---such behaviors are often called *side effects*, +*computational effects*, or just *effects*. + +In this section, we look at F*'s effect system which allows users to + + * Model the semantics of various kinds of effects, e.g., mutable + state, input/output, concurrency, etc. + + * Develop reasoning principles that enable building proofs of + various properties of effectful programs + + * Simplify effectful program construction by encapsulating the + semantic model and reasoning principles within an abstraction that + allows users to write programs in F*'s native syntax while behind + the scenes, for reasoning and execution purposes, programs are + elaborated into the semantic model + +Aside from user-defined effects, F*'s also supports the following +*primitive* effects: + + * **Ghost**: An effect which describes which parts of a program have + no observable behavior at all, and do not even influence the + result returned by the program. This allows optimizing a program + by erasing the parts of a program that are computationally + irrelevant. + + * **Divergence**: An effect that encapsulates computations which may + run forever. Potentially divergent computations cannot be used as + proofs (see :ref:`termination`) and the effect + system ensures that this is so. + + * **Partiality**: Partial functions are only defined over a subset + of their domain. F* provides primitive support for partial + functions as an effect. Although currently primitive, in the + future, we hope to remove the special status of partial functions + and make partial functions a user-defined notion too. + + +.. [#] Although pure F* programs are mathematical functions + in the ideal sense, when executing these programs on a + computer, they do exhibit various side effects, including + consuming resources like time, power, and memory. Although + these side effects are clearly observable to an external + observer of a running F* program, the resourse-usage side + effects of one component of a pure F* program are not visible + to another component of the same program. + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + part4_background + part4_computation_types_and_tot + part4_ghost + part4_div + part4_pure + diff --git a/doc/book/PoP-in-FStar/book/part4/part4_background.rst b/doc/book/PoP-in-FStar/book/part4/part4_background.rst new file mode 100644 index 00000000000..585cb35bbdd --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_background.rst @@ -0,0 +1,93 @@ +.. _Part4_Background: + +Computation Types to Track Dependences +====================================== + +A main goal for F*'s effect system is to *track dependences* among the +various parts of a program. For example, the effect system needs to +ensures that the total part of a program that is proven to always +terminate never calls a function in the divergent fragment (since that +function call may loop forever). Or, that the runtime behavior of a +compiled program does not depend on ghost computations that get erased +by the compiler. + +In a paper from 1999 called `A Core Calculus of Dependency +`_, Abadi et +al. present DCC, a language with a very generic way to track +dependences. DCC's type system includes an indexed, monadic type +:math:`T_l`, where the index :math:`l` ranges over the elements of a +lattice, i.e., the indexes are arranged in a partial order. DCC's type +system ensures that a computation with type :math:`T_l` can depend on +another computation with type :math:`T_m` only if :math:`m \leq l` in +the lattice's partial order. F*'s effect system is inspired by DCC, +and builds on a 2011 paper by Swamy et al. called `Lightweight Monadic +Programming in ML `_ +which develops a DCC-like system for an ML-like programming language. + +At its core, F*'s effect system includes the following three elements: + +**Computation Types**: F*'s type system includes a notion of +*computation type*, a type of the form ``M t`` where ``M`` is an +*effect label* and ``t`` is the return type of the computation. A term +``e`` can be given the computation type ``M t`` when executing ``e`` +exhibits *at-most* the effect ``M`` and (possibly) returns a value of +type ``t``. We will refine this intuition as we go along. In contrast +with computation types, the types that we have seen so far (``unit``, +``bool``, ``int``, ``list int``, other inductive types, refinement +types, and arrows) are called *value types*. + +**Partially Ordered Effect Labels**: The effect label of a computation +type is drawn from an open-ended, user-extensible set of labels, where +the labels are organized in a user-chosen partial order. For example, +under certain conditions, one can define the label ``M`` to be a +sub-effect of ``N``, i.e., ``M < N``. For any pair of labels ``M`` +and ``N``, a partial function ``lub M N`` (for least upper bound) +computes the least label greater than both ``M`` and ``N``, if any. + +**Typing Rules to Track Dependences**: The key part of the effect +system is a rule for composing computations sequentially using ``let x += e1 in e2``. Suppose ``e1 : M t1``, and suppose ``e2 : N t2`` +assuming ``x:t1``, then the composition ``let x = e1 in e2`` has type +``L t2``, where ``L = lub M N``---if ``lub M N`` is not defined, then +the ``let``-binding is rejected. Further, a computation with type ``M +t`` can be implicitly given the type ``N t``, when ``M < N``, i.e., +moving up the effect hierarchy is always permitted. The resulting +typing discipline enforces the same dependence-tracking property as +DCC: a computation ``M t`` can only depend on ``N t`` when ``lub M N = +M``. + +In full generality, F*'s computation types are more complex than just +an effect label ``M`` and a result type (i.e., more than just ``M +t``), and relying on F*'s dependent types, computation types do more +than just track dependences, e.g., a computation type in F* can also +provide full, functional correctness specifications. The papers +referenced below provide some context and we discuss various elements +of these papers throughout this part of the book. + + + + `Verifying Higher-order Programs with the Dijkstra Monad + `_, introduces the + idea of a Dijkstra monad, a construction to structure the + inference of weakest preconditions of effectful computations. + + + This 2016 paper, + `Dependent Types and Multi-Monadic Effects in F* `_, + has become the canonical reference for F\*. It shows how to combine + multiple Dijkstra monads with a DCC-like system. + + + `Dijkstra Monads for Free + `_ presents an + algorithm to construct Dijkstra monads automatically for a class + of simple monads. + + + `Dijkstra Monads for All + `_ generalizes the + construction to apply to relate any computational monad to a + specificational counterpart, so long as the two are related by a + monad morphism. + + + `Programming and Proving with Indexed Effects + `_ describes + F*'s user-defined effect system in its most general form, allowing + it to be applied to any indexed effects, including Dijkstra + monads, but several other constructions as well. diff --git a/doc/book/PoP-in-FStar/book/part4/part4_computation_types_and_tot.rst b/doc/book/PoP-in-FStar/book/part4/part4_computation_types_and_tot.rst new file mode 100644 index 00000000000..4a6842c2ae2 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_computation_types_and_tot.rst @@ -0,0 +1,93 @@ +.. _Part4_Computation_Types_And_Tot: + +The Effect of Total Computations +================================ + +At the very bottom of the effect label hierarchy is ``Tot``, used to +describe pure, total functions. Since they are at the bottom of the +hierarchy, ``Tot`` computations can only depend on other ``Tot`` +computations, ensuring that F*'s logical core remains total. + +Every term in F* is typechecked to have a computation type. This +includes the total terms we have been working with. Such terms are +classified in the default effect called ``Tot``, the effect of total +computations that do not have any observable effect, aside from the +value they return. Any meaning or intuition that we have ascribed to +typing ``e:t`` extends to ``e:Tot t``. For example, if ``e:Tot t``, +then at runtime, ``e`` terminates and produces a value of type +``t``. In addition, since its effect label is ``Tot``, it has no +side-effect. + +In fact, as we have already :ref:`seen `, +notationally ``x:t0 -> t1`` is a shorthand for ``x:t0 -> Tot t1`` +(where ``t1`` could itself be an arrow type). More generally, arrow +types in F* take the form ``x:t -> C``, representing a function with +argument type ``t0`` and body computation type ``C`` (which may depend +on ``x``). + +Similarly, the return type annotation that we have seen in the ``let`` +definitions is also a shorthand, e.g., the :ref:`id function +` + +.. code-block:: fstar + + let id (a:Type) (x:a) : a = x + +is a shorthand for + +.. code-block:: fstar + + let id (a:Type) (x:a) : Tot a = x //the return type annotation is a computation type + +and the type of ``id`` is ``a:Type -> a -> Tot a``. More generally, +the return type annotations on ``let`` definitions are computation +types ``C``. + +The :ref:`explicit annotation syntax ` ``e +<: t`` behaves a little differently. F* allows writing ``e <: C``, and +checks that ``e`` indeed has computation type ``C``. But when the +effect label is omitted, ``e <: t``, it is interpreted as ``e <: _ +t``, where the omitted effect label is inferred by F* and does not +default to ``Tot``. + + +.. _Part4_evaluation_order: + +Evaluation order +^^^^^^^^^^^^^^^^ + +For pure functions, the evaluation order is irrelevant. [#]_ F* +provides abstract machines to interpret pure terms using either a lazy +evaluation strategy or a call-by-value strategy (see a +:ref:`forthcoming chapter on F*'s normalizers +`). Further, compiling pure programs to OCaml, F* +inherits the OCaml's call-by-value semantics for pure terms. + +When evaluating function calls with effectful arguments, the arguments +are reduced to values first, exhibiting their effects, if any, prior +to the function call. That is, where ``e1`` and ``e2`` may be +effectful, the application ``e1 e2`` is analogous to ``bind f = e1 in +bind x = e2 in f x``: in fact, internally this is what F* elaborates +``e1 e2`` to, when either of them may have non-trivial effects. As +such, for effectful terms, F* enforces a left-to-right, `call-by-value +`_ semantics for +effectful terms. + +Since only the value returned by a computation is passed as an +argument to a function, function argument types in F* are always value +types. That is, they always have the form ``t -> C``. To pass a +computation as an argument to another function, you must encapsulate +the computation in a function, e.g., in place of ``C -> C``, one can +write ``(unit -> C) -> C'``. Since functions are first-class values in +F*, including functions whose body may have non-trivial effects, one +can always do this. + +.. [#] Since F* is an extensional type theory, pure F* terms are only + *weakly* normalizing. That is, some evaluation strategies + (e.g., repeatedly reducing a recursive function deep in an + infeasible code path) need not terminate. However, for every + closed, pure term, there is a reduction strategy that will + reduce it fully. As such, the evaluation order for pure + functions is irrelevant, except that some choices of evaluation + order may lead to non-termination. + diff --git a/doc/book/PoP-in-FStar/book/part4/part4_div.rst b/doc/book/PoP-in-FStar/book/part4/part4_div.rst new file mode 100644 index 00000000000..f7e3c038d37 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_div.rst @@ -0,0 +1,547 @@ +.. _Part4_Div: + +Divergence, or Non-Termination +============================== + +Most dependently typed languages are not `Turing complete +`_. This is +because, as explained :ref:`earlier `, it is +crucial to the soundness of a type theory to have all functions +terminate. This means that you cannot program, say, an interpreter for +a general-purpose programming language in a language like Coq, since +such an interpreter would not be able to handle programs that +intentionally loop forever. [#]_ + +F*'s logical core of total (and ghost) functions can only express +terminating computations. However, F*'s also allows expressing +non-terminating or *divergent* computations, relying on the effect +system to isolate divergent computations from the logical core. In +particular, the computation type ``Dv t`` describes a computation that +may loop forever, but if it completes, it returns a value of type +``t``. + +Relying on the effect system as a dependency tracking mechanism, F* +ensures that ``Tot`` computations cannot rely on ``Dv`` computations +by placing ``Dv`` above ``Tot`` in the effect hierarchy, while, +conversely, a total computation ``Tot t`` can be silently promoted to +``Dv t``, the type of computations that may not terminate, i.e., ``Tot +< Dv`` in the effect partial order. + +Recursive functions that return computations in the ``Dv`` effect are +not checked for termination. As such, using the ``Dv`` effect, one +can write programs such as the one below, which computes `Collatz +sequences +`_---whether or not +this program terminates for all inputs is an open problem. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: collatz$ + :end-before: //SNIPPET_END: collatz$ + +In this chapter, we'll look in detail at the ``Dv`` effect and how it +interacts with other features of the language, including the other +effects, recursive type definitions, and the styles of programming and +proving it enables. + +.. [#] In place of general recursion and potential non-termination, + other dependently typed languages like Coq and Agda offer + features like corecursion and coinduction. Coinduction can be + used to express a class of *productive* non-terminating + programs. For instance, using coinduction, one could program a + web server that loops forever to handle an infinite stream of + requests, while producing a response for each request in a + finite amount of time. Even the ``collatz`` function can be + given a corecursive definition that computes a potentially + infinite stream of numbers. However, not all non-terminating + computations can be implemented with + coinduction/corecursion. F* does not yet support coinduction. + + +The ``Dv`` effect +^^^^^^^^^^^^^^^^^^^ + +The effect ``Dv`` (for divergence) is a primitive effect in F*. +Computations in ``Dv`` may not terminate, even with infinite +resources. In other words, computations in the ``Dv`` effect have the +observational behavior of non-termination. For example, the following +``loop`` function has type ``unit -> Dv unit`` and it always diverges +when called: + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: loop$ + :end-before: //SNIPPET_END: loop$ + +If we remove the ``Dv`` effect label annotation, then F* treats the +function as total and will try to prove that the recursive call +terminates, according to its usual termination checking rules, i.e., +F* will attempt to prove ``() << ()`` which fails, as expected. + +Since the ``Dv`` effect admits divergence, F* essentially turns-off +the termination checker when typechecking ``Dv`` computations. So the +recursive ``loop ()`` call does not require a decreasing termination +metric. + +Partial correctness semantics of ``Dv`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``Tot`` effect in F* has a *total correctness* semantics. That is, +if a term has type ``e:Tot t``, then ``e`` terminates terminates and +produces a value of type ``t``. + +Terms with type ``Dv t`` have a *partial correctness* semantics. That +is, a term ``e:Dv t``, ``e`` may either run forever, but if it +terminates then the resulting value has type ``t``. + +Another perspective is that aside from disabling the termination +checking features of F*, all other type-checking constraints are +enforced on ``Dv`` term. This means that one can still give +interesting sound, specifications to ``Dv`` programs, e.g., the type +below proves that if the Collatz function terminates, then the last +element of the sequence is ``1``. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: collatz_ends_in_one$ + :end-before: //SNIPPET_END: collatz_ends_in_one$ + +If, for example, in the base case we were to return the empty list +``[]`` rather than ``[n]``, then F* would refuse to accept the +program, since the program could terminate while returning a value +that is not an element of the annotated return type. + +Isolating ``Dv`` from the logical core +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Since ``Dv`` terms need not terminate, a program that always loops +forever can be given any return type. For instance, the program below +has return type ``False``: + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: loop_false$ + :end-before: //SNIPPET_END: loop_false$ + +Importantly, a term of type ``Dv False`` should not be confused as a +*proof* of ``False``, since that would lead immediately to unsoundness +of F*'s logical core. In particular, it should be impossible to turn a +``e:Dv t`` into a term of type ``Tot t``. This is achieved by F*'s +effect system, which treats ``Tot`` as a sub-effect of ``Dv``, i.e., +``Tot < Dv``, in the effect order. As explained in :ref:`earlier +`, this ensures that no ``Tot`` term can depend on a +``Dv`` term, maintaining soundness of the total correctness +interpretation of ``Tot``. + +As an example, the following attempt to "cast" ``dv_false`` to ``Tot`` +fails, as does trying to use ``dv_false`` to produce incorrect proofs +of other types. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: loop_false_failures$ + :end-before: //SNIPPET_END: loop_false_failures$ + + +While F* does not allow ``Tot`` computations to depend on ``Dv`` +computations, going the other way is perfectly fine. Intuitively, +always terminating computations are potentially non-terminating. We +can think of it like a *weakening* of the specification: + +.. code-block:: fstar + + let add_one (x:int) : int = x + 1 + let add_one_div (x:int) : Dv int = add_one x + +The effect system of F* automatically *lifts* ``Tot`` computations +into ``Dv``, meaning that ``Tot`` functions can be seamlessly used in +``Dv`` functions. + +The weakening of ``Tot`` terms to other effects is so pervasive in F* +that one hardly even thinks about it, e.g., in the ``collatz`` +program, sub-terms like ``n / 2`` are in ``Tot`` but are easily used +within a computation in the ``Dv`` effect. + +No extrinsic proofs for ``Dv`` computations +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +One important consequence of any effectful code, including ``Dv``, +being outside the logical core of F* is that it is not possible to do +:ref:`extrinsic proofs ` about effectful +code. One cannot even state properties of ``Dv`` computations in +specifications, since even specifications must be total. For example, +even stating the following lemma is illegal: + +.. code-block:: fstar + + [@@expect_failure] + val collatz_property (n:pos) + : Lemma (Cons? (collatz n) /\ last (collatz n) = 1) + +This is nonsensical in F* since writing ``Cons? (collatz n)`` supposes +that ``collatz n`` is *defined*, whereas it might actually just loop +forever. + +The only way to state properties about divergent programs is to encode +the property intrinsically in the computation type, as we saw above. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: val collatz_ends_in_one$ + :end-before: //SNIPPET_END: val collatz_ends_in_one$ + +Exercise +++++++++ + +Define a predicate ``collatz_spec (n:pos) (l:list pos) : bool`` that +decides if ``l`` is a valid Collatz sequence starting at ``n``. + +Implement ``val collatz' (n:pos) : Dv (l:list pos { collatz_spec n l })``. + +What does this type mean? Are there other ways to implement +``collatz'`` with the same type? + + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: collatz_spec$ + :end-before: //SNIPPET_END: collatz_spec$ + +-------------------------------------------------------------------------------- + +General Recursive Types and Impredicativity with ``Dv`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Aside from disabling the decreases metric on recursive functions in +``Dv``, F* also disables two other forms of termination checking on +``Dv`` computations. + +Recall from a :ref:`previous chapter ` that +inductive type definitions are subject to the *strict positivity* +condition, since non-positive definitions allow the definition of +recursive types and non-terminating computations. However, since +computations in the ``Dv`` effect are already allowed to loop forever, +the strict positivity condition can be relaxed when ``Dv`` types are +involved. For example, one can define this: + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: nonpos$ + :end-before: //SNIPPET_END: nonpos$ + +The type ``nonpos`` is not strictly positive, since it appears to the +left of an arrow in a field of one of its constructors. Indeed, usingn +``nonpos`` it is possible to define (without using ``let rec``) an +infinitely looping program ``loop_nonpos()``---however, the type ``Dv +False`` tells us that this program may loop forever, and the infinite +loop is safely isolated from F*'s logical core. + +The other place in F*'s type system where termination checking comes +into play is in the :ref:`universe levels `. As we +learned previously, the logical core of F* is organized into an +infinite hierarchy with copies of the F* type system arranged in a +tower of universes. This stratification is necessary to prevent +inconsistencies within the logical core. However, terms in the ``Dv`` +effect are outside the logical core and, as such, restrictions on the +universe levels no longer apply. As the snippet below shows a total +function returning a type in universe ``u#a`` resides in universe +``u#(a + 1)``. However, a ``Dv`` function returning a type in ``u#a`` +is just in universe ``0``, since the only way to obtain the type +``dv_type`` returns is by incurring a ``Dv`` effect and moving outside +F*'s logical core. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: universe_dv$ + :end-before: //SNIPPET_END: universe_dv$ + +Top-level Effects +^^^^^^^^^^^^^^^^^ + +A top-level F* term is not meant to be effectful. If one defines the +following term, F* accepts the term but raises a warning saying +"Top-level let bindings must be total---this term may have effects". + +.. code-block:: fstar + + let inconsistent : False = loop_nonpos() + +Top-level effects can be problematic for a few reasons: + + 1. The order of evaluation of the effects in top-level terms is + undefined for programs with multiple modules---it depends on the + order in which modules are loaded at runtime. + + 2. Top-level effects, particularly when divergence is involved, can + render F*'s typechecking context inconsistent. For example, once + ``inconsistent`` is defined, then any other assertion can be + proven. + + .. code-block :: fstar + + let _ = let _ = FStar.Squash.return_squash inconsistent in + assert false + +Nevertheless, when used carefully, top-level effects can be useful, +e.g., to initialize the state of a module, or to start the main +function of a program. So, pay attention to the warning F* raises when +you have a top-level effect and make sure you really know what you're +doing. + +Example: Untyped Lambda Calculus +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +In this section, we put together the various things we've learned +about ``Dv`` computations to define several variants of an untyped +lambda calculus. + +You can refer back to our prior development of the :ref:`simply typed +lambda calculus ` if you need some basic background on the +lambda calculus. + +Interpreting Deeply Embedded Lambda Terms ++++++++++++++++++++++++++++++++++++++++++ + +We start by defining the syntax of untyped lambda terms, below. The +variables use the de Bruijn convention, where a index of a variable +counts the number of lambda-binders to traverse to reach its binding +occurrence. The ``Lam`` case just has the body of the lambda term, +with no type annotation on the binder, and no explicit name for the +variable. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: deep_embedding_syntax$ + :end-before: //SNIPPET_END: deep_embedding_syntax$ + +As usual, we can define what it means to substitute a variable ``x`` +with a (closed) term ``v`` in ``t``---this is just a regular ``Tot`` +function. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: deep_embedding_subst$ + :end-before: //SNIPPET_END: deep_embedding_subst$ + +Finally, we can define an interpreter for ``term``, which can +(intentionally) loop infinitely, as is clear from the ``Dv`` type +annotation. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: deep_embedding_interpreter$ + :end-before: //SNIPPET_END: deep_embedding_interpreter$ + +Exercise +........ + +This exercise is designed to show how you can prove non-trivial +properties of ``Dv`` computations by giving them interesting dependent +types. + +The substitution function defined here is only sound when the term +being substituted is closed, otherwise, any free variables it has can +be captured when substituted beneath a lambda. + +A term is closed if it satisfies this definition: + +.. literalinclude:: ../code/Part4.UTLCEx1.fst + :language: fstar + :start-after: //SNIPPET_START: closed$ + :end-before: //SNIPPET_END: closed$ + +Restrict the type of ``subst`` so that its argument is ``v : term { +closed v }``---you will have to also revise the type of its other +argument for the proof to work. + +Next, give the following type to the interpreter itself, proving that +interpreting closed terms produces closed terms, or loops forever. + +.. literalinclude:: ../code/Part4.UTLCEx1.fst + :language: fstar + :start-after: //SNIPPET_START: interpret$ + :end-before: //SNIPPET_END: interpret$ + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part4.UTLCEx1.fst + :language: fstar + +-------------------------------------------------------------------------------- + + +Denoting Lambda Terms into an F* Recursive Type ++++++++++++++++++++++++++++++++++++++++++++++++ + +We now look at a variation on the interpreter above to illustrate how +(non-positive) recursive types using ``Dv`` can also be used to give a +semantics to untyped lambda terms. + +Consider the type ``dyn`` shown below---it has a non-positive +constructor ``DFun``. We can use this type to interpret untyped lambda +terms into dynamically typed, potentially divergent, F* terms, +showing, in a way, that untyped lambda calculus is no more expressive +than F* with the ``Dv`` effect. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: dyn$ + :end-before: //SNIPPET_END: dyn$ + +The program ``denote`` shown below gives a semantics to ``term`` using +``dyn``. It is parameterized by a ``ctx : ctx_t``, which interprets +the free variables of the term into ``dyn``. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: denote$ + :end-before: //SNIPPET_END: denote$ + +We look at the cases in detail: + + * In the ``Var`` case, the intepretation just refers to the context. + + * Integers constants in ``term`` are directly interpreted to + integers in ``dyn``. + + * The case of ``Lam`` is the most interesting: An lambda abstraction + in ``term`` is interpreted as an F* function ``dyn -> Dv dyn``, + recursively calling the denotation function on the body when the + function is applied. Here's where we see the non-positivity of + ``DFun`` at play---it allows us to inject the function into the + ``dyn`` type. + + * Finally, in the application case, we interpret a syntactic + application in ``term`` as function application in F* (unless the + head is not a function, in which case we have a type error). + +Exercise +........ + +This exercise is similar in spirit to the previous one and designed to +show that you can prove some simple properties of ``denote`` by +enriching its type. + +Can you prove that a closed term can be interpreted in an empty +context? + +First, let's refine the type of contexts so that it only provides an +interpretation to only some variables: + +.. literalinclude:: ../code/Part4.UTLCEx2.fst + :language: fstar + :start-after: //SNIPPET_START: ctx_t$ + :end-before: //SNIPPET_END: ctx_t$ + +Next, let's define ``free t`` to compute the greatest index of a free +variable in a term. + +.. literalinclude:: ../code/Part4.UTLCEx2.fst + :language: fstar + :start-after: //SNIPPET_START: free$ + :end-before: //SNIPPET_END: free$ + +Can you give the same ``denote`` function shown earlier the following +type? + +.. code-block:: fstar + + val denote (t:term) (ctx:ctx_t (free t)) + : Dv dyn + +Next, define the empty context as shown below: + +.. literalinclude:: ../code/Part4.UTLCEx2.fst + :language: fstar + :start-after: //SNIPPET_START: empty_context$ + :end-before: //SNIPPET_END: empty_context$ + +Given a closed term ``t : term { closed t }``, where ``closed t = +(free t = -1)``, can you use ``denote`` to give an interpretation to +closed terms in the empty context? + + +.. container:: toggle + + .. container:: header + + **Answer** + + .. literalinclude:: ../code/Part4.UTLCEx2.fst + :language: fstar + +-------------------------------------------------------------------------------- + +Shallowly Embedded Dynamically Typed Programming +++++++++++++++++++++++++++++++++++++++++++++++++ + +In the previous example, we saw how the syntax of untyped lambda terms +can be interpreted into the F* type ``dyn``. In this example, rather +than going via the indirection of the syntax of lambda terms, we show +how the type ``dyn`` can be used directly to embed within F* a small +Turing complete, dynamically typed programming language. + +We can start by lifting the F* operations on integers and functions to +(possibly failing) operations on ``dyn``. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: lift_int$ + :end-before: //SNIPPET_END: lift_int$ + +We also encode provide operations to compare dyn-typed integers and to +branch on them, treating ``0`` as ``false``. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: branch_eq$ + :end-before: //SNIPPET_END: branch_eq$ + +For functions, we can provide combinators to apply functions and, +importantly, a combinator ``fix`` that provides general recursion. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: app_fix$ + :end-before: //SNIPPET_END: app_fix$ + +An aside on the arity of recursive functions: You may wonder why +``fix`` is defined as shown, rather than ``fix_alt`` below, which +removes a needless additional abstraction. The reason is that with +``fix_alt``, to instruct F* to disable the termination checker on the +recursive definition, we need an additional ``Dv`` annotation: indeed, +evaluating ``fixalt f`` in a call-by-value semantics would result, +unconditionally, in an infinite loop, whereas ``fix f`` would +immediately return the lambda term ``fun n -> f (fix f) n``. In other +words, eta reduction (or removing redundant function applications) +does not preserve semantics in the presence of divergence. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: fix_alt$ + :end-before: //SNIPPET_END: fix_alt$ + +With that, we can program non-trivial dynamically typed, general +recursive programs within F* itself, as seen below. + +.. literalinclude:: ../code/Divergence.fst + :language: fstar + :start-after: //SNIPPET_START: collatz_dyn$ + :end-before: //SNIPPET_END: collatz_dyn$ + +All of which is to illustrate that with general recursion and +non-positive datatypes using ``Dv``, F* is a general-purpose +programming language like ML, Haskell, Lisp, or Scheme, or other +functional languages you may be familiar with. + + diff --git a/doc/book/PoP-in-FStar/book/part4/part4_dm4f.rst.outline b/doc/book/PoP-in-FStar/book/part4/part4_dm4f.rst.outline new file mode 100644 index 00000000000..a7aaa8e5366 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_dm4f.rst.outline @@ -0,0 +1,4 @@ +.. _Part4_DM4F: + +Dijkstra Monads for Free +======================== diff --git a/doc/book/PoP-in-FStar/book/part4/part4_ghost.rst b/doc/book/PoP-in-FStar/book/part4/part4_ghost.rst new file mode 100644 index 00000000000..f2e86c8a4e3 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_ghost.rst @@ -0,0 +1,474 @@ +.. _Part4_Ghost: + +Erasure and the Ghost Effect +============================ + +When writing proof-oriented programs, inevitably, some parts of the +program serve only to state and prove properties about the code that +actually executes. Our first non-trivial effect separates the +computationally relevant parts of the program from the computationally +irrelevant (i.e., specificational or *ghost*) parts of a program. This +separation enables the F* compiler to guarantee that all the ghost +parts of a program are optimized away entirely. + +For a glimpse of what all of this is about, let's take a look again at +length-indexed vectors---we saw them first :ref:`here `. + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: vec + :end-before: //SNIPPET_END: vec + +and a function to concatenate two vectors: + +.. literalinclude:: ../code/Vec.fst + :language: fstar + :start-after: //SNIPPET_START: append + :end-before: //SNIPPET_END: append + +Compare this with concatenating two lists: + +.. code-block:: fstar + + let rec list_append #a (l1 l2:list a) = + match l1 with + | [] -> [] + | hd::tl -> hd :: list_append tl l2 + +Superficially, because of the implicit arguments, it may look like +concatenating vectors with ``append`` is just as efficient as a +concatenating lists---the length indexes seem to impose no +overhead. But, let's look at the code that F* extracts to OCaml for +length-indexed vectors. + +First, in the definition of the ``vec`` type, since OCaml is not +dependently typed, the ``nat``-index of the F* ``vec`` is replaced by +a ``'dummy`` type argument---that's fine. But, notice that the +``Cons`` constructor contains three fields: a ``Prims.nat`` for the +length of the tail of the list, the head of the list, and then then +tail, i.e., the length of the tail of the list is stored at every +``Cons`` cell, so the ``vec`` type is actually less space efficient +than an ordinary ``list``. + +.. literalinclude:: ../code/Vec.ml + :language: ocaml + :start-after: (* SNIPPET_START: vec *) + :end-before: (* SNIPPET_END: vec *) + +Next, in the OCaml definition of ``append``, we see that it receives +additional arguments ``n`` and ``m`` for the lengths of the vectors, +and worse, in the last case, it incurs an addition to sum ``n' + m`` +when building the result vector. So, ``append`` is also less +time-efficient than ``List.append``. + +.. literalinclude:: ../code/Vec.ml + :language: ocaml + :start-after: (* SNIPPET_START: append *) + :end-before: (* SNIPPET_END: append *) + +This is particularly unfortunate, since the computational behavior of +``append`` doesn't actually depend on the length indexes of the input +vectors. What we need is a principled way to indicate to the F\* +compiler that some parts of a computation are actually only there for +specification or proof purposes and that they can be removed when +compiling the code, without changing the observable result computed by +the program. This is what *erasure* is about---removing the +computationally irrelevant parts of a term for compilation. + +Here's a revised version of vectors, making use of the ``erased`` type +from the ``FStar.Ghost`` library to indicate to F* which parts must be +erased by the compiler. + +.. literalinclude:: ../code/VecErased.fst + :language: fstar + +We'll look into this in much more detail in what follows, but notice +for now that: + + 1. The first argument of ``Cons`` now has type ``erased nat``. + + 2. The implicit arguments of ``append`` corresponding to the + indexes of the input vectors have type ``erased nat``. + +If we extract this code to OCaml, here's what we get: + + +.. literalinclude:: ../code/VecErased.ml + :language: ocaml + :start-after: (* SNIPPET_START: vec *) + :end-before: (* SNIPPET_END: vec *) + +.. literalinclude:: ../code/VecErased.ml + :language: ocaml + :start-after: (* SNIPPET_START: append *) + :end-before: (* SNIPPET_END: append *) + +Notice that the erased arguments have all been turned into the unit +value ``()``, and the needless addition in ``append`` is gone too. + +Of course, the code would be cleaner if F* were to have entirely +removed the argument instead of leaving behind a unit term, but we +leave it to the downstream compiler, e.g., OCaml itself, to remove +these needless units. Further, if we're compiling the ML code +extracted from F* to C, then KaRaMeL does remove these additional +units in the C code it produces. + + +Ghost: A Primitive Effect +------------------------- + +The second, primitive effect in F*'s effect system is the effect of +*ghost* computations, i.e., computation types whose effect label is +``GTot``. [#]_ The label ``GTot`` is strictly above ``Tot`` in the +effect hierarchy, i.e., ``Tot < GTot``. This means that a term with +computation type ``GTot t`` cannot influence the behavior of a term +whose type is ``Tot s``. Conversely, every ``Tot`` computation can be +implicitly promoted to a ``GTot`` computation. + +Ghost computations are just as well-behaved as pure, total +terms---they always terminate on all inputs and exhibit no observable +effects, except for the value they return. As such, F*'s logical core +really includes both ``Tot`` and ``GTot`` computations. The +distinction between ``Tot`` and ``GTot`` is only relevant when +considering how programs are compiled. Ghost computations are +guaranteed to be erased by the the compiler, while ``Tot`` +computations are retained. + +Since ``Tot`` terms are implicitly promoted to ``GTot``, it is easy to +designate that some piece of code should be erased just by annotating +it with a ``GTot`` effect label. For example, here is an ghost version +of the factorial function: + +.. literalinclude:: ../code/FactorialTailRec.fst + :language: fstar + :start-after: //SNIPPET_START: factorial$ + :end-before: //SNIPPET_END: factorial$ + +Its definition is identical to the corresponding total function that +we saw earlier, except that we have annotated the return computation +type of the function as ``GTot nat``. This indicates to F* that +``factorial`` is to be erased during compilation, and the F* +type-and-effect checker ensures that ``Tot`` computation cannot depend +on an application of ``factorial n``. + +.. [#] The name ``GTot`` is meant to stand for "Ghost and Total" + computations, and is pronounced "gee tote". However, it's a + poor name and is far from self-explanatory. We plan to change + the name of this effect in the future (e.g., to something like + ``Spec``, ``Ghost``, or ``Erased``), though this is a breaking + change to a large amount of existing F* code. + +Ghost Computations as Specifications +------------------------------------ + +A ghost function like ``factorial`` can be used in specifications, +e.g., in a proof that a tail recursion optimization ``factorial_tail`` +is equivalent to ``factorial``. + +.. literalinclude:: ../code/FactorialTailRec.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_tail$ + :end-before: //SNIPPET_END: factorial_tail$ + +This type allows a client to use the more efficient ``fact``, but for +reasoning purposes, one can use the more canonical ``factorial``, +proven equivalent to ``fact``. + +In contrast, if we were to try to implement the same specification by +directly using the factorial ghost function, F* complains with a +effect incompatibility error. + +.. literalinclude:: ../code/FactorialTailRec.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_bad$ + :end-before: //SNIPPET_END: factorial_bad$ + +The error is: + +.. code-block:: none + + Computed type "r: nat{r == out * factorial n}" and + effect "GTot" is not compatible with the annotated + type "r: nat{r == out * factorial n}" effect "Tot" + +So, while F* forbids using ghost computations in ``Tot`` contexts, it +seems to be fine with accepting a use of factorial in specifications, +e.g., in the type ``r:nat { r == out * factorial n }``. We'll see in a +moment why this is permitted. + +Erasable and Non-informative Types +---------------------------------- + +In addition to using the ``GTot`` effect to classifies computations +that must be erased, F* also provides a way to mark certain *value +types* as erasable. + +Consider introducing an inductive type definition that is meant to +describe a proof term only and for that proof term to introduce no +runtime overhead. In a system like Coq, the type of Coq propositions +``Prop`` serves this purpose, but ``prop`` in F* is quite +different. Instead, F* allows an inductive type definition to be +marked as ``erasable``. + +For example, when we looked at the :ref:`simply typed lambda calculus +(STLC) `, we introduced the inductive type below, +to represent a typing derivation for an STLC term. One could define a +typechecker for STLC and give it the type shown below to prove it +correct: + +.. code-block:: fstar + + val check (g:env) (e:exp) : (t : typ & typing g e t) + +However, this function returns both the type ``t:typ`` computed for +``e``, we well as the typing derivation. Although the typing +derivation may be useful in some cases, often returning the whole +derivation is unnecessary. By marking the definition of the ``typing`` +inductive as shown below (and keeping the rest of the definition the +same), F* guarantees that the compiler will extract ``typing g e t`` +to the ``unit`` type and correspondinly, all values of ``typing g e +t`` will be erased to the unit value ``()`` + +.. code-block:: fstar + + [@@erasable] + noeq + type typing : env -> exp -> typ -> Type = ... + +Marking a type with the ``erasable`` attribute and having it be erased +to ``unit`` is safe because F* restricts how ``erasable`` types can be +used. In particular, no ``Tot`` computations should be able to extract +information from a value of an erasable type. + +Closely related to erasable types are a class of types that are called +*non-informative*, defined inductively as follows: + + 1. The type ``Type`` is non-informative + + 2. The type ``prop`` is non-informative (i.e., unit and all its + subtypes) + + 3. An erasable type is non-informative + + 4. A function type ``x:t -> Tot s`` is non-informative, if ``s`` is + non-informative + + 5. A ghost function type ``x:t -> GTot s`` is non-informative + + 6. A function type ``x:t -> C``, with user-defined computation type + ``C``, is non-informative if the effect label of ``C`` has the + erasable attribute. + +Intuitively, a non-informative type is a type that cannot be +case-analyzed in a ``Tot`` context. + +With this notion of non-informative types, we can now define the +restrictions on an ``erasable`` type: + + 1. Any computation that pattern matches on an erasable type must + return a non-informative type. + + 2. Inductive types with the ``erasable`` attribute do not support + built-in decidable equality and must also be marked ``noeq``. + + +The `erased` type, `reveal`, and `hide` +--------------------------------------- + +The ``erasable`` attribute can only be added to new inductive type +definitions and every instance of that type becomes erasable. If you +have a type like ``nat``, which is not erasable, but some occurrences +of it (e.g., in the arguments to ``Vector.append``) need to be erased, +the F* standard library ``FStar.Ghost.fsti`` offers the following: + +.. code-block:: fstar + + (** [erased t] is the computationally irrelevant counterpart of [t] *) + [@@ erasable] + val erased (t:Type u#a) : Type u#a + +``FStar.Ghost`` also offers a pair of functions, ``reveal`` and +``hide``, that form a bijection between ``a`` and ``erased a``. + +.. code-block:: fstar + + val reveal (#a: Type u#a) (v:erased a) : GTot a + + val hide (#a: Type u#a) (v:a) : Tot (erased a) + + val hide_reveal (#a: Type) (x: erased a) + : Lemma (ensures (hide (reveal x) == x)) + [SMTPat (reveal x)] + + val reveal_hide (#a: Type) (x: a) + : Lemma (ensures (reveal (hide x) == x)) + [SMTPat (hide x)] + +Importantly, ``reveal v`` breaks the abstraction of ``v:erased a`` +returning just an ``a``, but doing so incurs a ``GTot`` effect---so, +``reveal`` cannot be used in an arbitrary ``Tot`` contexts. + +Dually, ``hide v`` can be used to erase ``v:a``, since a ``Tot`` +context cannot depend on the value of an ``erased a``. + +The SMT patterns on the two lemmas allow F* and Z3 to automatically +instantiate the lemmas to relate a value and its hidden +counterpart---:ref:`this chapter ` provides more details on +how SMT patterns work. + +**Implicit coercions** + +``FStar.Ghost.erased``, ``reveal``, and ``hide`` are so commonly used +in F* that the compiler provides some special support for it. In +particular, when a term ``v:t`` is used in a context that expects an +``erased t``, F* implictly coerces ``v`` to ``hide v``. Likewise, when +the context expects a ``t`` where ``v:erased t`` is provided, F* +implicitly coerces ``v`` to ``reveal v``. + +The following examples illustrates a few usages and limitations. You +can ask F* to print the code with implicits enabled by using +``--dump_module RevealHideCoercions --print_implicits``. + +.. literalinclude:: ../code/RevealHideCoercions.fst + :language: fstar + +A few comments on these examples: + +* The first two functions illustrate how a ``nat`` is coerced + implicitly to ``erased nat``. Note, the effect of ``auto_reveal`` is + ``GTot`` + +* ``auto_reveal_2`` fails, since the the annotation claims, + incorrectly, that the effect label is ``Tot`` + +* ``incr`` is just a ``nat -> nat`` function. + +* ``incr_e`` is interesting because it calls ``incr`` with an + ``erased nat`` and the annotation expects an ``erased nat`` + too. The body of ``incr_e`` is implicitly coerced to ``hide (incr + (reveal x))`` + +* ``incr'`` is interesting, since it calls ``incr_e``: its body + is implicitly coerced to ``reveal (incr_e (hide x))`` + +* Finally, ``poly`` shows the limitations of implicit coercion: F* + only inserts coercions when the expected type of the term in a + context and the type of the term differ by an ``erased`` + constructor. In ``poly``, since ``==`` is polymorphic, the expected + type of the context is just an unresolved unification variable and, + so, no coercion is inserted. Instead, F* complains that ``y`` has + type ``erased nat`` when the type ``nat`` was expected. + + +.. _Ghost_in_total_contexts: + +Using Ghost Computations in Total Contexts +------------------------------------------ + +We have already noted that ``Tot < GTot``, enabling ``Tot`` +computations to be re-used in ``GTot`` contexts. For erasure to be +sound, it is crucial that ``GTot`` terms cannot be used in ``Tot`` +contexts, and indeed, F* forbids this in general. +However, there is one exception where +we can directly invoke a ``GTot`` computation in a ``Tot`` context +without wrapping the result in ``Ghost.erased``. + + +Effect Promotion for Non-informative Types +.......................................... + +Consider a term ``f`` with type ``GTot s``, where ``s`` is a +non-informative type. Since ``s`` is non-informative, no total context +can extract any information from ``f``. As such, F* allows implicitly +promoting ``GTot s`` to ``Tot s``, when ``s`` is a non-informative +type. + +For instance, the following is derivable, +``hide (factorial 0) : Tot (erased nat)``: let's work through it in detail. + +1. We know that that ``factorial n : GTot nat`` + +2. Recall from the discussion on :ref:`evaluation order + ` and the application of functions to + effectful arguments, ``hide (factorial 0)`` is equivalent to ``let + x = factorial 0 in hide x``, where ``x:nat`` and + ``hide x : Tot (erased nat)``. + +3. From the rule for sequential composition of effectful terms, the + type of ``let x = factorial 0 in hide x`` should be ``GTot (erased + nat)``, since ``GTot = lub GTot Tot``. + +4. Since ``erased nat`` is a non-informative type, ``GTot (erased + nat)`` is promoted to ``Tot (erased nat)``, which is then the type + of ``hide (factorial 0)``. + + +Effect promotion for ghost functions returning non-informative types +is very useful. It allows one to mix ghost computations with total +computations, so long as the result of the ghost sub-computation is +hidden with an erased type. For instance, in the code below, we use +``hide (factorial (n - 1))`` and use the result ``f_n_1`` in an +assertion to or some other proof step, all within a function that is +in the ``Tot`` effect. + +.. literalinclude:: ../code/FactorialTailRec.fst + :language: fstar + :start-after: //SNIPPET_START: factorial_tail_alt$ + :end-before: //SNIPPET_END: factorial_tail_alt$ + + +Revisiting Vector Concatenation +------------------------------- + +We now have all the ingredients to understand how the vector append +example shown at the start of this chapter works. Here, below, is a +version of the same code with all the implicit arguments and +reveal/hide operations made explicit. + +.. literalinclude:: ../code/VecErasedExplicit.fst + :language: fstar + +**Definition of vec** + +In the definition of the inductive type ``vec a``, we have two +occurrences of ``reveal``. Consider ``vec a (reveal n)``, the type of +the ``tl`` of the vector. ``reveal n`` is a ghost computation of type +``GTot nat``, so ``vec a (reveal n) : GTot Type``. But, since ``Type`` +is non-informative, ``GTot Type`` is promoted to ``Tot Type``. The +promotion from ``GTot Type`` to ``Tot Type`` is pervasive in F* and +enables ghost computations to be freely used in types and other +specifications. + +The ``vec a (reveal n + 1)`` in the result type of ``Cons`` is +similar. Here ``reveal n + 1`` has type ``GTot nat``, but applying it +to ``vec a`` produces a ``GTot Type``, which is promoted to ``Tot +Type``. + +**Type of append** + +The type of ``append`` has four occurrences of ``reveal``. Three of +them, in the type of ``v0``, ``v1``, and the return type behave the +same as the typing the fields of ``Cons``: the ``GTot Type`` is +promoted to ``Tot Type``. + +One additional wrinkle is in the decreases clause, where we have an +explicit ``reveal n``, since what decreases on each recursive call is +the ``nat`` that's in bijection with the parameter ``n``, rather than +``n`` itself. When F* infers a decreases clause for a function, any +erased terms in the clause are automatically revealed. + +**Definition of append** + +The recursive call instantiates the index parameters to ``n_tl`` and +``m``, which are both erased. + +When constructing the ``Cons`` node, its index argument is +instantiated to ``hide (reveal n_tl + reveal m)``. The needless +addition is marked with a ``hide`` enabling that F* compiler to erase +it. As we saw before in ``factorial_tail_alt``, using ``hide`` allows +one to mingle ghost computations (like ``(reveal n - 1)``) with total +computations, as needed for specifications and proofs. + +All of this is painfully explicit, but the implicit reveal/hide +coercions inserted by F* go a long way towards make things relatively +smooth. diff --git a/doc/book/PoP-in-FStar/book/part4/part4_pure.rst b/doc/book/PoP-in-FStar/book/part4/part4_pure.rst new file mode 100644 index 00000000000..132b75b0274 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_pure.rst @@ -0,0 +1,795 @@ +.. _Part4_Pure: + +Primitive Effect Refinements +============================== + +.. note:: + + This chapter provides some background on Floyd-Hoare logic and + weakest-precondition-based verification condition generation. This + is necessary if you want to understand a bit about how F* infers + the logical constraints needed to prove the correctness of a + program. It is also useful background for more advanced material in + subsequent chapters about defining custom effects in F*, e.g., + effects to model state, exceptions, or concurrency. + +Refinement types ``x:t{p}`` *refine* value types ``t`` and allow us to +make more precise assertions about the values in the program. For +example, when we have ``v : x:int{x >= 0}``, then not only we know +that ``v`` is an ``int``, but also that ``v >= 0``. + +In a similar manner, F* allows refining computation types with +specifications that describe some aspects of a program's computational +behavior. These *effect refinements* can, in general, be defined by +the user in a reasoning system of their choosing, e.g., the +refinements may use separation logic, or they may count computation +steps. + +However, F* has built-in support for refining the specification of +pure programs with effect refinements that encode the standard +reasoning principles of Floyd-Hoare Logic and weakest +precondition-based calculi. Foreshadowing what's about to come in this +chapter, we can write the following specification for the +``factorial`` function: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: factorial$ + :end-before: //SNIPPET_END: factorial$ + +Intuitively, this type states that ``factorial x`` is a computation +defined only when ``x >= 0`` and always terminates returning a value +``r >= 1``. In a way, this type is closely related to other, more +familiar, types we have given to ``factorial`` so far, e.g., ``nat -> +pos``, and, indeed, ``factorial`` can be used at this type. + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: fact$ + :end-before: //SNIPPET_END: fact$ + +Actually, in all the code we've seen so far, what's happening under +the covers is that F* infers a type for a pure program similar to +``Pure t pre post`` and then checks that that type can be subsumed to +a user-provided specification of the form ``Tot t'``. + +In this chapter, we look into how these ``Pure`` specifications work, +starting with a primer on Floyd-Hoare Logic and weakest precondition +calculi. If the reader is familiar with these, they may safely skip +the next subsections, though even if you are an expert, if may be of +interest to see how such program logics can be formalized in F*. + +.. _Part4_Floyd_Hoare: + +A Primer on Floyd-Hoare Logic and Weakest Preconditions +------------------------------------------------------- + +Floyd-Hoare Logic is a system of specifications and rules to reason +about the logical properties of programs, introduced by Robert Floyd +in a paper titled `Assigning Meaning to Programs +`_ and +by Tony Hoare in `An axiomatic basis for computer programming +`_. The notation used in +most modern presentations (called Hoare triples) is due to Hoare. An +algorithm to compute Hoare triples was developed by Edsger Dijkstra +`presented first in this paper +`_ , using a +technique called *weakest preconditions*. All of them received Turing +Awards for their work on these and other related topics. + +For an introduction to these ideas, we'll develop a small imperative +language with global variables, presenting + +* An operational semantics for the language, formalized as an + interpreter. + +* A Floyd-Hoare program logic proven sound with respect to the + operational semantics. + +* And, finally, an algorithm to compute weakest preconditions proved + sound against the Floyd-Hoare logic. + +Our language has the following abstract syntax: + + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: syntax$ + :end-before: //SNIPPET_END: syntax$ + +Expressions includes integer constants, global variables (represented +just as natural numbers), and some other forms, e.g., arithmetic +expressions like addition. + +A program includes: + +* Assignments, ``EAssign x e``, representing the assignment of the + result of an expression ``e`` to a global variable ``x``, i.e., ``x := e`` + +* ``Seq``, to compose programs sequentially + +* ``If`` to compose programs conditionally + +* And ``Repeat n p``, which represents a construct similar to a + ``for``-loop (or primitive recursion), where the program ``p`` is + repeated ``n`` times, where ``n`` evaluates to a non-negative + integer. + +Our language does not have ``while`` loops, whose semantics are a bit +more subtle to develop. We will look at a semantics for ``while`` in a +subsequent chapter. + +Operational Semantics +^^^^^^^^^^^^^^^^^^^^^ + +Our first step in giving a semantics to programs is to define an +interpreter for it to run a program while transforming a memory that +stores the values of the global variables. + +To model this memory, we use the type ``state`` shown below: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: state$ + :end-before: //SNIPPET_END: state$ + +Writing a small evaluator for expressions is easy: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: eval_expr$ + :end-before: //SNIPPET_END: eval_expr$ + +The interpreter for programs itself takes a bit more work, since +programs can both read and write the state. To structure our +interpreter, we'll introduce a simple state monad ``st a``. We've seen +this construction before in :ref:`a previous chapter +`---so, look there if the state monad is unfamiliar +to you. Recall that F* has support for monadic let operators: the +``let!`` provides syntactic sugar to convenient compose ``st`` terms. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: monad$ + :end-before: //SNIPPET_END: monad$ + +Now, the interpreter itself is a total, recursive function ``run`` which +interprets a program ``p`` as a state-passing function of type ``st +unit``, or ``state -> unit & state``. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: run$ + :end-before: //SNIPPET_END: run$ + +Let's look at its definition in detail: + + * ``Assign x e``: Evaluate ``e`` in the current state and then + update the state with a new value of ``x``. + + * ``Seq p1 p2``: Simply run ``p1`` and then run ``p2``, where + ``;!`` is syntactic sugar for ``let! _ = run p1 in run p2``. + + * ``If e p1 p2``: Evaluate ``e`` in the current state, branch on + its result and run either ``p1`` or ``p2`` + + * ``Repeat e p``: Evaluate ``e`` to ``n``, and if ``n`` is + greater than zero, call the mutually recursive ``run_repeat n + p``. Most of the subtlety here is in convincing F* that this + mutually recursive function terminates, but this is fairly + straightforward once you know how---we discussed + :ref:`termination proofs for mutually recursive functions earlier + `. + +These operational semantics are the ground truth for our programming +language---it defines how programs execute. Now that we have that +settled, we can look at how a Floyd-Hoare logic makes it possible to +reason about programs in a structured way. + +Floyd-Hoare Logic +^^^^^^^^^^^^^^^^^ + +The goal of a Floyd-Hoare logic is to provide a way to reason about a +program based on the structure of its syntax, rather than reasoning +directly about its operational semantics. The unit of reasoning is +called a *Hoare triple*, a predicate of the form ``{P} c {Q}``, where +``P`` and ``Q`` are predicates about the state of the program, and +``c`` is the program itself. + +We can *define* Hoare triples for our language by interpreting them as +an assertion about the operational semantics, where ``triple p c q`` +represents, formally, the Hoare triple ``{ p } c { q }``. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: triple$ + :end-before: //SNIPPET_END: triple$ + +The predicate ``triple p c q`` is valid, if when executing ``c`` in a +state that satisfies ``p`` results in a state that satisfies ``q``. +The predicates ``p`` and ``q`` are also called precondition and +postcondition of ``c``, respectively. + +For each syntactic construct of our language, we can prove a lemma +that shows how to build an instance of the ``triple`` predicate for +that construct. Then, to build a proof of program, one stitches +together these lemmas to obtain a ``triple p main q``, a statement of +correctess of the ``main`` program. + +Assignment +++++++++++ + +Our first rule is for reasoning about variable assignment: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: assignment$ + :end-before: //SNIPPET_END: assignment$ + +This lemma says that ``post`` holds after executing ``x := e`` in the +initial state ``s0``, if ``post`` holds on the initial state updated +at ``x`` with the value of ``e``. + +For example, to prove that after executing ``z := y + 1`` in ``s0``, +if we expect the value of ``z`` to be greater than zero`, then the +assignment rule says that ``read s0 y + 1 > 0`` should hold before the +assignment, which is what we would expect. + +Sequence +++++++++ + +Our next lemma about triples stitches together triples for two +programs that are sequentially composed: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: sequence$ + :end-before: //SNIPPET_END: sequence$ + +The lemma says that if we can derive the Hoare triples of the two +statements such that postcondition of ``p1`` matches the precondition +of ``p2``, then we can compose them. + + +Conditional ++++++++++++ + +The lemma for conditionals is next: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: conditional$ + :end-before: //SNIPPET_END: conditional$ + +It says that to derive the postcondition ``post`` from the ``If e p1 +p2``, we should be able to derive it from each of the branches with the +same precondition ``pre``. In addition, since we know that ``p1`` +executes only when ``e`` is non-zero, we can add these facts to the +preconditions of each branch. + +Repeat +++++++ + +In all the cases so far, these lemmas are proved automated by F* and +Z3. In the case of repeats, however, we need to do a little more work, +since an inductive argument is involved. + +The rule for ``repeat`` requires a *loop invariant* ``inv``. The loop +invariant is an assertion that holds before the loop starts, is +maintained by each iteration of the loop, and is provided as the +postcondition of the loop. + +The lemma below states that if we can prove that ``triple inv p inv``, +then we can also prove ``triple inv (Repeat e p) inv``. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: repeat$ + :end-before: //SNIPPET_END: repeat$ + +The auxiliary lemma ``repeat_n`` proves that ``run_repeat p n`` +preserves ``inv``, if ``p`` preserves ``inv``. + +To call this lemma from the main ``repeat`` lemma, we need to "get +our hands on" the initial state ``s0``, and the :ref:`syntactic sugar +to manipulate logical connectives ` makes this +possible. + + +Consequence ++++++++++++ + +The final lemma about our Hoare triples is called the rule of +consequence. It allows strengthening the precondition and weakening +the postcondition of a triple. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: consequence$ + :end-before: //SNIPPET_END: consequence$ + +A precondition of a program is an obligation before the statement +is executed. So if ``p`` requires ``pre``, we can always strengthen +the precondition to ``pre'``, provided ``pre' ==> pre``, i.e. it is +logically valid to require more than necessary in the +precondition. Similarly, postcondition is what a statement +guarantees. So if ``p`` guarantees ``post``, we can always weaken it +to guarantee less, i.e. some ``post'`` where ``post ==> post'``. + +Weakest Preconditions +^^^^^^^^^^^^^^^^^^^^^ + +The rules of Floyd-Hoare logic provide an abstract way to reason about +programs. However, the rules of the logic are presented +declaratively. For example, to apply the ``sequence`` rule, one has +derive triples for each component in a way that they prove exactly the +same assertion (``pre_mid``) about the intermediate state. There may +be many ways to do this, e.g., one could apply the rule of +consequence to weaken the postcondition of the first component, or to +strengthen the precondition of the second component. + +Dijkstra's system of weakest preconditions eliminates such ambiguity +and provides an *algorithm* for computing valid Hoare triples, +provided the invariants of all loops are given. This makes weakest +preconditions the basis of many program proof tools, since given a +program annotated with loop invariants, one can simply compute a +logical formula (called a verification condition) whose validity +implies the correctness of the program. + +At the core of the approach is a function ``WP (c, Q)``, which +computes a unique, weakest precondition ``P`` for the program ``c`` +and postcondition ``Q``. The semantics of ``WP`` is that ``WP (c, Q)`` +is the weakest precondition that should hold before executing ``c`` +for the postcondition ``Q`` to be valid after executing ``c``. Thus, +the function ``WP`` assigns meaning to programs as a transformer of +postconditions ``Q`` to preconditions ``WP (s, Q)``. + +The ``wp`` function for our small imperative language is shown below: + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: wp$ + :end-before: //SNIPPET_END: wp$ + +* The case of ``Assign`` is identical to the ``assignment`` lemma + shown earlier. + +* The case of ``Seq`` sequentially composes the wp's. That is, to + prove the ``post`` after running ``p1 ;; p2`` we need to prove ``wp + p2 post`` after running ``p1``. It may be helpful to read this case + as the equivalent form ``fun s0 -> wp p1 (fun s1 -> wp p2 post s1) + s0``, where ``s0`` is the initial state and ``s1`` is the state that + results after running just ``p1``. + +* The ``If`` case computes the WPs for each branch and requires them + to be proven under the suitable branch condition. + +* The ``Repeat`` case is most interesting: it involves an + existentially quantified invariant ``inv``, which is the loop + invariant. That is, to reason about ``Repeat n p``, one has to + somehow find an invariant ``inv`` that is true initially, and + implies both the WP of the loop body as well as the final + postcondition. + +The ``wp`` function is sound in the sense that it computes a +sufficient precondition, as proven by the following lemma. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: wp_soundness$ + :end-before: //SNIPPET_END: wp_soundness$ + +One could also prove that ``wp`` computes the weakest precondition, +i.e., if ``triple p c q`` then ``forall s. p s ==> wp c q s``, though +we do not prove that formally here. + +A Sample Program Proof +^^^^^^^^^^^^^^^^^^^^^^ + +We now illustrate some sample proofs using our Hoare triples and +``wp`` function. To emphasize that Hoare triples provide an *abstract* +way of reasoning about the execution of programs, we define the +``hoare p c q`` an alias for ``triple p c q`` marked with an attribute +to ensure that F* and Z3 cannot reason directly about the underlying +definition of ``triple``---that would allow Z3 to find proofs by +reasoning about the operational semantics directly, which we want to +avoid , since it would not scale to larger programs. For more about +the ``opaque_to_smt`` and ``reveal_opaque`` construct, please see +:ref:`this section on opaque definitions `. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: hoare$ + :end-before: //SNIPPET_END: hoare$ + +The lemmas above are just restatements of the ``wp_soundness`` and +``consequence`` lemmas that we've already proven. Now, these are the +only two lemmas we have to reason about the ``hoare p c q`` predicate. + +Next, we define some notation to make it a bit more convenient to +write programs in our small language. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: notation$ + :end-before: //SNIPPET_END: notation$ + +Finally, we can build proofs of some simple, loop-free programs +automatically by computing their ``wp`` using ``wp_hoare`` and +applying ``hoare_consequence`` to get F* and Z3 to prove that the +inferred WP is implied by the annotated precondition. + +.. literalinclude:: ../code/Imp.fst + :language: fstar + :start-after: //SNIPPET_START: swap$ + :end-before: //SNIPPET_END: swap$ + +This recipe of computing verification conditions using WPs and then +checking the computed WP against the annotated specification using a +solver like Z3 is a very common and powerful pattern. In fact, as +we'll see below, the methodology that we've developed here for our +small imperative language is exactly what the F* typechecker does (at +a larger scale and for the whole F* language) when checking an F* +program. + + +The ``PURE`` Effect: A Dijkstra Monad for Pure Computations +------------------------------------------------------------ + +F* provides a weakest precondition calculus for reasoning about pure +computations. The calculus is based on a *Dijkstra Monad*, a +construction first introduced in `this paper +`_. In +this chapter, we will learn about Dijkstra Monad and its usage in +specifying and proving pure programs in F*. + +The first main difference in adapting the Hoare triples and weakest +precondition computations that we saw earlier to the setting of F*'s +functional language is that there are no global variables or mutable +state (we'll see about how model mutable state in F*'s effect system +later). Instead, each pure expression in F*'s *returns* a value and +the postconditions that we will manipulate are predicates about these +values, rather than state predicates. + +To illustrate, we sketch the definition of pure WPs below. + +.. code-block:: none + + WP c Q = Q c + WP (let x = e1 in e2) Q = WP e1 (fun x -> WP e2 Q) + WP (if e then e1 else e2) Q = (e ==> WP e1 Q) /\ (~e ==> WP e2 Q) + +* The WP of a constant ``c`` is just the postcondition ``Q`` applied to ``c``. + +* The WP of a ``let`` binding is a sequential composition of WPs, + applied to the *values* returned by each sub-expression + +* The WP of a condition is the WP of each branch, weakened by the + suitable branch condition, as before. + +The F* type system internalizes and generalizes this WP construction +to apply it to all F* terms. The form this takes is as a computation +type in F*, ``PURE a wp``, where in ``prims.fst``, ``PURE`` is defined +as an F* primitive effect with a signature as shown below--we'll see +much more of the ``new_effect`` syntax as we look at user-defined +effects in subsequent chapters; for now, just see it as a reserved +syntax in F* to introduce a computation type constructor. + +.. code-block:: fstar + + new_effect PURE (a:Type) (w:wp a) { ... } + +where + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: wp$ + :end-before: //SNIPPET_END: wp$ + +A program ``e`` of type ``PURE a wp`` is a computation which + + * Is defined only when ``wp (fun _ -> True)`` is valid + + * If ``wp post`` is valid, then ``e`` terminates without any side + effects and returns a value ``v:a`` satisfying ``post v``. + +Notice that ``wp a`` is the type of a function transforming +postconditions (``a -> Type0``) to preconditions (``Type0``). [#]_ The +``wp`` argument is also called an *index* of the ``PURE`` effect. [#]_ + +The return operator for ``wp a`` is shown below: it is analogous to +the ``WP c Q`` and ``WP x Q`` rules for variables and constants that +we showed earlier: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: return_wp$ + :end-before: //SNIPPET_END: return_wp$ + +The bind operator for ``wp a`` is analogous to the rule for sequencing +WPs, i.e., the rule for ``WP (let x = e1 in e2) Q`` above: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: bind_wp$ + :end-before: //SNIPPET_END: bind_wp$ + +Finally, analogous to the WP rule for conditionals, one can write a +combinator for composing ``wp a`` in a branch: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: if_then_else_wp$ + :end-before: //SNIPPET_END: if_then_else_wp$ + + +This is the essence of the Dijkstra monad construction for pure +programs: the rule for computing weakest preconditions for a +computation *returning* a value ``x`` is ``return_wp``; the rule for +computing the WP of the sequential composition of terms is the +sequential composition of WPs using ``bind_wp``; the rule for +computing the WP of a conditional term is the conditional composition +of WPs using ``if_then_else_wp``. + +If fact, if one thinks of pure computations as the identity monad, +``tot a`` as shown below: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: tot$ + :end-before: //SNIPPET_END: tot$ + +then the parallel between the ``tot`` monad and ``wp`` becomes even +clearer---the WP analog of ``return_tot`` is ``return_wp`` and of +``bind_tot`` is ``bind_wp``. + +It turns out that ``wp a`` (for monotonic weakest preconditions) is +itself a monad, as shown below by a proof of the monad laws: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: mwp_laws$ + :end-before: //SNIPPET_END: mwp_laws$ + + +.. [#] It is also possible to define ``post a = a -> prop`` and ``pre + = prop``. However, the F* libraries for pure WPs using + ``Type0`` instead of ``prop``, so we remain faithful to that + here. + +.. [#] Dijkstra monads are also related to the continuation + monad. Continuation monad models `Continuation Passing Style + `_ + programming, where the control is passed to the callee + explicitly in the form of a continuation. For a result type + ``r``, the continuation monad is defined as follows: + + .. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: cont$ + :end-before: //SNIPPET_END: cont$ + + If we squint a bit, we can see that the ``wp`` monad we defined + earlier, is nothing but a continuation into ``Type0``, i.e., + ``wp a = cont Type0 a`` (or ``cont prop a``, if one prefers to + use ``prop``). + +``PURE`` and ``Tot`` +--------------------- + +When typechecking a program, F* computes a weakest precondition which +characterizes a necessary condition for the program to satisfy all its +typing constraints. This computed weakest precondition is usually +hidden from the programmer, but if you annotate your program suitably, +you can get access to it, as shown in the code snippet below: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: square$ + :end-before: //SNIPPET_END: square$ + +The type says that ``square n`` is a pure function, which for any +postcondition ``q:nat -> prop``, + + * Is defined only when ``n * n >= 0`` and when ``q (n * n)`` is valid + + * And returns a value ``m:nat`` satisfying ``q m`` + +Let's look at another example: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: maybe_incr$ + :end-before: //SNIPPET_END: maybe_incr$ + +Notice how the ``wp`` index of ``PURE`` mirrors the structure of the +computation itself---it starts with an ``if_then_else_wp``, then in +the first branch, uses a ``bind_wp`` followed by a return; and in the +else branch it returns ``x``. + +As such, the wp-index simply "lifts" the computation into a +specification in a form amenable to logical reasoning, e.g., using the +SMT solver. For pure programs this may seem like overkill, since the +pure term itself can be reasoned about directly, but when the term +contains non-trivial typing constraints, e.g., such as those that +arise from refinement type checking, lifting the entire program into a +single constraint structures and simplifies logical reasoning. + +Of course, one often writes specifications that are more abstract than +the full logical lifting of the program, as in the example below, +which says that to prove ``post`` of the return value, the +precondition is to prove ``post`` on all ``y >= x``. This is a +valid, although weaker, characterization of the function's return +value. + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: maybe_incr2$ + :end-before: //SNIPPET_END: maybe_incr2$ + +The ``PURE`` computation type comes with a built-in weakening rule. In +particular, if a term is computed to have type ``PURE a wp_a`` and it is +annotated to have type ``PURE b wp_b``, then F* does the following: + + 1. It computes a constraint ``p : a -> Type0``, which is sufficient + to prove that ``a`` is a subtype of ``b``, e.g., is ``a = int`` + and ``b = nat``, the constraint ``p`` is ``fun (x:int) -> x >= + 0``. + + 2. Next, it strengthens ``wp_a`` to assert that the returned value + validates the subtyping constraints ``p x``, i.e., it builds + ``assert_wp wp_a p``, where + + .. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: assert_wp$ + :end-before: //SNIPPET_END: assert_wp$ + + 3. Finally, it produces the verification condition ``stronger_wp #b + wp_b (assert_wp wp_a p)``, where ``stronger_wp`` is defined as + shown below: + + .. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: stronger_wp$ + :end-before: //SNIPPET_END: stronger_wp$ + + That is, for any postcondition ``post``, the precondition ``wp_b + post`` implies the original precondition ``wp_a post`` as well as + the subtyping constraint ``p x``. This matches the intuition + about preconditions that we built earlier: it is always sound to + require more in the precondition. + +Thus, when we have ``e:PURE a wp`` in F*, the ``wp`` is *a* predicate +transformer for ``e``, not necessarily the weakest one. + +Of course, even ``maybe_incr2`` is not particularly idiomatic in +F*. One would usually annotate a program with a refinement type, such +as the one below: + +.. literalinclude:: ../code/Pure.fst + :language: fstar + :start-after: //SNIPPET_START: maybe_incr_tot$ + :end-before: //SNIPPET_END: maybe_incr_tot$ + +Internally to the compiler, F* treats ``Tot t`` as the following +instance of ``PURE``: + +.. code-block:: fstar + + Tot t = PURE t (fun post -> forall (x:t). post x) + +Once ``Tot t`` is viewed as just an instance of ``PURE``, checking if +a user annotation ``Tot t`` is stronger than the inferred +specification of a term ``PURE a wp`` is just as explained before. + +``Pure``: Hoare Triples for ``PURE`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Although specification are easier to *compute* using WPs, they are +more natural to read and write when presented as Hoare triples, with a +clear separation between precondition and postconditions. Further, +when specifications written as Hoare triples naturally induce +monotonic WPs. + +F* provides an effect abbreviation called ``Pure`` for writing and +typechecking Hoare-style specifications for pure programs, and is +defined as shown below in ``prims.fst``: + +.. code-block:: fstar + + effect Pure (a:Type) (req:Type0) (ens:a -> Type0) = + PURE a (fun post -> req /\ (forall x. ens x ==> post x)) + +The signature of ``Pure`` is ``Pure a req ens``, where ``req`` is the +precondition and ``ens:a -> Type0`` is the postcondition. Using +``Pure``, we can write the ``factorial`` function we saw at the top of +this chapter---F* infers a ``PURE a wp`` type for it, and relates it +to the annotated ``Pure int req ens`` type, proving that the latter +has a stronger precondition and weaker postcondition. + +One may wonder when one should write specifications using the notation +``x:a -> Pure b req ens`` versus ``x:a{req} -> Tot (y:b { ens y +})``. The two styles are closely related and choosing between them is +mostly a matter of taste. As you have seen, until this point in the +book, we have not used ``Pure a req ens`` at all. However, when a +function has many pre and postconditions, it is sometimes more +convenient to use the ``Pure a req ens`` notation, rather than +stuffing all the constraints in refinement types. + + +``GHOST`` and ``DIV`` +--------------------- + +Just as ``PURE`` is an wp-indexed refinement of ``Tot``, F* provides +two more primitive wp-indexed effects: + + * ``GHOST (a:Type) (w:wp a)`` is a refinement of ``GTot a`` + + * ``DIV (a:Type) (w:wp a)`` is a refinement of ``Dv a`` + +That is, F* uses the ``GHOST`` effect to infer total correctness WPs +for ghost computations, where, internally, ``GTot a`` is equivalent to +``GHOST a (fun post -> forall x. post x)`` + +Likewise, F* uses the ``DIV`` effect to infer *partial correctness* +WPs for potentially non-terminating computations, where, internally, +``Dv a`` is equivalent to ``DIV a (fun post -> forall x. post x)``. + +As with ``Tot`` and ``PURE``, F* automatically relates ``GTot`` and +``GHOST`` computations, and ``Dv`` and ``DIV`` computations. Further, +the effect ordering ``Tot < Dv`` and ``Tot < GTot`` extends to ``PURE +< DIV`` and ``PURE < GHOST`` as well. + +The ``prims.fst`` library also provides Hoare-triple style +abbreviations for ``GHOST`` and ``DIV``, i.e., + +.. code-block:: fstar + + effect Ghost a req ens = GHOST a (fun post -> req /\ (forall x. ens x /\ post x)) + effect Div a req ens = DIV a (fun post -> req /\ (forall x. ens x /\ post x)) + +These Hoare-style abbreviations are more convenient to use than their +more primitive WP-based counterparts. + +The tradeoffs of using ``Ghost`` vs. ``GTot`` or ``Div`` vs. ``Dv`` +are similar to those for ``Pure`` vs ``Tot``---it's mostly a matter of +taste. In fact, there are relatively few occurrences of ``Pure``, +``Ghost``, and ``Div`` in most F* codebases. However, there is one +important exception: ``Lemma``. + +The ``Lemma`` abbreviation +-------------------------- + +We can finally unveil the definition of the ``Lemma`` syntax, which we +introduced as a syntactic shorthand in :ref:`an early chapter +`. In fact, ``Lemma`` is defined in ``prims.fst`` +as follows: + +.. code-block:: fstar + + effect Lemma (a: eqtype_u) + (pre: Type) + (post: (squash pre -> Type)) + (smt_pats: list pattern) = + Pure a pre (fun r -> post ()) + +That is, ``Lemma`` is an instance of the Hoare-style refinement of +pure computations ``Pure a req ens``. So, when you write a proof term +and annotate it as ``e : Lemma (requires pre) (ensures post)``, F* +infers a specification for ``e : PURE a wp``, and then, as with all +PURE computations, F* tries to check that the annotated ``Lemma`` +specification has a stronger WP-specification than the computed +weakest precondition. + +Of course, F* still includes syntactic sugar for ``Lemma``, e.g., +``Lemma (requires pre) (ensures post)`` is desugared to ``Lemma unit +pre (fun _ -> post) []``. The last argument of a lemma, the +``smt_pats`` are used to introduce lemmas to the SMT solver for proof +automation---a :ref:`later chapter ` covers that in detail. + +Finally, notice the type of the ``post``, which assumes ``squash pre`` +as an argument--this is what allows the ``ensures`` clause of a +``Lemma`` to assume that what was specified in the ```requires`` +clause. + diff --git a/doc/book/PoP-in-FStar/book/part4/part4_user_defined_effects.rst.outline b/doc/book/PoP-in-FStar/book/part4/part4_user_defined_effects.rst.outline new file mode 100644 index 00000000000..b8537122fb3 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part4/part4_user_defined_effects.rst.outline @@ -0,0 +1,53 @@ +.. _Part4_UDFX: + + +User-defined Effects +==================== + +General definitions, various classes of effects, substitutive etc. + +And then backed by many examples. + +A less formal, more code-centric version of the paper. + +I would expect this to be quite long, with several sub-sections. + +It would be nice for each section to introduce background on basic +constructions + +e.g., + +a section on graded monads + +a section on Dijkstra monads generalized from before, as a morphism between computational and spec monad + +a section on algebraic effects, with some background on what they're good for and how we model them + +etc. + + + + + +The primitive effects in F* provide a fixed specification and +reasoning mechanism for pure, ghost, and divergent +computations. :ref:`Earlier ` we also saw that using monads +we can model different kind of effects and specify their +semantics. For reasoning about effectful programs, however, such +semantic models may not be the right tool. Indeed several +monad-like abstractions have been proposed in the literature that are +suitable for different tasks. With user-defined effects, F* allows +building such custom abstractions and program logics, seamlessly +integrated with other features (recursion, inductive types, +...) and programmability using the same syntax that we have seen so +far. We turn our attention to user-defined effects next. + + + + + + + + + + diff --git a/doc/book/PoP-in-FStar/book/part5/part5.rst b/doc/book/PoP-in-FStar/book/part5/part5.rst new file mode 100644 index 00000000000..39393d0f3c8 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part5/part5.rst @@ -0,0 +1,46 @@ +.. _Part5: + +######################################## +Tactics and Metaprogramming with Meta-F* +######################################## + +**This part of the book is still heavily under construction** + +So far, we have mostly relied on the SMT solver to do proofs in F*. +This works rather well: we got this far, after all! However, sometimes, +the SMT solver is really not able to complete our proof, or takes too +long to do so, or is not *robust* (i.e. works or fails due to seemingly +insignificant changes). + +This is what Meta-F* was originally designed for. It provides the +programmer with more control on how to break down a proof and guide +the SMT solver towards finding a proof by using *tactics*. Moreover, a +proof can be fully completed within Meta-F* without using the SMT +solver at all. This is the usual approach taken in other proof +assistants (such as Lean, Coq, or Agda), but it's not the preferred +route in F*: we will use the SMT for the things it can do well, and +mostly write tactics to "preprocess" obligations and make them easier +for the solver, thus reducing manual effort. + +Meta-F* also allows for *metaprogramming*, i.e. generating programs +(or types, or proofs, ...) automatically. This should not be +surprising to anyone already familiar with proof assistants and the +`Curry-Howard correspondence +`_. There +are however some slight differences between tactic-based proofs and +metaprogramming, and more so in F*, so we will first look at +automating proofs (i.e. tactics), and then turn to metaprogramming +(though we use the generic name "metaprogram" for tactics as well). + +In summary, when the SMT solver "just works" we usually do not bother +writing tactics, but, if not, we still have the ability to roll up our +sleeves and write explicit proofs. + +Speaking of rolling up our sleeves, let us do just that, and get our +first taste of tactics. + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + part5_meta diff --git a/doc/book/PoP-in-FStar/book/part5/part5_meta.rst b/doc/book/PoP-in-FStar/book/part5/part5_meta.rst new file mode 100644 index 00000000000..c4238f778c4 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/part5/part5_meta.rst @@ -0,0 +1,523 @@ +.. _MetaFStar_intro: + +An Overview of Tactics +====================== + +In this chapter, we quickly introduce several of the main concepts +underlying Meta-F* and its use in writing tactics for proof +automation. The goal is to get you quickly up to speed on basic uses +of tactics. Subsequent chapters will revisit the concepts covered here +in more detail, introduce more advanced aspects of Meta-F*, and show +them at use in several case studies. + + +Decorating assertions with tactics +---------------------------------- + +As you know already, F* verifies programs by computing verification +conditions (VCs) and calling an SMT solver (Z3) to prove them. Most +simple proof obligations are handled completely automatically by Z3, +and for more complex statements we can help the solver find a proof +via lemma calls and intermediate assertions. Even when using lemma +calls and assertions, the VC for a definition is sent to Z3 in one +single piece (though :ref:`SMT queries can be split via an option +`.). This "monolithic" style of proof can become +unwieldy rapidly, particularly when the solver is being pushed to its +limits. + +The first ability Meta-F* provides is allowing to attach specific +tactics to assertions. These tactics operate on the "goal" that we +want to prove, and can "massage" the assertion by simplifying it, +splitting it into several sub-goals, tweaking particular SMT options, +etc. + +For instance, let us take the the following example, where we want to +guarantee that ``pow2 x`` is less than one million given that ``x`` is +at most ``19``. One way of going about this proof is by noting that +``pow2`` is an increasing function, and that ``pow2 19`` is less than +one million, so we try to write something like this: + +.. literalinclude:: ../code/Part5.Pow2.fst + :language: fstar + :start-after: //SNIPPET_START: pow2_0 + :end-before: //SNIPPET_END: pow2_0 + +Sadly, this doesn't work. First of all, Z3 cannot automatically +prove that ``pow2`` is increasing, but that is to be expected. +We could prove this by a straightforward induction. However, we +only need this fact for ``x`` and ``19``, so we can simply call +``FStar.Math.Lemmas.pow2_le_compat`` from the library: + +.. literalinclude:: ../code/Part5.Pow2.fst + :language: fstar + :start-after: //SNIPPET_START: pow2_1 + :end-before: //SNIPPET_END: pow2_1 + +Now, the second assertion fails. Z3 will not, with the default fuel +limits, unfold pow2 enough times to compute ``pow2 19`` +precisely. (You can read more about how F* :ref:`uses "fuel" to +control the SMT solver's ability to unfold recursive definitions +`.) Here we will use our first call into Meta-F*: via +the ``by`` keyword, we can attach a tactic to an assertion. In this +case, we'll ask Meta-F* to ``compute()`` over the goal, simplifying as +much as it can via F*'s normalizer, like this: + +.. literalinclude:: ../code/Part5.Pow2.fst + :language: fstar + :start-after: //SNIPPET_START: pow2_2 + :end-before: //SNIPPET_END: pow2_2 + +Now the lemma verifies! Meta-F* reduced the proof obligation into a +trivial equality. Crucially, however, the ``pow2 19 == 524288`` shape is +kept as-is in the postcondition of the assertion, so we can make use of +it! If we were just to rewrite the assertion into ``524288 == 524288`` +that would not be useful at all. + +How can we know what Meta-F* is doing? We can use the ``dump`` tactic to +print the state of the proof after the call to ``compute()``. + +.. literalinclude:: ../code/Part5.Pow2.fst + :language: fstar + :start-after: //SNIPPET_START: pow2_3 + :end-before: //SNIPPET_END: pow2_3 + +With this version, you should see something like: + +.. code-block:: fstar + + Goal 1/1 + x: x: nat{x < 20} + p: pure_post unit + uu___: forall (pure_result: unit). pow2 x < 1000000 ==> p pure_result + pure_result: unit + uu___'0: pow2 x <= pow2 19 + -------------------------------------------------------------------------------- + squash (524288 == 524288) + (*?u144*) _ + +as output from F* (or in the goals buffer if you are using emacs with fstar-mode.el). +The ``print`` primitive can also be useful. + +A "goal" is some proof obligation that is yet to be solved. Meta-F* +allows you to capture goals (e.g. via ``assert..by``), modify them (such +as with ``compute``), and even to completely solve them. In this case, we +can solve the goal (without Z3) by calling ``trivial()``, a helper +tactic that discharges trivial goals (such as trivial equalities). + +.. literalinclude:: ../code/Part5.Pow2.fst + :language: fstar + :start-after: //SNIPPET_START: pow2_4 + :end-before: //SNIPPET_END: pow2_4 + +If you ``dump`` the state just after the ``trivial()`` call, you should +see no more goals remain (this is what ``qed()`` checks). + +.. note:: + + Meta-F* does not yet allow a fully interactive style of proof, and + hence we need to re-check the entire proof after every edit. We hope + to improve this soon. + +There is still the "rest" of the proof, namely that ``pow2 x < 1000000`` +given the hypothesis and the fact that the assertion holds. We call +this *skeleton* of the proof, and it is (by default) not handled by +Meta-F*. In general, we only use tactics on those assertions that are +particularly hard for the SMT solver, but leave all the rest to it. + +The ``Tac`` effect +------------------- + +.. note:: + + Although we have seen a bit about :ref:`monads and computational + effects in a previous chapter `, we have yet to fully + describe F*'s effect system. So, some of what follows may be a bit + confusing. However, you don't need to fully understand how the + ``Tac`` effect is implemented to use tactics. Feel free to skip + ahead, if this section doesn't make much sense to you. + +What, concretely, are tactics? So far we've written a few simple ones, +without too much attention to their structure. + +Tactics and metaprograms in F* are really just F* terms, but *in a +particular effect*, namely ``Tac``. To construct interesting +metaprograms, we have to use the set of *primitives* provided by Meta-F*. +Their full list is in the ``FStar.Tactics.Builtins`` module. +So far, we have actually not used any primitive directly, but +only *derived* metaprograms present in the standard library. + +Internally, ``Tac`` is implemented via a combination of 1) a state +monad, over a ``proofstate``, 2) exceptions and 3) divergence or +non-termination. The state monad is used to implicitly carry the +proofstate, without us manually having to handle all goals +explicitly. Exceptions are a useful way of doing error handling. Any +declared exception can be ``raise``'d within a metaprogram, and the +``try..with`` construct works exactly as for normal programs. There +are also ``fail``, ``catch`` and ``recover`` primitives. + +Metaprograms cannot be run directly. This is needed to retain the +soundness of pure computations, in the same way that stateful and +exception-raising computations are isolated from the ``Pure`` fragment +(and from each other). Metaprograms can only be used where F* expects +them , such as in an ``assert..by`` construct. Here, F* will run the +metaprogram on an initial proofstate consisting (usually) of a single +goal, and allow the metaprogram to modify it. + +To guarantee soundness, i.e., that metaprograms do not prove false +things, all of the primitives are designed to perform small and +correct modifications of the goals. Any metaprogram constructed from +them cannot do anything to the proofstate (which is abstract) except +modifying it via the primitives. + +Having divergence as part of the ``Tac`` effect may seem a bit odd, +since allowing for diverging terms usually implies that one can form a +proof of false, via a non-well-founded recursion. However, we should +note that this possible divergence happens at the *meta* level. If we +call a divergent tactic, F* will loop forever waiting for it to finish, +never actually accepting the assertion being checked. + +As you know, F* already has exceptions and divergence. All ``Dv`` and +``Ex`` functions can readily be used in Meta-F* metaprograms, as well as +all ``Tot`` and ``Pure`` functions. For instance, you can use all of the +``FStar.List.Tot`` module if your metaprogram uses lists. + +Goals +----- + +Essentially, a Meta-F* tactic manipulates a *proofstate*, which is +essentially a set of *goals*. Tactic primitives usually work on the +goals, for example by simplifying (like ``compute()``) or by breaking +them down into smaller *sub*-goals. + +When proving assertions, all of our goals will be of the shape ``squash +phi``, where ``phi`` is some logical formula we must prove. One way to +break down a goal into subparts is by using the ``mapply`` tactic, which +attempts to prove the goal by instantiating the given lemma or function, +perhaps adding subgoals for the hypothesis and arguments of the lemma. +This "working backwards" style is very common in tactics frameworks. + +For instance, we could have proved the assertion that ``pow2 x <= pow2 19`` +in the following way: + +.. code-block:: fstar + + assert (pow2 x <= pow2 19) by (mapply (`FStar.Math.Lemmas.pow2_le_compat)); + +This reduces the proof of ``pow2 x <= pow2 19`` to ``x <= 19`` (the +precondition of the lemma), which is trivially provably by Z3 in this +context. Note that we do not have to provide the arguments to the lemma: +they are inferred by F* through *unification*. In a nutshell, this means +F* finds there is an obvious instantiation of the arguments to make the +postcondition of the lemma and the current assertion coincide. When some +argument is *not* found via unification, Meta-F* will present a new goal +for it. + +This style of proof is more *surgical* than the one above, since the +proof that ``pow2 x <= pow2 19`` does not "leak" into the rest of the +function. If the proof of this assertion required several auxiliary +lemmas, or a tweak to the solver's options, etc, this kind of style can +pay off in robustness. + +Most tactics works on the *current* goal, which is the first one in +the proofstate. When a tactic reduces a goal ``g`` into ``g1,...,gn``, the +new ``g1,..,gn`` will (usually) be added to the beginning of the list of +goals. + +In the following simplified example, we are looking to prove ``s`` +from ``p`` given some lemmas. The first thing we do is apply the +``qr_s`` lemma, which gives us two subgoals, for ``q`` and ``r`` +respectively. We then need to proceed to solve the first goal for +``q``. In order to isolate the proofs of both goals, we can ``focus`` +on the current goal making all others temporarily invisible. To prove +``q``, we then just use the ``p_r`` lemma and obtain a subgoal for +``p``. This one we will just just leave to the SMT solver, hence we +call ``smt()`` to move it to the list of SMT goals. We prove ``r`` +similarly, using ``p_r``. + +.. literalinclude:: ../code/Part5.Mapply.fst + :language: fstar + :start-after: //SNIPPET_START: mapply + :end-before: //SNIPPET_END: mapply + +Once this tactic runs, we are left with SMT goals to prove ``p``, which Z3 +discharges immediately. + +Note that ``mapply`` works with lemmas that ensure an implication, or +that have a precondition (``requires``/``ensures``), and even those that take a +squashed proof as argument. Internally, ``mapply`` is implemented via the +``apply_lemma`` and ``apply`` primitives, but ideally you should not need to +use them directly. + +Note, also, that the proofs of each part are completely isolated from +each other. It is also possible to prove the ``p_gives_s`` lemma by calling +the sublemmas directly, and/or adding SMT patterns. While that style of +proof works, it can quickly become unwieldy. + +Quotations +---------- + +In the last few examples, you might have noted the backticks, such as +in ``(`FStar.Math.Lemmas.pow2_le_compat)``. This is a *quotation*: it +represents the *syntax* for this lemma instead of the lemma itself. It +is called a quotation since the idea is analogous to the word "sun" +being syntax representing the sun. + +A quotation always has type ``term``, an abstract type representing +the AST of F*. + +Meta-F* also provides *antiquotations*, which are a convenient way of +modifying an existing term. For instance, if ``t`` is a term, we can write +```(1 + `#t)`` to form the syntax of "adding 1" to ``t``. The part inside +the antiquotation (```#``) can be anything of type ``term``. + +Many metaprogramming primitives, however, do take a ``term`` as an +argument to use it in proof, like ``apply_lemma`` does. In this case, +the primitives will typecheck the term in order to use it in proofs +(to make sure that the syntax actually corresponds to a meaningful +well-typed F* term), though other primitives, such as +``term_to_string``, won't typecheck anything. + +We will see ahead that quotations are just a convenient way of +constructing syntax, instead of doing it step by step via ``pack``. + +Basic logic +----------- + +Meta-F* provides some predefined tactics to handle "logical" goals. + +For instance, to prove an implication ``p ==> q``, we can "introduce" the +hypothesis via ``implies_intro`` to obtain instead a goal for ``q`` in a +context that assumes ``p``. + +For experts in Coq and other provers, this tactic is simply called +``intro`` and creates a lambda abstraction. In F* this is slightly +more contrived due to squashed types, hence the need for an +``implies_intro`` different from the ``intro``, explained ahead, that +introduces a binder. + +Other basic logical tactics include: + + - ``forall_intro``: for a goal ``forall x. p``, introduce a fresh ``x`` into + the context and present a goal for ``p``. + + - ``l_intros``: introduce both implications and foralls as much as + possible. + + - ``split``: split a conjunction (``p /\ q``) into two goals + + - ``left``/``right``: prove a disjunction ``p \/ q`` by proving ``p`` or ``q`` + + - ``assumption``: prove the goal from a hypothesis in the context. + + - ``pose_lemma``: given a term ``t`` representing a lemma call, add its + postcondition to the context. If the lemma has a precondition, it is + presented as a separate goal. + +See the `FStar.Tactics.Logic +`_ +module for more. + +Normalizing and unfolding +------------------------- + +We have previously seen ``compute()``, which blasts a goal with F*'s +normalizer to reduce it into a *normal form*. We sometimes need a +bit more control than that, and hence there are several tactics to +normalize goals in different ways. Most of them are implemented via a +few configurable primitives (you can look up their definitions in the +standard library) + + - ``compute()``: calls the normalizer with almost all steps enabled + + - ``simpl()``: simplifies logical operations (e.g. reduces ``p /\ True`` + to ``p``). + + - ``whnf()`` (short for "weak head normal form"): reduces the goal + until its "head" is evident. + + - ``unfold_def `t``: unfolds the definition of the name ``t`` in the goal, + fully normalizing its body. + + - ``trivial()``: if the goal is trivial after normalization and simplification, + solve it. + +The ``norm`` primitive provides fine-grained control. Its type is +``list norm_step -> Tac unit``. The full list of ``norm_step`` s can +be found in the ``FStar.Pervasives`` module, and it is the same one +available for the ``norm`` marker in ``Pervasives`` (beware of the +name clash between ``Tactics.norm`` and ``Pervasives.norm``!). + +Inspecting and building syntax +------------------------------ + +As part of automating proofs, we often need to inspect the syntax of +the goal and the hypotheses in the context to decide what to do. For +instance, instead of blindly trying to apply the ``split`` tactic (and +recovering if it fails), we could instead look at the *shape* of the +goal and apply ``split`` only if the goal has the shape ``p1 /\ p2``. + +Note: inspecting syntax is, perhaps obviously, not something we can +just do everywhere. If a function was allowed to inspect the syntax of +its argument, it could behave differently on ``1+2`` and ``3``, which +is bad, since ``1+2 == 3`` in F*, and functions are expected to map +equal arguments to the same result. So, for the most part, we cannot +simply turn a value of type ``a`` into its syntax. Hence, quotations +are *static*, they simply represent the syntax of a term and one +cannot turn values into terms. There is a more powerful mechanism of +*dynamic quotations* that will be explained later, but suffice it to +say for now that this can only be done in the ``Tac`` effect. + +As an example, the ``cur_goal()`` tactic will return a value of type +``typ`` (an alias for ``term`` indicating that the term is really the +representation of an F* type) representing the syntax of the current +goal. + +The ``term`` type is *abstract*: it has no observable structure +itself. Think of it as an opaque "box" containing a term inside. A +priori, all that can be done with a ``term`` is pass it to primitives +that expect one, such as ``tc`` to type-check it or ``norm_term`` to +normalize it. But none of those give us full, programatic access to +the structure of the term. + + +That's where the ``term_view`` comes in: following `a classic idea +introduced by Phil Wadler +`_, there is function +called ``inspect`` that turns a ``term`` into a ``term_view``. The +``term_view`` type resembles an AST, but crucially it is not +recursive: its subterms have type ``term`` rather than ``term_view``. + +.. code-block:: + :caption: Part of the ``term_view`` type. + + noeq + type term_view = + | Tv_FVar : v:fv -> term_view + | Tv_App : hd:term -> a:argv -> term_view + | Tv_Abs : bv:binder -> body:term -> term_view + | Tv_Arrow : bv:binder -> c:comp -> term_view + ... + +The ``inspect`` primitves "peels away" one level of the abstraction +layer, giving access to the top-level shape of the term. + +The ``Tv_FVar`` node above represents (an ocurrence of) a global name. +The ``fv`` type is also abstract, and can be viewed as a ``name`` (which +is just ``list string``) via ``inspect_fv``. + +For instance, if we were to inspect ```qr_s`` (which we used above) +we would obtain a ``Tv_FVar v``, where ``inspect_fv v`` is something +like ``["Path"; "To"; "Module"; "qr_s"]``, that is, an "exploded" +representation of the fully-qualified name ``Path.To.Module.qr-s``. + +Every syntactic construct (terms, free variables, bound variables, +binders, computation types, etc) is modeled abstractly like ``term`` and +``fv``, and has a corresponding inspection functions. A list can be found +in ``FStar.Reflection.Builtins``. + +If the inspected term is an application, ``inspect`` will return a +``Tv_App f a`` node. Here ``f`` is a ``term``, so if we want to know its +structure we must recursively call ``inspect`` on it. The ``a`` part +is an *argument*, consisting of a ``term`` and an argument qualifier +(``aqualv``). The qualifier specifies if the application is implicit or +explicit. + +Of course, in the case of a nested application such as ``f x y``, this +is nested as ``(f x) y``, so inspecting it would return a ``Tv_App`` +node containing ``f x`` and ``y`` (with a ``Q_Explicit`` qualifier). +There are some helper functions defined to make inspecting applications +easier, like ``collect_app``, which decompose a term into its "head" and +all of the arguments the head is applied to. + + +Now, knowing this, we would then like a function to check if the goal is +a conjunction. +Naively, we need to inspect the goal to check that it is of the shape +``squash ((/\) a1 a2)``, that is, an application with two arguments +where the head is the symbol for a conjunction, i.e. ``(/\)``. This can +already be done with the ``term_view``, but is quite inconvenient due to +there being *too much* information in it. + +Meta-F* therefore provides another type, ``formula``, to represent logical +formulas more directly. Hence it suffices for us to call ``term_as_formula`` +and match on the result, like so: + +.. literalinclude:: ../code/Part5.IsConj.fst + :language: fstar + :start-after: //SNIPPET_START: isconj_2 + :end-before: //SNIPPET_END: isconj_2 + +The ``term_as_formula`` function, and all others that work on syntax, +are defined in "userspace" (that is, as library tactics/metaprograms) by +using ``inspect``. + +.. code-block:: + :caption: Part of the ``formula`` type. + + noeq + type formula = + | True_ : formula + | False_ : formula + | And : term -> term -> formula + | Or : term -> term -> formula + | Not : term -> formula + | Implies: term -> term -> formula + | Forall : bv -> term -> formula + ... + +.. note:: + + For experts: F* terms are (internally) represented with a + locally-nameless representation, meaning that variables do not have + a name under binders, but a de Bruijn index instead. While this has + many advantages, it is likely to be counterproductive when doing + tactics and metaprogramming, hence ``inspect`` *opens* variables when + it traverses a binder, transforming the term into a fully-named + representation. This is why ``inspect`` is effectul: it requires + freshness to avoid name clashes. If you prefer to work with a + locally-nameless representation, and avoid the effect label, you can + use ``inspect_ln`` instead (which will return ``Tv_BVar`` nodes instead + of ``Tv_Var`` ones). + +Dually, a ``term_view`` can be transformed into a ``term`` via the +``pack`` primitive, in order to build the syntax of any term. However, +it is usually more comfortable to use antiquotations (see above) for +building terms. + +Usual gotchas +------------- + +- The ``smt`` tactic does *not* immediately call the SMT solver. It merely + places the current goal into the "SMT Goal" list, all of which are sent + to the solver when the tactic invocation finishes. If any of these fail, + there is currently no way to "try again". + +- If a tactic is natively compiled and loaded as a plugin, editing its + source file may not have any effect (it depends on the build + system). You should recompile the tactic, just delete its object + file, or use the F* option ``--no_plugins`` to run it via the + interpreter temporarily, + +- When proving a lemma, we cannot just use ``_ by ...`` since the expected + type is just ``unit``. Workaround: assert the postcondition again, + or start without any binder. + +Coming soon +----------- + +- Metaprogramming +- Meta arguments and typeclasses +- Plugins (efficient tactics and metaprograms, ``--codegen Plugin`` and ``--load``) +- Tweaking the SMT options +- Automated coercions inspect/pack +- ``e <: C by ...`` +- Tactics can be used as steps of calc proofs. +- Solving implicits (Steel) + +.. + need pack, pack_ln + _ by + pre/post process + %splice + everparse, printers + --tactic_trace diff --git a/doc/book/PoP-in-FStar/book/pulse/img/create.png b/doc/book/PoP-in-FStar/book/pulse/img/create.png new file mode 100644 index 00000000000..ec6557de07a Binary files /dev/null and b/doc/book/PoP-in-FStar/book/pulse/img/create.png differ diff --git a/doc/book/PoP-in-FStar/book/pulse/img/local-open.png b/doc/book/PoP-in-FStar/book/pulse/img/local-open.png new file mode 100644 index 00000000000..fc7d2b3b6d6 Binary files /dev/null and b/doc/book/PoP-in-FStar/book/pulse/img/local-open.png differ diff --git a/doc/book/PoP-in-FStar/book/pulse/img/starting.png b/doc/book/PoP-in-FStar/book/pulse/img/starting.png new file mode 100644 index 00000000000..d41c262a0a2 Binary files /dev/null and b/doc/book/PoP-in-FStar/book/pulse/img/starting.png differ diff --git a/doc/book/PoP-in-FStar/book/pulse/img/vscode.png b/doc/book/PoP-in-FStar/book/pulse/img/vscode.png new file mode 100644 index 00000000000..ba5f104d959 Binary files /dev/null and b/doc/book/PoP-in-FStar/book/pulse/img/vscode.png differ diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse.rst b/doc/book/PoP-in-FStar/book/pulse/pulse.rst new file mode 100644 index 00000000000..2b84a6f4047 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse.rst @@ -0,0 +1,95 @@ +.. _PartPulse: + +################################################################ +Pulse: Proof-oriented Programming in Concurrent Separation Logic +################################################################ + +Many F* projects involve building domain-specific languages with +specialized programming and proving support. For example, `Vale +`_ supports program proofs +for a structured assembly language; `Low* +`_ provides effectful +programming in F* with a C-like memory model; `EverParse +`_ is a DSL for writing +low-level parsers and serializers. Recently, F* has gained new +features for building DSLs embedded in F* with customized syntax, type +checker plugins, extraction support, etc., with *Pulse* as a showcase +example of such a DSL. + +Pulse is a new programming language embedded in F*, inheriting many of +its features (notably, it is higher order and has dependent types), +but with built-in support for programming with mutable state and +concurrency, with specifications and proofs in `Concurrent Separation +Logic `_. + +As a first taste of Pulse, here's a function to increment a mutable +integer reference. + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //incr + :end-before: //end incr + +And here's a function to increment two references in parallel. + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //par_incr + :end-before: //end par_incr + +You may not have heard about separation logic before---but perhaps +these specifications already make intuitive sense to you. The type of +``incr`` says that if "x points to 'i" initially, then when ``incr`` +returns, "x points to 'i + 1"; while ``par_incr`` increments the +contents of ``x`` and ``y`` in parallel by using the ``par`` +combinator. + +Concurrent separation logic is an active research area and there are many such +logics to use, all with different tradeoffs. The state of the art in concurrent +separation logic is `Iris `_, a higher-order, +impredicative separation logic. Drawing inspiration from Iris, Pulse's logic is +similar in many ways to Iris, but is based on a logic called PulseCore, +formalized entirely within F*---you can find the formalization `here +`_. Proofs of +programs in Pulse's surface language correspond to proofs of correctness in the +PulseCore program logic. But, you should not need to know much about how the +logic is formalized to use Pulse effectively. We'll start from the basics and +explain what you need to know about concurrent separation logic to start +programming and proving in Pulse. Additionally, Pulse is an extension of F*, so +all you've learned about F*, lemmas, dependent types, refinement types, etc. +will be of use again. + + +.. note:: + + Why is it called Pulse? Because it grew from a prior logic called + `Steel `_, and one of the + authors and his daughter are big fans of a classic reggae band + called `Steel Pulse `_. We wanted a name + that was softer than Steel, and, well, a bit playful. So, Pulse! + + + +.. .. image:: pulse_arch.png + + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + pulse_getting_started + pulse_ch1 + pulse_ch2 + pulse_existentials + pulse_user_defined_predicates + pulse_conditionals + pulse_loops + pulse_arrays + pulse_ghost + pulse_higher_order + pulse_implication_and_forall + pulse_linked_list + pulse_atomics_and_invariants + pulse_spin_lock + pulse_parallel_increment + pulse_extraction diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_arrays.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_arrays.rst new file mode 100644 index 00000000000..e0557f429f2 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_arrays.rst @@ -0,0 +1,212 @@ +.. _Pulse_Arrays: + +Mutable Arrays +=============== + +In this chapter, we will learn about mutable arrays in Pulse. An array +is a contiguous collection of values of the same type. Similar to ``ref``, +arrays in Pulse can be allocated in the stack frame of the current function +or in the heap---while the stack allocated arrays are reclaimed automatically +(e.g., when the function returns), heap allocated arrays are explicitly managed +by the programmer. + +Pulse provides two array types: ``Pulse.Lib.Array.array t`` as the basic array type +and ``Pulse.Lib.Vec.vec t`` for heap allocated arrays. To provide code reuse, functions +that may operate over both stack and heap allocated arrays can be written using +``Pulse.Lib.Array.array t``---the ``Pulse.Lib.Vec`` library provides back-and-forth coercions +between ``vec t`` and ``array t``. + +``array t`` +^^^^^^^^^^^^ + +We illustrate the basics of ``array t`` with the help of the following example +that reads an array: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //readi$ + :end-before: //end readi$ + +The library provides a points-to predicate ``pts_to arr #p s`` with +the interpretation that in the current memory, the contents of ``arr`` +are same as the (functional) sequence ``s:FStar.Seq.seq t``. Like the +``pts_to`` predicate on reference, it is also indexed by an implicit +fractional permission ``p``, which distinguished shared, read-only +access from exclusive read/write access. + +In the arguments of ``read_i``, the argument ```s`` is erased, since +it is for specification only. + +Arrays can be read and written-to using indexes of type +``FStar.SizeT.t``, a model of C ``size_t`` [#]_ in F*, provided that +the index is within the array bounds---the refinement ``SZ.v i < +Seq.length s`` enforces that the index is in bounds, where ``module SZ += FStar.SizeT``. The function returns the ``i``-th element of the +array, the asserted by the postcondition slprop ``pure (x == Seq.index +s (SZ.v i))``. The body of the function uses the array read operator +``arr.(i)``. + +As another example, let's write to the ``i``-th element of an array: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //writei$ + :end-before: //end writei$ + +The function uses the array write operator ``arr(i) <- x`` and the postcondition +asserts that in the state when the function returns, the contents of the array +are same as the sequence ``s`` updated at the index ``i``. + +While any permission suffices for reading, writing requires +``1.0R``. For example, implementing ``write_i`` without +``1.0R`` is rejected, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //writeipbegin$ + :end-before: //writeipend$ + +The library contains ``share`` and ``gather`` functions, similar to +those for references, to divide and combine permissions on arrays. + +We now look at a couple of examples that use arrays with conditionals, +loops, existentials, and invariants, using many of the Pulse +constructs we have seen so far. + +.. [#] ``size_t`` in C is an unsigned integer type that is at least + ``16`` bits wide. The upper bound of ``size_t`` is platform + dependent. ``FStar.SizeT.size_t`` models this type and is + extracted to the primitive ``size_t`` type in C, similar to the + other :ref:`bounded integer types ` discussed + previously. + +Compare +........ + +Let's implement a function that compares two arrays for equality: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //comparesigbegin$ + :end-before: //comparesigend$ + +The function takes two arrays ``a1`` and ``a2`` as input, and returns a boolean. +The postcondition ``pure (res <==> Seq.equal 's1 's2)`` +specifies that the boolean is true if and only if the sequence representations of the +two arrays are equal. Since the function only reads the arrays, it is parametric in the +permissions ``p1`` and ``p2`` on the two arrays. Note that the type parameter ``t`` has +type :ref:`eqtype`, requiring that values of type ``t`` support +decidable equality. + +One way to implement ``compare`` is to use a ``while`` loop, reading the two arrays +using a mutable counter and checking that the corresponding elements are equal. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //compareimplbegin$ + :end-before: //compareimplend$ + +The loop invariant states that (a) the arrays are pointwise equal up to the current value +of the counter, and (b) the boolean ``b`` is true if and only if the current value +of the counter is less than the length of the arrays and the arrays are equal at that index. +While (a) helps proving the final postcondition of ``compare``, (b) is required to maintain the +invariant after the counter is incremented in the loop body. + +Copy +..... + +As our next example, let's implement a ``copy`` function that copies the contents +of the array ``a2`` to ``a1``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //copy$ + :end-before: //end copy$ + +The loop invariant existentially abstracts over the contents of ``a1``, and maintains +that up to the current loop counter, the contents of the two arrays are equal. Rest of +the code is straightforward, the loop conditional checks that the loop counter is less +than the array lengths and the loop body copies one element at a time. + +The reader will notice that the postcondition of ``copy`` is a little convoluted. +A better signature would be the following, where we directly state that the +contents of ``a1`` are same as ``'s2``: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //copy2sigbegin$ + :end-before: //copy2sigend$ + +We can implement this signature, but it requires one step of rewriting at the end +after the ``while`` loop to get the postcondition in this exact shape: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //copy2rewriting$ + :end-before: //copy2rewritingend$ + +We could also rewrite the predicates explicitly, as we saw in a +:ref:`previous chapter `. + + +Stack allocated arrays +^^^^^^^^^^^^^^^^^^^^^^^ + +Stack arrays can be allocated using the expression ``[| v; n |]``. It +allocates an array of size ``n``, with all the array elements +initialized to ``v``. The size ``n`` must be compile-time constant. +It provides the postcondition that the newly create array points to a +length ``n`` sequence of ``v``. The following example allocates two +arrays on the stack and compares them using the ``compare`` function +above. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //compare_stack_arrays$ + :end-before: //end compare_stack_arrays$ + +As with the stack references, stack arrays don't need to be deallocated or +dropped, they are reclaimed automatically when the function returns. As a result, +returning them from the function is not allowed: + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //ret_stack_array$ + :end-before: //ret_stack_array_end$ + +Heap allocated arrays +^^^^^^^^^^^^^^^^^^^^^^ + +The library ``Pulse.Lib.Vec`` provides the type ``vec t``, for +heap-allocated arrays: ``vec`` is to ``array`` as ``box`` is to +``ref``. + +Similar to ``array``, ``vec`` is accompanied with a ``pts_to`` +assertion with support for fractional permissions, ``share`` and +``gather`` for dividing and combining permissions, and read and write +functions. However, unlike ``array``, the ``Vec`` library provides +allocation and free functions. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //heaparray$ + :end-before: //heaparrayend$ + +As with the heap references, heap allocated arrays can be coerced to ``array`` using the coercion +``vec_to_array``. To use the coercion, it is often required to convert ``Vec.pts_to`` to ``Array.pts_to`` +back-and-forth; the library provides ``to_array_pts_to`` and ``to_vec_pts_to`` lemmas for this purpose. + +The following example illustrates the pattern. It copies the contents of a stack array into a heap array, +using the ``copy2`` function we wrote above. + +.. literalinclude:: ../code/pulse/PulseTutorial.Array.fst + :language: pulse + :start-after: //copyuse$ + :end-before: //end copyuse$ + +Note how the assertion for ``v`` transforms from ``V.pts_to`` to ``pts_to`` (the points-to assertion +for arrays) and back. It means that array algorithms and routines can be implemented with the +``array t`` type, and then can be reused for both stack- and heap-allocated arrays. + +Finally, though the name ``vec a`` evokes the Rust ``std::Vec`` library, we don't yet support automatic +resizing. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_atomics_and_invariants.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_atomics_and_invariants.rst new file mode 100755 index 00000000000..a9c86cd4311 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_atomics_and_invariants.rst @@ -0,0 +1,538 @@ +.. _Pulse_atomics_and_invariants: + +Atomic Operations and Invariants +================================ + +In this section, we finally come to some concurrency related +constructs. + +Concurrency in Pulse is built around two concepts: + + * **Atomic operations**: operations that are guaranteed to be + executed in a single-step of computation without interruption by + other threads. + + * **Invariants**: named predicates that are enforced to be true at + all times. Atomic operations can make use of invariants, assuming + they are true in the current state, and enforced to be true again + once the atomic step concludes. + +Based on this, and in conjunction with all the other separation logic +constructs that we've learned about so far, notably the use of ghost +state, Pulse enables proofs of concurrent programs. + +Atomic Operations +................. + +We've learned so far about :ref:`two kinds of Pulse computations +`: + + * General purpose, partially correct computations, with the ``stt`` + computation type + + * Ghost computations, proven totally correct, and enforced to be + computationally irrelevant with the ``stt_ghost`` computation + type. + +Pulse offers a third kind of computation, *atomic* computations, with +the ``stt_atomic`` computation type. Here is the signature of +``read_atomic`` and ``write_atomic`` from ``Pulse.Lib.Reference``: + +.. code-block:: pulse + + atomic + fn read_atomic (r:ref U32.t) (#n:erased U32.t) (#p:perm) + requires pts_to r #p n + returns x:U32.t + ensures pts_to r #p n ** pure (reveal n == x) + +.. code-block:: pulse + + atomic + fn write_atomic (r:ref U32.t) (x:U32.t) (#n:erased U32.t) + requires pts_to r n + ensures pts_to r x + +The ``atomic`` annotation on these functions claims that reading and +writing 32-bit integers can be done in a single atomic step of +computation. + +This is an assumption about the target architecture on which a Pulse +program is executed. It may be that on some machines, 32-bit values +cannot be read or written atomically. So, when using atomic +operations, you should be careful to check that it is safe to assume +that these operations truly are atomic. + +Pulse also provides a way for you to declare that other operations are +atomic, e.g., maybe your machine supports 64-bit or 128-bit atomic +operations---you can program the semantics of these operations in F* +and add them to Pulse, marking them as atomic. + +Sometimes, particularly at higher order, you will see atomic +computations described by the computation type below: + +.. code-block:: fstar + + val stt_atomic (t:Type) (i:inames) (pre:slprop) (post:t -> slprop) + : Type u#4 + +Like ``stt_ghost``, atomic computations are total and live in universe +``u#4``. As such, you cannot store an atomic function in the state, +i.e., ``ref (unit -> stt_atomic t i p q)`` is not a well-formed type. + +Atomic computations and ghost computations are also indexed by +``i:inames``, where ``inames`` is a set of invariant names. We'll +learn about these next. + +Invariants +.......... + +In ``Pulse.Lib.Core``, we have the following types: + +.. code-block:: fstar + + [@@erasable] + val iref : Type0 + val inv (i:iref) (p:slprop) : slprop + +Think of ``inv i p`` as a predicate asserting that ``p`` is true in +the current state and all future states of the program. Every +invariant has a name, ``i:iref``, though, the name is only relevant in +specifications, i.e., it is erasable. + +A closely related type is ``iname``: + +.. code-block:: fstar + + val iname : eqtype + let inames = erased (FStar.Set.set iname) + +Every ``iref`` can be turned into an ``iname``, with the function +``iname_of (i:iref): GTot iname``. + +Invariants are duplicable, i.e., from ``inv i p`` one can prove ``inv +i p ** inv i p``, as shown by the type of ``Pulse.Lib.Core.dup_inv`` +below: + +.. code-block:: pulse + + ghost fn dup_inv (i:iref) (p:slprop) + requires inv i p + ensures inv i p ** inv i p + +Creating an invariant ++++++++++++++++++++++ + +Let's start by looking at how to create an invariant. + +First, let's define a predicate ``owns x``, to mean that we hold +full-permission on ``x``. + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: fstar + :start-after: //owns$ + :end-before: //end owns$ + + +Now, if we can currently prove ``pts_to r x`` then we can turn it into +an invariant ``inv i (owns r)``, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //create_invariant$ + :end-before: //end create_invariant$ + +Importantly, when we turn ``pts_to r x`` into ``inv i (owns r)``, **we +lose** ownership of ``pts_to r x``. Remember, once we have ``inv i +(owns r)``, Pulse's logic aims to prove that ``owns r`` remains true +always. If we were allowed to retain ``pts_to r x``, while also +creating an ``inv i (owns r)``, we can clearly break the invariant, +e.g., by freeing ``r``. + +.. note:: + + A tip: When using an ``inv i p``, it's a good idea to make sure + that ``p`` is a user-defined predicate. For example, one might + think to just write ``inv i (exists* v. pts_to x v)`` instead of + defining an auxiliary predicate for ``inv i (owns r)``. However, the + some of the proof obligations produced by the Pulse checker are + harder for the SMT solver to prove if you don't use the auxiliary + predicate and you may start to see odd failures. This is something + we're working to improve. In the meantime, use an auxiliary + predicate. + +Impredicativity and the ``later`` modality ++++++++++++++++++++++++++++++++++++++++++++ + +Pulse allows *any* predicate ``p:slprop`` to be turned into an invariant ``inv i +p : slprop``. Importantly, ``inv i p`` is itself an ``slprop``, so one can even +turn an invariant into another invariant, ``inv i (inv j p)``, etc. This ability +to turn any predicate into an invariant, including invariants themselves, makes +Pulse an *impredicative* separation logic. + +Impredicativity turns out to be useful for a number of reasons, e.g., one could +create a lock to protect access to a data structure that may itself contain +further locks. However, soundly implementing impredicativity in a separation +logic is challenging, since it involves resolving a kind of circularity in the +definitions of heaps and heap predicates. PulseCore resolves this circularity +using something called *indirection theory*, using it to provide a foundational +model for impredicative invariants, together with all the constructs of Pulse. +The details of this construction is out of scope here, but one doesn't really +need to know how the construction of the model works to use the resulting logic. + +We provide a bit of intuition about the model below, but for now, just keep in +mind that Pulse includes the following abstract predicates: + +.. code-block:: fstar + + val later (p:slprop) : slprop + val later_credit (i:nat) : slprop + +with the following forms to introduce and eliminate them: + +.. code-block:: pulse + + ghost fn later_intro (p: slprop) + requires p + ensures later p + + ghost fn later_elim (p: slprop) + requires later p ** later_credit 1 + ensures p + + fn later_credit_buy (amt:nat) + requires emp + ensures later_credit n + +Opening Invariants +++++++++++++++++++++++++++++++++++++++++++++++ + +Once we've allocated an invariant, ``inv i (owns r)``, what can we do with it? +As we said earlier, one can make use of the ``owns r`` in an atomic computation, +so long as we restore it at the end of the atomic step. + +The ``with_invariants`` construct gives us access to the invariant +within the scope of at most one atomic step, preceded or succeeded by +as many ghost or unobservable steps as needed. + +The general form of ``with_invariants`` is as follows, to "open" +invariants ``i_1`` to ``i_k`` in the scope of ``e``. + +.. code-block:: pulse + + with_invariants i_1 ... i_k + returns x:t + ensures post + { e } + +In many cases, the ``returns`` and ``ensures`` annotations are +omitted, since it can be inferred. + +This is syntactic sugar for the following nest: + +.. code-block:: pulse + + with_invariants i_1 { + ... + with_invariants i_k + returns x:t + ensures post + { e } + ... + } + +Here's the rule for opening a single invariant ``inv i p`` using +``with_invariant i { e }`` is as follows: + +* ``i`` must have type ``iref`` and ``inv i p`` must be provable in + the current context, for some ``p:slprop`` + +* ``e`` must have the type ``stt_atomic t j (later p ** r) (fun x -> later p ** + s x)``. [#]_ That is, ``e`` requires and restores ``later p``, while also + transforming ``r`` to ``s x``, all in at most one atomic step. Further, the + ``name_of_inv i`` must not be in the set ``j``. + +* ``with_invariants i { e }`` has type ``stt_atomic t (add_inv i j) + (inv i p ** r) (fun x -> inv i p ** s x)``. That is, ``e`` gets to + use ``p`` for a step, and from the caller's perspective, the context + was transformed from ``r`` to ``s``, while the use of ``p`` is + hidden. + +* Pay attention to the ``add_inv i j`` index on ``with_invariants``: + ``stt_atomic`` (or ``stt_ghost``) computation is indexed by + the names of all the invariants that it may open. + + +Let's look at a few examples to see how ``with_invariants`` works. + +.. [#] + + Alternatively ``e`` may have type ``stt_ghost t j (later p ** r) (fun x -> + later p ** s x)``, in which case the entire ``with_invariants i { e }`` + block has type ``stt_ghost t (add_inv i j) (inv i p ** r) (fun x -> inv i p + ** s x)``, i.e., one can open an invariant and use it in either an atomic or + ghost context. + + +Updating a reference +~~~~~~~~~~~~~~~~~~~~ + +Let's try do update a reference, given ``inv i (owns r)``. Our first attempt is +shown below: + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref_atomic0$ + :end-before: //end update_ref_atomic0$ + +We use ``with_invariants i { ... }`` to open the invariant, and in the scope of +the block, we have ``later (owns r)``. Now, we're stuck: we need ``later (owns +r)``, but we only have ``later (owns r)``. In order to eliminate the later, we +can use the ``later_elim`` combinator shown earlier, but to call it, we need to +also have a ``later_credit 1``. + +So, let's try again: + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref_atomic$ + :end-before: //end update_ref_atomic$ + +* The precondition of the function also includes a ``later_credit 1``. + +* At the start of the ``with_invariants`` scope, we have ``later (owns r)`` in + the context. + +* The ghost step ``later_elim _`` uses up the later credit and eliminates + ``later (owns r)`` into ``owns r``. + +* The ghost step ``unfold owns`` unfolds it to its definition. + +* Then, we do a single atomic action, ``write_atomic``. + +* And follow it up with a ``fold owns``, another ghost step. + +* To finish the block, we need to restore ``later (owns r)``, but we have ``owns + r``, so the ghost step ``later_intro`` does the job. + +* The block within ``with_invariants i`` has type ``stt_atomic unit + emp_inames (later (owns r) ** later_credit 1) (fun _ -> later (owns r) ** emp)`` + +* Since we opened the invariant ``i``, the type of ``update_ref_atomic`` records + this in the ``opens (singleton i)`` annotation; equivalently, the type is + ``stt_atomic unit (singleton i) (inv i (owns r) ** later_credit 1) (fun _ -> + inv i (owns r))``. When the ``opens`` annotation is omitted, it defaults to + ``emp_inames``, the empty set of invariant names. + +Finally, to call ``update_ref_atomic``, we need to buy a later credit first. +This is easily done before we call the atomic computation, as shown below: + + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref$ + :end-before: //end update_ref$ + +The later modality and later credits +++++++++++++++++++++++++++++++++++++ + +Having seen an example with later modality at work, we provide a bit of +intuition for the underlying model. + +The semantics of PulseCore is defined with respect to memory with an abstract +notion of a "ticker", a natural number counter, initialized at the start of a +program's execution. In other logics, this is sometimes called a "step index", +but in PulseCore, the ticker is unrelated to the number of actual steps a +computation takes. Instead, at specific points in the program, the programmer +can issue a specific *ghost* instruction to "tick" the ticker, decreasing its +value by one unit. The decreasing counter provides a way to define an +approximate fixed point between the otherwise-circular heaps and heap +predicates. The logic is defined in such a way that it is always possible to +pick a high enough initial value for the ticker so that any finite number of +programs steps can be executed before the ticker is exhausted. + +Now, rather than explicitly working with the ticker, PulseCore encapsulates all +reasoning about the ticker using two logical constructs: the *later* modality +and *later credits*, features found in Iris and other separation logics that +feature impredicativity. + +The Later Modality and Later Credits +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The predicate ``later p`` states that the ``p:slprop`` is true after one tick. + +.. code-block:: pulse + + val later (p: slprop) : slprop + +All predicates ``p:slprop`` are "hereditary", meaning that if they are true in a +given memory, then they are also true after that memory is ticked. The ghost +function ``later_intro`` embodies this principle: from ``p`` one can prove +``later p``. + +.. code-block:: pulse + + ghost fn later_intro (p: slprop) + requires p + ensures later p + +Given a ``later p``, one can prove ``p`` by using ``later_elim``. This ghost +function effectively "ticks" the memory (since ``later p`` says that ``p`` is +true after a tick), but in order to do so, it needs a precondition that the +ticker has not already reached zero: ``later_credit 1`` says just that, i.e., +that the memory can be ticked at least once. + +.. code-block:: pulse + + ghost fn later_elim (p: slprop) + requires later p ** later_credit 1 + ensures p + +The only way to get a ``later_credit 1`` is to *buy* a credit with the operation +below---this is a concrete operation that ensures that the memory can be ticked +at least ``n`` times. + +.. code-block:: pulse + + fn later_credit_buy (amt:nat) + requires emp + ensures later_credit n + +At an abstract level, if the ticker cannot be ticked further, the program loops +indefinitely---programs that use later credits (and more generally in step +indexed logics) are inherently proven only partially correct and are allowed to +loop infinitely. At a meta-level, we show that one can always set the initial +ticker value high enough that ``later_credit_buy`` will never actually loop +indefinitely. In fact, when compiling a program, Pulse extracts +``later_credit_buy n`` to a noop ``()``. + +Note, later credits can also be split and combined additively: + +.. code-block:: fstar + + val later_credit_zero () + : Lemma (later_credit 0 == emp) + + val later_credit_add (a b: nat) + : Lemma (later_credit (a + b) == later_credit a ** later_credit b) + +Timeless Predicates +~~~~~~~~~~~~~~~~~~~ + +All predicates ``p:slprop`` are hereditary, meaning that ``p`` implies ``later +p``. Some predicates, including many common predicates like ``pts_to`` are also +**timeless**, meaning that ``later p`` implies ``p``. Combining timeless +predicates with ``**`` or exisentially quantifying over timeless predicates +yields a timeless predicate. + +All of the following are available in Pulse.Lib.Core: + +.. code-block:: fstar + + val timeless (p: slprop) : prop + let timeless_slprop = v:slprop { timeless v } + val timeless_emp : squash (timeless emp) + val timeless_pure (p:prop) : Lemma (timeless (pure p)) + val timeless_star (p q : slprop) : Lemma + (requires timeless p /\ timeless q) + (ensures timeless (p ** q)) + val timeless_exists (#a:Type u#a) (p: a -> slprop) : Lemma + (requires forall x. timeless (p x)) + (ensures timeless (op_exists_Star p)) + +And in Pulse.Lib.Reference, we have: + +.. code-block:: fstar + + val pts_to_timeless (#a:Type) (r:ref a) (p:perm) (x:a) + : Lemma (timeless (pts_to r #p x)) + [SMTPat (timeless (pts_to r #p x))] + +For timeless predicates, the ``later`` modality can be eliminated trivially +without requiring a credit. + +.. code-block:: pulse + + ghost fn later_elim_timeless (p: timeless_slprop) + requires later p + ensures p + +Updating a reference, with timeless predicates +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Since ``pts_to`` is timeless, we can actually eliminate ``later (owns r)`` +without a later credit, as shown below. + +First, we prove that ``owns`` is timeless: + + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //owns_timeless$ + :end-before: //end owns_timeless$ + +.. note:: + + It's usually easier to prove a predicate timeless by just annotating its + definition, rather than writing an explicit lemma. For example, + this would have worked: + + .. code-block:: fstar + + let owns (x:ref U32.t) : timeless_slprop = exists* v. pts_to x v + +Next, we can revise ``update_ref_atomic`` to use ``later_elim_timeless``, rather +than requiring a later credit. + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref_atomic_alt$ + :end-before: //end update_ref_atomic_alt$ + + +Double opening is unsound +++++++++++++++++++++++++++ + +To see why we have to track the names of the opened invariants, +consider the example below. If we opened the same invariant twice +within the same scope, then it's easy to prove ``False``: + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //double_open_bad$ + :end-before: //end double_open_bad$ + +Here, we open the invariants ``i`` twice and get ``owns r ** owns r``, +or more than full permission to ``r``---from this, it is easy to build +a contradiction. + + +Subsuming atomic computations +++++++++++++++++++++++++++++++ + +Atomic computations can be silently converted to regular, ``stt`` +computations, while forgetting which invariants they opened. For +example, ``update_ref`` below is not marked atomic, so its type +doesn't record which invariants were opened internally. + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref$ + :end-before: //end update_ref$ + +This is okay, since a non-atomic computation can never appear within a +``with_invariants`` block---so, there's no fear of an ``stt`` +computation causing an unsound double opening. Attempting to use a +non-atomic computation in a ``with_invariants`` block produces an +error, as shown below. + + +.. literalinclude:: ../code/pulse/PulseTutorial.AtomicsAndInvariants.fst + :language: pulse + :start-after: //update_ref_fail$ + :end-before: //end update_ref_fail$ + +.. code-block:: + + - This computation is not atomic nor ghost. `with_invariants` + blocks can only contain atomic computations. \ No newline at end of file diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_ch1.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_ch1.rst new file mode 100644 index 00000000000..c6d1aef922e --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_ch1.rst @@ -0,0 +1,223 @@ +.. _Pulse_Basics: + +Pulse Basics +============ + +A Pulse program is embedded in an F* program by including the +directive ``#lang-pulse`` in an F* file. The rest of the file can +then use a mixture of Pulse and F* syntax, as shown below. + +.. literalinclude:: ../code/pulse/PulseByExample.fst + :language: pulse + :start-after: //SNIPPET_START: five + :end-before: //SNIPPET_END + +This program starts with a bit of regular F* defining ``fstar_five`` +followed by an a Pulse function ``five`` that references that F* +definition and proves that it always returns the constant +``5``. Finally, we have a bit of regular F* referencing the ``five`` +defined in Pulse. This is a really simple program, but it already +illustrates how Pulse and F* interact in both directions. + +In what follows, unless we really want to emphasize that a fragment of code is +Pulse embedded in a larger F* context, we just assume that we're working in a +context where ``#lang-pulse`` is enabled. + +A Separation Logic Primer +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Separation Logic was invented by John Reynolds, Peter O'Hearn, and +others in the late 1990s as a way to reason about imperative programs +that use shared, mutable data structures, e.g., linked lists and +graphs---see this paper for `an introduction to separation logic +`_. In the subsequent +decades, several innovations were added to separation logic by many +people, generalizing it beyond just sequential heap-manipulating +programs to distributed programs, concurrent programs, asynchronous +programs, etc. that manipulate abstract resources of various kinds, +including time and space, messages sent over communication channels, +etc. + +Much like other Hoare Logics, which we reviewed in :ref:`an earlier +section `, separation logic comes in two parts. + +**Separation Logic Propositions** First, we have a language of propositions that +describe properties about program resources, e.g., the heap. These propositions +have the type ``slprop`` in Pulse, and, under the covers in the PulselCore +semantics of Pulse, a ``slprop = state -> prop``, where ``state`` represents the +state of a program, e.g., the contents of memory. It is useful (at least at +first) to think of a ``slprop`` as a memory property, though we will eventually +treat it more abstractly and use it to model many other kinds of resources. + +.. I'm calling it a hmem to not confuse things with heap vs stack + later. + +**Separation Logic Hoare Triples** To connect ``slprop``'s to programs, +separation logics use Hoare triples to describe the action of a +program on its state. For example, the Hoare triple ``{ p } c { n. q +}`` describes a program ``c`` which when run in an initial state +``s0`` satisfying ``p s0`` (i.e., ``p`` is a precondition); ``c`` +returns a value ``n`` while transforming the state to ``s1`` +satisfying ``q n s1`` (i.e., ``q`` is a postcondition). Pulse's +program logic is a partial-correctness logic, meaning that ``c`` may +also loop forever, deadlock with other threads, etc. + +**Some simple slprops and triples**: Here are two of the simplest + ``slprops`` (defined in ``Pulse.Lib.Pervasives``): + + * ``emp``, the trivial proposition (equivalent to ``fun s -> True``). + + * ``pure p``, heap-independent predicate ``fun s -> p``. ``emp`` is + equivalent to ``pure True``. + +The type of the program ``five`` illustrates how these ``slprop``'s are +used in program specifications: + + * It is a function with a single unit argument---Pulse functions use + the keyword ``fn``. + + * The precondition is just ``emp``, the trivial assertion in + separation logic, i.e., ``five`` can be called in any initial + state. + + * The return value is an integer ``n:int`` + + * The postcondition may refer to the name of the return value (``n`` + in this case) and here claims that the final state satisfies the + ``pure`` proposition, ``n == 5``. + +In other words, the type signature in Pulse is a convenient way to +write the Hoare triple ``{ emp } five () { n:int. pure (n == 5) }``. + +**Ownership** At this point you may wonder if the postcondition of +``five`` is actually strong enough. We've only said that the return +value ``n == 5`` but have not said anything about the state that +results from calling ``five ()``. Perhaps this specification allows +``five`` to arbitrarily change any memory location in the state, since ``pure +(5 == 5)`` is true of any state. [#]_ If you're familiar with Low*, +Dafny, or other languages based on Hoare logic for heaps, you may be +wondering about how come we haven't specified a ``modifies``-clause, +describing exactly which part of the state a function may have +changed. The nice thing in separation logic is that there is no need +to describe what parts of the state you may have modified. This is +because a central idea in logic is the concept of *ownership*. To a +first approximation, a computation can only access those resources +that it is explicitly granted access to in its precondition or those +that it creates itself. [#]_ In this case, with a precondition of +``emp``, the function ``five`` does not have permission to access +*any* resources, and so ``five`` simply cannot modify the state in any +observable way. + + +**Separating Conjunction and the Frame Rule** Let's go back to +``incr`` and ``par_incr`` that we saw in the previous section and look +at their types closely. We'll need to introduce two more common +``slprop``'s, starting with the "points-to" predicate: + + * ``pts_to x v`` asserts that the reference ``x`` points to a cell + in the current state that holds the value ``v``. + +``slprop``'s can also be combined in various ways, the most common one +being the "separating conjunction", written ``**`` in Pulse. [#]_ + + * ``p ** q``, means that the state can be split into two *disjoint* + fragments satisfying ``p`` and ``q``, respectively. Alternatively, + one could read ``p ** q`` as meaning that one holds the + permissions associated with both ``p`` and ``q`` separately in a + given state. The ``**`` operator satisfies the following laws: + + - Commutativity: ``p ** q`` is equivalent to ``q ** p`` + + - Associativity: ``p ** (q ** r)`` is equivalent to ``(p ** q) ** r`` + + - Left and right unit: ``p ** emp`` is equivalent to ``p``. Since + ``**`` is commutative, this also means that ``emp ** p`` is + equivalent to ``p`` + +Now, perhaps the defining characteristic of separation logic is how +the ``**`` operator works in the program logic, via a key rule known +as the *frame* rule. The rule says that if you can prove the Hoare +triple ``{ p } c { n. q }``, then, for any other ``f : slprop``, you +can also prove ``{ p ** f } c { n. q ** f }``---``f`` is often called +the "frame". It might take some time to appreciate, but the frame rule +captures the essence of local, modular reasoning. Roughly, it states +that if a program is correct when it only has permission ``p`` on the +input state, then it remains correct when run in a larger state and is +guaranteed to preserve any property (``f``) on the part of the state +that it doesn't touch. + +With this in mind, let's look again at the type of ``incr``, which +requires permission only to ``x`` and increments it: + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //incr$ + :end-before: //end incr$ + +Because of the frame rule, we can also call ``incr`` in a context like +``incr_frame`` below, and we can prove without any additional work +that ``y`` is unchanged. + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //incr_frame$ + :end-before: //end incr_frame$ + +In fact, Pulse lets us use the frame rule with any ``f:slprop``, and we +get, for free, that ``incr x`` does not disturb ``f``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //incr_frame_any$ + :end-before: //end incr_frame_any$ + +A point about the notation: The variable ``'i`` is an implicitly bound +logical variable, representing the value held in the ref-cell ``x`` in +the initial state. In this case, ``'i`` has type ``FStar.Ghost.erased +int``---we learned about :ref:`erased types in a previous section +`. One can also bind logical variables explicitly, e.g., +this is equivalent: + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //incr_explicit_i$ + :end-before: //end incr_explicit_i$ + +**Other slprop connectives** In addition the separating conjunction, +Pulse, like other separation logics, provides other ways to combine +``slprops``. We'll look at these in detail in the subsequent chapters, +but we list the most common other connectives below just to give you a +taste of the logic. + + * ``exists* (x1:t1) ... (xn:tn). p``: Existential quantification is + used extensively in the Pulse libraries, and the language provides + many tools to make existentials convenient to use. ``exists x. p`` + is valid in a state ``s`` if there is a witness ``w`` such that + ``p [w/x]`` is valid in ``s``. For experts, existential + quantification is impredicative, in the sense that one can quantify + over ``slprops`` themselves, i.e., ``exists* (p:slprop). q`` is + allowed. + + * ``forall* (x1:t1) ... (xn:tn). p``: Universal quantification is + also supported, though less commonly used. ``forall (x:t). p`` is + valid in ``s`` if ``p[w/x]`` is valid for all values ``w:t``. + Like existential quantification, it is also impredicative. + + * ``p @==> q`` is a form of separating implication similar to an + operator called a *magic wand* or a *view shift* in other + separation logics. + +Pulse does not yet provide libraries for conjunction or +disjunction. However, since Pulse is embedded in F*, new slprops can +also be defined by the user and it is common to do so, e.g., +recursively defined predicates, or variants of the connectives +described above. + +.. [#] For experts, Pulse's separation logic is *affine*. + +.. [#] When we get to things like invariants and locks, we'll see how + permissions can be acquired by other means. + +.. [#] In the separation logic literature, separating conjunction is + written ``p * q``, with just a single star. We use two stars + ``**`` to avoid a clash with multiplication. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_ch2.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_ch2.rst new file mode 100644 index 00000000000..886d81bd9d5 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_ch2.rst @@ -0,0 +1,414 @@ +.. _Pulse_References: + +Mutable References +================== + +Pulse aims to support programming with explicit control over memory +management and without need for a garbage collector, similar to +languages like C or Rust, but, of course, in a proof-oriented style. +Towards that end, one of the main features it offers (especially in +comparison to purely functional F*) is support for references to +mutable memory that can be both allocated and reclaimed. + +In this chapter, we'll learn about three kinds of mutable references: +stack references, heap references (or boxes), and ghost +references. Stack references point to memory allocated in the stack +frame of the current function (in which case the memory is reclaimed +when the function returns). Heap references, or boxes, point to memory +locations in the heap, and heap memory is explicitly reclaimed by +calling ``drop`` or ``free``. Ghost references are for specification +and proof purposes only and point to memory locations that do not +really exist at runtime. + +``ref t``: Stack or Heap References +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Most of the operations on mutable references are agnostic to whether +the memory referenced resides on the stack or the heap---the main +difference is that stack references are allocated in a scope and +implicitly reclaimed when they go out of scope; whereas heap +references are explicitly allocated and deallocated. + +The type ``Pulse.Lib.Reference.ref t`` is the basic type of a mutable +reference. We have already seen ``ref t`` used in the ``incr`` +function of the previous section. We show below another common +function to swap the contents of two references: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //swap$ + :end-before: //end swap$ + +Reading a reference +................... + +Let's start by taking a closer look at how dereferencing works in the +function ``value_of`` below: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //value_of$ + :end-before: //end value_of$ + +Its slightly more explicit form is shown below, where ``w:erased a`` +is an erased value witnessing the current contents referenced by +``r``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //value_of_explicit$ + :end-before: //end value_of_explicit$ + +Notice how the precondition requires ``pts_to r w`` while the +postcondition retains ``pts_to r w``, along with the property that ``v +== reveal w``, i.e., the type proves that if we read the reference the +value we get is equal to the logical witness provided. + + +Erased values are for specification and proof only +.................................................. + +The logical witness is an erased value, so one cannot directly use it +in a non-ghost computation. For example, if instead of reading the +reference, we attempt to just return ``reveal w``, the +code fails to check with the error shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //value_of_explicit_fail$ + :end-before: //end value_of_explicit_fail$ + +.. code-block:: + + Expected a Total computation, but got Ghost + +Writing through a reference +........................... + +The function ``assign`` below shows how to mutate the contents of a +reference---the specification shows that when the function returns, +``r`` points to the assigned value ``v``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //assign$ + :end-before: //end assign$ + + +Dereferencing is explicit +......................... + +Unlike languages like C or Rust which make a distinction between +l-values and r-values and implicitly read the content of references, +in Pulse (like in OCaml), references are explicitly dereferenced. +As the program below illustrates, references themselves can be passed +to other functions (e.g., as in/out-parameters) while their current +values must be passed explicitly. + +The function ``add`` takes both a reference ``r:ref int`` and a value +``n:int`` as arguments: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //add$ + :end-before: //end add$ + +Meanwhile, the function ``quadruple`` calls ``add`` twice to double +the value stored in ``r`` each time. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //quadruple$ + :end-before: //end quadruple$ + +Inspecting the proof state +.......................... + +A Pulse program is checked one stateful operation at a time, "pushing +through" the ``slprop`` assertions starting with the precondition, +until the end of function's body. The inferred ``slprop`` at the exit +of a function must match the annotated postcondition. Along the way, +the Pulse checker will make several calls to the SMT solver to prove +that, say, ``pts_to x (v + v)`` is equal to ``pts_to x (2 * v)``. + +At each point in the program, the Pulse checker maintains a proof +state, which has two components: + + * A typing environment, binding variables in scope to their types, + including some refinement types that reflect properties about + those variables in scope, e.g., ``x:int; y:erased int; _:squash (x == reveal y)``. + + * A separation logic context, called just "the context", or + sometimes "the ``slprop`` context". The context contains all known + facts about the current state of the program. + +Pulse provides a command called ``show_proof_state`` that allows the +user to inspect the proof state at a particular program point, +aborting the Pulse checker at that point. It's quite common when +developing a Pulse program to repeatedly inspect the proof state and +to advance it by a single or just a few steps at a time. This makes +the experience of developing a Pulse program quite interactive, +similar perhaps to writing tactics in F* or other languages. Except, +in Pulse, one incrementally writes an imperative program together with +its proof of correctness. + +Here below is the ``quadruple`` program again, with the proof states +annotated at each point, and a ``show_proof_state`` command in the +middle. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //quadruple_show_proof_state$ + :end-before: //end quadruple_show_proof_state$ + +The output from ``show_proof_state`` is shown below: + +.. code-block:: pulse + + - Current context: + pts_to r (reveal (hide v1) + v1) ** + emp + - In typing environment: + [_#5 : unit, + _#4 : squash (reveal 'v == v1), + v1#3 : int, + 'v#2 : erased int, + r#1 : ref int] + +The comments show how the proof state evolves after each command. + + * Pulse typechecks each step of a program by checking the current + assumptions in the proof state are sufficient to prove the + precondition of that step, ensuring that all unused permissions + are retained in the context---using the frame rule, discussed in + the previous section. Given a context that is equivalent to ``p ** + q``, if ``p`` is sufficient to prove ``goal``, then ``p`` is + called *the support* for ``goal``, while ``q`` is the *frame*. + + * Like F*, Pulse tries to instantiate implicit arguments + automatically, e.g., at the second call to ``add``, Pulse + automatically instantiates ``'v`` to ``v2``. + + * Pulse automatically moves any ``pure p`` property in the ``slprop`` + context to a ``squash p`` hypothesis in the typing + environment. Pulse also proves ``pure`` properties automatically, + by sending queries to the SMT solver, which can make use of the + hypothesis in the typing environment only. + + * Pulse also uses the SMT solver to convert ``pts_to r (v2 + v2)`` + to ``pts_to r (4 * 'v)``. + +Fractional Permissions +...................... + +Pulse distinguishes read-only references from read/write +references. As in languages like Rust, Pulse ensures that there can be +at most one thread that holds read/write permission to a reference, +although many threads can share read-only references. This ensures +that Pulse programs are free of data races. At a more abstract level, +Pulse's permission system ensures that one can reason locally about +the contents of memory, since if one holds read/write permission to a +reference, one can be sure that its contents cannot be changed by some +part of the program. + +To implement this permission discipline, Pulse uses a system of +fractional permissions, an idea due to `John Boyland +`_. In +particular, the ``pts_to`` predicate that we have been using actually +has an additional implicit arguments that describes how much +permission one holds on a reference. + +The full type of the ``pts_to`` predicate is shown below: + +.. code-block:: fstar + + val pts_to (#a:Type u#0) (r:ref a) (#p:perm) (v:a) : slprop + +We have so far been writing ``pts_to r v`` instead of ``pts_to #a r #p +v``. Usually, one does not need to write the first argument ``#a`` +since it is computed by type inference; the ``#p:perm`` argument is +more interesting---when omitted, it defaults to the value +``1.0R``. The type ``perm`` (defined in +``PulseCore.FractionalPermission``) is a real number strictly greater than +``0.0R`` and less than or equal to ``1.0R``. + +The ``pts_to r #1.0R v`` represents exclusive, read/write +permission on a reference. Revisiting the ``assign`` function from +previously, we can write down the permissions explicitly. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //assign_1.0R$ + :end-before: //end assign_1.0R$ + +In contrast, when reading a reference, any permission ``p`` will do, +as shown below: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //value_of_perm$ + :end-before: //end value_of_perm$ + +If we try to write to a reference without holding full permission on +it, Pulse rejects the program, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //assign_perm FAIL$ + :end-before: //end assign_perm FAIL$ + +.. code-block:: fstar + + - Cannot prove: + pts_to #a r #1.0R (reveal #a _) + - In the context: + pts_to #a r #p (reveal #a w) + +The full error message requires the F* option ``--print_implicits``. + +The functions ``share`` and ``gather`` allow one to divide and combine +permissions on references, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //share_ref$ + :end-before: //end share_ref$ + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //gather_ref$ + :end-before: //end gather_ref$ + +The type of ``gather_ref`` has an additional interesting element: its +postcondition proves that ``'v0 == 'v1``. That is, since ``x`` can +point to at most one value, given two separate points-to assertions +about ``x``, allows one to conclude that the pointed-to witnesses are +identical. + +Stack references +^^^^^^^^^^^^^^^^ + +``let mut`` creates a new stack ref +................................... + +To create a new ``ref t``, one uses the ``let mut`` construct of +Pulse, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //one + :end-before: //end one + +The body of the program is annotated to show program assertions that +are true after each command. + + * Initially, only the precondition ``emp`` is valid. + + * After ``let mut i = 0``, we have ``i : ref int`` and ``pts_to i + 0``, meaning that ``i`` points to a stack slot that holds the + value ``0``. + + * After calling ``incr i``, we have ``pts_to i (0 + 1)`` + + * Finally, we dereference ``i`` using ``!i`` and return ``v:int`` + the current value of ``i``. + + * At the point where the scope of a ``let mut x`` ends, the Pulse + checker requires that the context contains ``pts_to x #1.0R + _v`` for some value ``_v``. This ensures that the code cannot + squirrel away a permission to the soon-to-be out-of-scope + reference in some other permission. Once the scope ends, and the + memory it points to is reclaimed, and the ``pts_to x #1.0R + _v`` is consumed. + + + +A few additional points to note here: + + * Pulse proves ``pure`` properties automatically, by sending queries + to the SMT solver. + + * Pulse simplifies ``slprop`` implicitly, e.g., Pulse will + automatically rewrite ``emp ** p`` to ``p``. + + * Like F*, Pulse tries to instantiate implicit arguments + automatically, e.g., at the call to ``incr``, Pulse automatically + instantiates ``'v`` to ``0`` (actually, to ``hide 0``). + +Stack references are scoped and implicitly reclaimed +.................................................... + +To emphasize that stack references allocated with ``let mut`` are +scoped, let's look at the program below that Pulse refuses to check: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //refs_as_scoped FAIL + :end-before: //end refs_as_scoped FAIL + +The error points to the location of ``s`` with the message below, +meaning that the current assertion on the heap is only ``emp``, while +the goal to be proven for the postcondition is ``pts_to s 0``. In +other words, we no longer have ownership on ``s`` once it goes out of +scope. + +.. code-block:: fstar + + - Cannot prove: + pts_to s 0 + - In the context: + emp + + +Heap references +^^^^^^^^^^^^^^^ + +The type ``Pulse.Lib.Box.box t`` is the type of heap references---the +name is meant to evoke Rust's type of heap references, ``Box``. We +use the module alias ``Box`` in what follows: + +.. code-block:: fstar + + module Box = Pulse.Lib.Box + +The ``Box`` module provides most of the same predicates and functions +that we have with regular references, including ``pts_to``, ``(!)``, +``(:=)``, ``share``, and ``gather``. Additionally, heap references are +explicitly allocated using ``alloc`` and deallocated using ``free``, +as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Box.fst + :language: pulse + :start-after: //new_heap_ref$ + :end-before: //end new_heap_ref$ + +Note, we can return a freshly allocated heap reference from a +function, unlike a ``let mut`` scoped, stack-allocated reference. + +In the following example, we use ``open Box;`` to open the namespace +``Box`` in the following scope. + +.. literalinclude:: ../code/pulse/PulseTutorial.Box.fst + :language: pulse + :start-after: //last_value_of$ + :end-before: //end last_value_of$ + +``box t`` references can be demoted to regular ``ref t`` references +for code reuse. For example, in the code below, we increment the +contents of ``r:box int`` by first calling ``Box.to_ref_pts_to`` to +convert ``Box.pts_to r 'v`` to a regular ``pts_to (box_to_ref r) 'v``; +then calling ``incr (box_to_ref r)``; and then converting back to a +``Box.pts_to``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Box.fst + :language: pulse + :start-after: //incr_box$ + :end-before: //end incr_box$ + +Finally, unlike Rust's ``Box`` type, which is always treated +linearly (i.e., in Rust, one always holds exclusive read/write +permission on a ` ``Box``), in Pulse, ``Box.pts_to r #p v`` has an +implicit fractional permission as with regular references. + +Ghost references +^^^^^^^^^^^^^^^^ diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_conditionals.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_conditionals.rst new file mode 100644 index 00000000000..6d8a6acf96b --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_conditionals.rst @@ -0,0 +1,230 @@ +.. _Pulse_Conditionals: + +Conditionals +============ + +To start writing interesting programs, we need a few control +constructs. In this chapter, we'll write some programs with branches +of two kinds: ``if/else`` and ``match``. + + +A Simple Branching Program: Max +............................... + +Here's a simple program that returns the maximum value stored in two +references. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //max$ + :end-before: //end max$ + +This program illustrates a very common specification style. + + * We have a pure, F* function ``max_spec`` + + * And a Pulse function working on mutable references, with a + specification that relates it to the pure F* spec. In this case, + we prove that ``max`` behaves like ``max_spec`` on the logical + values that witness the contents of the two references. + +The implementation of ``max`` uses a Pulse conditional statement. Its +syntax is different from the F* ``if-then-else`` expression: Pulse +uses a more imperative syntax with curly braces, which should be +familiar from languages like C. + +Limitation: Non-tail Conditionals ++++++++++++++++++++++++++++++++++ + +Pulse's inference machinery does not yet support conditionals that +appear in non-tail position. For example, this variant of ``max`` +fails, with the error message shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //max_alt_fail$ + :end-before: //end max_alt_fail$ + +.. code-block:: + + Pulse cannot yet infer a postcondition for a non-tail conditional statement; + Either annotate this `if` with `returns` clause; or rewrite your code to use a tail conditional + +Here's an annotated version of ``max_alt`` that succeeds. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //max_alt$ + :end-before: //end max_alt$ + +We are working on adding inference for non-tail conditionals. + +Pattern matching with nullable references +......................................... + +To illustrate the use of pattern matching, consider the following +representation of a possibly null reference. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: fstar + :start-after: //nullable_ref$ + :end-before: //end nullable_ref$ + +Representation predicate +++++++++++++++++++++++++ + +We can represent a nullable ref as just an ``option (ref a)`` coupled +with a representation predicate, ``pts_to_or_null``. A few points to +note: + + * The notation ``(#[default_arg (\`1.0R)] p:perm)`` is F* + syntax for an implicit argument which when omitted defaults to + ``1.0R``---this is exactly how predicates like + ``Pulse.Lib.Reference.pts_to`` are defined. + + * The definition is by cases: if the reference ``x`` is ``None``, + then the logical witness is ``None`` too. + + * Otherwise, the underlying reference points to some value ``w`` and + the logical witness ``v == Some w`` agrees with that value. + +Note, one might consider defining it this way: + +.. code-block:: fstar + + let pts_to_or_null #a + (x:nullable_ref a) + (#[default_arg (`1.0R)] p:perm) + (v:option a) + : slprop + = match x with + | None -> pure (v == None) + | Some x -> pure (Some? v) ** pts_to x #p (Some?.v v) + +However, unlike F*'s conjunction ``p /\ q`` where the well-typedness +of ``q`` can rely on ``p``, the ``**`` operator is not left-biased; so +``(Some?.v v)`` cannot be proven in this context and the definition is +rejected. + +Another style might be as follows: + +.. code-block:: fstar + + let pts_to_or_null #a + (x:nullable_ref a) + (#[default_arg (`1.0R)] p:perm) + (v:option a) + : slprop + = match x, v with + | None, None -> emp + | Some x, Some w -> pts_to x #p w + | _ -> pure False + +This could also work, though it would require handling an additional +(impossible) case. + +Reading a nullable ref +++++++++++++++++++++++ + +Let's try our first pattern match in Pulse: + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //read_nullable$ + :end-before: //end read_nullable$ + +The syntax of pattern matching in Pulse is more imperative and +Rust-like than what F* uses. + + * The entire body of match is enclosed within braces + + * Each branch is also enclosed within braces. + + * Pulse (for now) only supports simple patterns with a single top-level + constructor applied to variables, or variable patterns: e.g., you + cannot write ``Some (Some x)`` as a pattern. + +The type of ``read_nullable`` promises to return a value equal to the +logical witness of its representation predicate. + +The code is a little tedious---we'll see how to clean it up a bit +shortly. + +A ``show_proof_state`` in the ``Some x`` branch prints the following: + +.. code-block:: + + - Current context: + pts_to_or_null r (reveal 'v) + - In typing environment: + [branch equality#684 : squash (eq2 r (Some x)), + ... + +The interesting part is the ``branch equality`` hypothesis, meaning +that in this branch, we can assume that ``(r == Some x)``. So, the +first thing we do is to rewrite ``r``; then we ``unfold`` the +representation predicate; read the value ``o`` out of ``x``; fold the +predicate back; rewrite in the other direction; and return ``Some +o``. The ``None`` case is similar. + +Another difference between Pulse and F* matches is that Pulse does not +provide any negated path conditions. For example, in the example +below, the assertion fails, since the pattern is only a wildcard and +the Pulse checker does not prove ``not (Some? x)`` as the path +condition hypothesis for the preceding branches not taken. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //read_nullable_alt_fail$ + :end-before: //end read_nullable_alt_fail$ + +We plan to enhance the Pulse checker to also provide these negated +path conditions. + +.. _Pulse_nullable_ref_helpers: + +Helpers ++++++++ + +When a ``slprop`` is defined by cases (like ``pts_to_or_null``) it is +very common to have to reason according to those cases when pattern +matching. Instead of rewriting, unfolding, folding, and rewriting +every time, one can define helper functions to handle these cases. + + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //pts_to_or_null_helpers$ + :end-before: //end pts_to_or_null_helpers$ + +These functions are all marked ``ghost``, indicating that they are +purely for proof purposes only. + +Writing these helpers is often quite mechanical: One could imagine +that the Pulse checker could automatically generate them from the +definition of ``pts_to_or_null``. Using F*'s metaprogramming support, +a user could also auto-generate them in a custom way. For now, we +write them by hand. + +Using the helpers, case analyzing a nullable reference is somewhat +easier: + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //read_nullable_alt$ + :end-before: //end read_nullable_alt$ + + +Writing a nullable reference +++++++++++++++++++++++++++++ + +Having defined our helpers, we can use them repeatedly. For example, +here is a function to write a nullable reference. + +.. literalinclude:: ../code/pulse/PulseTutorial.Conditionals.fst + :language: pulse + :start-after: //write_nullable$ + :end-before: //end write_nullable$ + + + diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_existentials.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_existentials.rst new file mode 100644 index 00000000000..481c4400b0d --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_existentials.rst @@ -0,0 +1,166 @@ +.. _Pulse_Existentials: + +Existential Quantification +========================== + +A very common specification style in Pulse involves the use of the +existential quantifier. Before we can start to write interesting +examples, let's take a brief look at how existential quantification +works. + +As mentioned in the :ref:`introduction to Pulse `, one +of the connectives of Pulse's separation logic is the existential +quantifier. Its syntax is similar to F*'s existential quantifier, +except it is written ``exists*`` instead of just ``exists``, and its +body is a ``slprop``, as in the examples shown below. + +.. code-block:: pulse + + exists* (v:nat). pts_to x v + + exists* v. pts_to x v + + exists* v1 v2. pts_to x v1 ** pts_to y v2 + + ... + + +Some simple examples +.................... + +Looking back to the ``assign`` example from the previous chapter +(shown below), you may have wondered why we bothered to bind a logical +variable ``'v`` in precondition of the specification, since it is +never actually used in any other predicate. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ref.fst + :language: pulse + :start-after: //assign$ + :end-before: //end assign$ + +And indeed, another way to write the specification of ``assign``, without +the logical variable argument, is shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Existentials.fst + :language: pulse + :start-after: //assign$ + :end-before: //end assign$ + +This time, in the precondition, we use an existential quantifier to +say that ``assign`` is callable in a context where ``x`` points to any +value ``w``. + +Usually, however, the postcondition of a function *relates* the +initial state prior to the call to the state after the call and +existential variables are only in scope as far to the right as +possible of the enclosing ``slprop``. So, existential quantifiers in +the precondition of a function are not so common. + +To illustrate, the following attempted specification of ``incr`` does +not work, since the existentially bound ``w0`` is not in scope for the +postcondition. + +.. literalinclude:: ../code/pulse/PulseTutorial.Existentials.fst + :language: pulse + :start-after: //incr_fail$ + :end-before: //end incr_fail$ + +However, existential quantification often appears in postconditions, +e.g., in order to abstract the behavior of function by underspecifying +it. To illustrate, consider the function ``make_even`` below. It's +type states that it sets the contents of ``x`` to some even number +``w1``, without specifying ``w1`` exactly. It also uses an existential +quantification in its precondition, since its postcondition does not +depend on the initial value of ``x``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Existentials.fst + :language: pulse + :start-after: //make_even$ + :end-before: //end make_even$ + +Manipulating existentials +......................... + +In a previous chapter on :ref:`handling classical connectives +`, we saw how F* provides various constructs for +introducing and eliminating logical connectives, including the +existential quantifier. Pulse also provides constructs for working +explicitly with existential quantifiers, though, usually, Pulse +automation takes care of introducing and eliminating existentials +behind the scenes. However, the explicit operations are sometimes +useful, and we show a first example of how they work below + +.. literalinclude:: ../code/pulse/PulseTutorial.Existentials.fst + :language: pulse + :start-after: //make_even_explicit$ + :end-before: //end make_even_explicit$ + +Eliminating existentials +++++++++++++++++++++++++ + +The form ``with w0...wn. assert p; rest`` is often used as an +eliminator for an existential. When the context contains ``exists* +x0...xn. p``, the ``with`` construct binds ``w0 ... wn`` to the +existentially bound variables in the remainder of the scope ``rest``. + +A ``show_proof_state`` immediately after the ``with w0. assert (pts_to +x w0)`` prints the following: + +.. code-block:: pulse + + - Current context: + pts_to x (reveal w0) ** + emp + - In typing environment: + [w0#2 : erased int, + x#1 : ref int] + +That is, we have ``w0:erased int`` in scope, and ``pts_to x (reveal +w0)`` in context. + +Here is another example usage of ``with``, this time with multiple +binders. + +.. literalinclude:: ../code/pulse/PulseTutorial.Existentials.fst + :language: pulse + :start-after: //make_even_explicit_alt$ + :end-before: //end make_even_explicit_alt$ + +When there is a single existential formula in the context, one can +write ``with x1..xn. _`` to "open" the formula, binding its witnesses +in scope. A ``show_proof_state`` after the first line prints: + +.. code-block:: pulse + + - Current context: + pts_to x (reveal wx) ** + pts_to y (reveal wy) ** + pure (eq2 (op_Modulus (reveal wx) 2) (op_Modulus (reveal wy) 2)) ** + emp + - In typing environment: + [_#5 : squash (eq2 (op_Modulus (reveal wx) 2) (op_Modulus (reveal wy) 2)), + wy#4 : erased int, + wx#3 : erased int, + y#2 : ref int, + x#1 : ref int] + + +Introducing existentials +++++++++++++++++++++++++ + +The Pulse checker will automatically introduce existential formulas by +introduces new unification variables for each existentially bound +variable, and then trying to find solutions for those variables by +matching ``slprops`` in the goal with those in the context. + +However, one can also introduce existential formulas explicitly, using +the ``introduce exists*`` syntax, as seen in the two examples +above. In general, one can write + +.. code-block:: pulse + + introduce exists* x1 .. xn. p + with w1...wn + +explicitly providing witnesses ``w1..wn`` for each of the +existentially bound variables ``x1..xn``. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_extraction.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_extraction.rst new file mode 100644 index 00000000000..469dd18e4d9 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_extraction.rst @@ -0,0 +1,491 @@ +.. _Pulse_Extraction: + +Extraction +=========== + +Pulse programs can be extracted to OCaml, C, and Rust. We illustrate the extraction capabilities +with the help of the `Boyer-Moore majority vote algorithm `_ +implemented in Pulse. + +Boyer-Moore majority vote algorithm +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The algorithm finds majority vote in an array of votes in linear time +(2n comparisons, where n is the length of the array) and constant extra memory. + +We implement the algorithm in Pulse with the following specification: + +.. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst + :language: pulse + :start-after: //majorityspec$ + :end-before: //majorityspecend$ + +The precondition ``SZ.fits (2 * SZ.v len)`` ensures safe arithmetic when counting +for majority. + +The algorithm consists of two phases. The first phase, called the pairing phase, +pairs off disagreeing votes (cancels them) until the remaining votes are all same. +The main idea of the algorithm is to do this pairing with n comparisons. After the +pairing phase, the remaining vote must be the majority, *if the majority exists*. +The second phase, called the counting phase, checks if the remaining vote is indeed +in majority with n more comparisons. + +For the first phase, the algorithm maintains three auxiliary variables, ``i`` for the +loop counter, ``cand`` the current majority candidate, and a count ``k``. It visits the +votes in a loop, where for the ``i-th`` +element of the array, if ``k = 0``, the algorithm assigns the ``i-th`` vote as the new +majority candidate and assigns ``k = 1``. Otherwise, if the ``i-th`` vote is same as +``cand``, it increments ``k`` by one, otherwise it decrements ``k`` by one. + +The second phase then is another while loop that counts the number of votes +for the majority candidate from the first phase. + +.. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst + :language: pulse + :start-after: //majorityphase1$ + :end-before: //majorityphase1end$ + +The loop invariant for the first phase specifies majority constraints *within* the +prefix of the array that the loop has visited so far. The second phase loop invariant +is a simple counting invariant. + +Pulse automatically proves the program, with an hint for the behavior of the ``count`` +function as we increment the loop counter, the following ``count_until_next`` lemma +captures the behavior, and we invoke the lemma in both the while loops: + +.. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst + :language: pulse + :start-after: //countlemma$ + :end-before: //countlemmaend$ + +Rust extraction +^^^^^^^^^^^^^^^^ + +Pulse toolchain is accompanied with a tool to extract Pulse programs to Rust. +The extraction pipeline maps the Pulse syntactic constructs such as ``let mut``, +``while``, ``if-then-else``, etc. to corresponding Rust constructs. Further, +Pulse libraries are mapped to their Rust counterparts, e.g. ``Pulse.Lib.Vec`` to +``std::vec``, ``Pulse.Lib.Array`` to Rust slices etc. + +To extract a Pulse file to Rust, we first invoke the F* extraction pipeline with +the command line option ``--codegen Extension``. This emits a ``.ast`` file containing +an internal AST representation of the file. We then invoke the Rust extraction tool +that takes as input the ``.ast`` files and outputs the extracted Rust code (by-default +the output is written to ``stdout``, if an ``-o `` option is provided to the tool, +the output is written to ``file``). For example, the first command produces the ``.ast`` +file from ``PulseTutorial.Algorithms.fst`` (which contains the Boyer-Moore algorithm implementation), +and then the second command extracts the Rust code to ``voting.rs``. (These commands are run in the +``pulse`` root directory, change the location of main.exe according to your setup.) + +.. code-block:: shell + + $ fstar.exe --include out/lib/pulse/ + --include share/pulse/examples/by-example/ --include share/pulse/examples/_cache/ + --cmi --load_cmxs pulse --odir . PulseTutorial.Algorithms.fst + --codegen Extension + + $ ./pulse2rust/main.exe PulseTutorial_Algorithms.ast -o voting.rs + +The output Rust code is as shown below: + +.. literalinclude:: ../code/pulse/voting.rs + :language: pulse + :start-after: //majorityrust$ + :end-before: //majorityrustend$ + +We can test it by adding the following in ``voting.rs`` and running the tests +(using ``cargo test``, it requires a ``Cargo.toml`` file, we provide an example file +in the repo that can be used): + +.. literalinclude:: ../code/pulse/voting.rs + :language: pulse + :start-after: //majorityrusttest$ + :end-before: //majorityrusttestend$ + +A few notes about the extracted Rust code: + +- The Pulse function and the Rust function are generic in the type of the votes. In Rust, + the extracted code required the type argument to implement the ``Clone``, ``Copy``, and + ``PartialEq`` traits. Currently we hardcode these traits. We plan to specify these traits + in Pulse through attribute mechanism + +- The ghost arguments ``p`` and ``s`` appear in the Rust code as ``unit`` arguments, we plan + to make it so that these arguments are completely erased. + +- Whereas ``majority`` needs only read permission for the ``votes`` array in the Pulse + signature, the extracted Rust code specifies the argument as ``&mut``. The Rust extraction + pipeline currently passes all the references as ``mut``, we plan to make + it more precise by taking into account the permissions from the Pulse signature. + +C extraction +^^^^^^^^^^^^^ + +Pulse programs can also be extracted to C. The extraction pipeline is based on the +`Karamel `_ tool. The process to extract Pulse +programs to C is similar to that of extracting Low* to C, described in +`this tutorial `_. In summary, we first generate +``.krml`` files from using the F* extraction command line option ``--codegen krml``, and then +run the Karamel tool on those files. + +One catch with extracting our Boyer-Moore implementation to C is that due to the lack +of support of polymorphism in C, Karamel monomorphizes polymorphic functions based on +their uses. So, we write a monomorphic version of the ``majority`` function for ``u32``, +that internally calls the polymorphic ``majority`` function: + +.. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst + :language: pulse + :start-after: //majoritymono$ + :end-before: //majoritymonoend$ + +Then we extract it to C as follows (the commands are run in the ``pulse`` root directory as before): + +.. code-block:: shell + + $ fstar.exe --include out/lib/pulse/ + --include share/pulse/examples/by-example/ --include share/pulse/examples/_cache/ + --cmi --load_cmxs pulse --odir . PulseTutorial.Algorithms.fst + --extract 'FStar.Pervasives.Native PulseTutorial.Algorithms' --codegen krml + + $ ../karamel/krml -skip-compilation out.krml + +This produces ``PulseTutorial_Algorithms.h`` and ``PulseTutorial_Algorithms.c`` files, with the following +implementation of ``majority``: + +.. literalinclude:: ../code/pulse/PulseTutorial_Algorithms.c + :language: C + :start-after: //majorityc$ + :end-before: //majoritycend$ + +We can now test it with a client like: + +.. literalinclude:: ../code/pulse/PulseTutorial_Algorithms_Client.c + :language: C + + +.. code-block:: shell + + $ gcc PulseTutorial_Algorithms.c PulseTutorial_Algorithms_Client.c -I ../karamel/include/ + -I ../karamel/krmllib/c -I ../karamel/krmllib/dist/minimal/ + + $ ./a.out + Majority: 1 + + $ + +OCaml extraction +^^^^^^^^^^^^^^^^^ + +As with all F* programs, Pulse programs can be extracted to OCaml. One caveat +with using the OCaml backend for Pulse programs is that the explicit memory +management from Pulse programs does not carry over to OCaml. For example, the +extracted OCaml programs rely on the OCaml garbage collector for reclaiming unused +heap memory, ``let mut`` variables are allocated on the heap, etc. + +For the Boyer-Moore example, we can extract the program to OCaml as follows: + +.. code-block:: shell + + $ fstar.exe --include out/lib/pulse/ + --include share/pulse/examples/by-example/ --include share/pulse/examples/_cache/ + --cmi --load_cmxs pulse --odir . PulseTutorial.Algorithms.fst + --codegen OCaml + +and the extracted ``majority`` function looks like: + +.. literalinclude:: ../code/pulse/PulseTutorial_Algorithms.ml + :language: C + :start-after: //majorityocaml$ + :end-before: //majorityocamlend$ + + +.. Rust extraction +.. ^^^^^^^^^^^^^^^^ + +.. .. note:: +.. The Rust extraction pipeline is under heavy development. + +.. We illustrate Rust extraction with the +.. `Boyer-Moore majority vote algorithm `_ implemented +.. in Pulse. The algorithm finds majority vote in an array of votes in linear time +.. (2n comparisons, where n is the length of the array) and constant extra memory. + +.. The algorithm consists of two phases. The first phase, called the pairing phase, +.. pairs off disagreeing votes (cancels them) until the remaining votes are all same. +.. The main idea of the algorithm is to do this pairing with n comparisons. After the +.. pairing phase, the remaining vote must be the majority, *if the majority exists*. +.. The second phase, called the counting phase, checks if the remaining vote is indeed +.. in majority with n more comparisons. + +.. We implement the algorithm in Pulse with the following specification: + +.. .. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst +.. :language: pulse +.. :start-after: //majorityspec$ +.. :end-before: //majorityspecend$ + +.. The precondition ``SZ.fits (2 * SZ.v len)`` ensures safe arithmetic in the counting +.. phase of the algorithm. The implementation of the function contains two while loops +.. for the two phases. + +.. For the first phase, the algorithm maintains three auxiliary variables, ``i`` for the +.. loop counter, ``cand`` the current majority candidate, and a count ``k``. For the ``i``-th +.. element of the array, if ``k = 0``, the algorithm assigns the ``i``-th vote as the new +.. majority candidate and assigns ``k = ``. Otherwise, if the ``i``-th vote is same as +.. ``cand``, it increments ``k`` by one, otherwise it decrements ``k`` by one. + + +.. .. literalinclude:: ../code/pulse/PulseTutorial.Algorithms.fst +.. :language: pulse +.. :start-after: //majorityphase1$ +.. :end-before: //majorityphase1end$ + +.. The loop invariant specifies majority constraints *within* the prefix of the array +.. that the loop has visited so far. The second phase after this is a simple counting loop. +.. We refer the reader to the corresponding Pulse file for more details. + +.. To extract ``majority`` to Rust, we first invoke F* extraction pipeline with option +.. ``--codegen Extension``. This emits a ``.ast`` file containing an internal AST +.. representation of the file. Pulse framework is accompanied with a rust extration tool +.. that takes as input the ``.ast`` files and outputs the extracted Rust code (by-default +.. the output is written to ``stdout``, if an ``-o `` option is provided to the tool, +.. the output is written to ``file``). The output of the tool on this example is as shown +.. below: + +.. .. literalinclude:: ../code/pulse/voting.rs +.. :language: pulse +.. :start-after: //majorityrust$ +.. :end-before: //majorityrustend$ + +.. We can output this code in a file, and then test it as follows: + +.. .. literalinclude:: ../code/pulse/voting.rs +.. :language: pulse +.. :start-after: //majorityrusttest$ +.. :end-before: //majorityrusttestend$ + +.. A few notes about the extracted Rust code: + +.. - The Pulse function and the Rust function are generic in the type of the votes. In Rust, +.. the extracted code required the type argument to implement the ``Clone``, ``Copy``, and +.. ``PartialEq`` traits. Currently we hardcode these traits. We plan to specify these traits +.. in Pulse through attribute mechanism + +.. - The ghost arguments ``p`` and ``s`` appear in the Rust code as ``unit`` arguments, we plan +.. to make it so that these arguments are completely erased. + +.. - Whereas ``majority`` needs only read permission for the ``votes`` array in the Pulse +.. signature, the extracted Rust code specifies the argument as ``&mut``. The Rust extraction +.. pipeline currently passes all the references as ``mut``, we plan to make +.. it more precise by taking into account the permissions from the Pulse signature. + + +.. .. Mutable Arrays +.. .. =============== + +.. .. In this chapter, we will learn about mutable arrays in Pulse. An array +.. .. is a contiguous collection of values of the same type. Similar to ``ref``, +.. .. arrays in Pulse can be allocated in the stack frame of the current function +.. .. or in the heap---while the stack allocated arrays are reclaimed automatically +.. .. (e.g., when the function returns), heap allocated arrays are explicitly managed +.. .. by the programmer. + +.. .. Pulse provides two array types: ``Pulse.Lib.Array.array t`` as the basic array type +.. .. and ``Pulse.Lib.Vec.vec t`` for heap allocated arrays. To provide code reuse, functions +.. .. that may operate over both stack and heap allocated arrays can be written using +.. .. ``Pulse.Lib.Array.array t``---the ``Pulse.Lib.Vec`` library provides back-and-forth coercions +.. .. between ``vec t`` and ``array t``. + +.. .. ``array t`` +.. .. ^^^^^^^^^^^^ + +.. .. We illustrate the basics of ``array t`` with the help of the following example +.. .. that reads an array: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: ```pulse //readi$ +.. .. :end-before: ``` + +.. .. The library provides a points-to predicate ``pts_to arr #p s`` with +.. .. the interpretation that in the current memory, the contents of ``arr`` +.. .. are same as the (functional) sequence ``s:FStar.Seq.seq t``. Like the +.. .. ``pts_to`` predicate on reference, it is also indexed by an implicit +.. .. fractional permission ``p``, which distinguished shared, read-only +.. .. access from exclusive read/write access. + +.. .. In the arguments of ``read_i``, the argument ```s`` is erased, since +.. .. it is for specification only. + +.. .. Arrays can be read and written-to using indexes of type +.. .. ``FStar.SizeT.t``, a model of C ``size_t`` [#]_ in F*, provided that +.. .. the index is within the array bounds---the refinement ``SZ.v i < +.. .. Seq.length s`` enforces that the index is in bounds, where ``module SZ +.. .. = FStar.SizeT``. The function returns the ``i``-th element of the +.. .. array, the asserted by the postcondition slprop ``pure (x == Seq.index +.. .. s (SZ.v i))``. The body of the function uses the array read operator +.. .. ``arr.(i)``. + +.. .. As another example, let's write to the ``i``-th element of an array: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: ```pulse //writei$ +.. .. :end-before: ``` + +.. .. The function uses the array write operator ``arr(i) <- x`` and the postcondition +.. .. asserts that in the state when the function returns, the contents of the array +.. .. are same as the sequence ``s`` updated at the index ``i``. + +.. .. While any permission suffices for reading, writing requires +.. .. ``1.0R``. For example, implementing ``write_i`` without +.. .. ``1.0R`` is rejected, as shown below. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //writeipbegin$ +.. .. :end-before: //writeipend$ + +.. .. The library contains ``share`` and ``gather`` functions, similar to +.. .. those for references, to divide and combine permissions on arrays. + +.. .. We now look at a couple of examples that use arrays with conditionals, +.. .. loops, existentials, and invariants, using many of the Pulse +.. .. constructs we have seen so far. + +.. .. .. [#] ``size_t`` in C is an unsigned integer type that is at least +.. .. ``16`` bits wide. The upper bound of ``size_t`` is platform +.. .. dependent. ``FStar.SizeT.size_t`` models this type and is +.. .. extracted to the primitive ``size_t`` type in C, similar to the +.. .. other :ref:`bounded integer types ` discussed +.. .. previously. + +.. .. Compare +.. .. ........ + +.. .. Let's implement a function that compares two arrays for equality: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //comparesigbegin$ +.. .. :end-before: //comparesigend$ + +.. .. The function takes two arrays ``a1`` and ``a2`` as input, and returns a boolean. +.. .. The postcondition ``pure (res <==> Seq.equal 's1 's2)`` +.. .. specifies that the boolean is true if and only if the sequence representations of the +.. .. two arrays are equal. Since the function only reads the arrays, it is parametric in the +.. .. permissions ``p1`` and ``p2`` on the two arrays. Note that the type parameter ``t`` has +.. .. type :ref:`eqtype`, requiring that values of type ``t`` support +.. .. decidable equality. + +.. .. One way to implement ``compare`` is to use a ``while`` loop, reading the two arrays +.. .. using a mutable counter and checking that the corresponding elements are equal. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //compareimplbegin$ +.. .. :end-before: //compareimplend$ + +.. .. The loop invariant states that (a) the arrays are pointwise equal up to the current value +.. .. of the counter, and (b) the boolean ``b`` is true if and only if the current value +.. .. of the counter is less than the length of the arrays and the arrays are equal at that index. +.. .. While (a) helps proving the final postcondition of ``compare``, (b) is required to maintain the +.. .. invariant after the counter is incremented in the loop body. + +.. .. Copy +.. .. ..... + +.. .. As our next example, let's implement a ``copy`` function that copies the contents +.. .. of the array ``a2`` to ``a1``. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //copy$ +.. .. :end-before: ``` + +.. .. The loop invariant existentially abstracts over the contents of ``a1``, and maintains +.. .. that upto the current loop counter, the contents of the two arrays are equal. Rest of +.. .. the code is straightforward, the loop conditional checks that the loop counter is less +.. .. than the array lengths and the loop body copies one element at a time. + +.. .. The reader will notice that the postcondition of ``copy`` is a little convoluted. +.. .. A better signature would be the following, where we directly state that the +.. .. contents of ``a1`` are same as ``'s2``: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //copy2sigbegin$ +.. .. :end-before: //copy2sigend$ + +.. .. We can implement this signature, but it requires one step of rewriting at the end +.. .. after the ``while`` loop to get the postcondition in this exact shape: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //copy2rewriting$ +.. .. :end-before: //copy2rewritingend$ + +.. .. We could also rewrite the predicates explicitly, as we saw in a +.. .. :ref:`previous chapter `. + + +.. .. Stack allocated arrays +.. .. ^^^^^^^^^^^^^^^^^^^^^^^ + +.. .. Stack arrays can be allocated using the expression ``[| v; n |]``. It +.. .. allocates an array of size ``n``, with all the array elements +.. .. initialized to ``v``. The size ``n`` must be compile-time constant. +.. .. It provides the postcondition that the newly create array points to a +.. .. length ``n`` sequence of ``v``. The following example allocates two +.. .. arrays on the stack and compares them using the ``compare`` function +.. .. above. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: ```pulse //compare_stack_arrays$ +.. .. :end-before: ``` + +.. .. As with the stack references, stack arrays don't need to be deallocated or +.. .. dropped, they are reclaimed automatically when the function returns. As a result, +.. .. returning them from the function is not allowed: + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //ret_stack_array$ +.. .. :end-before: //ret_stack_array_end$ + +.. .. Heap allocated arrays +.. .. ^^^^^^^^^^^^^^^^^^^^^^ + +.. .. The library ``Pulse.Lib.Vec`` provides the type ``vec t``, for +.. .. heap-allocated arrays: ``vec`` is to ``array`` as ``box`` is to +.. .. ``ref``. + +.. .. Similar to ``array``, ``vec`` is accompanied with a ``pts_to`` +.. .. assertion with support for fractional permissions, ``share`` and +.. .. ``gather`` for dividing and combining permissions, and read and write +.. .. functions. However, unlike ``array``, the ``Vec`` library provides +.. .. allocation and free functions. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: //heaparray$ +.. .. :end-before: //heaparrayend$ + +.. .. As with the heap references, heap allocated arrays can be coerced to ``array`` using the coercion +.. .. ``vec_to_array``. To use the coercion, it is often required to convert ``Vec.pts_to`` to ``Array.pts_to`` +.. .. back-and-forth; the library provides ``to_array_pts_to`` and ``to_vec_pts_to`` lemmas for this purpose. + +.. .. The following example illustrates the pattern. It copies the contents of a stack array into a heap array, +.. .. using the ``copy2`` function we wrote above. + +.. .. .. literalinclude:: ../code/pulse/PulseTutorial.Array.fst +.. .. :language: pulse +.. .. :start-after: ```pulse //copyuse$ +.. .. :end-before: ``` + +.. .. Note how the assertion for ``v`` transforms from ``V.pts_to`` to ``pts_to`` (the points-to assertion +.. .. for arrays) and back. It means that array algorithms and routines can be implemented with the +.. .. ``array t`` type, and then can be reused for both stack- and heap-allocated arrays. + +.. .. Finally, though the name ``vec a`` evokes the Rust ``std::Vec`` library, we don't yet support automatic +.. .. resizing. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_getting_started.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_getting_started.rst new file mode 100644 index 00000000000..da4aa5bad09 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_getting_started.rst @@ -0,0 +1,92 @@ +.. _Pulse_Getting_Started: + +Getting up and running with Codespaces +====================================== + +There are three main ways of running Pulse, roughly sorted in increasing +order of difficulty. + +The easiest way of using Pulse is with Github Codespaces. With a single +click, you can get a full-fledged IDE (VS Code) running in your browser +already configured with F* and Pulse. + +You can also run Pulse inside a container locally, for a similar 1-click setup +that is independent of Github. + +Finally, you can also extract a Pulse release tarball and run +the binaries directly in your system. + +(Building from source is not well-documented yet.) + +.. note:: + + Unlike the pure F* parts of this tutorial, Pulse code does not yet + work in the online playground. Use one of the methods described + below to try the examples in this part of the book. + + You can find all the source files associated with each chapter `in + this folder + `_, + in files named ``PulseTutorial.*.fst``. + +Creating a Github Codespace +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +To do so, go to the `this +repository `_ and click on the +'<>Code' button, then select 'Create codespace on main'. This will use +the Dev Container definition in the `.devcontainer` directory to set up +container where F* and Pulse can run in a reproducible manner. + +.. image:: img/create.png + +.. note: + + This will consume minutes out of your free Codespaces budget, + which is 120 hours a month for free users. If you would like to + avoid this, or do not have a Github account, see the next section. + +You should be greeted, after a minute or two, by a VS Code instance +running in your browser displaying this same README. + +.. image:: img/starting.png + +.. image:: img/vscode.png + +All the usual F* navigation commands should work on Pulse files. + +If you prefer a local UI instead of a browser tab, you can "open" +the Codespace from your local VS Code installation like so: + +.. image:: img/local-open.png + +F* and Pulse are still running on Github's servers, so the usage is +still computed, but you may find the UI more comfortable. + +Running the Dev Container locally +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The Dev Container configuration contains all that is needed to run +Pulse in an isolated, reproducible manner. If you would like to avoid +Codespaces and just run locally, VS Code can set up the Dev Container +locally for you very easily. + +Simply open the repository in VS Code. You should see a popup claiming +that the project has a Dev Container. Choose 'Reopen in Dev Container' +to trigger a build of the container. VS Code will spawn a new window to +download the base Docker image, set up the extension in it, and open the +repository again. + +This new window should now work as usual. + +Using a Pulse release +^^^^^^^^^^^^^^^^^^^^^ + +A release of Pulse, including related F* tools, `is available here +`_. Uncompress +the archive and add follow the instructions in the README.md, notably +setting the recommended environment variables. + +We also recommend installing VS Code and the fstar-vscode-assistant, +from the VS Code marketplace. This should pick up the F* and Pulse +installation from your path. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_ghost.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_ghost.rst new file mode 100755 index 00000000000..be9cb45f01a --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_ghost.rst @@ -0,0 +1,329 @@ +.. _Pulse_Ghost: + +Ghost Computations +================== + +Throughout the chapters on pure F*, we made routine use of the +``Lemmas`` and ghost functions to prove properties of our +programs. Lemmas, you will recall, are pure, total functions that +always return ``unit``, i.e., they have no computational significance +and are erased by the F* compiler. F* :ref:`ghost functions +` are also pure, total functions, except that they are +allowed to inspect erased values in a controlled way---they too are +erased by the F* compiler. + +As we've seen already, F* lemmas and ghost functions can be directly +used in Pulse code. But, these are only useful for describing +properties about the pure values in scope. Often, in Pulse, one needs +to write lemmas that speak about the state, manipulate ``slprops``, +etc. For this purpose, Pulse provides its own notion of *ghost +computations* (think of these as the analog of F* lemmas and ghost +functions, except they are specified using ``slprops``); and *ghost +state* (think of these as the analog of F* erased types, except ghost +state is mutable, though still computationally irrelevant). Ghost +computations are used everywhere in Pulse---we've already seen a few +example. Ghost state is especially useful in proofs of concurrent +programs. + + +Ghost Functions +............... + +Here's a Pulse function that fails to check, with the error message +below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //incr_erased_non_ghost$ + :end-before: //end incr_erased_non_ghost$ + +.. code-block:: + + Cannot bind ghost expression reveal x with ST computation + +We should expect this to fail, since the program claims to be able to +compute an integer ``y`` by incrementing an erased integer ``x``---the +``x:erased int`` doesn't exist at runtime, so this program cannot be +compiled. + +But, if we tag the function with the ``ghost`` qualifier, then this +works: + + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //incr_erased$ + :end-before: //end incr_erased$ + +The ``ghost`` qualifier indicates the the Pulse checker that the +function is to be erased at runtime, so ``ghost`` functions are +allowed to make use of F* functions with ``GTot`` effect, like +``FStar.Ghost.reveal``. + +However, for this to be sound, no compilable code is allowed to depend +on the return value of a ``ghost`` function. So, the following code +fails with the error below: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //try_use_incr_erased$ + :end-before: //end try_use_incr_erased$ + +.. code-block:: + + Expected a term with a non-informative (e.g., erased) type; got int + +That is, when calling a ``ghost`` function from a non-ghost context, +the return type of the ghost function must be non-informative, e.g, +``erased``, or ``unit`` etc. The class of non-informative types and +the rules for allowing F* :ref:`ghost computations to be used in total +contexts is described here `, and the same +rules apply in Pulse. + +To use of ``incr_erased`` in non-ghost contexts, we have to erase its +result. There are a few ways of doing this. + +Here's a verbose but explicit way, where we define a nested ghost +function to wrap the call to ``incr_erased``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_incr_erased$ + :end-before: //end use_incr_erased$ + +The library also contains ``Pulse.Lib.Pervasives.call_ghost`` that is +a higher-order combinator to erase the result of a ghost call. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_incr_erased_alt$ + :end-before: //end use_incr_erased_alt$ + +The ``call_ghost`` combinator can be used with ghost functions of +different arities, though it requires the applications to be curried +in the following way. + +Suppose we have a binary ghost function, like ``add_erased``: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //add_erased$ + :end-before: //end add_erased$ + +To call it in a non-ghost context, one can do the following: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_add_erased$ + :end-before: //end use_add_erased$ + +That said, since ``ghost`` functions must have non-informative return +types to be usable in non-ghost contexts, it's usually best to define +them that way to start with, rather than having to wrap them at each +call site, as shown below: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //add_erased_erased$ + :end-before: //end add_erased_erased$ + + +Some Primitive Ghost Functions +.............................. + +Pulse ghost functions with ``emp`` or ``pure _`` pre and +postconditions are not that interesting---such functions can usually +be written with regular F* ghost functions. + +Ghost functions are often used as proof steps to prove equivalences +among ``slprops``. We saw a few :ref:`examples of ghost functions +before `---they are ghost since their +implementations are compositions of ``ghost`` functions from the Pulse +library. + +The ``rewrite`` primitive that we saw :ref:`previously +` is in fact a defined function in the Pulse +library. Its signature looks like this: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //__rewrite_sig$ + :end-before: //end __rewrite_sig$ + +Many of the other primitives like ``fold``, ``unfold``, etc. are +defined in terms of ``rewrite`` and are ``ghost`` computations. + +Other primitives like ``introduce exists*`` are also implemented in +terms of library ``ghost`` functions, with signatures like the one +below: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //intro_exists_sig$ + :end-before: //end intro_exists_sig$ + + +.. _Pulse_recursive_predicates: + +Recursive Predicates and Ghost Lemmas +..................................... + +We previously saw how to :ref:`define custom predicates +`, e.g., for representation predicates on data +structures. Since a ``slprop`` is just a regular type, one can also +define ``slprops`` by recursion in F*. Working with these recursive +predicates in Pulse usually involves writing recursive ghost functions +as lemmas. We'll look at a simple example of this here and revisit in +subsequent chapters as look at programming unbounded structures, like +linked lists. + +Say you have a list of references and want to describe that they all +contain integers whose value is at most ``n``. The recursive predicate +``all_at_most l n`` does just that: + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: fstar + :start-after: //all_at_most$ + :end-before: //end all_at_most$ + +As we did when working with :ref:`nullable references +`, it's useful to define a few helper +ghost functions to introduce and eliminate this predicate, for each of +its cases. + + +Recursive Ghost Lemmas +++++++++++++++++++++++ + +Pulse allows writing recursive ghost functions as lemmas for use in +Pulse code. Like F* lemmas, recursive ghost functions must be proven +to terminate on all inputs---otherwise, they would not be sound. + +To see this in action, let's write a ghost function to prove that +``all_at_most l n`` can be weakened to ``all_at_most l m`` when ``n <= +m``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //weaken_at_most$ + :end-before: //end weaken_at_most$ + +A few points to note: + + * Recursive functions in Pulse are defined using ``fn rec``. + + * Ghost recursive functions must also have a ``decreases`` + annotation---unlike in F*, Pulse does not yet attempt to infer a + default decreases annotation. In this case, we are recursing on + the list ``l``. + + * List patterns in Pulse do not (yet) have the same syntactic sugar + as in F*, i.e., you cannot write ``[]`` and ``hd::tl`` as + patterns. + + * The proof itself is fairly straightforward: + + - In the ``Nil`` case, we eliminate the ``all_at_most`` predicate + at ``n`` and introduce it at ``m``. + + - In the ``Cons`` case, we eliminate ``all_at_most l n``,` use the + induction hypothesis to weaken the ``all_at_most`` predicate on + the ``tl``; and then introduce it again, packaging it with + assumption on ``hd``. + +Mutable Ghost References +........................ + +The underlying logic that Pulse is based on actually supports a very +general form of ghost state based on partial commutative monoids +(PCMs). Users can define their own ghost state abstractions in F* +using PCMs and use these in Pulse programs. The library +``Pulse.Lib.GhostReference`` provides the simplest and most common +form of ghost state: references to erased values with a +fractional-permission-based ownership discipline. + +We'll use the module abbreviation ``module GR = +Pulse.Lib.GhostReference`` in what follows. The library is very +similar to ``Pulse.Lib.Reference``, in that it provides: + + * ``GR.ref a``: The main type of ghost references. ``GR.ref`` is an + erasable type and is hence considered non-informative. + + * ``GR.pts_to (#a:Type0) (r:GR.ref a) (#p:perm) (v:a) : slprop`` is + the main predicate provided by the library. Similar to the regular + ``pts_to``, the permission index defaults to ``1.0R``. + + * Unlike ``ref a`` (and more like ``box a``), ghost references + ``GR.ref a`` are not lexically scoped: they are allocated using + ``GR.alloc`` and freed using ``GR.free``. Of course, neither + allocation nor free'ing has any runtime cost---these are just + ghost operations. + + * Reading a ghost reference using ``!r`` returns an ``erased a``, + when ``r:GR.ref a``. Likewise, to update ``r``, it is enough to + provide a new value ``v:erased a``. + + * Operations to ``share`` and ``gather`` ghost references work just + as with ``ref``. + +A somewhat contrived example +++++++++++++++++++++++++++++ + +Most examples that require ghost state usually involve stating +interesting invariants between multiple threads, or sometimes in a +sequential setting to correlate knowledge among different +components. We'll see examples of that in a later chapter. For now, +here's a small example that gives a flavor of how ghost state can be +used. + +Suppose we want to give a function read/write access to a reference, +but want to ensure that before it returns, it resets the value of the +reference to its original value. The simplest way to do that would be +to give the function the following signature: + +.. code-block:: pulse + + fn uses_but_resets #a (x:ref a) + requires pts_to x 'v + ensures pts_to x 'v + +Here's another way to do it, this time with ghost references. + +First, we define a predicate ``correlated`` that holds full permission +to a reference and half permission to a ghost reference, forcing them +to hold the same value. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //correlated$ + :end-before: //end correlated$ + +Now, here's the signature of a function ``use_temp``: at first glance, +from its signature alone, one might think that the witness ``v0`` +bound in the precondition is unrelated to the ``v1`` bound in the +postcondition. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_temp_sig$ + :end-before: //end use_temp_sig$ + +But, ``use_temp`` only has half-permission to the ghost reference and +cannot mutate it. So, although it can mutate the reference itself, in +order to return its postcondition, it must reset the reference to its +initial value. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_temp_body$ + :end-before: //end use_temp_body$ + +This property can be exploited by a caller to pass a reference to +``use_temp`` and be assured that the value is unchanged when it +returns. + +.. literalinclude:: ../code/pulse/PulseTutorial.Ghost.fst + :language: pulse + :start-after: //use_correlated$ + :end-before: //end use_correlated$ + diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_higher_order.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_higher_order.rst new file mode 100755 index 00000000000..ed3edf5abc2 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_higher_order.rst @@ -0,0 +1,159 @@ +.. _Pulse_higher_order: + +Higher Order Functions +====================== + +Like F*, Pulse is higher order. That is, Pulse functions are first +class and can be passed to other functions, returned as results, and +some functions can even be stored in the heap. + +Pulse Computation Types +....................... + +Here's perhaps the simplest higher-order function: ``apply`` abstracts +function application. + +.. literalinclude:: ../code/pulse/PulseTutorial.HigherOrder.fst + :language: pulse + :start-after: //apply$ + :end-before: //end apply$ + +This function is polymorphic in the argument, result type, pre, and +post-condition of a function ``f``, which it applies to an argument +``x``. This is the first time we have written the type of a Pulse +function as an F* type. So far, we have been writing *signatures* of +Pulse functions, using the ``fn/requires/ensures`` notation, but here +we see that the type of Pulse function is of the form: + +.. code-block:: pulse + + x:a -> stt b pre (fun y -> post) + +where, + + * like any F* function type, Pulse functions are dependent and the + right hand side of the arrow can mention ``x`` + + * immediately to the right of the arrow is a Pulse computation type + tag, similar to F*'s ``Tot``, or ``GTot``, etc. + + * The tag ``stt`` is the most permissive of Pulse computation type + tags, allowing the function's body to read and write the state, + run forever etc., but with pre-condition ``pre``, return type + ``b``, and post-condition ``fun y -> post``. + +Pulse provides several other kinds of computation types. For now, the +most important is the constructor for ghost computations. We show +below ``apply_ghost``, the analog of ``apply`` but for ``ghost`` +functions. + +.. literalinclude:: ../code/pulse/PulseTutorial.HigherOrder.fst + :language: pulse + :start-after: //apply_ghost$ + :end-before: //end apply_ghost$ + +The type of ``f`` is similar to what we had before, but this time we +have: + + * computation type tag ``stt_ghost``, indication that this function + reads or writes ghost state only, and always terminates. + + * the return type is ``b x`` + + * the next argument is ``emp_inames``, describes the set of + invariants that a computation may open, where ``emp_inames`` means + that this computation opens no invariants. For now, let's ignore + this. + + * the precondition is ``pre x`` and the postcondition is ``fun y -> + post x y``. + +Universes ++++++++++ + +For completeness, the signature of ``stt`` and ``stt_ghost`` are shown +below: + +.. code-block:: fstar + + val stt (a:Type u#a) (i:inames) (pre:slprop) (post: a -> slprop) + : Type u#0 + + val stt_ghost (a:Type u#a) (i:inames) (pre:slprop) (post: a -> slprop) + : Type u#4 + +A point to note is that ``stt`` computations live in universe +``u#0``. This is because ``stt`` computations are allowed to +infinitely loop, and are built upon :ref:`the effect of divergence +`, or ``Div``, which, as we learned earlier, lives in +universe ``u#0``. The universe of ``stt`` means that one can store an +``stt`` function in an reference, e.g., ``ref (unit -> stt unit p q)`` +is a legal type in Pulse. + +In contrast, ``stt_ghost`` functions are total and live in +universe 4. You cannot store a ``stt_ghost`` function in the state, +since that would allow writing non-terminating functions in +``stt_ghost``. + +Counters +........ + +For a slightly more interesting use of higher order programming, let's +look at how to program a mutable counter. We'll start by defining the +type ``ctr`` of a counter. + +.. literalinclude:: ../code/pulse/PulseTutorial.HigherOrder.fst + :language: fstar + :start-after: //ctr$ + :end-before: //end ctr$ + +A counter packages the following: + + * A predicate ``inv`` on the state, where ``inv i`` states that the + current value of the counter is ``i``, without describing exactly + how the counter's state is implemented. + + * A stateful function ``next`` that expects the ``inv i``, returns + the current value ``i`` of the counter, and provides ``inv (i + + 1)``. + + * A stateful function ``destroy`` to deallocate the counter. + +One way to implement a ``ctr`` is to represent the state with a +heap-allocated reference. This is what ``new_counter`` does below: + +.. literalinclude:: ../code/pulse/PulseTutorial.HigherOrder.fst + :language: pulse + :start-after: //new_counter$ + :end-before: //end new_counter$ + +Here's how it works. + +First, we allocate a new heap reference ``x`` initialized to ``0``. + +Pulse allows us to define functions within any scope. So, we define +two local functions ``next`` and ``destroy``, whose implementations +and specifications are straightforward. The important bit is that they +capture the reference ``x:box int`` in their closure. + +Finally, we package ``next`` and ``destroy`` into a ``c:ctr``, +instantiating ``inv`` to ``Box.pts_to x``, rewrite the context assertion +to ``c.inv 0``, and return ``c``. + +In a caller's context, such as ``test_counter`` below, the fact that +the counter is implemented using a single mutable heap reference is +completely hidden. + +.. literalinclude:: ../code/pulse/PulseTutorial.HigherOrder.fst + :language: pulse + :start-after: //test_counter$ + :end-before: //end test_counter$ + + + + + + + + + diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_implication_and_forall.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_implication_and_forall.rst new file mode 100755 index 00000000000..c05e079728a --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_implication_and_forall.rst @@ -0,0 +1,254 @@ +.. _Pulse_implication_and_forall: + +Implication and Universal Quantification +======================================== + +In this chapter, we'll learn about two more separation logic +connectives, ``@==>`` and ``forall*``. We show a few very simple +examples using them, though these will be almost trivial. In the next +chapter, on linked lists, we'll see more significant uses of these +connectives. + +Trades, or Separating Ghost Implication +........................................ + +The library ``module I = Pulse.Lib.Trade.Util`` defines the operator *trade* +``(@==>)`` and utilities for using it. In the literature, the operator ``p --* +q`` is pronounced "p magic-wand q"; ``p @==> q`` is similar, though there are +some important technical differences, as we'll see. We'll just pronounce it ``p +for q``, ``p trade q``, or ``p trades for q``. Here's an informal description of +what ``p @==> q`` means: + + ``p @==> q`` says that if you have ``p`` then you can *trade* it for + ``q``. In other words, from ``p ** (p @==> q)``, you can derive + ``q``. This step of reasoning is performed using a ghost function + ``I.elim`` with the signature below: + + .. code-block:: pulse + + ghost + fn I.elim (p q:slprop) + requires p ** (p @==> q) + ensures q + + +Importantly, if you think of ``p`` as describing permission on a +resource, the ``I.elim`` makes you *give up* the permission ``p`` and +get ``q`` as a result. Note, during this step, you also lose +permission on the implication, i.e., ``p @==> q`` lets you trade ``p`` +for ``q`` just once. + +But, how do you create a ``p @==> q`` in the first place? That's its +introduction form, shown below: + + .. code-block:: pulse + + ghost + fn I.intro (p q r:slprop) + (elim: unit -> stt_ghost unit emp_inames (r ** p) (fun _ -> q)) + requires r + ensures p @==> q + +That is, to introduce ``p @==> q``, one has to show hold permission +``r``, such that a ghost function can transform ``r ** p`` into +``q``. + +Share and Gather +++++++++++++++++ + +Here's a small example to see ``p @==> q`` at work. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: fstar + :start-after: //regain_half$ + :end-before: //end regain_half$ + +The predicate ``regain_half`` says that you can trade a +half-permission ``pts_to x #one_half v`` for a full permission +``pts_to x v``. At first, this may seem counter-intuitive: how can you +gain a full permission from half-permission. The thing to remember is +that ``p @==> q`` itself holds permissions internally. In particular, +``regain_half x v`` holds ``exists* u. pts_to x #one_half u`` +internally, such that if the context presents the other half, the +eliminator can combine the two to return the full permission. + +Let's look at how to introduce ``regain_half``: + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //intro_regain_half$ + :end-before: //end intro_regain_half$ + +The specification says that if we start out with ``pts_to x 'v`` then +we can split it into ``pts_to x #one_half v`` and a ``regain_half x +'v``. The normal way of splitting a permission a reference would split +it into two halves---here, we just package the second half in a ghost +function that allows us to gather the permission back when we need it. + +In the implementation, we define an auxiliary ghost function that +corresponds to the eliminator for ``pts_to x #one_half 'v @==> pts_to x +'v``---it's just a ``gather``. Then, we split ``pts_to x 'v`` into +halves, call ``I.intro`` passing the eliminator, and the fold it into +a ``regain_half``. All ``regain_half`` has done is to package the +ghost function ``aux``, together the half permission on ``x``, and put +it into a ``slprop``. + +Later on, if want to use ``regain_half``, we can call its +eliminator---which, effectively, calls ``aux`` with the needed +permission, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //use_regain_half$ + :end-before: //end use_regain_half$ + +At this point, you may be wondering why we bother to use a +``regain_half x 'v`` in the first place, since one might as well have +just used ``pts_to x #one_half 'v`` and ``gather``, and you'd be right +to wonder that! In this simple usage, the ``(@==>)`` hasn't bought us +much. + +Universal Quantification +........................ + +Let's look at our ``regain_half`` predicate again: + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: fstar + :start-after: //regain_half$ + :end-before: //end regain_half$ + +This predicate is not as general as it could be: to eliminate it, it +requires the caller to prove that they holds ``pts_to x #one_half v``, +for the same ``v`` as was used when the trade was introduced. + +One could try to generalize ``regain_half`` a bit by changing it to: + +.. code-block:: fstar + + let regain_half #a (x:GR.ref a) (v:a) = + (exists* u. pts_to x #one_half u) @==> pts_to x v + +This is an improvement, but it still is not general enough, since it +does not relate ``v`` to the existentially bound ``u``. What we really +need is a universal quantifier. + +Here's the right version of ``regain_half``: + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: fstar + :start-after: //regain_half_q$ + :end-before: //end regain_half_q$ + +This says that no matter what ``pts_to x #one_half u`` the context +has, they can recover full permission to it, *with the same witness* +``u``. + +The ``forall*`` quantifier and utilities to manipulate it are defined +in ``Pulse.Lib.Forall.Util``. The introduction and elimination forms +have a similar shape to what we saw earlier for ``@==>``: + + + .. code-block:: pulse + + ghost + fn FA.elim (#a:Type) (#p:a->slprop) (v:a) + requires (forall* x. p x) + ensures p v + +The eliminator allows a *single* instantiation of the universally +bound ``x`` to ``v``. + + .. code-block:: pulse + + ghost + fn FA.intro (#a:Type) (#p:a->slprop) + (v:slprop) + (f_elim : (x:a -> stt_ghost unit emp_inames v (fun _ -> p x))) + requires v + ensures (forall* x. p x) + +The introduction form requires proving that one holds ``v``, and that +with ``v`` a ghost function can produce ``p x``, for any ``x``. + +Note, it's very common to have universal quantifiers and trades +together, so the library also provides the following combined forms: + + .. code-block:: pulse + + ghost + fn elim_forall_imp (#a:Type0) (p q: a -> slprop) (x:a) + requires (forall* x. p x @==> q x) ** p x + ensures q x + +and + + .. code-block:: pulse + + ghost + fn intro_forall_imp (#a:Type0) (p q: a -> slprop) (r:slprop) + (elim: (u:a -> stt_ghost unit emp_inames + (r ** p u) + (fun _ -> q u))) + requires r + ensures forall* x. p x @==> q x + + +Share and Gather, Again ++++++++++++++++++++++++ + +Here's how one introduces ``regain_half_q``: + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //intro_regain_half_q$ + :end-before: //end intro_regain_half_q$ + +Now, when we want to use it, we can trade in any half-permission on +``pts_to x #one_half u``, for a full permission with the same ``u``. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //use_regain_half_q$ + :end-before: //end use_regain_half_q$ + +Note using the eliminator for ``FA.elim`` is quite verbose: we need to +specify the quantifier term again. The way Pulse uses F*'s unifier +currently does not allow it to properly find solutions to some +higher-order unification problems. We expect to fix this soon. + + +Trades and Ghost Steps +...................... + +As a final example in this section, we show that one can use package +any ghost computation into a trade, including steps that may modify +the ghost state. In full generality, this makes ``@==>`` behave more +like a view shift (in Iris terminology) than a wand. + +Here's a predicate ``can_update`` which says that one can trade a half +permission to ``pts_to x #one_half u`` for a full permission to a +*different value* ``pts_to x v``. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: fstar + :start-after: //can_update$ + :end-before: //end can_update$ + +In ``make_can_update``, we package a ghost-state update function into +a binary quantifier ``forall* u v. ...``. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //make_can_update$ + :end-before: //end make_can_update$ + +And in ``update``, below, we instantiate it to update the reference +``x`` from ``'u`` to ``k``, and also return back a ``can_update`` +predicate to the caller, for further use. + +.. literalinclude:: ../code/pulse/PulseTutorial.ImplicationAndForall.fst + :language: pulse + :start-after: //update$ + :end-before: //end update$ + diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_linked_list.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_linked_list.rst new file mode 100755 index 00000000000..20622bbb40c --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_linked_list.rst @@ -0,0 +1,346 @@ + +.. _Pulse_linked_list: + +Linked Lists +============ + +In this chapter, we develop a linked list library. Along the way, +we'll see uses of recursive predicates, trades, and universal +quantification. + +Representing a Linked List +.......................... + +Let's start by defining the type of a singly linked list: + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: fstar + :start-after: //llist$ + :end-before: //end llist$ + +A ``node t`` contains a ``head:t`` and a ``tail:llist t``, a nullable +reference pointing to the rest of the list. Nullable references are +represented by an option, as :ref:`we saw before +`. + +Next, we need a predicate to relate a linked list to a logical +representation of the list, for use in specifications. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: fstar + :start-after: //is_list$ + :end-before: //end is_list$ + +The predicate ``is_list x l`` is a recursive predicate: + + * When ``l == []``, the reference ``x`` must be null. + + * Otherwise, ``l == head :: tl``, ``x`` must contains a valid + reference ``p``, where ``p`` points to ``{ head; tail }`` and, + recursively , we have ``is_list tail tl``. + + +Boilerplate: Introducing and Eliminating ``is_list`` +.................................................... + +We've seen :ref:`recursive predicates in a previous chapter +`, and just as we did there, we need some +helper ghost functions to work with ``is_list``. We expect the Pulse +checker will automate these boilerplate ghost lemmas in the future, +but, for now, we are forced to write them by hand. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //boilerplate$ + :end-before: //end boilerplate$ + + + +Case analyzing a nullable pointer +................................. + +When working with a linked list, the first thing we'll do, typically, +is to check whether a given ``x:llist t`` is null or not. However, the +``is_list x l`` predicate is defined by case analyzing ``l:list t`` +rather than ``x:llist t``, since that is makes it possible to write +the predicate by recursing on the tail of ``l``. So, below, we have a +predicate ``is_list_cases x l`` that inverts ``is_list x l`` predicate +based on whether or not ``x`` is null. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: fstar + :start-after: //is_list_cases$ + :end-before: //end is_list_cases$ + +Next, we define a ghost function to invert ``is_list`` into ``is_list_cases``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //cases_of_is_list$ + :end-before: //end cases_of_is_list$ + +We also define two more ghost functions that package up the call to +``cases_of_is_list``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //is_list_case_none$ + :end-before: //end is_list_case_none$ + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //is_list_case_some$ + :end-before: //end is_list_case_some$ + +Length, Recursively +................... + +With our helper functions in hand, let's get to writing some real +code, starting with a function to compute the length of an ``llist``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //length$ + :end-before: //end length$ + +The ``None`` case is simple. + +Some points to note in the ``Some`` case: + + * We use ``with _node _tl. _`` to "get our hands on" the + existentially bound witnesses. + + * After reading ``let node = !vl``, we need ``is_list node.tail + _tl`` to make the recursive call. But the context contains + ``is_list _node.tail _tl`` and ``node == _node``. So, we need a + rewrite. + + * We re-introduce the ``is_list`` predicate, and return ``1 + + n``. While the ``intro_is_list_cons x vl`` is a ghost step and + will be erased before execution, the addition is not---so, this + function is not tail recursive. + +Exercise 1 +.......... + +Write a tail-recursive version of ``length``. + +Exercise 2 +.......... + +Index the ``is_list`` predicate with a fractional permission. Write +ghost functions to share and gather fractional ``is_list`` predicates. + +Length, Iteratively, with Trades +................................ + +What if we wanted to implement ``length`` using a while loop, as is +more idiomatic in a language like C. It will take us a few steps to +get there, and we'll use the trade operator (``@==>``) to structure +our proof. + +Trade Tails ++++++++++++ + +Our first step is to define ``tail_for_cons``, a lemma stating that with +permission on a node pointer (``pts_to v n``), we can build a trade +transforming a permission on the tail into a permission for a cons +cell starting at the given node. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //tail_for_cons$ + :end-before: //end tail_for_cons$ + + +Tail of a list +++++++++++++++ + +Next, here's a basic operation on a linked list: given a pointer to a +cons cell, return a pointer to its tail. Here's a small diagram: + + +.. code-block:: + + x tl + | | + v v + .---.---. .---.---. + | | --|---> | | --|--> ... + .---.---. .---.---. + +We're given a pointer ``x`` to the cons cell at the head of a list, +and we want to return ``tl``, the pointer to the next cell (or +``None``, of x this is the end of the list). But, if we want to return +a pointer to ``tl``, we a permission accounting problem: + + * We cannot return permission to ``x`` to the caller, since then we + would have two *aliases* pointing to the next cell in the list: + the returned ``tl`` and ``x -> next``. + + * But, we cannot consume the permission to ``x`` either, since we + would like to return permission to ``x`` once the return ``tl`` + goes out of scope. + +The solution here is to use a trade. The type of ``tail`` below says +that if ``x`` is a non-null pointer satisfying ``is_list x 'l``, then +``tail`` returns a pointer ``y`` such that ``is_list y tl`` (where +``tl`` is the tail of ``'l``); and, one can trade ``is_list y tl`` to +recover permission to ``is_list x 'l``. The trade essentially says +that you cannot have permission to ``is_list x 'l`` and ``is_list y +tl`` at the same time, but once you give up permission on ``y``, you +can get back the original permission on ``x``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //tail$ + :end-before: //end tail$ + +``length_iter`` ++++++++++++++++ + +The code below shows our iterative implementation of ``length``. The +basic idea is simple, though the proof takes a bit of doing. We +initialize a current pointer ``cur`` to the head of the list; and +``ctr`` to ``0``. Then, while ``cur`` is not null, we move ``cur`` to +the tail and increment ``ctr``. Finally, we return the ``!ctr``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //length_iter$ + :end-before: //end length_iter$ + +Now, for the proof. The main part is the loop invariant, which says: + + * the current value of the counter is ``n``; + + * ``cur`` holds a list pointer, ``ll`` where ``ll`` contains the + list represented by ``suffix``; + + * ``n`` is the the length of the prefix of the list traversed so far; + + * the loop continues as long as ``b`` is true, i.e., the list + pointer ``l`` is not ``None``; + + * and, the key bit: you can trade ownership on ``ll`` back for + ownership on the original list ``x``. + +Some parts of this could be simplified, e.g., to avoid some of the +rewrites. + +One way to understand how trades have helped here is to compare +``length_iter`` to the recursive function ``length``. In ``length``, +after each recursive call returns, we called a ghost function to +repackage permission on the cons cell after taking out permission on +the tail. The recursive function call stack kept track of all these +pieces of ghost code that had to be executed. In the iterative +version, we use the trade to package up all the ghost functions that +need to be run, rather than using the call stack. When the loop +terminates, we use ``I.elim`` to run all that ghost code at once. + +Of course, the recursive ``length`` is much simpler in this case, but +this pattern of using trades to package up ghost functions is quite +broadly useful. + +Append, Recursively +................... + +Here's another recursive function on linked lists: ``append`` +concatenates ``y`` on to the end of ``x``. + +It's fairly straightforward: we recurse until we reach the last node +of ``x`` (i.e., the ``tail`` field is ``None``; and we set that field +to point to ``y``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //append$ + :end-before: //end append$ + +The code is tail recursive in the ``Some _`` case, but notice that we +have a ghost function call *after* the recursive call. Like we did for +``length``, can we implement an iterative version of ``append``, +factoring this ghost code on the stack into a trade? + +Append, Iteratively +................... + +Let's start by defining a more general version of the ``tail`` +function from before. In comparison, the postcondition of ``tail_alt`` +uses a universal quantifier to say, roughly, that whatever list ``tl'`` +the returns ``y`` points to, it can be traded for a pointer to ``x`` +that cons's on to ``tl``. Our previous function ``tail`` can be easily +recovered by instantiating ``tl'`` to ``tl``. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //tail_alt$ + :end-before: //end tail_alt$ + +We'll use these quantified trades in our invariant of ``append_iter``, +shown below. The main idea of the implementation is to use a while +loop to traverse to the last element of the first list ``x``; and then +to set ``y`` as the ``next`` pointer of this last element. + +.. literalinclude:: ../code/pulse/PulseTutorial.LinkedList.fst + :language: pulse + :start-after: //append_iter$ + :end-before: //end append_iter$ + +There are few interesting points to note. + + * The main part is the quantified trade in the invariant, which, as + we traverse the list, encapsulates the ghost code that we need to + run at the end to restore permission to the initial list pointer + ``x``. + + * The library function, ``FA.trans_compose`` has the following + signature: + + .. code-block:: pulse + + ghost + fn trans_compose (#a #b #c:Type0) + (p: a -> slprop) + (q: b -> slprop) + (r: c -> slprop) + (f: a -> GTot b) + (g: b -> GTot c) + requires + (forall* x. p x @==> q (f x)) ** + (forall* x. q x @==> r (g x)) + ensures + forall* x. p x @==> r (g (f x)) + + + We use it in the key induction step as we move one step down the + list---similar to what we had in ``length_iter``, but this time + with a quantifier. + + * Illustrating again that Pulse is a superset of pure F*, we make + use of a :ref:`bit of F* sugar ` in the + ``introduce forall`` to prove a property needed for a Pulse + rewrite. + + * Finally, at the end of the loop, we use ``FA.elim_forall_imp`` to + restore permission on ``x``, now pointing to a concatenated list, + effectively running all the ghost code we accumulated as we + traversed the list. + +Perhaps the lesson from all this is that recursive programs are much +easier to write and prove correct that iterative ones? That's one +takeaway. But, hopefully, you've seen how trades and quantifiers work +and can be useful in some proofs---of course, we'll use them not just +for rewriting recursive as iterative code. + +Exercise 3 +++++++++++ + +Write a function to insert an element in a list and a specific +position. + + +Exercise 4 +++++++++++ + +Write a function to reverse a list. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_loops.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_loops.rst new file mode 100755 index 00000000000..f8f975eb5be --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_loops.rst @@ -0,0 +1,226 @@ +.. _Pulse_Loops: + +Loops & Recursion +################# + +In this chapter, we'll see how various looping constructs work in +Pulse, starting with ``while`` and also recursive functions. + +By default, Pulse's logic is designed for partial correctness. This +means that programs are allowed to loop forever. When we say that +program returns ``v:t`` satisfying a postcondition ``p``, this should +be understood to mean that the program could loop forever, but if it +does return, it is guaranteed to return a ``v:t`` where the state +satisfies ``p v``. + + +While loops: General form +......................... + +The form of a while loop is: + +.. code-block:: pulse + + while ( guard ) + invariant (b:bool). p + { body } + +Where + + * ``guard`` is a Pulse program that returns a ``b:bool`` + + * ``body`` is a Pulse program that returns ``unit`` + + * ``invariant (b:bool). p`` is an invariant where + + - ``exists* b. p`` must be provable before the loop is entered and as a postcondition of ``body``. + + - ``exists* b. p`` is the precondition of the guard, and ``p b`` + is its postcondition, i.e., the ``guard`` must satisfy: + + .. code-block:: pulse + + requires exists* b. p + returns b:bool + ensures p + + - the postcondition of the entire loop is ``invariant false``. + +One way to understand the invariant is that it describes program +assertions at three different program points. + + * When ``b==true``, the invariant describes the program state at the + start of the loop body; + + * when ``b==false``, the invariant describes the state at the end of + the loop; + + * when ``b`` is undetermined, the invariant describes the property + of the program state just before the guard is (re)executed, i.e., + at the entry to the loop and at the end of loop body. + +Coming up with an invariant to describe a loop often requires some +careful thinking. We'll see many examples in the remaining chapters, +starting with some simple loops here. + +Countdown ++++++++++ + +Here's our first Pulse program with a loop: ``count_down`` repeatedly +decrements a reference until it reaches ``0``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //count_down$ + :end-before: //end count_down$ + +While loops in Pulse are perhaps a bit more general than in other +languages. The ``guard`` is an arbitrary Pulse program, not just a +program that reads some local variables. For example, here's another +version of ``count_down`` where the ``guard`` does all the work and +the loop body is empty, and we don't need an auxiliary ``keep_going`` +variable. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //count_down3$ + :end-before: //end count_down3$ + + +Partial correctness ++++++++++++++++++++ + +The partial correctness interpretation means that the following +infinitely looping variant of our program is also accepted: + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //count_down_loopy$ + :end-before: //end count_down_loopy$ + +This program increments instead of decrement ``x``, but it still +satisfies the same invariant as before, since it always loops forever. + +We do have a fragment of the Pulse logic, notably the logic of +``ghost`` and ``atomic`` computations that is guaranteed to always +terminate. We plan to also support a version of the Pulse logic for +general purpose sequential programs (i.e., no concurrency) that is +also terminating. + +Multiply by repeated addition ++++++++++++++++++++++++++++++ + +Our next program with a loop multiplies two natural numbers ``x, y`` +by repeatedly adding ``y`` to an accumulator ``x`` times. This +program has a bit of history: A 1949 paper by Alan Turing titled +`"Checking a Large Routine" +`_ is +often cited as the first paper about proving the correctness of a +computer program. The program that Turing describes is one that +implements multiplication by repeated addition. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //multiply_by_repeated_addition$ + :end-before: //end multiply_by_repeated_addition$ + +A few noteworthy points: + + * Both the counter ``ctr`` and the accumulator ``acc`` are declared + ``nat``, which implicitly, by refinement typing, provides an + invariant that they are both always at least ``0``. This + illustrates how Pulse provides a separation logic on top of F*'s + existing dependent type system. + + * The invariant says that the counter never exceeds ``x``; the + accumulator is always the product of counter and ``y``; and the + loop continues so long as the counter is strictly less than ``x``. + +Summing the first ``N`` numbers ++++++++++++++++++++++++++++++++ + +This next example shows a Pulse program that sums the first ``n`` +natural numbers. It illustrates how Pulse programs can developed along +with pure F* specifications and lemmas. + +We start with a specification of ``sum``, a simple recursive function +in F* along with a lemma that proves the well-known identity about +this sum. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: fstar + :start-after: //sum$ + :end-before: //end sum$ + +Now, let's say we want to implement ``isum``, an iterative version of +``sum``, and prove that it satisfies the identity proven by +``sum_lemma``. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //isum$ + :end-before: //end isum$ + +This program is quite similar to ``multiply_by_repeated_addition``, +but with a couple of differences: + + * The invariant says that the current value of the accumulator holds + the sum of the the first ``c`` numbers, i.e., we prove that the + loop refines the recursive implementation of ``sum``, without + relying on any properties of non-linear arithmetic---notice, we + have disabled non-linear arithmetic in Z3 with a pragma. + + * Finally, to prove the identity we're after, we just call the F* + ``sum_lemma`` that has already been proven from within Pulse, and + the proof is concluded. + +The program is a bit artificial, but hopefully it illustrates how +Pulse programs can be shown to first refine a pure F* function, and +then to rely on mathematical reasoning on those pure functions to +conclude properties about the Pulse program itself. + +Recursion +......... + +Pulse also supports general recursion, i.e., recursive functions that +may not terminate. Here is a simple example---we'll see more examples +later. + +Let's start with a standard F* (doubly) recursive definition that +computes the nth Fibonacci number. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: fstar + :start-after: //fib$ + :end-before: //end fib$ + +One can also implement it in Pulse, as ``fib_rec`` while using an +out-parameter to hold that values of the last two Fibonacci numbers in +the sequence. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //fib_rec$ + :end-before: //end fib_rec$ + +Some points to note here: + + * Recursive definitions in Pulse are introduced with ``fn rec``. + + * So that we can easily memoize the last two values of ``fib``, we + expect the argument ``n`` to be a positive number, rather than + also allowing ``0``. + + * A quirk shown in the comments: We need an additional type + annotation to properly type ``(1, 1)`` as a pair of nats. + +Of course, one can also define fibonacci iteratively, with a while +loop, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.Loops.fst + :language: pulse + :start-after: //fib_loop$ + :end-before: //end fib_loop$ + + + diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_parallel_increment.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_parallel_increment.rst new file mode 100755 index 00000000000..63acc7347f4 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_parallel_increment.rst @@ -0,0 +1,481 @@ +.. _Pulse_parallel_increment: + +Parallel Increment +================== + +In this chapter, we look at an example first studied by Susan Owicki +and David Gries, in a classic paper titled `Verifying properties of +parallel programs: an axiomatic approach +`_. The problem involves +proving that a program that atomically increments an integer reference +``r`` twice in parallel correctly adds 2 to ``r``. There are many ways +to do this---Owicki & Gries' approach, adapted to a modern separation +logic, involves the use of additional ghost state and offers a modular +way to structure the proof. + +While this is a very simple program, it captures the essence of some +of the reasoning challenges posed by concurrency: two threads interact +with a shared resource, contributing to it in an undetermined order, +and one aims to reason about the overall behavior, ideally without +resorting to directly analyzing each of the possible interleavings. + +Parallel Blocks +............... + +Pulse provides a few primitives for creating new threads. The most +basic one is parallel composition, as shown below: + +.. code-block:: pulse + + parallel + requires p1 and p2 + ensures q1 and q2 + { e1 } + { e2 } + +The typing rule for this construct requires: + +.. code-block:: pulse + + val e1 : stt a p1 q1 + val e2 : stt b p2 q2 + +and the ``parallel`` block then has the type: + +.. code-block:: pulse + + stt (a & b) (p1 ** p2) (fun (x, y) -> q1 x ** q2 y) + +In other words, if the current context can be split into separate +parts ``p1`` and ``p2`` satisfying the preconditions of ``e1`` and +``e2``, then the parallel block executes ``e1`` and ``e2`` in +parallel, waits for both of them to finish, and if they both do, +returns their results as a pair, with their postconditions on each +component. + +Using ``parallel``, one can easily program the ``par`` combinator +below: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //par$ + :end-before: //end par$ + +As we saw in the :ref:`introduction to Pulse `, it's easy +to increment two separate references in parallel: + +.. literalinclude:: ../code/pulse/PulseTutorial.Intro.fst + :language: pulse + :start-after: //par_incr$ + :end-before: //end par_incr$ + +But, what if we wanted to increment the same reference in two separate +threads? That is, we wanted to program something like this: + +.. code-block:: pulse + + fn add2 (x:ref int) + requires pts_to x 'i + ensures pts_to x ('i + 2) + { + par (fun _ -> incr x) + (fun _ -> incr x) + } + +But, this program doesn't check. The problem is that we have only a +single ``pts_to x 'i``, and we can't split it to share among the +threads, since both threads require full permission to ``x`` to update +it. + +Further, for the program to correctly add ``2`` to ``x``, each +increment operations must take place atomically, e.g., if the two +fragments below were executed in parallel, then they may both read the +initial value of ``x`` first, bind it to ```v``, and then both update +it to ``v + 1``. + +.. code-block:: pulse + + let v = !x; || let v = !x; + x := v + 1; || x := v + 1; + +Worse, without any synchronization, on modern processors with weak +memory models, this program could exhibit a variety of other +behaviors. + +To enforce synchronization, we could use a lock, e.g., shown below: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //attempt$ + :end-before: //end attempt$ + +This program is type correct and free from data races. But, since the +lock holds the entire permission on ``x``, there's no way to give this +function a precise postcondition. + +.. note :: + + In this section, we use an implementation of spin locks from the + Pulse library, Pulse.Lib.SpinLock. Unlike the version we developed + in the previous chapter, these locks use a fraction-indexed + permission, ``lock_alive l #f p``. The also provide a predicate, + ``lock_acquired l``, that indicates when the lock has been + taken. With full-permission to the lock, and ``lock_acquired l``, + the lock can be freed---reclaiming the underlying + memory. Additionally, the ``lock_acquired`` predicate ensures that + locks cannot be double freed. As such, ``Pulse.Lib.SpinLock`` fixes + the problems with the spin locks we introduced in the previous + chapter and also provides a solution to the exercises given there. + + +A First Take, with Locks +........................ + +Owicki and Gries' idea was to augment the program with auxiliary +variables, or ghost state, that are purely for specification +purposes. Each thread gets its own piece of ghost state, and accounts +for how much that thread has contributed to the current value of +shared variable. Let's see how this works in Pulse. + +The main idea is captured by ``lock_inv``, the type of the predicate +protected by the lock: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: fstar + :start-after: //lock_inv$ + :end-before: //end lock_inv$ + +Our strawman ``lock`` in the ``attempt`` shown before had type ``lock +(exists* v. pts_to x v)``. This time, we add a conjunct that refines +the value ``v``, i.e., the predicate ``contributions l r init v`` says +that the current value of ``x`` protected by the lock (i.e., ``v``) is +equal to ``init + vl + vr``, where ``init`` is the initial value of +``x``; ``vl`` is the value of the ghost state owned by the "left" +thread; and ``vr`` is the value of the ghost state owned by the +"right" thread. In other words, the predicate ``contributions l r init +v`` shows that ``v`` always reflects the values of the contributions +made by each thread. + +Note, however, the ``contributions`` predicate only holds +half-permission on the left and right ghost variables. The other half +permission is held outside the lock and allows us to keep track of +each threads contribution in our specifications. + +Here's the code for the left thread, ``incr_left``: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr_left$ + :end-before: //end incr_left$ + +* Its arguments include ``x`` and the ``lock``, but also both pieces + of ghost state, ``left`` and ``right``, and an erased value ``i`` + for the initial value of ``x``. + +* Its precondition holds half permission on the ghost reference + ``left`` + +* Its postcondition returns half-permission to ``left``, but proves + that it was incremented, i.e., the contribution of the left thread + to the value of ``x`` increased by ``1``. + +Notice that even though we only had half permission to ``left``, the +specifications says we have updated ``left``---that's because we can +get the other half permission we need by acquiring the lock. + +* We acquire the lock and update increment the value stored in ``x``. + +* And then we follow the increment with several ghost steps: + + - Gain full permission on ``left`` by combining the half permission + from the precondition with the half permission gained from the + lock. + + - Increment ``left``. + + - Share it again, returning half permission to the lock when we + release it. + +* Finally, we ``GR.pts_to left #one_half (`vl + 1)`` left over to + return to the caller in the postcondition. + +The code of the right thread is symmetrical, but in this, our first +take, we have to essentially repeat the code---we'll see how to remedy +this shortly. + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr_right$ + :end-before: //end incr_right$ + +Finally, we can implement ``add2`` with the specification we want: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //add2$ + :end-before: //end add2$ + +* We allocate ``left`` and ``right`` ghost references, initializing + them to ``0``. + +* Then we split them, putting half permission to both in the lock, + retaining the other half. + +* Then spawn two threads for ``incr_left`` and ``incr_right``, and get + as a postcondition that contributions of both threads and increased + by one each. + +* Finally, we acquire the lock, get ``pts_to x v``, for some ``v``, + and ``contributions left right i v``. Once we gather up the + permission on ``left`` and ``right``, and now the ``contributions + left right i v`` tells us that ``v == i + 1 + 1``, which is what we + need to conclude. + +Modularity with higher-order ghost code +....................................... + +Our next attempt aims to write a single function ``incr``, rather than +``incr_left`` and ``incr_right``, and to give ``incr`` a more +abstract, modular specification. The style we use here is based on an +idea proposed by Bart Jacobs and Frank Piessens in a paper titled +`Expressive modular fine-grained concurrency specification +`_. + +The main idea is to observe that ``incr_left`` and ``incr_right`` only +differ by the ghost code that they execute. But, Pulse is higher +order: so, why not parameterize a single function by ``incr`` and let +the caller instantiate ``incr`` twice, with different bits of ghost +code. Also, while we're at it, why not also generalize the +specification of ``incr`` so that it works with any user-chosen +abstract predicate, rather than ``contributions`` and ``left/right`` +ghost state. Here's how: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr$ + :end-before: //end incr$ + +As before, ``incr`` requires ``x`` and the ``lock``, but, this time, +it is parameterized by: + +* A predicate ``refine``, which generalizes the ``contributions`` + predicate from before, and refines the value that ``x`` points to. + +* A predicate ``aspec``, an abstract specification chosen by the + caller, and serves as the main specification for ``incr``, which + transitions from ``aspec 'i`` to ``aspec ('i + 1)``. + +* And, finally, the ghost function itself, ``ghost_steps``, now + specified abstractly in terms of the relationship between ``refine``, + ``aspec`` and ``pts_to x``---it says, effectively, that once ``x`` + has been updated, the abstract predicates ``refine`` and ``aspec`` + can be updated too. + +Having generalized ``incr``, we've now shifted the work to the +caller. But, ``incr``, now verified once and for all, can be used with +many different callers just by instantiating it differently. For +example, if we wanted to do a three-way parallel increment, we could +reuse our ``incr`` as is. Whereas, our first take would have to be +completely revised, since ``incr_left`` and ``incr_right`` assume that +there are only two ghost references, not three. + +Here's one way to instantiate ``incr``, proving the same specification +as ``add2``. + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //add2_v2$ + :end-before: //end add2_v2$ + +The code is just a rearrangement of what we had before, factoring the +ghost code in ``incr_left`` and ``incr_right`` into a ghost function +``step``. When we spawn our threads, we pass in the ghost code to +either update the left or the right contribution. + +This code still has two issues: + +* The ghost ``step`` function is a bloated: we have essentially the + same code and proof twice, once in each branch of the + conditional. We can improve this by defining a custom bit of ghost + state using Pulse's support for partial commutative monoids---but + that's for another chapter. + +* We allocate and free memory for a lock, which is inefficient---could + we instead do things with atomic operations? We'll remedy that next. + +Exercise +++++++++ + +Instead of explicitly passing a ghost function, use a quantified trade. + + +A version with invariants +......................... + +As a final example, in this section, we'll see how to program ``add2`` +using invariants and atomic operations, rather than locks. + +Doing this properly will require working with bounded, machine +integers, e.g., ``U32.t``, since these are the only types that support +atomic operations. However, to illustrate the main ideas, we'll assume +two atomic operations on unbounded integers---this will allow us to +not worry about possible integer overflow. We leave as an exercise the +problem of adapting this to ``U32.t``. + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: fstar + :start-after: //atomic_primitives$ + :end-before: //end atomic_primitives$ + +Cancellable Invariants +++++++++++++++++++++++ + +The main idea of doing the ``add2`` proof is to use an invariant +instead of a lock. Just as in our previous code, ``add2`` starts with +allocating an invariant, putting ``exists* v. pts_to x v ** +contribution left right i v`` in the invariant. Then call incr twice +in different threads. However, finally, to recover ``pts_to x (v + +2)``, where previously we would acquire the lock, with a regular +invariant, we're stuck, since the ``pts_to x v`` permission is inside +the invariant and we can't take it out to return to the caller. + +An invariant ``inv i p`` guarantees that the property ``p`` is true +and remains true for the rest of a program's execution. But, what if +we wanted to only enforce ``p`` as an invariant for some finite +duration, and then to cancel it? This is what the library +``Pulse.Lib.CancellableInvariant`` provides. Here's the relevant part +of the API: + +.. code-block:: pulse + + [@@ erasable] + val cinv : Type0 + val iref_of (c:cinv) : GTot iref + +The main type it offers is ``cinv``, the name of a cancellable +invariant. + + +.. code-block:: pulse + + ghost + fn new_cancellable_invariant (v:boxable) + requires v + returns c:cinv + ensures inv (iref_of c) (cinv_vp c v) ** active c 1.0R + +Allocating a cancellable invariant is similar to allocating a regular +invariant, except one gets an invariant for an abstract predicate +``cinv_cp c v``, and a fraction-indexed predicate ``active c 1.0R`` +which allows the cancellable invariant to be shared and gathered +between threads. + +The ``cinv_cp c v`` predicate can be used in conjunction with +``active`` to recover the underlying predicate ``v``---but only when +the invariant has not been cancelled yet---this is what +``unpack_cinv_vp``, and its inverse, ``pack_cinv_vp``, allow one to +do. + +.. code-block:: pulse + + ghost + fn unpack_cinv_vp (#p:perm) (#v:slprop) (c:cinv) + requires cinv_vp c v ** active c p + ensures v ** unpacked c ** active c p + + ghost + fn pack_cinv_vp (#v:slprop) (c:cinv) + requires v ** unpacked c + ensures cinv_vp c v + +Finally, if one has full permission to the invariant (``active c +1.0R``) it can be cancelled and the underlying predicate ``v`` can be +obtained as postcondition. + +.. code-block:: pulse + + ghost + fn cancel (#v:slprop) (c:cinv) + requires inv (iref_of c) (cinv_vp c v) ** active c 1.0R + ensures v + opens add_inv emp_inames (iref_of c) + +An increment operation +++++++++++++++++++++++ + +Our first step is to build an increment operation from an +``atomic_read`` and a ``cas``. Here is its specification: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr_atomic_spec$ + :end-before: //end incr_atomic_spec$ + +The style of specification is similar to the generic style we used +with ``incr``, except now we use cancellable invariant instead of a +lock. + +For its implementation, the main idea is to repeatedly read the +current value of ``x``, say ``v``; and then to ``cas`` in ``v+1`` if +the current value is still ``v``. + +The ``read`` function is relatively easy: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr_atomic_body_read$ + :end-before: //end incr_atomic_body_read$ + +* We open the invariant ``l``; then, knowing that the invariant is + still active, we can unpack` it; then read the value ``v``; pack it + back; and return ``v``. + +The main loop of ``incr_atomic`` is next, shown below: + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //incr_atomic_body_loop$ + :end-before: //end incr_atomic_body_loop$ + +The loop invariant says: + + * the invariant remains active + + * the local variable ``continue`` determines if the loop iteration + continues + + * and, so long as the loop continues, we still have ``aspec 'i``, + but when the loop ends we have ``aspec ('i + 1)`` + +The body of the loop is also interesting and consists of two atomic +operations. We first ``read`` the value of ``x`` into ``v``. Then we +open the invariant again try to ``cas`` in ``v+1``. If it succeeds, we +return ``false`` from the ``with_invariants`` block; otherwise +``true``. And, finally, outside the ``with_invariants`` block, we set +the ``continue`` variable accordingly. Recall that ``with_invariants`` +allows at most a single atomic operation, so we having done a ``cas``, +we are not allowed to also set ``continue`` inside the +``with_invariants`` block. + +``add2_v3`` ++++++++++++ + +Finally, we implement our parallel increment again, ``add2_v3``, this +time using invariants, though it has the same specification as before. + +.. literalinclude:: ../code/pulse/PulseTutorial.ParallelIncrement.fst + :language: pulse + :start-after: //add2_v3$ + :end-before: //end add2_v3$ + +The code too is very similar to ``add2_v2``, except instead of +allocating a lock, we allocate a cancellable invariant. And, at the +end, instead of acquiring, and leaking, the lock, we simply cancel the +invariant and we're done. + +Exercise +........ + +Implement ``add2`` on a ``ref U32.t``. You'll need a precondition that +``'i + 2 < pow2 32`` and also to strengthen the invariant to prove +that each increment doesn't overflow. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_spin_lock.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_spin_lock.rst new file mode 100755 index 00000000000..54e24dfa259 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_spin_lock.rst @@ -0,0 +1,195 @@ +.. _Pulse_spin_lock: + +Spin Locks +========== + +With atomic operations and invariants, we can build many useful +abstractions for concurrency programming. In this chapter, we'll look +at how to build a spin lock for mutual exclusion. + +Representing a Lock +................... + +The main idea of the implementation is to represent a lock using a +mutable machine word, where the value ``0ul`` signifies that the lock +is currently released; and ``1ul`` signifies that the lock is currently +acquired. To acquire a lock, we'll try to atomically compare-and-swap, +repeating until we succeed in setting a ``1ul`` and acquiring the +lock. Releasing the lock is simpler: we'll just set it to ``0ul`` +(though we'll explore a subtlety on how to handle double releases). + +From a specification perspective, a lock is lot like an invariant: the +predicate type ``lock_alive l p`` states that the lock protects some +property ``p``. Acquiring the lock provides ``p`` to the caller; while +releasing the lock requires the caller to give up ownership of +``p``. The runtime mutual exclusion is enforced by the acquire +spinning, or looping, until the lock is available. + +We'll represent a lock as a pair of reference to a ``U32.t`` and an +invariant: + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: fstar + :start-after: //lock$ + :end-before: //end lock$ + +The predicate ``lock_inv r p`` states: + + * We hold full permission to the ``r:box U32.t``; and + + * If ``r`` contains ``0ul``, then we also have ``p``. + +The lock itself pairs the concrete mutable state ``box U32.t`` with an +invariant reference ``i:iref``, where the ``lock_alive l p`` predicate +states that ``l.i`` names an invariant for ``lock_alive l.r p``. + +Creating a lock +............... + +To create a lock, we implement ``new_lock`` below. It requires the +caller to provide ``p``, ceding ownership of ``p`` to the newly +allocated ``l:lock`` + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: pulse + :start-after: //new_lock$ + :end-before: //end new_lock$ + + +Some notes on the implementation: + +* We heap allocate a reference using ``Box.alloc``, since clearly, + the lock has to live beyond the scope of this function's activation. + +* We use ``new_invariant`` to create an ``inv i (lock_inv r p)``, and + package it up with the newly allocated reference. + + +Duplicating permission to a lock +................................ + +Locks are useful only if they can be shared between multiple +threads. The ``lock_alive l p`` expresses ownership of a lock---but, +since ``lock_alive`` is just an invariant, we can use ``dup_inv`` to +duplicate ``lock_alive``. + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: pulse + :start-after: //dup_lock_alive$ + :end-before: //end dup_lock_alive$ + + +Acquiring a lock +................ + +The signature of ``acquire`` is shown below: it says that with +``lock_alive l p``, we can get back ``p`` without proving anything, +i.e., the precondition is ``emp``. + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: pulse + :start-after: //acquire_sig$ + :end-before: //end acquire_sig$ + +This may be seem surprising at first. But, recall that we've stashed +``p`` inside the invariant stored in the lock, and ``acquire`` is +going to keep looping until such time as a CAS on the reference in the +lock succeeds, allowing us to pull out ``p`` and return it to the +caller. + +The type of a compare-and-swap is shown below, from +Pulse.Lib.Reference: + +.. code-block:: fstar + + let cond b (p q:slprop) = if b then p else q + +.. code-block:: pulse + + atomic + fn cas_box (r:Box.box U32.t) (u v:U32.t) (#i:erased U32.t) + requires Box.pts_to r i + returns b:bool + ensures cond b (Box.pts_to r v ** pure (reveal i == u)) + (Box.pts_to r i) + + +The specification of ``cas_box r u v`` says that we can try to atomically +update ``r`` from ``u`` to ``v``, and if the operation succeeds, we +learn that the initial value (``i``) of ``r`` was equal to ``u``. + +Using ``cas_box``, we can implement ``acquire`` using a tail-recursive +function: + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: pulse + :start-after: //acquire_body$ + :end-before: //end acquire_body$ + +The main part of the implementation is the ``with_invariants`` block. + +* Its return type ``b:bool`` and postcondition is ``inv l.i (lock_inv + l.r p) ** maybe b p``, signifying that after a single ``cas``, we + may have ``p`` if the ``cas`` succeeded, while maintaining the invariant. + +* We open ``l.i`` to get ``lock_inv``, and then try a ``cas_box l.r 0ul 1ul`` + +* If the ``cas_box`` succeeds, we know that the lock was initially in + the ``0ul`` state. So, from ``lock_inv`` we have ``p``, and we can + "take it out" of the lock and return it out of block as ``maybe true + p``. And, importantly, we can trivially restore the ``lock_inv``, + since we know its currently value is ``1ul``, i.e., ``maybe (1ul = + 0ul) _ == emp``. + +* If the ``cas_box`` fails, we just restore ``lock_inv`` and return + false. + +Outside the ``with_invariants`` block, if the CAS succeeded, then +we're done: we have ``p`` to return to the caller. Otherwise, we +recurse and try again. + +Exercise +........ + +Rewrite the tail-recursive ``acquire`` using a while loop. + +Releasing a lock +................ + +Releasing a lock is somewhat easier, at least for a simple version. +The signature is the dual of ``acquire``: the caller has to give up +``p`` to the lock. + +.. literalinclude:: ../code/pulse/PulseTutorial.SpinLock.fst + :language: pulse + :start-after: //release$ + :end-before: //end release$ + +In this implementation, ``release`` unconditionally sets the reference +to ``0ul`` and reproves the ``lock_inv``, since we have ``p`` in +context. + +However, if the lock was already in the released state, it may already +hold ``p``---releasing an already released lock can allow the caller +to leak resources. + +Exercise +........ + +Rewrite ``release`` to spin until the lock is acquired, before +releasing it. This is not a particularly realistic design for avoiding +a double release, but it's a useful exercise. + +Exercise +........ + +Redesign the lock API to prevent double releases. One way to do this +is when acquiring to lock to give out a permission to release it, and +for ``release`` to require and consume that permission. + +Exercise +........ + +Add a liveness predicate, with fractional permissions, to allow a lock +to be allocated, then shared among several threads, then gathered, and +eventually free'd. diff --git a/doc/book/PoP-in-FStar/book/pulse/pulse_user_defined_predicates.rst b/doc/book/PoP-in-FStar/book/pulse/pulse_user_defined_predicates.rst new file mode 100644 index 00000000000..3f17b865b14 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/pulse/pulse_user_defined_predicates.rst @@ -0,0 +1,226 @@ +.. _Pulse_DefinedVProps: + +User-defined Predicates +======================= + +In addition to the slprop predicates and connectives that the Pulse +libraries provide, users very commonly define their own ``slprops``. We +show a few simple examples here---subsequent examples will make heavy +use of user-defined predicates. For example, see this section for +:ref:`recursively defined predicates `. + + + +Fold and Unfold with Diagonal Pairs +................................... + +A simple example of a user-defined abstraction is show below. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //pts_to_diag$ + :end-before: //end pts_to_diag$ + +``pts_to_diag r v`` is a ``slprop`` defined in F* representing a +reference to a pair whose components are equal. + +We can use this abstraction in a Pulse program, though we have to be +explicit about folding and unfolding the predicate. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //double$ + :end-before: //end double$ + +The ``unfold p`` command checks that ``p`` is provable in the current +context by some term ``q``, and then rewrites the context by replacing +that occurrence of ``q`` with the term that results from unfolding the +head symbol of ``p``. A ``show_proof_state`` after the ``unfold`` +shows that we have a ``pts_to r (reveal 'v, reveal 'v)`` in the +context, exposing the abstraction of the ``pts_to_diag`` predicate. + +At the end of function, we use the ``fold p`` command: this checks +that the unfolding of ``p`` is provable in the context by some term +``q`` and then replaces ``q`` in the context with ``p``. + +``fold`` and ``unfold`` is currently very manual in Pulse. While in +the general case, including with recursively defined predicates, +automating the placement of folds and unfolds is challenging, many +common cases (such as the ones here) can be easily automated. We are +currently investigating adding support for this. + +Some initial support for this is already available, inasmuch as Pulse +can sometimes figure out the arguments to the slprops that need to be +folded/unfolded. For instance, in the code below, we just mention the +name of the predicate to be unfolded/folded, without needing to +provide all the arguments. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //double_alt$ + :end-before: //end double_alt$ + +Mutable Points +.............. + +As a second, perhaps more realistic example of a user-defined +abstraction, we look at defining a simple mutable data structure: a +structure with two mutable integer fields, representing a +2-dimensional point. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: fstar + :start-after: //point$ + :end-before: //end point$ + +A ``point`` is just an F* record containing two +references. Additionally, we define ``is_point``, a ``slprop``, +sometimes called a "representation predicate", for a +``point``. ``is_point p xy`` says that ``p`` is a representation of +the logical point ``xy``, where ``xy`` is pure, mathematical pair. + +We can define a function ``move``, which translates a point by some +offset ``dx, dy``. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //move$ + :end-before: //end move$ + +Implementing ``move`` is straightforward, but like before, we have to +``unfold`` the ``is_point`` predicate first, and then fold it back up +before returning. + +Unfortunately, Pulse cannot infer the instantiation of ``is_point`` +when folding it. A ``show_proof_state`` prior to the fold should help +us see why: + + * We have ``pts_to p.x (x + dx) ** pts_to p.y (y + dy)`` + + * For ``fold (is_point p.x ?w)`` to succeed, we rely on F*'s type + inference to find a solution for the unsolved witness ``?w`` such + that ``fst ?w == (x + dx)`` and ``snd ?w == (y + dy)``. This + requires an eta-expansion rule for pairs to solve ``?w := (x + dx, + y + dy)``, but F*'s type inference does not support such a rule + for pairs. + +So, sadly, we have to provide the full instantiation ``is_point p (x + +dx, y + dy)`` to complete the proof. + +This pattern is a common problem when working with representation +predicates that are indexed by complex values, e.g., pairs or +records. It's common enough that it is usually more convenient to +define a helper function to fold the predicate, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //fold_is_point$ + :end-before: //end fold_is_point$ + +.. note:: + + We've marked this helper function ``ghost``. We'll look into + ``ghost`` functions in much more detail in a later chapter. + +This allows type inference to work better, as shown below. + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //move_alt$ + :end-before: //end move_alt$ + +.. _Pulse_rewriting: + +Rewriting +......... + +In addition to ``fold`` and ``unfold``, one also often uses the +``rewrite`` command when working with defined predicates. Its general +form is: + +.. code-block:: + + with x1 ... xn. rewrite p as q; + rest + +Its behavior is to find a substitution ``subst`` that instantiates the +``x1 ... xn`` as ``v1 ... vn``, such that ``subst(p)`` is supported by +``c`` in the context, Pulse aims to prove that ``subst(p) == +subst(q)`` and replaces ``c`` in the context by ``subst(q)`` and +proceeds to check ``subst(rest)``. + +To illustrate this at work, consider the program below: + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //create_and_move$ + :end-before: //end create_and_move$ + +We allocate two references and put them in the structure ``p``. Now, +to call ``fold_is_point``, we need ``pts_to p.x _`` and ``pts_to p.y +_``, but the context only contains ``pts_to x _`` and ``pts_to y +_``. The ``rewrite`` command transforms the context as needed. + +At the end of the function, we need to prove that ``pts_to x _`` and +``pts_to y _`` as we exit the scope of ``y`` and ``x``, so that they +can be reclaimed. Using ``rewrite`` in the other direction +accomplishes this. + +This is quite verbose. As with ``fold`` and ``unfold``, fully +automated ``rewrite`` in the general case is hard, but many common +cases are easy and we expect to add support for that to the Pulse +checker. + +In the meantime, Pulse provides a shorthand to make some common +rewrites easier. + +The ``rewrite each`` command has the most general form: + +.. code-block:: pulse + + with x1 ... xn. rewrite each e1 as e1', ..., en as en' in goal + +This is equivalent to: + +.. code-block:: pulse + + with x1 ... xn. assert goal; + rewrite each e1 as e1', ..., en as en' in goal + +.. code-block:: pulse + + rewrite each e1 as e1', ..., en as en' in goal + +is equivalent to + +.. code-block:: pulse + + rewrite goal as goal' + +where ``goal'`` is computed by rewriting, in parallel, every +occurrence of ``ei`` as ``ei'`` in ``goal``. + +Finally, one can also write: + +.. code-block:: pulse + + rewrite each e1 as e1', ..., en as en' + +omitting the ``goal`` term. In this case, the ``goal`` is taken to be +the entire current ``slprop`` context. + +Using ``rewrite each ...`` makes the code somewhat shorter: + +.. literalinclude:: ../code/pulse/PulseTutorial.UserDefinedPredicates.fst + :language: pulse + :start-after: //create_and_move_alt$ + :end-before: //end create_and_move_alt$ + + + + + + + + + diff --git a/doc/book/PoP-in-FStar/book/sec2/Design-of-fstar-Intro.rst.notes b/doc/book/PoP-in-FStar/book/sec2/Design-of-fstar-Intro.rst.notes new file mode 100644 index 00000000000..3cdd557e31b --- /dev/null +++ b/doc/book/PoP-in-FStar/book/sec2/Design-of-fstar-Intro.rst.notes @@ -0,0 +1,790 @@ +.. _Design-of-fstar-Intro: + +Elements of F* +============== + +The short tutorial should have provided you +with a basic feel for F*, in particular for how to use F* and its SMT +solving backend for programming and proving simple functional +programs. + +In this section, we step back and provide a more comprehensive +description of F*, starting from short summary of its design goals and +main technical features. Not all of these concepts may be familiar to +you at first, but by the end of this section, you should have gained a +working knowledge of the core design of F* as well as pointers to +further resources. + + +Core Language +------------- + +The syntax of F*'s core language is summarized by the simplified +grammar of terms, representing the core abstract syntax of terms used +by the F* typechecker. Note, this is not the full concrete syntax of +F* terms, which includes many additional features, including +parentheses for enforcing precedence, implicit arguments, n-ary +functions, various forms of syntactic sugar for common constructs like +tuples and sequencing, etc. However, the essence of F*'s core language +is distilled below. Understand how all these constructs fit together +and you should be able to see how the core of F* is simultaneously a +functional programming language and a logic. The goal of this section +is to explain all these constructs in detail, down to specifics of the +various syntactic conventions used in F*. + +Core abstract syntax of F* terms:: + + + Constants c ::= p primitive constant + | D user-defined data constructors + | T user-defined inductive type constructors + + Terms e, t ::= x variables + + | c constants and constructors + + + | fun (x:t) -> t' functions + + | t t' applications + + | match t with [b1 ... bn] pattern matching with zero or more cases + + | let x = t in t' let bindings + + | let rec f1 (x:t1) : t1' = e1 ... + and ... fn (x:tn) : tn' = en mutually recursive function definitions + + | x:t -> t function types (arrows) + + | x:t { t' } refinement types + + | Type u#U Type of types + + | x u#U1 ... u#Un Variable applied to one or more universes + + Case X ::= `|` P -> t Pattern-matching branch + + Pattern P ::= x Variable + | c Constant + | D [P1...Pn] Constructor applied to zero or more patterns + + Universe U ::= x Universe variable + | 0 Universe 0 + | U + 1 Successor universe + | max U U Maximum of universes + +Basic syntactic structure +......................... + +An F* program is a collection of :ref:`modules`, with each +module represented by a single file with the filename extension +``.fst``. Later, we'll see that a module's interface is in a separate +file. + +A module begins with the module's name and contains a sequence of +top-level signatures and definitions. + +* Signatures ascribe a type to a definition, e.g., ``val f : t``. + +Definitions come in several flavors: the two main forms we'll focus on +in this section are + +* possibly recursive definitions (let bindings, ``let [rec] f = e``) +* and, inductive type definitions (datatypes, ``type t = | D1 : t1 | ... | Dn : tn``) + +In later sections, we'll see two other kinds of definition: +user-defined indexed effects and sub-effects. + +Classes of Identifiers +^^^^^^^^^^^^^^^^^^^^^^ + +TODO: + +Comments +^^^^^^^^ + +Block comments are delimited by ``(*`` and ``*)``. Line comments begin +with ``//``. :: + + (* this is a + block comment *) + + + //This is a line comment + + +Primitive constants +................... + +Every F* program is checked in the context of some ambient primitive +definitions taken from the core F* module :ref:`Prims`. + +False +^^^^^ + +The type ``False`` has no elements. It represents a logical +falsehood in F*--- + + +Unit +^^^^ + +The type ``unit`` has a single element denoted ``()``. + + +Booleans +^^^^^^^^ + +The primitive type ``bool`` has two elements, ``true`` and +``false``. ``Prims`` also provides the following primitive boolean +operators + +* ``&&``: Boolean conjunction (infix) +* ``||``: Boolean disjunction (infix) +* ``not``: Boolean negation (prefix) + +TODO: Precedence + +Integers +^^^^^^^^ + +The type ``int`` represents unbounded, primitive mathematical +integers. Its elements are formed from the literals ``0, 1, 2, ...``, +and the following primitive operators: + +* ``-``: Unary negation (prefix) +* ``-``: Subtraction (infix) +* ``+``: Addition (infix) +* ``/``: Euclidean division (infix) +* ``%``: Euclidean modulus (infix) +* ``op_Multiply``: Unfortunately, the traditional multiplication symbol + ``*`` is reserved by default for the :ref:`tuple` type + constructor. Use the module ``FStar.Mul`` to treat ``*`` as integer + multiplication---see :ref:`this note`. +* ``<`` : Less than (infix) +* ``<=``: Less than or equal (infix) +* ``>`` : Greater than (infix) +* ``>=``: Greater than or equal (infix) + +TODO: Precedence + +Functions +......... + +F* provides several forms of syntactic sugar to define functions. The +syntax is largely inherited from OCaml, and this +`OCaml tutorial `_ +provides more details for those unfamiliar with the language. + +The following are synonyms:: + + let incr = fun (x:int) -> x + 1 + let incr (x:int) = x + 1 + +You can also let F* infer the type of the parameter ``x``:: + + let incr x = x + 1 + +Functions can take several arguments and the result type of a function +can also be annotated, if desired:: + + let incr (x:int) : int = x + 1 + let more_than_twice (x:int) (y:int) : bool = x > y + y + +It's considered good practice to annotate all the parameters and +result type of a top-level definition. + +.. note:: + + The type of any term in F* can be annotated using a *type + ascription*, ``e <: t``. This form instructs F* to check that the + term ``e`` has the type ``t``. For example, we could have written:: + + let incr = fun (x:int) -> (x + 1 <: int) + + We'll cover more about type ascriptions in this later + :ref:`section`. + + +User-defined operators and infix notation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Most commonly, to call, or "apply", a function, just place the +arguments to the right of the function. For example:: + + incr 0 // calls ``incr`` with the argument 0 + more_than_twice 17 8 //calls ``more_than_twice`` with ``17`` and ``8`` + +You can also immediately apply an unnamed function, or lambda term:: + + (fun (x:int) -> x + 1) 0 + +However, functions with two arguments can be applied in infix +notation, enclosing the function's name in "backticks". For example, +one could write, which can sometimes make code more readable. + + 17 \`more_than_twice\` 8 + +Functions can also be given names using special operator symbols, +e.g., one could write:: + + let (>>) = more_than_twice + +And then call the function using:: + + 17 >> 8 + +This `wiki page +`_ +provides more details on defining functions with operator symbols. + +Boolean refinement types +........................ + +Types are a way to describe collections of terms. For instance, the +type ``int`` describes terms which compute integer results, i.e., when +an ``int``-typed term is reduced fully it produces a value in the set +``{..., -2, -1, 0, 1, 2, ...}``. Similarly, the type ``bool`` is the type +of terms that compute or evaluate to one of the values in the set +``{true,false}``. + +One (naive but useful) mental model is to think of a type as +describing a set of values. With that in mind, and unlike in other +mainstream programming languages, one can contemplate defining types +for arbitrary sets of values. We will see a great many ways to define +such types, starting with boolean refinement types. + +Some simple refinement types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +:ref:`Prims` defines the type of natural numbers as:: + + let nat = x:int{x >= 0} + +This is an instance of a boolean refinement type, whose general form +is ``x:t { e }`` where ``t`` is a type, and ``e`` is a ``bool``-typed term +that may refer to the ``t``-typed bound variable ``x``. The term ``e`` +*refines* the type ``t``, in the sense that the set ``S`` denoted by ``t`` +is restricted to those elements ``x $\in$ S`` for which ``e`` evaluates to +``true``. + +That is the type ``nat`` describes the set of terms that evaluate to an +element of the set ``{0, 1, 2, 3, ...}``. + +But, there's nothing particularly special about ``nat``. You can define +arbitrary refinements of your choosing, e.g.,:: + + let empty = x:int { false } //one type for the empty set + let zero = x:int{ x = 0 } //the type containing one element `0` + let pos = x:int { x > 0 } //the positive numbers + let neg = x:int { x < 0 } //the negative numbers + let even = x:int { x % 2 = 0 } //the even numbers + let odd = x:int { x % 2 = 1 } //the odd numbers + +.. note:: + + Refinement types in F* trace their lineage to `F7 + `_, + a language developed at Microsoft Research c. 2007 -- 2011. `Liquid + Haskell `_ is + another language with refinement types. Those languages provide + additional background and resources for learning about refinement + types. + + Refinement types, in conjunction with dependent function types, + are, in principle, sufficient to encode many kinds of logics for + program correctness. However, refinement types are just one among + several tools in F* for program specification and proof. + +Refinement subtyping +^^^^^^^^^^^^^^^^^^^^ + +We have seen so far how to define a new refinement type, like ``nat`` or +``even``. However, to make use of refinement types we need rules that +allow us to: + +1. check that a program term has a given refinement type, e.g., to + check that ``0`` has type ``nat``. This is sometimes called + *introducing* a refinement type. + +2. make use of a term that has a refinement type, e.g., given ``x : + even`` we would like to be write ``x + 1``, treating ``x`` as an ``int`` + to add ``1`` to it. This is sometimes called *eliminating* a + refinement type. + +The technical mechanism in F* that supports both these features is +called *refinement subtyping*. + +If you're used to a language like Java, C# or some other +object-oriented language, you're familiar with the idea of +subtyping. A type ``t`` is a subtype of ``s`` whenever a program term of +type ``t`` can be safely treated as an ``s``. For example, in Java, all +object types are subtypes of the type ``Object``, the base class of all +objects. + +For boolean refinement types, the subtyping rules are as follows: + +* The type ``x:t { p }`` is a subtype of ``t``. That is, given ``e : + (x:t{p})``, it is always safe to *eliminate* the refinement and + consider ``e`` to also have type ``t``. + +* For a term ``e`` of type ``t`` (i.e., ``e : t``), ``t`` is a subtype of the + boolean refinement type ``x:t { p }`` whenever ``p[e / x]`` is provably + equal to ``true``. In other words, to *introduce* ``e : t`` at the + boolean refinement type ``x:t{ p }``, it suffices to prove that the + term ``p`` with ``e`` substituted for bound variable ``x``, evaluates to + ``true``. + +The the elimination rule for refinement types (i.e., the first part +above) is simple---with our intuition of types as sets, the refinement +type ``x:t{ p }`` *refines* the set corresponding to ``t`` by the +predicate ``p``, i.e., the ``x:t{ p }`` denotes a subset of ``t``, so, of +course ``x:t{ p }`` is a subtype of ``t``. + +The other direction is a bit more subtle: ``x:t{ p }`` is only a subtype +of ``p``, for those terms ``e`` that validate ``p``. You're probably also +wondering about how to prove that ``p[e/x]`` evaluates to ``true``---this +:ref:`part of the tutorial` should provide some +answers. But, the short version is that F*, by default, uses an SMT +solver to prove such fact, though you can also use tactics and other +techniques to do so. More information can be found +:ref:`here`. + +An example +++++++++++ + +Given ``x:even``, consider typechecking ``x + 1 : odd``; it takes a few +steps: + +1. The operator ``+`` expects both its arguments to have type ``int`` and + returns an ``int``. + +2. To prove that the first argument ``x:even`` is a valid argument for + ``+``, we use refinement subtyping to eliminate the refinement and + obtain ``x:int``. The second argument ``1:int`` already has the + required type. Thus, ``x + 1 : int``. + +3. To conclude that ``x + 1 : odd``, we need to introduce a refinement + type, by proving that the refinement predicate of ``odd`` evaluates + to true, i.e., ``x + 1 % 2 = 1``. This is provable by SMT, since we + started with the knowledge that ``x`` is even. + +As such, F* applies subtyping repeatedly to introduce and eliminate +refinement types, applying it multiple times even to check a simple +term like ``x + 1 : odd``. + + +Function types or arrows +........................ + +Functions are the main abstraction facility of any functional language +and their types are, correspondigly, the main specificational +construct. + +Total dependent functions +^^^^^^^^^^^^^^^^^^^^^^^^^ + +In its most basic form, function types have the shape:: + + x:t0 -> t1 + +This is the type of a function that + +1. receives an argument ``e`` of type ``t0``, and + +2. always returns a value of type ``t1[e / x]``, i.e., the type of the + returned value depends on the argument ``e``. + +It's worth emphasizing how this differs from function types in other +languages. + +* F*'s function type are dependent---the type of the result depends on + the argument. For example, we can write a function that returns a + ``bool`` when applied to an even number and returns a ``string`` when + applied to an odd number. + +* In F*'s core language, all functions are total, i.e., a function + call always terminates after consuming a finite but unbounded amount + of resources. + +.. note:: + + That said, on any given computer, it is possible for a function + call to fail to return due to resource exhaustion, e.g., running + out of memory. Later, as we look at :ref:`effects `, we + will see that F* also supports writing non-terminating functions. + +Some examples and common notation ++++++++++++++++++++++++++++++++++ + +1. Functions are *curried*. Functions that take multiple arguments are + written as functions that take the first argument and return a + function that takes the next argument and so on. For instance, the + type of integer addition is:: + + val (+) : x:int -> y:int -> int + +2. Not all functions are dependent and the name of the argument can be + omitted when it is not needed. For example, here's a more concise + way to write the type of ``(+)``:: + + val (+) : int -> int -> int + +3. Function types can be mixed with refinement types. For instance, + here's the type of integer division---the refinement on the divisor + forbids division-by-zero errors:: + + val (/) : int -> (divisor:int { divisor <> 0 }) -> int + +4. Dependence between the arguments and the result type can be used to + state relationships among them. For instance, there are several + types for the function ``let incr = (fun (x:int) -> x + 1)``:: + + val incr : int -> int + val incr : x:int -> y:int{y > x} + val incr : x:int -> y:int{y = x + 1} + + The first type ``(int -> int)`` is its traditional type in languages + like OCaml. + + The second type ``(x:int -> y:int{y > x})`` states that the returned + value ``y`` is greater than the argument ``x``. + + The third type is the most precise: ``(x:int -> y:int{y = x + 1})`` + states that the result ``y`` is exactly the increment of the argument + ``x``. + +5. It's often convenient to add refinements on arguments in a + dependent function type. For instance:: + + val f : x:(x:int{ x >= 1 }) -> y:(y:int{ y > x }) -> z:int{ z > x + y } + + Since this style is so common, and it is inconvenient to have to + bind two names for the parameters ``x`` and ``y``, F* allows (and + encourages) you to write:: + + val f : x:int{ x >= 1 } -> y:int{ y > x } -> z:int{ z > x + y } + +6. To emphasize that functions in F*'s core are total functions (i.e., + they always return a result), we sometimes annotate the result type + with the effect label "``Tot``". This label is optional, but + especially as we learn about :ref:`effects `, emphasizing + that some functions have no effects via the ``Tot`` label is + useful. For example, one might typically write:: + + val f : x:int{ x >= 1 } -> y:int{ y > x } -> Tot (z:int{ z > x + y }) + + adding a ``Tot`` annotation on the last arrow, to indicate that the + function has no side effects. One could also write:: + + val f : x:int{ x >= 1 } -> Tot (y:int{ y > x } -> Tot (z:int{ z > x + y })) + + adding an annotation on the intermediate arrow, though this is not + customary. + +Please refer to the section on :ref:`Implicit Arguments `, +where we explain the full syntax of binders, in function abstractions +and types. + +Type: The type of types +......................... + +One characteristic of F* (and many other dependently typed languages) +is that it treats programs and their types uniformly, all within a +single syntactic class. A type system in this style is sometimes +called a *Pure Type System* or `PTS +`_. + +In F* (as in other PTSs) types have types too, functions can take +types as arguments and return types as results, etc. In particular, +the type of a type is ``Type``, e.g., ``bool : Type``, ``int : Type``, ``int +-> int : Type`` etc. In fact, even ``Type`` has a type---as we'll see in +the subsection on :ref:`universes `. + +Parametric polymorphism or generics +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Most modern typed languages provide a way to write programs with +generic types. For instance, C# and Java provide generics, C++ has +templates, and languages like OCaml and Haskell have several kinds of +polymorphic types. + +In F*, writing functions that are generic or polymorphic in types +arises naturally as a special case of dependent function types. For +example, here's a polymorphic identity function:: + + let id : a:Type -> a -> a = fun a x -> x + +There are a several things to note here: + +* The type of ``id`` is a dependent function type, with two + arguments. The first argument is ``a : Type``; the second argument is + a term of type ``a``; and the result also has the same type ``a``. + +* The definition of ``id`` is a lambda term with two arguments ``a : + Type`` (corresponding to the first argument type) and ``x : a``. The + function returns ``x``---it's an identity function on the second + argument. + +Here are some equivalent ways to write it:: + + let id = fun (a:Type) (x:a) -> x <: a + let id (a:Type) (x:a) : a = x + +To call ``id``, one can apply and check its type as shown:: + + id bool true : bool + id bool false : bool + id int (-1) : int + id nat 17 : nat + id string "hello" : string + id (int -> int) (fun x -> x) 0 : int + +.. note:: + + Exercises + + Try completing the following programs:: + + let apply : a:Type -> b:Type -> (a -> b) -> a -> b = + let compose : a:Type -> b:Type -> c:Type -> (b -> c) -> (a -> b) -> a -> c = + let twice : = fun a f x -> compose a a a f f x + +It's quite tedious to have to explicitly provide that first type +argument to ``id``. Implicit arguments and type inference will help, as +we'll see in :ref:`a later section `. + + +Type inference: Basics +...................... +.. _inference: + +Like many other languages in the tradition of +`Milner's ML `_, +type inference is a central component in F*'s design. + +You may be used to type inference in other languages, where one can +leave out type annotations (e.g., on variables, or when using +type-polymorphic (aka generic) functions) and the compiler determines +an appropriate type based on the surrounding program context. F*'s +type inference certainly includes such a feature, but is considerably +more powerful. Like in other dependently typed language, F*'s +inference engine is based on `higher-order unification +`_ +and can be used to infer arbitrary fragments of program text, not just +type annotations on variables. + +Let's consider our simple example of the definition and use of the +identity function again:: + + let id (a:Type) (x:a) : a = x + + id bool true : bool + id bool false : bool + id int (-1) : int + id nat 17 : nat + id string "hello" : string + id (int -> int) (fun x -> x) 0 : int + +Instead of explicitly providing that first type argument when applying +``id``, one could write it as follows, replacing the type arguments with +an underscore ``_``:: + + id _ true : bool + id _ false : bool + id _ (-1) : int + id _ 17 : nat + id _ "hello" : string + id _ (fun x -> x) 0 : int + +The underscore symbols is a wildcard, or a hole in program, and it's +the job of the F* typechecker to fill in the hole. + +.. note:: + + Program holes are a very powerful concept and form the basis of + Meta-F*, the metaprogramming and tactics framework embedded in + F*---we'll see more about holes in a :ref:`later + section`. + + +Implicit arguments +^^^^^^^^^^^^^^^^^^ +.. _implicits: + +Since it's tedious to write an ``_`` everywhere, F* has a notion of +*implicit arguments*. That is, when defining a function, one can add +annotations to indicate that certain arguments can be omitted at call +sites and left for the typechecker to infer automatically. + +For example, one could write:: + + let id (#a:Type) (x:a) : a = x + +decorating the first argument ``a`` with a ``#``, to indicate that it is +an implicit argument. Then at call sites, one can simply write:: + + id true + id 0 + id (fun x -> x) 0 + +And F* will figure out instantiations for the missing first argument +to ``id``. + +In some cases, it may be useful to actually provide an implicit +argument explicitly, rather than relying on the F* to pick one. For +example, one could write the following:: + + id #nat 0 + id #(x:int{x == 0}) 0 + id #(x:int{x <> 1}) 0 + +In each case, we provide the first argument of ``id`` explicitly, by +preceding it with a ``#`` sign, which instructs F* to take the user's +term rather than generating a hole and trying to fill it. + +Universes +......... + +.. _universes: + +As mentioned before, every well-typed term in F* has a type, and this +is true of the type ``Type`` itself. In some languages that are +designed only for programming rather than both programs and proofs, +the type of ``Type`` is itself ``Type``, a kind of circularity known +as `impredicativity +`_. This circularity +leads to paradoxes and can make a logic inconsistent. + +As such, F*, like many other dependently typed systems, employ a +system of *universes*. The type ``Type`` actually comes in (countably) +infinite variants, written ``Type u#0``, ``Type u#1``, ``Type u#2``, +etc. The ``u#i`` annotation following the ``Type`` is called a +*universe level*, where ``Type u#i`` has type ``Type u#(i + 1)``. One +way to think of it is the each universe level contains an entire copy +of ``F*``'s type system, with higher universes being large enough to +accommodate copies of the systems available at all lower levels. + +This may seem a bit mind-bending at first. And, indeed, the universe +system of F* can often be ignored, since F* will infer universes +levels, e.g., one can just write ``Type`` instead of picking a +specific universe level. That said, occasionally, the universe +constraints will make themselves known and preventy you from doing +certain things that can break consistency. Nevertheless, universes are +a crucial feature that allow F* programs to abstract over nearly all +elements of the language (e.g., one can write functions from types to +types, or store types within data structures) while remaining +logically consistent. + +F*'s type system is universe polymorphic, meaning that by default, a defin + + + + + + +Syntax of binders +................. + +Having informally introduced implicit arguments, we can now present a +first take at the syntax of binders in F*. + +**Binding occurrences**: A binding occurence `b` of a variable +introduces a variable in a scope and is associated with one of several +language constructs, including a lambda abstraction, a refinement +type, a let binding, etc. Each binding occurrence is in one of several +forms: + + 1. The form ``x:t``, declaring a variable ``x`` at type ``t`` + + 2. The ``#x:t``, indicating that the binding is for an implicit + argument ``x`` of type ``t``. + +In many cases the type annotation on a binder can be omitted, + +Later, we will see additional forms of binding occurrences, including +versions that associate attributes with binders and others with +various forms of type-inference hints. + +**Introducing binders**: The syntax ``fun (b1) ... (bn) -> t`` +introduces a lambda abstraction, whereas ``b1 -> .. bn -> t`` is the +shape of a function type. + + +Decidable equality and `eqtype` +............................... + + + +Let bindings +............ + + +Inductive type definitions +.......................... + +.. _tuples: + +Discriminators +^^^^^^^^^^^^^^ + +Projectors +^^^^^^^^^^ + +Equality +^^^^^^^^ + +Positivity +^^^^^^^^^^ + +Universe constraints +^^^^^^^^^^^^^^^^^^^^ + +Pattern matching +................ + + +Recursive definitions and termination +..................................... + + +Refinement Types +................ + + +Proof irrelevance, squash types and classical logic +................................................... + + +Misc +.... + + +Evaluation strategy +^^^^^^^^^^^^^^^^^^^ + +.. _ascriptions: + +Effects +------- +.. _effects: + + +Modules and Interfaces +---------------------- +.. _modules: + +.. toctree:: + :hidden: + :maxdepth: 1 + :caption: Contents: + +A Mental Model of the F* Typechecker +------------------------------------ +.. _mental-model:refinements: + + +Dangling + +.. _tutorial:refinements: diff --git a/doc/book/PoP-in-FStar/book/smt2_pygments.py b/doc/book/PoP-in-FStar/book/smt2_pygments.py new file mode 100644 index 00000000000..e11e8fb2426 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/smt2_pygments.py @@ -0,0 +1,36 @@ +from pygments.lexer import RegexLexer, words +from pygments.token import * + +# very rough lexer; not 100% precise +class CustomLexer(RegexLexer): + name = 'SMT2' + aliases = ['smt2'] + filenames = ['*.smt2'] + keywords = ( + 'assert' , + 'declare-datatypes', + 'declare-fun' , + 'declare-sort', + 'define-fun' , + 'set-option' , + 'pattern' , + 'weight' , + 'qid' , + 'check-sat' , + 'named' , + ) + tokens = { + 'root': [ + (r' ', Text), + (r'\n', Text), + (r'\r', Text), + (r';.*\n', Comment), + (words(keywords, suffix=r'\b'), Keyword), + (r'0x[0-9a-fA-F_]+', Literal.Number), + (r'[0-9_]+', Literal.Number), + (r'[a-zA-Z_]+', Text), + (r'.', Text), + ] + } + +#class CustomFormatter: diff --git a/doc/book/PoP-in-FStar/book/static/custom.css b/doc/book/PoP-in-FStar/book/static/custom.css new file mode 100644 index 00000000000..7292bc09251 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/static/custom.css @@ -0,0 +1,12 @@ +.toggle .header { + display: block; + clear: both; +} + +.toggle .header:after { + content: "Reveal ▶"; +} + +.toggle .header.open:after { + content: "Hide ▼"; +} diff --git a/doc/book/PoP-in-FStar/book/structure.rst b/doc/book/PoP-in-FStar/book/structure.rst new file mode 100644 index 00000000000..d0c5f63bc1d --- /dev/null +++ b/doc/book/PoP-in-FStar/book/structure.rst @@ -0,0 +1,201 @@ + + + +Structure of this book +====================== + +**This book is a work in progress** + +The first four parts of this book explain the main features of the +language using a variety of examples. You should read them +sequentially, following along with the associated code samples and +exercises. These first four parts are arranged in increasing order of +complexity---you can stop after any of them and have a working +knowledge of useful fragments of F*. + +The remaining parts of the book are more loosely connected and either +provide a reference guide to the compiler and libraries, or develop +case studies that the reader can choose depending on their +interest. Of course, some of those case studies come with +prerequisites, e.g., you must have read about effects before tackling +the case study on parsers and formatters. + +* Part 1: Basic Functional Programming and Proofs + + +The first part of this book provides a basic introduction to +programming with pure total functions, refinement types, and SMT-based +proofs, and how to compile and execute your first F* program. This +part of the book revises a previous online tutorial on F* and is +targeted at an audience familiar with programming, though with no +background in formal proofs. Even if you are familiar with program +proofs and dependent types, it will be useful to quickly go through +this part, since some elements are quite specific to F*. + +* Part 2: Inductive Types for Data, Proofs, and Computations + +We turn next to inductive type definitions, the main mechanism by +which a user can define new data types. F*'s indexed inductive types +allow one to capture useful properties of data structures, and +dependently types functions over these indexed types can be proven to +respect several kinds of invariants. Beyond their use for data +structures, inductive data types are used at the core of F*'s logic to +model fundamental notions like equality and termination proofs, and +can also be used to model and embed other programming paradigms within +F*. + + +.. + Part 2: Dependently Typed Functional Programming + ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + + .. _Universes: + .. _TypeConversion: + + * Working with indexed data structures + - Vectors + - Red-black trees + - Merkle trees + + * Equality, type conversion, and subtyping + + .. _Classical: + + * Proof irrelevance and classical logic: prop and squash + + * More termination proofs + - Infinitely branching trees and ordinal numbers + - Lexicographic orderings and unification + + * Calculational Proofs + + * Generic programming + - Printf + - Integer overloading + - Codes for types + + * Typeclasses + + * Universes + +* Part 3: Modularity with Interfaces and Typeclasses + + +We discuss two main abstraction techniques, useful in structuring +larger developments: interfaces and typeclasses. Interfaces are a +simple information hiding mechanism built in to F*'s module +system. Typeclasses are suitable for more advanced developments, +providing more flexible abstraction patterns coupled with +custom type inference. + +* Part 4: Computational Effects + +We introduce F*'s effect system, starting with its primitive effects +for total, ghost, and divergent computations. We also provide a brief +primer on Floyd-Hoare logic and weakest precondition calculi, +connecting them to Dijkstra monads, a core concept in the design of +F*'s effect system. + +* Part 5: Tactics and Metaprogramming + +We introduce Meta-F*, the metaprogramming system included in +F*. Meta-F* can be used to automate the construction of proofs as well +as programmatically construct fragments of F* programs. There's a lot +to cover here---the material so far presents the basics of how to get +started with using Meta-F* to target specific assertions in your +program and to have their proofs be solved using a mixture of tactics +and SMT solving. + +* Under the hood: F* & SMT + +In this part of the book, we cover how F* uses the Z3 SMT solver. We +present a brief overview of F*'s SMT encoding paying attention in +particular to F* use of fuel to throttle SMT solver's unfolding of +recursive functions and inductive type definitions. We also cover a +bit of how quantifier instantiation works, how to profile Z3's +quantifier instantiation, and some strategies for how to control +proofs that are too slow because of excessive quantifier +instantiation. + + +.. _effects: + +* Planned content + +The rest of the book is still in the works, but the planned content is +the following: + + + Part 4: User-defined Effects + + - State + + - Exceptions + + - Concurrency + + - Algebraic Effects + + + + Part 5: Tactics and Metaprogramming + + - Reflecting on syntax + + - Holes and proof states + + - Builtin tactics + + - Derived tactics + + - Interactive proofs + + - Custom decision procedures + + - Proofs by reflection + + - Synthesizing programs + + - Tactics for program extraction + + + + Part 6: F* Libraries + + + + Part 7: A User's Guide to Structuring and Maintaining F* Developments + + - The Build System + -- Dependence Analysis + -- Checked files + -- Sample project + + - Using the F* editor + + - Proofs by normalization + * Normalization steps + * Call-by-name vs. call-by-value + * Native execution and plugins + + - Proof Engineering + * Building, maintaining and debugging stable proofs + + - Extraction + * OCaml + * F# + * KaRaMeL + * Partial evaluation + + - Command line options + + - A guide to various F* error messages + + - Syntax guide + + - FAQ + + + Part 8: Steel: A Concurrent Separation Logic Embedded in F* + + + Part 9: Application to High-assurance Cryptography + + + Part 10: Application to Parsers and Formatters + + + diff --git a/doc/book/PoP-in-FStar/book/under_the_hood/under_the_hood.rst b/doc/book/PoP-in-FStar/book/under_the_hood/under_the_hood.rst new file mode 100644 index 00000000000..8cebff6d102 --- /dev/null +++ b/doc/book/PoP-in-FStar/book/under_the_hood/under_the_hood.rst @@ -0,0 +1,17 @@ +.. _Under_the_hood: + +############## +Under the hood +############## + +In this part of the book, we'll look at some of the inner workings of +F*, things that you will eventually need to know to become an expert +user of the system. We'll cover F*'s SMT encoding, its two +normalization engines, its plugin framework, and other topics. + + +.. toctree:: + :maxdepth: 1 + :caption: Contents: + + uth_smt diff --git a/doc/book/PoP-in-FStar/book/under_the_hood/uth_smt.rst b/doc/book/PoP-in-FStar/book/under_the_hood/uth_smt.rst new file mode 100644 index 00000000000..0e39a6ad17c --- /dev/null +++ b/doc/book/PoP-in-FStar/book/under_the_hood/uth_smt.rst @@ -0,0 +1,2020 @@ +.. _UTH_smt: + +Understanding how F* uses Z3 +============================ + +As we have seen throughout, F* relies heavily on the Z3 SMT +(Satifiability Modulo Theories) solver for proof automation. Often, on +standalone examples at the scale covered in earlier chapters, the +automation just works out of the box, but as one builds larger +developments, proof automation can start becoming slower or +unpredictable---at that stage, it becomes important to understand how +F*'s encoding to SMT works to better control proofs. + +At the most abstract level, one should realize that finding proofs in +the SMT logic F* uses (first-order logic with uninterpreted functions +and arithmetic) is an undecidable problem. As such, F* and the SMT +solver relies on various heuristics and partial decision procedures, +and a solver like Z3 does a remarkable job of being able to +effectively solve the very large problems that F* presents to it, +despite the theoretical undecidability. That said, the proof search +that Z3 uses is computationally expensive and can be quite sensitive +to the choice of heuristics and syntactic details of problem +instances. As such, if one doesn't chose the heuristics well, a small +change in a query presented to Z3 can cause it to take a different +search path, perhaps causing a proof to not be found at all, or to be +found after consuming a very different amount of resources. + +Some background and resources: + + * F*'s SMT encoding uses the `SMT-LIB v2 + `_ language. We refer + to the "SMT-LIB v2" language as SMT2. + + * Alejandro Aguirre wrote a `tech report + `_ + describing work in progress towards formalizing F*'s SMT encoding. + + * Michal Moskal's `Programming with Triggers + `_ describes how to pick + triggers for quantifier instantiation and how to debug and profile + the SMT solver, in the context of Vcc and the relation Hypervisor + Verification project. + + * Leonardo de Moura and Nikolaj Bjorner `describe how E-matching is + implemented in Z3 + `_ (at least + circa 2007). + +A Primer on SMT2 +---------------- + +SMT2 is a standardized input language supported by many SMT +solvers. Its syntax is based on `S-expressions +`_, inspired by languages +in the LISP family. We review some basic elements of its syntax here, +particularly the parts that are used by F*'s SMT encoding. + +* Multi-sorted logic + + The logic provided by the SMT solver is multi-sorted: the sorts + provide a simple type system for the logic, ensuring, e.g., that + terms from two different sorts can never be equal. A user can define + a new sort ``T``, as shown below: + + .. code-block:: smt2 + + (declare-sort T) + + Every sort comes with a built-in notion of equality. Given two terms + ``p`` and ``q`` of the same sort ``T``, ``(= p q)`` is a term of + sort ``Bool`` expressing their equality. + + +* Declaring uninterpreted functions + + A new function symbol ``F``, with arguments in sorts + ``sort_1 .. sort_n`` and returning a result in ``sort`` is declared + as shown below, + + .. code-block:: smt2 + + (declare-fun F (sort_1 ... sort_n) sort) + + The function symbol ``F`` is *uninterpreted*, meaning that the only + information the solver has about ``F`` is that it is a function, + i.e., when applied to equal arguments ``F`` produces equal results. + +* Theory symbols + + Z3 provides support for several *theories*, notably integer and + real arithmetic. For example, on terms ``i`` and ``j`` of ``Int`` + sort, the sort of unbounded integers, the following terms define + the expected arithmetic functions: + + .. code-block:: smt2 + + (+ i j) ; addition + (- i j) ; subtraction + (* i j) ; multiplication + (div i j) ; Euclidean division + (mod i j) ; Euclidean modulus + + +* Logical connectives + + SMT2 provides basic logical connectives as shown below, where ``p`` + and ``q`` are terms of sort ``Bool`` + + .. code-block:: smt2 + + (and p q) ; conjunction + (or p q) ; disjunction + (not p) ; negation + (implies p q) ; implication + (iff p q) ; bi-implication + + + SMT2 also provides support for quantifiers, where the terms below + represent a term ``p`` with the variables ``x1 ... xn`` universally + and existentially quantified, respectively. + + + .. code-block:: smt2 + + (forall ((x1 sort_1) ... (xn sort_n)) p) + (exists ((x1 sort_1) ... (xn sort_n)) p) + +* Attribute annotations + + A term ``p`` can be decorated with attributes names ``a_1 .. a_n`` + with values ``v_1 .. v_n`` using the following syntax---the ``!`` is + NOT to be confused with logical negation. + + .. code-block:: smt2 + + (! p + :a_1 v_1 + ... + :a_n v_n) + + A common usage is with quantifiers, as we'll see below, e.g., + + .. code-block:: smt2 + + (forall ((x Int)) + (! (implies (>= x 0) (f x)) + :qid some_identifier)) + +* An SMT2 theory and check-sat + + An SMT2 theory is a collection of sort and function symbol + declarations, and assertions of facts about them. For example, + here's a simple theory declaring a function symbol ``f`` and an + assumption that ``f x y`` is equivalent to ``(>= x y)``---note, + unlike in F*, the ``assert`` keyword in SMT2 assumes that a fact is + true, rather than checking that it is valid, i.e., ``assert`` in + SMT2 is like ``assume`` in F*. + + + .. code-block:: smt2 + + (declare-fun f (Int Int) Bool) + + (assert (forall ((x Int) (y Int)) + (iff (>= y x) (f x y)))) + + + In the context of this theory, one can ask whether some facts about + ``f`` are valid. For example, to check if ``f`` is a transitive + function, one asserts the *negation* of the transitivity + property for ``f`` and then asks Z3 to check (using the + ``(check-sat)`` directive) if the resulting theory is satisfiable. + + .. code-block:: smt2 + + (assert (not (forall ((x Int) (y Int) (z Int)) + (implies (and (f x y) (f y z)) + (f x z))))) + (check-sat) + + In this case, Z3 very quickly responds with ``unsat``, meaning that + there are no models for the theory that contain an interpretation of + ``f`` compatible with both assertions, or, equivalently, the + transitivity of ``f`` is true in all models. That is, we expect + successful queries to return ``unsat``. + + +A Brief Tour of F*'s SMT Encoding +--------------------------------- + +Consider the following simple F* code: + +.. code-block:: fstar + + let id x = x + let f (x:int) = + if x < 0 + then assert (- (id x) >= 0) + else assert (id x >= 0) + +To encode the proof obligation of this program to SMT, F* generates an +SMT2 file with the following rough shape. + +.. code-block:: smt2 + + ;; Some basic scaffoling + + (declare-sort Term) + ... + + ;; Encoding of some basic modules + + (declare-fun Prims.bool () Term) + ... + + ;; Encoding of background facts about the current module + + (declare-fun id (Term) Term) + (assert (forall ((x Term)) (= (id x) x))) + + ;; Encoding the query, i.e., negated proof obligation + + (assert (not (forall ((x Term)) + (and (implies (lt x 0) (geq (minus (M.id x)) 0)) + (implies (not (lt x 0)) (geq (M.id x) 0)))))) + + (check-sat) + + ;; Followed by some instrumentation for error reporting + ;; in case the check-sat call fails (i.e., does not return unsat) + +That was just just to give you a rough idea---the details of F*'s +actual SMT encoding are a bit different, as we'll see below. + +To inspect F*'s SMT encoding, we'll work through several small +examples and get F* to log the SMT2 theories that it generates. For +this, we'll use the file shown below as a skeleton, starting with the +``#push-options "--log_queries"`` directive, which instructs F* to +print out its encoding to ``.smt2`` file. The ``force_a_query`` +definition at the end ensures that F* actually produces at least one +query---without it, F* sends nothing to the Z3 and so prints no output +in the .smt2 file. If you run ``fstar.exe SMTEncoding.fst`` on the +command line, you will find a file ``queries-SMTEncoding.smt2`` in the +current directory. + +.. literalinclude:: ../code/SMTEncoding.fst + :language: fstar + +Even for a tiny module like this, you'll see that the .smt2 file is +very large. That's because, by default, F* always includes the modules +``prims.fst``, ``FStar.Pervasives.Native.fst``, and +``FStar.Pervasives.fsti`` as dependences of other modules. Encoding +these modules consumes about 150,000 lines of SMT2 definitions and +comments. + +The encoding of each module is delimited in the .smt2 file by comments +of the following kind: + +.. code-block:: smt2 + + ;;; Start module Prims + ... + ;;; End module Prims (1334 decls; total size 431263) + + ;;; Start module FStar.Pervasives.Native + ... + ;;; End module FStar.Pervasives.Native (2643 decls; total size 2546449) + + ;;; Start interface FStar.Pervasives + ... + ;;; End interface FStar.Pervasives (2421 decls; total size 1123058) + +where each `End` line also describes the number of declarations in +the module and its length in characters. + + +``Term`` sort +............. + +Despite SMT2 being a multi-sorted logic, aside from the pervasive use +the SMT sort ``Bool``, F*'s encoding to SMT (mostly) uses just a +single sort: every pure (or ghost) F* term is encoded to the SMT +solver as an instance of an uninterpreted SMT sort called +``Term``. This allows the encoding to be very general, handling F*'s +much richer type system, rather than trying to map F*'s complex type +system into the much simpler type system of SMT sorts. + + +Booleans +........ + +One of the most primitive sorts in the SMT solver is ``Bool``, the +sort of Booleans. All the logical connectives in SMT are operations on +the ``Bool`` sort. To encode values of the F* type ``bool`` to SMT, we +use the ``Bool`` sort, but since all F* terms are encoded to the +``Term`` sort, we "box" the ``Bool`` sort to promote it to ``Term``, +using the SMT2 definitions below. + +.. code-block:: smt2 + + (declare-fun BoxBool (Bool) Term) + (declare-fun BoxBool_proj_0 (Term) Bool) + (assert (! (forall ((@u0 Bool)) + (! (= (BoxBool_proj_0 (BoxBool @u0)) + @u0) + :pattern ((BoxBool @u0)) + :qid projection_inverse_BoxBool_proj_0)) + :named projection_inverse_BoxBool_proj_0)) + +This declares two uninterpreted functions ``BoxBool`` and +``BoxBool_proj_0`` that go back and forth between the sorts ``Bool`` +and ``Term``. + +The axiom named ``projection_inverse_BoxBool_proj_0`` states that +``BoxBool_proj_0`` is the inverse of ``BoxBool``, or, equivalently, +that ``BoxBool`` is an injective function from ``Bool`` to +``Term``. + + +The ``qid`` is the quantifier identifier, usually equal to or derived +from the name of the assumption that includes it---qids come up when +we look at :ref:`profiling quantifier instantiation `. + +Patterns for quantifier instantiation +..................................... + +The ``projection_inverse_BoxBool_proj_0`` axiom on booleans shows our +first use of a quantified formula with a pattern, i.e., the part that +says ``:pattern ((BoxBool @u0))``. These patterns are the main +heuristic used to control the SMT solver's proof search and will +feature repeatedly in the remainder of this chapter. + +When exploring a theory, the SMT solver has a current partial model +which contains an assignment for some of the variables in a theory to +ground terms. All the terms that appear in this partial model are +called `active` terms and these active terms play a role in quantifier +instantiation. + +Each universally quantified formula in scope is a term of the form below: + +.. code-block:: smt2 + + (forall ((x1 s1) ... (xn sn)) + (! ( body ) + :pattern ((p1) ... (pm)))) + +This quantified formula is inert and only plays a role in the solver's +search once the bound variables ``x1 ... xn`` are instantiated. The +terms ``p1 ... pm`` are called patterns, and collectively, ``p1 +... pm`` must mention *all* the bound variables. To instantiate the +quantifier, the solver aims to find active terms ``v1...vm`` that +match the patterns ``p1..pm``, where a match involves finding a +substitution ``S`` for the bound variables ``x1...xm``, such that the +substituted patterns ``S(p1...pm)`` are equal to the active terms +``v1..vm``. Given such a substitution, the substituted term +``S(body)`` becomes active and may refine the partial model further. + +Existentially quantified formulas are dual to universally quantified +formulas. Whereas a universal formula in the *context* (i.e., in +negative position, or as a hypothesis) is inert until its pattern is +instantiated, an existential *goal* (or, in positive position) is +inert until its pattern is instantiated. Existential quantifiers can +be decorated with patterns that trigger instantiation when matched +with active terms, just like universal quantifiers + +Returning to ``projection_inverse_BoxBool_proj_0``, what this means is +that once the solver has an active term ``BoxBool b``, it can +instantiate the quantified formula to obtain ``(= (BoxBool_proj_0 +(BoxBool b)) b)``. + +Integers +........ + +The encoding of the F* type ``int`` is similar to that of +``bool``---the primitive SMT sort ``Int`` (of unbounded mathematical +integers) are coerced to ``Term`` using the injective function +``BoxInt``. + +.. code-block:: smt2 + + (declare-fun BoxInt (Int) Term) + (declare-fun BoxInt_proj_0 (Term) Int) + (assert (! (forall ((@u0 Int)) + (! (= (BoxInt_proj_0 (BoxInt @u0)) + @u0) + :pattern ((BoxInt @u0)) + :qid projection_inverse_BoxInt_proj_0)) + :named projection_inverse_BoxInt_proj_0)) + +The primitive operations on integers are encoded by unboxing the +arguments and boxing the result. For example, here's the encoding of +``Prims.(+)``, the addition operator on integers. + +.. code-block:: smt2 + + (declare-fun Prims.op_Addition (Term Term) Term) + (assert (! (forall ((@x0 Term) (@x1 Term)) + (! (= (Prims.op_Addition @x0 + @x1) + (BoxInt (+ (BoxInt_proj_0 @x0) + (BoxInt_proj_0 @x1)))) + :pattern ((Prims.op_Addition @x0 + @x1)) + :qid primitive_Prims.op_Addition)) + :named primitive_Prims.op_Addition)) + +This declares an uninterpreted function ``Prims.op_Addition``, a +binary function on ``Term``, and an assumption relating it to the SMT +primitive operator from the integer arithmetic theory ``(+)``. The +pattern allows the SMT solver to instantiate this quantifier for every +active application of the ``Prims.op_Addition``. + +The additional boxing introduces some overhead, e.g., proving ``x + y +== y + x`` in F* amounts to proving ``Prims.op_Addition x y == +Prims.op_Addition y x`` in SMT2. This in turn involves instantiation +quantifiers, then reasoning in the theory of linear arithmetic, and +finally using the injectivity of the ``BoxInt`` function to +conclude. However, this overhead is usually not perceptible, and the +uniformity of encoding everything to a single ``Term`` sort simplifies +many other things. Nevertheless, F* provides a few options to control +the way integers and boxed and unboxed, described :ref:`ahead +`. + + +Functions +......... + +Consider the F* function below: + +.. code-block:: fstar + + let add3 (x y z:int) : int = x + y + z + + +Its encoding to SMT has several elements. + +First, we have have a declaration of an uninterpreted ternary function +on ``Term``. + +.. code-block:: smt2 + + (declare-fun SMTEncoding.add3 (Term Term Term) Term) + +The semantics of ``add3`` is given using the assumption below, which +because of the pattern on the quantifier, can be interpreted as a +rewrite rule from left to right: every time the solver has +``SMTEncoding.add3 x y z`` as an active term, it can expand it to its +definition. + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term) (@x1 Term) (@x2 Term)) + (! (= (SMTEncoding.add3 @x0 + @x1 + @x2) + (Prims.op_Addition (Prims.op_Addition @x0 + @x1) + @x2)) + :pattern ((SMTEncoding.add3 @x0 + @x1 + @x2)) + :qid equation_SMTEncoding.add3)) + + :named equation_SMTEncoding.add3)) + +In addition to its definition, F* encodes *the type of* ``add3`` to the +solver too, as seen by the assumption below. One of the key predicates +of F*'s SMT encoding is ``HasType``, which relates a term to its +type. The assumption ``typing_SMTEncoding.add3`` encodes the typing of +the application based on the typing hypotheses on the arguments. + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term) (@x1 Term) (@x2 Term)) + (! (implies (and (HasType @x0 + Prims.int) + (HasType @x1 + Prims.int) + (HasType @x2 + Prims.int)) + (HasType (SMTEncoding.add3 @x0 @x1 @x2) + Prims.int)) + :pattern ((SMTEncoding.add3 @x0 @x1 @x2)) + :qid typing_SMTEncoding.add3)) + :named typing_SMTEncoding.add3)) + +This is all we'd need to encode ``add3`` if it was never used at +higher order. However, F* treats functions values just like any other +value and allows them to be passed as arguments to, or returned as +results from, other functions. The SMT logic is, however, a +first-order logic and functions like ``add3`` are not first-class +values. So, F* introduces another layer in the encoding to model +higher-order functions, but we don't cover this here. + + +.. _UTH_SMT_fuel: + +Recursive functions and fuel +............................ + +Non-recursive functions are similar to macro definitions---F* just +encodes encodes their semantics to the SMT solver as a rewrite +rule. However, recursive functions, since they could be unfolded +indefinitely, are not so simple. Let's look at the encoding of the +``factorial`` function shown below. + +.. code-block:: fstar + + open FStar.Mul + let rec factorial (n:nat) : nat = + if n = 0 then 1 + else n * factorial (n - 1) + + + +First, we have, as before, an uninterpreted function symbol on ``Term`` +and an assumption about its typing. + +.. code-block:: smt2 + + (declare-fun SMTEncoding.factorial (Term) Term) + + (assert (! (forall ((@x0 Term)) + (! (implies (HasType @x0 Prims.nat) + (HasType (SMTEncoding.factorial @x0) Prims.nat)) + :pattern ((SMTEncoding.factorial @x0)) + :qid typing_SMTEncoding.factorial)) + :named typing_SMTEncoding.factorial)) + + +However, to define the semantics of ``factorial`` we introduce a +second "fuel-instrumented" function symbol with an additional +parameter of ``Fuel`` sort. + +.. code-block:: smt2 + + (declare-fun SMTEncoding.factorial.fuel_instrumented (Fuel Term) Term) + +The ``Fuel`` sort is declared at the very beginning of F*'s SMT +encoding and is a representation of unary integers, with two +constructors ``ZFuel`` (for zero) and ``SFuel f`` (for successor). + +The main idea is to encode the definition of ``factorial`` guarded by +patterns that only allow unfolding the definition if the fuel argument +of ``factorial.fuel_instrumented`` is not zero, as shown below. +Further, the assumption defining the semantics of +``factorial.fuel_instrumented`` is guarded by a typing hypothesis on +the argument ``(HasType @x1 Prims.nat)``, since the recursive function +in F* is only well-founded on ``nat``, not on all terms. The +``:weight`` annotation is an SMT2 detail: setting it to zero ensures +that the SMT solver can instantiate this quantifier as often as +needed, so long as the the fuel instrumentation argument is non-zero. +Notice that the equation peels off one application of ``SFuel``, so +that the quantifier cannot be repeatedly instantiated infinitely. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (implies (HasType @x1 Prims.nat) + (= (SMTEncoding.factorial.fuel_instrumented (SFuel @u0) @x1) + (let ((@lb2 (Prims.op_Equality Prims.int @x1 (BoxInt 0)))) + (ite (= @lb2 (BoxBool true)) + (BoxInt 1) + (Prims.op_Multiply + @x1 + (SMTEncoding.factorial.fuel_instrumented + @u0 + (Prims.op_Subtraction @x1 (BoxInt 1)))))))) + :weight 0 + :pattern ((SMTEncoding.factorial.fuel_instrumented (SFuel @u0) @x1)) + :qid equation_with_fuel_SMTEncoding.factorial.fuel_instrumented)) + :named equation_with_fuel_SMTEncoding.factorial.fuel_instrumented)) + +We also need an assumption that tells the SMT solver that the fuel +argument, aside from controlling the number of unfoldings, is +semantically irrelevant. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (= (SMTEncoding.factorial.fuel_instrumented (SFuel @u0) @x1) + (SMTEncoding.factorial.fuel_instrumented ZFuel @x1)) + :pattern ((SMTEncoding.factorial.fuel_instrumented (SFuel @u0) @x1)) + :qid @fuel_irrelevance_SMTEncoding.factorial.fuel_instrumented)) + :named @fuel_irrelevance_SMTEncoding.factorial.fuel_instrumented)) + +And, finally, we relate the original function to its fuel-instrumented +counterpart. + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term)) + (! (= (SMTEncoding.factorial @x0) + (SMTEncoding.factorial.fuel_instrumented MaxFuel @x0)) + :pattern ((SMTEncoding.factorial @x0)) + :qid @fuel_correspondence_SMTEncoding.factorial.fuel_instrumented)) + :named @fuel_correspondence_SMTEncoding.factorial.fuel_instrumented)) + +This definition uses the constant ``MaxFuel``. The value of this +constant is determined by the F* options ``--initial_fuel n`` and +``--max_fuel m``. When F* issues a query to Z3, it tries the query +repeatedly with different values of ``MaxFuel`` ranging between ``n`` +and ``m``. Additionally, the option ``--fuel n`` sets both the initial +fuel and max fuel to ``n``. + +This single value of ``MaxFuel`` controls the number of unfoldings of +`all` recursive functions in scope. Of course, the patterns are +arranged so that if you have a query involving, say, ``List.map``, +quantified assumptions about an unrelated recursive function like +``factorial`` should never trigger. Neverthless, large values of +``MaxFuel`` greatly increase the search space for the SMT solver. If +your proof requires a setting greater than ``--fuel 2``, and if it +takes the SMT solver a long time to find the proof, then you should +think about whether things could be done differently. + +However, with a low value of ``fuel``, the SMT solver cannot reason +about recursive functions beyond that bound. For instance, the +following fails, since the solver can unroll the definition only once +to conclude that ``factorial 1 == 1 * factorial 0``, but being unable +to unfold ``factorial 0`` further, the proof fails. + +.. code-block:: fstar + + #push-options "--fuel 1" + let _ = assert (factorial 1 == 1) (* fails *) + +As with regular functions, the rest of the encoding of recursive +functions has to do with handling higher-order uses. + +Inductive datatypes and ifuel +............................. + +Inductive datatypes in F* allow defining unbounded structures and, +just like with recursive functions, F* encodes them to SMT by +instrumenting them with fuel, to prevent infinite unfoldings. Let's +look at a very simple example, an F* type of unary natural numbers. + +.. code-block:: fstar + + type unat = + | Z : unat + | S : (prec:unat) -> unat + +Although Z3 offers support for a built-in theory of datatypes, F* does +not use it (aside for ``Fuel``), since F* datatypes are more +complex. Instead, F* rolls its own datatype encoding using +uninterpreted functions and the encoding of ``unat`` begins by +declaring these functions. + +.. code-block:: smt2 + + (declare-fun SMTEncoding.unat () Term) + (declare-fun SMTEncoding.Z () Term) + (declare-fun SMTEncoding.S (Term) Term) + (declare-fun SMTEncoding.S_prec (Term) Term) + +We have one function for the type ``unat``; one for each constructor +(``Z`` and ``S``); and one "projector" for each argument of each +constructor (here, only ``S_prec``, corresponding to the F* projector +``S?.prec``). + +The type ``unat`` has its typing assumption, where ``Tm_type`` is the +SMT encoding of the F* type ``Type``---note F* does not encode the +universe levels to SMT. + +.. code-block:: smt2 + + (assert (! (HasType SMTEncoding.unat Tm_type) + :named kinding_SMTEncoding.unat@tok)) + +The constructor ``S_prec`` is assumed to be an inverse of ``S``. If +there were more than one argument to the constructor, each projector +would project out only the corresponding argument, encoding that the +constructor is injective on each of its arguments. + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term)) + (! (= (SMTEncoding.S_prec (SMTEncoding.S @x0)) @x0) + :pattern ((SMTEncoding.S @x0)) + :qid projection_inverse_SMTEncoding.S_prec)) + :named projection_inverse_SMTEncoding.S_prec)) + +The encoding defines two macros ``is-SMTEncoding.Z`` and +``is-SMTEncoding.S`` that define when the head-constructor of a term +is ``Z`` and ``S`` respectively. These two macros are used in the +definition of the inversion assumption of datatypes, namely that given +a term of type ``unat``, one can conclude that its head constructor +must be either ``Z`` or ``S``. However, since the type ``unat`` is +unbounded, we want to avoid applying this inversion indefinitely, so +it uses a quantifier with a pattern that requires non-zero fuel to +be triggered. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (implies (HasTypeFuel (SFuel @u0) @x1 SMTEncoding.unat) + (or (is-SMTEncoding.Z @x1) + (is-SMTEncoding.S @x1))) + :pattern ((HasTypeFuel (SFuel @u0) @x1 SMTEncoding.unat)) + :qid fuel_guarded_inversion_SMTEncoding.unat)) + :named fuel_guarded_inversion_SMTEncoding.unat)) + +Here, we see a use of ``HasTypeFuel``, a fuel-instrumented version of +the ``HasType`` we've seen earlier. In fact, ``(HasType x t)`` is just +a macro for ``(HasTypeFuel MaxIFuel x t)``, where much like for +recursive functions and fuel, the constant ``MaxIFuel`` is defined by +the current value of the F* options ``--initial_ifuel``, +``--max_ifuel``, and ``--ifuel`` (where ``ifuel`` stands for "inversion fuel"). + +The key bit in ensuring that the inversion assumption above is not +indefinitely applied is in the structure of the typing assumptions for +the data constructors. These typing assumptions come in two forms, +introduction and elimination. + +The introduction form for the ``S`` constructor is shown below. This +allows deriving that ``S x`` has type ``unat`` from the fact that +``x`` itself has type ``unat``. The pattern on the quantifier makes +this goal-directed: if ``(HasTypeFuel @u0 (SMTEncoding.S @x1) +SMTEncoding.unat)`` is already an active term, then the quantifer +fires to make ``(HasTypeFuel @u0 @x1 SMTEncoding.unat)`` an active +term, peeling off one application of the ``S`` constructor. If we +were to use ``(HasTypeFuel @u0 @x1 SMTEncoding.unat)`` as the pattern, +this would lead to an infinite quantifier instantiation loop, since +every each instantiation would lead a new, larger active term that +could instantiate the quantifier again. Note, using the introduction +form does not vary the fuel parameter, since the the number of +applications of the constructor ``S`` decreases at each instantiation +anyway. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (implies (HasTypeFuel @u0 @x1 SMTEncoding.unat) + (HasTypeFuel @u0 (SMTEncoding.S @x1) SMTEncoding.unat)) + :pattern ((HasTypeFuel @u0 (SMTEncoding.S @x1) SMTEncoding.unat)) + :qid data_typing_intro_SMTEncoding.S@tok)) + :named data_typing_intro_SMTEncoding.S@tok)) + +The elimination form allows concluding that the sub-terms of a +well-typed application of a constructor are well-typed too. This time +note that the conclusion of the rule decreases the fuel parameter by +one. If that were not the case, then we would get a quantifier +matching loop between ``data_elim_SMTEncoding.S`` and +``fuel_guarded_inversion_SMTEncoding.unat``, since each application of +the latter would contribute an active term of the form ``(HasTypeFuel +(SFuel _) (S (S_prec x)) unat)``, allowing the former to be triggered +again. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (implies (HasTypeFuel (SFuel @u0) (SMTEncoding.S @x1) SMTEncoding.unat) + (HasTypeFuel @u0 @x1 SMTEncoding.unat)) + :pattern ((HasTypeFuel (SFuel @u0) (SMTEncoding.S @x1) SMTEncoding.unat)) + :qid data_elim_SMTEncoding.S)) + :named data_elim_SMTEncoding.S)) + +A final important element in the encoding of datatypes has to do with +the well-founded ordering used in termination proofs. The following +states that if ``S x1`` is well-typed (with non-zero fuel) then ``x1`` +precedes ``S x1`` in F*'s built-in sub-term ordering. + +.. code-block:: smt2 + + (assert (! (forall ((@u0 Fuel) (@x1 Term)) + (! (implies (HasTypeFuel (SFuel @u0) + (SMTEncoding.S @x1) + SMTEncoding.unat) + (Valid (Prims.precedes Prims.lex_t Prims.lex_t + @x1 (SMTEncoding.S @x1)))) + :pattern ((HasTypeFuel (SFuel @u0) (SMTEncoding.S @x1) SMTEncoding.unat)) + :qid subterm_ordering_SMTEncoding.S)) + :named subterm_ordering_SMTEncoding.S)) + +Once again, a lot of the rest of the datatype encoding has to do with +handling higher order uses of the constructors. + +As with recursive functions, the single value of ``MaxIFuel`` controls +the number of inversions of all datatypes in scope. It's a good idea +to try to use an ``ifuel`` setting that is as low as possible for +your proofs, e.g., a value less than ``2``, or even ``0``, if +possible. However, as with fuel, a value of ifuel that is too low will +cause the solver to be unable to prove some facts. For example, +without any ``ifuel``, the solver cannot use the inversion assumption +to prove that the head of ``x`` must be either ``S`` or ``Z``, and F* +reports the error "Patterns are incomplete". + +.. code-block:: fstar + + #push-options "--ifuel 0" + let rec as_nat (x:unat) : nat = + match x with (* fails exhaustiveness check *) + | S x -> 1 + as_nat x (* fails termination check *) + | Z -> 0 + +Sometimes it is useful to let the solver arbitrarily invert an +inductive type. The ``FStar.Pervasives.allow_inversion`` is a library +function that enables this, as shown below. Within that scope, the +ifuel guards on the ``unat`` type are no longer imposed and SMT can +invert ``unat`` freely---F* accepts the code below. + +.. code-block:: fstar + + #push-options "--ifuel 0" + let rec as_nat (x:unat) : nat = + allow_inversion unat; + match x with + | S x -> 1 + as_nat x + | Z -> 0 + +This can be useful sometimes, e.g., one could set the ifuel to 0 and +allow inversion within a scope for only a few selected types, e.g., +``option``. However, it is rarely a good idea to use +``allow_inversion`` on an unbounded type (e.g., ``list`` or even +``unat``). + + +Logical Connectives +.................... + +The :ref:`logical connectives ` that F* offers are all +derived forms. Given the encodings of datatypes and functions (and +arrow types, which we haven't shown), the encodings of all these +connectives just fall out naturally. However, all these connectives +also have built-in support in the SMT solver as part of its +propositional core and support for E-matching-based quantifier +instantiation. So, rather than leave them as derived forms, a vital +optimization in F*'s SMT encoding is to recognize these connectives +and to encode them directly to the corresponding forms in SMT. + +The term ``p /\ q`` in F* is encoded to ``(and [[p]] [[q]]])`` where +``[[p]]`` and ``[[q]]`` are the *logical* encodings of ``p`` and ``q`` +respectively. However, the SMT connective ``and`` is a binary function +on the SMT sort ``Bool``, whereas all we have been describing so far +is that every F* term ``p`` is encoded to the SMT sort ``Term``. To +bridge the gap, the logical encoding of a term ``p`` interprets the +``Term`` sort into ``Bool`` by using a function ``Valid p``, which +deems a ``p : Term`` to be valid if it is inhabited, as per the +definitions below. + +.. code-block:: smt2 + + (declare-fun Valid (Term) Bool) + (assert (forall ((e Term) (t Term)) + (! (implies (HasType e t) (Valid t)) + :pattern ((HasType e t) (Valid t)) + :qid __prelude_valid_intro))) + +The connectives ``p \/ q``, ``p ==> q``, ``p <==> q``, and ``~p`` are +similar. + +The quantified forms ``forall`` and ``exists`` are mapped to the +corresponding quantifiers in SMT. For example, + +.. code-block:: fstar + + let fact_positive = forall (x:nat). factorial x >= 1 + +is encoded to: + +.. code-block:: smt2 + + (forall ((@x1 Term)) + (implies (HasType @x1 Prims.nat) + (>= (BoxInt_proj_0 (SMTEncoding.factorial @x1)) + (BoxInt_proj_0 (BoxInt 1))))) + +Note, this quantifier does not have any explicitly annotated +patterns. In this case, Z3's syntactic trigger selection heuristics +pick a pattern: it is usually the smallest collection of sub-terms of +the body of the quantifier that collectively mention all the bound +variables. In this case, the choices for the pattern are +``(SMTEncoding.factorial @x1)`` and ``(HasType @x1 Prims.nat)``: Z3 +picks both of these as patterns, allowing the quantifier to be +triggered if an active term matches either one of them. + +For small developments, leaving the choice of pattern to Z3 is often +fine, but as your project scales up, you probably want to be more +careful about your choice of patterns. F* lets you write the pattern +explicitly on a quantifier and translates it down to SMT, as shown +below. + +.. code-block:: fstar + + let fact_positive_2 = forall (x:nat).{:pattern (factorial x)} factorial x >= 1 + +This produces: + +.. code-block:: smt2 + + (forall ((@x1 Term)) + (! (implies (HasType @x1 Prims.nat) + (>= (BoxInt_proj_0 (SMTEncoding.factorial @x1)) + (BoxInt_proj_0 (BoxInt 1)))) + :pattern ((SMTEncoding.factorial.fuel_instrumented ZFuel @x1)))) + + +Note, since ``factorial`` is fuel instrumented, the pattern is +translated to an application that requires no fuel, so that the +property also applies to any partial unrolling of factorial also. + +Existential formulas are similar. For example, one can write: + +.. code-block:: fstar + + let fact_unbounded = forall (n:nat). exists (x:nat). factorial x >= n + +And it gets translated to: + +.. code-block:: smt2 + + (forall ((@x1 Term)) + (implies (HasType @x1 Prims.nat) + (exists ((@x2 Term)) + (and (HasType @x2 Prims.nat) + (>= (BoxInt_proj_0 (SMTEncoding.factorial @x2)) + (BoxInt_proj_0 @x1)))))) + +.. _z3_and_smtencoding_options: + +Options for Z3 and the SMT Encoding +................................... + +F* provides two ways of passing options to Z3. + +The option ``--z3cliopt `` causes F* to pass the string as a +command-line option when starting the Z3 process. A typical usage +might be ``--z3cliopt 'smt.random_seed=17'``. + +In contrast, ``--z3smtopt `` causes F* to send the string to +Z3 as part of its SMT2 output and this option is also reflected in the +.smt2 file that F* emits with ``--log_queries``. As such, it can be +more convenient to use this option if you want to debug or profile a +run of Z3 on an .smt2 file generated by F*. A typical usage would be +``--z3smtopt '(set-option :smt.random_seed 17)'``. Note, it is +possible to abuse this option, e.g., one could use ``--z3smtopt +'(assert false)'`` and all SMT queries would trivially pass. So, use +it with care. + +F*'s SMT encoding also offers a few options. + +* ``--smtencoding.l_arith_repr native`` + +This option requests F* to inline the definitions of the linear +arithmetic operators (``+`` and ``-``). For example, with this option +enabled, F* encodes the term ``x + 1 + 2`` as the SMT2 term below. + +.. code-block:: smt2 + + (BoxInt (+ (BoxInt_proj_0 (BoxInt (+ (BoxInt_proj_0 @x0) + (BoxInt_proj_0 (BoxInt 1))))) + (BoxInt_proj_0 (BoxInt 2)))) + +* ``--smtencoding.elim_box true`` + +This option is often useful in combination with +``smtencoding.l_arith_repr native``, enables an optimization to remove +redundant adjacent box/unbox pairs. So, adding this option to the +example above, the encoding of ``x + 1 + 2`` becomes: + +.. code-block:: smt2 + + (BoxInt (+ (+ (BoxInt_proj_0 @x0) 1) 2)) + + +* ``--smtencoding.nl_arith_repr [native|wrapped|boxwrap]`` + +This option controls the representation of non-linear arithmetic +functions (``*, /, mod``) in the SMT encoding. The default is +``boxwrap`` which uses the encoding of ``Prims.op_Multiply, +Prims.op_Division, Prims.op_Modulus`` analogous to +``Prims.op_Addition``. + +The ``native`` setting is similar to the ``smtencoding.l_arith_repr +native``. When used in conjuction with ``smtencoding.elim_box true``, +the F* term ``x * 1 * 2`` is encoded to: + +.. code-block:: smt2 + + (BoxInt (* (* (BoxInt_proj_0 @x0) 1) 2)) + +However, a third setting ``wrapped`` is also available with provides +an intermediate level of wrapping. With this setting enabled, the +encoding of ``x * 1 * 2`` becomes + +.. code-block:: smt2 + + (BoxInt (_mul (_mul (BoxInt_proj_0 @x0) 1) 2)) + +where ``_mul`` is declared as shown below: + +.. code-block:: smt2 + + (declare-fun _mul (Int Int) Int) + (assert (forall ((x Int) (y Int)) (! (= (_mul x y) (* x y)) :pattern ((_mul x y))))) + +Now, you may wonder why all these settings are useful. Surely, one +would think, ``--smtencoding.l_arith_repr native +--smtencoding.nl_arith_repr native --smtencoding.elim_box true`` is +the best setting. However, it turns out that the additional layers of +wrapping and boxing actually enable some proofs to go through, and, +empirically, no setting strictly dominates all the others. + +However, the following is a good rule of thumb if you are starting a +new project: + +1. Consider using ``--z3smtopt '(set-option :smt.arith.nl + false)'``. This entirely disables support for non-linear arithmetic + theory reasoning in the SMT solver, since this can be very + expensive and unpredictable. Instead, if you need to reason about + non-linear arithmetic, consider using the lemmas from + ``FStar.Math.Lemmas`` to do the non-linear steps in your proof + manually. This will be more painstaking, but will lead to more + stable proofs. + +2. For linear arithmetic, the setting ``--smtencoding.l_arith_repr + native --smtencoding.elim_box true`` is a good one to consider, and + may yield some proof performance boosts over the default setting. + +.. _UTH_smt_patterns: + +Designing a Library with SMT Patterns +------------------------------------- + +In this section, we look at the design of ``FStar.Set``, a module in +the standard library, examining, in particular, its use of SMT +patterns on lemmas for proof automation. The style used here is +representative of the style used in many proof-oriented +libraries---the interface of the module offers an abstract type, with +some constructors and some destructors, and lemmas that relate their +behavior. + +To start with, for our interface, we set the fuel and ifuel both to +zero---we will not need to reason about recursive functions or invert +inductive types here. + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: module$ + :end-before: //SNIPPET_END: module$ + +Next, we introduce the signature of the main abstract type of this +module, ``set``: + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: set$ + :end-before: //SNIPPET_END: set$ + +Sets offer just a single operation called ``mem`` that allows testing +whether or not a given element is in the set. + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: destructor$ + :end-before: //SNIPPET_END: destructor$ + +However, there are several ways to construct sets: + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: constructors$ + :end-before: //SNIPPET_END: constructors$ + +Finally, sets are equipped with a custom equivalence relation: + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: equal$ + :end-before: //SNIPPET_END: equal$ + +The rest of our module offers lemmas that describe the behavior of +``mem`` when applied to each of the constructors. + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: core_properties$ + :end-before: //SNIPPET_END: core_properties$ + +Each of these lemmas should be intuitive and familiar. The extra bit +to pay attention to is the ``SMTPat`` annotations on each of the +lemmas. These annotations instruct F*'s SMT encoding to treat the +lemma like a universal quantifier guarded by the user-provided +pattern. For instance, the lemma ``mem_empty`` is encoded to the SMT +solver as shown below. + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term) (@x1 Term)) + (! (implies (and (HasType @x0 Prims.eqtype) + (HasType @x1 @x0)) + (not (BoxBool_proj_0 + (SimplifiedFStarSet.mem @x0 + @x1 + (SimplifiedFStarSet.empty @x0))))) + :pattern ((SimplifiedFStarSet.mem @x0 + @x1 + (SimplifiedFStarSet.empty @x0))) + :qid lemma_SimplifiedFStarSet.mem_empty)) + :named lemma_SimplifiedFStarSet.mem_empty)) + +That is, from the perspective of the SMT encoding, the statement of +the lemma ``mem_empty`` is analogous to the following assumption: + +.. code-block:: fstar + + forall (a:eqtype) (x:a). {:pattern (mem x empty)} not (mem x empty) + + +As such, lemmas decorated with SMT patterns allow the user to inject +new, quantified hypotheses into the solver's context, where each of +those hypotheses is justified by a proof in F* of the corresponding +lemma. This allows users of the ``FStar.Set`` library to treat ``set`` +almost like a new built-in type, with proof automation to reason about +its operations. However, making this work well requires some careful +design of the patterns. + +Consider ``mem_union``: the pattern chosen above allows the solver to +decompose an active term ``mem x (union s1 s2)`` into ``mem x s1`` and +``mem x s2``, where both terms are smaller than the term we started +with. Suppose instead that we had written: + +.. code-block:: fstar + + val mem_union (#a:eqtype) (x:a) (s1 s2:set a) + : Lemma + (ensures (mem x (union s1 s2) == (mem x s1 || mem x s2))) + [SMTPat (mem x s1); SMTPat (mem x s2)] + +This translates to an SMT quantifier whose patterns are the pair of +terms ``mem x s1`` and ``mem x s2``. This choice of pattern would +allow the solver to instantiate the quantifier with all pairs of +active terms of the form ``mem x s``, creating more active terms that +are themselves matching candidates. To be explicit, with a single +active term ``mem x s``, the solver would derive ``mem x (union s +s)``, ``mem x (union s (union s s))``, and so on. This is called a +matching loop and can be disastrous for solver performance. So, +carefully chosing the patterns on quantifiers and lemmas with +``SMTPat`` annotations is important. + +Finally, to complete our interface, we provide two lemmas to +characterize ``equal``, the equivalence relation on sets. The first +says that sets that agree on the ``mem`` function are ``equal``, and +the second says that ``equal`` sets are provably equal ``(==)``, and +the patterns allow the solver to convert reasoning about equality into +membership and provable equality. + +.. literalinclude:: ../code/SimplifiedFStarSet.fsti + :language: fstar + :start-after: //SNIPPET_START: equal_intro_elim$ + :end-before: //SNIPPET_END: equal_intro_elim$ + +Of course, all these lemmas can be easily proven by F* under a +suitable representation of the abstract type ``set``, as shown in the +module implementation below. + +.. literalinclude:: ../code/SimplifiedFStarSet.fst + :language: fstar + :start-after: //SNIPPET_START: SimplifiedFStarSet.Impl$ + :end-before: //SNIPPET_END: SimplifiedFStarSet.Impl$ + +Exercise +........ + +Extend the set library with another constructor with the signature +shown below: + +.. code-block:: fstar + + val from_fun (#a:eqtype) (f: a -> bool) : Tot (set a) + +and prove a lemma that shows that a an element ``x`` is in ``from_fun +f`` if and only if ``f x = true``, decorating the lemma with the +appropriate SMT pattern. + +This `interface file <../code/SimplifiedFStarSet.fsti>`_ and its +`implementation <../code/SimplifiedFStarSet.fst>`_ provides the +definitions you need. + +.. container:: toggle + + .. container:: header + + **Answer** + + Look at `FStar.Set.intension `_ if you get stuck + +-------------------------------------------------------------------------------- + +.. _Profiling_z3: + +Profiling Z3 and Solving Proof Performance Issues +------------------------------------------------- + +At some point, you will write F* programs where proofs start to take +much longer than you'd like: simple proofs fail to go through, or +proofs that were once working start to fail as you make small changes +to your program. Hopefully, you notice this early in your project and +can try to figure out how to make it better before slogging through +slow and unpredictable proofs. Contrary to the wisdom one often +receives in software engineering where early optimization is +discouraged, when developing proof-oriented libraries, it's wise to +pay attention to proof performance issues as soon as they come up, +otherwise you'll find that as you scale up further, proofs become so +slow or brittle that your productivity decreases rapidly. + +Query Statistics +................ + +Your first tool to start diagnosing solver performance is F*'s +``--query_stats`` option. We'll start with some very simple artificial +examples. + +With the options below, F* outputs the following statistics: + + +.. code-block:: fstar + + #push-options "--initial_fuel 0 --max_fuel 4 --ifuel 0 --query_stats" + let _ = assert (factorial 3 == 6) + + +.. code-block:: none + + ((20,0-20,49)) Query-stats (SMTEncoding._test_query_stats, 1) failed + {reason-unknown=unknown because (incomplete quantifiers)} in 31 milliseconds + with fuel 0 and ifuel 0 and rlimit 2723280 + statistics={mk-bool-var=7065 del-clause=242 num-checks=3 conflicts=5 + binary-propagations=42 arith-fixed-eqs=4 arith-pseudo-nonlinear=1 + propagations=10287 arith-assert-upper=21 arith-assert-lower=18 + decisions=11 datatype-occurs-check=2 rlimit-count=2084689 + arith-offset-eqs=2 quant-instantiations=208 mk-clause=3786 + minimized-lits=3 memory=21.41 arith-pivots=6 max-generation=5 + arith-conflicts=3 time=0.03 num-allocs=132027456 datatype-accessor-ax=3 + max-memory=21.68 final-checks=2 arith-eq-adapter=15 added-eqs=711} + + ((20,0-20,49)) Query-stats (SMTEncoding._test_query_stats, 1) failed + {reason-unknown=unknown because (incomplete quantifiers)} in 47 milliseconds + with fuel 2 and ifuel 0 and rlimit 2723280 + statistics={mk-bool-var=7354 del-clause=350 arith-max-min=10 interface-eqs=3 + num-checks=4 conflicts=8 binary-propagations=56 arith-fixed-eqs=17 + arith-pseudo-nonlinear=3 arith-bound-prop=2 propagations=13767 + arith-assert-upper=46 arith-assert-lower=40 decisions=25 + datatype-occurs-check=5 rlimit-count=2107946 arith-offset-eqs=6 + quant-instantiations=326 mk-clause=4005 minimized-lits=4 + memory=21.51 arith-pivots=20 max-generation=5 arith-add-rows=34 + arith-conflicts=4 time=0.05 num-allocs=143036410 datatype-accessor-ax=5 + max-memory=21.78 final-checks=6 arith-eq-adapter=31 added-eqs=1053} + + ((20,0-20,49)) Query-stats (SMTEncoding._test_query_stats, 1) succeeded + in 48 milliseconds with fuel 4 and ifuel 0 and rlimit 2723280 + statistics={arith-max-min=26 num-checks=5 binary-propagations=70 arith-fixed-eqs=47 + arith-assert-upper=78 arith-assert-lower=71 decisions=40 + rlimit-count=2130332 max-generation=5 arith-nonlinear-bounds=2 + time=0.05 max-memory=21.78 arith-eq-adapter=53 added-eqs=1517 + mk-bool-var=7805 del-clause=805 interface-eqs=3 conflicts=16 + arith-pseudo-nonlinear=6 arith-bound-prop=4 propagations=17271 + datatype-occurs-check=5 arith-offset-eqs=20 quant-instantiations=481 + mk-clause=4286 minimized-lits=38 memory=21.23 arith-pivots=65 + arith-add-rows=114 arith-conflicts=5 num-allocs=149004462 + datatype-accessor-ax=9 final-checks=7} + +There's a lot of information here: + +* We see three lines of output, each tagged with a source location and + an internal query identifer (``(SMTEncoding._test_query_stats, 1)``, + the first query for verifying ``_test_query_stats``). + +* The first two attempts at the query failed, with Z3 reporting the + reason for failure as ``unknown because (incomplete + quantifiers)``. This is a common response from Z3 when it fails to + prove a query---since first-order logic is undecidable, when Z3 + fails to find a proof, it reports "unknown" rather than claiming + that the theory is satisfiable. The third attempt succeeded. + +* The attempts used ``0``, ``2``, and ``4`` units of fuel. Notice that + our query was ``factorial 3 == 6`` and this clearly requires at + least 4 units of fuel to succeed. In this case it didn't matter + much, since the two failed attempts took only ``47`` and ``48`` + milliseconds. But, you may sometimes find that there are many + attempts of a proof with low fuel settings and finally success with + a higher fuel number. In such cases, you may try to find ways to + rewrite your proof so that you are not relying on so many unrollings + (if possible), or if you decide that you really need that much fuel, + then setting the ``--fuel`` option to that value can help avoid + several slow failures and retries. + +* The rest of the statistics report internal Z3 statistics. + + - The ``rlimit`` value is a logical resource limit that F* sets when + calling Z3. Sometimes, as we will see shortly, a proof can be + "cancelled" in case Z3 runs past this resource limit. You can + increase the ``rlimit`` in this case, as we'll see below. + + - Of the remaning statistics, perhaps the main one of interest is + ``quant_instantiations``. This records a cumulative total of + quantifiers instantiated by Z3 so far in the current + session---here, each attempt seems to instantiate around 100--150 + quantifiers. This is a very low number, since the query is so + simple. You may be wondering why it is even as many as that, since + 4 unfolding of factorial suffice, but remember that there are many + other quantifiers involved in the encoding, e.g., those that prove + that ``BoxBool`` is injective etc. A more typical query will see + quantifier instantiations in the few thousands. + + +.. note:: + + Note, since the ``quant-instantiations`` metric is cumulative, it + is often useful to precede a query with something like the following: + + .. code-block:: fstar + + #push-options "--initial_fuel 0 --max_fuel 4 --ifuel 0 --query_stats" + #restart-solver + let _dummy = assert (factorial 0 == 1) + + let _test_query_stats = assert (factorial 3 == 6) + + The ``#restart-solver`` creates a fresh Z3 process and the + ``dummy`` query "warms up" the process by feeding it a trivial + query, which will run somewhat slow because of various + initialization costs in the solver. Then, the query stats reported + for the real test subject starts in this fresh session. + +Working though a slow proof +........................... + +Even a single poorly chosen quantified assumption in the prover's +context can make an otherwise simple proof take very long. To +illustrate, consider the following variation on our example above: + +.. code-block:: fstar + + assume Factorial_unbounded: forall (x:nat). exists (y:nat). factorial y > x + + #push-options "--fuel 4 --ifuel 0 --query_stats" + #restart-solver + let _test_query_stats = assert (factorial 3 == 6) + +We've now introduced the assumption ``Factorial_unbounded`` into our +context. Recall from the SMT encoding of quantified formulas, from the +SMT solver's perspective, this looks like the following: + +.. code-block:: smt2 + + (assert (! (forall ((@x0 Term)) + (! (implies (HasType @x0 Prims.nat) + (exists ((@x1 Term)) + (! (and (HasType @x1 Prims.nat) + (> (BoxInt_proj_0 (SMTEncoding.factorial @x1)) + (BoxInt_proj_0 @x0))) + :qid assumption_SMTEncoding.Factorial_unbounded.1))) + :qid assumption_SMTEncoding.Factorial_unbounded)) + :named assumption_SMTEncoding.Factorial_unbounded)) + + +This quantifier has no explicit patterns, but Z3 picks the term +``(HasType @x0 Prims.nat)`` as the pattern for the ``forall`` +quantifier. This means that it can instantiate the quantifier for +active terms of type ``nat``. But, a single instantiation of the +quantifier, yields the existentially quantified formula. Existentials +are immediately `skolemized +`_ by Z3, i.e., the +existentially bound variable is replaced by a fresh function symbol +that depends on all the variables in scope. So, a fresh term ``a @x0`` +corresponding ``@x1`` is introduced, and immediately, the conjunct +``HasType (a @x0) Prims.nat`` becomes an active term and can be used to +instantiate the outer universal quantifier again. This "matching loop" +sends the solver into a long, fruitless search and the simple proof +about ``factorial 3 == 6`` which previously succeeded in a few +milliseconds, now fails. Here's are the query stats: + + +.. code-block:: none + + ((18,0-18,49)) Query-stats (SMTEncoding._test_query_stats, 1) failed + {reason-unknown=unknown because canceled} in 5647 milliseconds + with fuel 4 and ifuel 0 and rlimit 2723280 + statistics={ ... quant-instantiations=57046 ... } + +A few things to notice: + + * The failure reason is "unknown because canceled". That means the + solver reached its resource limit and halted the proof + search. Usually, when you see "canceled" as the reason, you could + try raising the rlimit, as we'll see shortly. + + * The failure took 5.6 seconds. + + * There were 57k quantifier instantiations, as compared to just the + 100 or so we had earlier. We'll soon seen how to pinpoint which + quantifiers were instantiated too much. + +Increasing the rlimit +~~~~~~~~~~~~~~~~~~~~~ + +We can first retry the proof by giving Z3 more resources---the +directive below doubles the resource limit given to Z3. + +.. code-block:: fstar + + #push-options "--z3rlimit_factor 2" + +This time it took 14 seconds and failed. But if you try the same proof +a second time, it succeeds. That's not very satisfying. + +Repeating Proofs with Quake +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Although this is an artificial example, unstable proofs that work and +then suddenly fail do happen. Z3 does guarantee that it is +deterministic in a very strict sense, but even the smallest change to +the input, e.g., a change in variable names, or even asking the same +query twice in a succession in the same Z3 session, can result in +different answers. + +There is often a deeper root cause (in our case, it's the +``Factorial_unbounded`` assumption, of course), but a first attempt at +determining whether or not a proof is "flaky" is to use the F* option +``--quake``. + +.. code-block:: fstar + + #push-options "--quake 5/k" + let _test_query_stats = assert (factorial 3 == 6) + +This tries the query 5 times and reports the number of successes and +failures. + +In this case, F* reports the following: + +.. code-block:: none + + Quake: query (SMTEncoding._test_query_stats, 1) succeeded 4/5 times (best fuel=4, best ifuel=0) + +If you're working to stabilize a proof, a good criterion is to see if +you can get the proof to go through with the ``--quake`` option. + +You can also try the proof by varying the Z3's random seed and +checking that it works with several choices of the seed. + +.. code-block:: none + + #push-options "--z3smtopt '(set-option :smt.random_seed 1)'" + + +Profiling Quantifier Instantiation +.................................. + +We have a query that's taking much longer than we'd like and from the +query-stats we see that there are a lot of quantifier instances. Now, +let's see how to pin down which quantifier is to blame. + + + 1. Get F* to log an .smt2 file, by adding the ``--log_queries`` + option. It's important to also add a ``#restart-solver`` before + just before the definition that you're interested in + profiling. + + .. code-block:: fstar + + #push-options "--fuel 4 --ifuel 0 --query_stats --log_queries --z3rlimit_factor 2" + #restart-solver + let _test_query_stats = assert (factorial 3 == 6) + + F* reports the name of the file that it wrote as part of the + query-stats. For example: + + .. code-block:: none + + ((18,0-18,49)@queries-SMTEncoding-7.smt2) Query-stats ... + + 2. Now, from a terminal, you run Z3 on this generated .smt2 file, + while passing it the following option and save the output in a + file. + + .. code-block:: none + + z3 queries-SMTEncoding-7.smt2 smt.qi.profile=true > sample_qiprofile + + 3. The output contains several lines that begin with + ``[quantifier_instances]``, which is what we're interested in. + + + .. code-block:: none + + grep quantifier_instances sample_qiprofile | sort -k 4 -n + + The last few lines of output look like this: + + .. code-block:: none + + [quantifier_instances] bool_inversion : 352 : 10 : 11 + [quantifier_instances] bool_typing : 720 : 10 : 11 + [quantifier_instances] constructor_distinct_BoxBool : 720 : 10 : 11 + [quantifier_instances] projection_inverse_BoxBool_proj_0 : 1772 : 10 : 11 + [quantifier_instances] primitive_Prims.op_Equality : 2873 : 10 : 11 + [quantifier_instances] int_typing : 3168 : 10 : 11 + [quantifier_instances] constructor_distinct_BoxInt : 3812 : 10 : 11 + [quantifier_instances] typing_SMTEncoding.factorial : 5490 : 10 : 11 + [quantifier_instances] int_inversion : 5506 : 11 : 12 + [quantifier_instances] @fuel_correspondence_SMTEncoding.factorial.fuel_instrumented : 5746 : 10 : 11 + [quantifier_instances] Prims_pretyping_ae567c2fb75be05905677af440075565 : 5835 : 11 : 12 + [quantifier_instances] projection_inverse_BoxInt_proj_0 : 6337 : 10 : 11 + [quantifier_instances] primitive_Prims.op_Multiply : 6394 : 10 : 11 + [quantifier_instances] primitive_Prims.op_Subtraction : 6394 : 10 : 11 + [quantifier_instances] token_correspondence_SMTEncoding.factorial.fuel_instrumented : 7629 : 10 : 11 + [quantifier_instances] @fuel_irrelevance_SMTEncoding.factorial.fuel_instrumented : 9249 : 10 : 11 + [quantifier_instances] equation_with_fuel_SMTEncoding.factorial.fuel_instrumented : 13185 : 10 : 10 + [quantifier_instances] refinement_interpretation_Tm_refine_542f9d4f129664613f2483a6c88bc7c2 : 15346 : 10 : 11 + [quantifier_instances] assumption_SMTEncoding.Factorial_unbounded : 15890 : 10 : 11 + + + Each line mentions is of the form: + + .. code-block:: none + + qid : number of instances : max generation : max cost + + where, + + * qid is the identifer of quantifier in the .smt2 file + + * the number of times it was instantiated, which is the + number we're most interested in + + * the generation and cost are other internal measures, which + Nikolaj Bjorner explains `here + `_ + + 4. Interpreting the results + + Clearly, as expected, ``assumption_SMTEncoding.Factorial_unbounded`` is + instantiated the most. + + Next, if you search in the .smt2 file for ":qid + refinement_interpretation_Tm_refine_542f9d4f129664613f2483a6c88bc7c2", + you'll find the assumption that gives an interpretation to the + ``HasType x Prims.nat`` predicate, where each instantiation of + ``Factorial_unbounded`` yields another instance of this fact. + + Notice that + ``equation_with_fuel_SMTEncoding.factorial.fuel_instrumented`` + is also instantiated a lot. This is because aside from the + matching loop due to ``HasType x Prims.nat``, each + instantiation of ``Factorial_unbounded`` also yields an + occurrence of ``factorial`` as a new active term, which the + solver then unrolls up to four times. + + We also see instantiations of quantifiers in ``Prims`` and other + basic facts like ``int_inversion``, ``bool_typing`` etc. + Sometimes, you may even find that these quantifiers fire the + most. However, these quantifiers are inherent to F*'s SMT + encoding: there's not much you can do about it as a user. They + are usually also not to blame for a slow proof---they fire a lot + when other terms are instantiated too much. You should try to + identify other quantifiers in your code or libraries that fire + a lot and try to understand the root cause of that. + +Z3 Axiom Profiler +~~~~~~~~~~~~~~~~~ + +The `Z3 Axiom Profiler +`_ can also be used to +find more detailed information about quantifier instantiation, +including which terms we used for instantiation, dependence among the +quantifiers in the form of instantiation chains, etc. + +However, there seem to be `some issues +`_ with +using it at the moment with Z3 logs generated from F*. + + +.. _Splitting_queries: + +Splitting Queries +................. + +In the next two sections, we look at a small example that Alex Rozanov +reported, shown below. It exhibits similar proof problems to our +artificial example with factorial. Instead of just identifying the +problematic quantifier, we look at how to remedy the performance +problem by revising the proof to be less reliant on Z3 quantifier +instantiation. + +.. literalinclude:: ../code/Alex.fst + :language: fstar + +The hypothesis is that ``unbounded f`` has exactly the same problem as +the our unbounded hypothesis on factorial---the ``forall/exists`` +quantifier contains a matching loop. + +This proof of ``find_above_for_g`` succeeds, but it takes a while and +F* reports: + +.. code-block:: none + + (Warning 349) The verification condition succeeded after splitting + it to localize potential errors, although the original non-split + verification condition failed. If you want to rely on splitting + queries for verifying your program please use the '--split_queries + always' option rather than relying on it implicitly. + +By default, F* collects all the proof obligations in a top-level F* +definition and presents it to Z3 in a single query with several +conjuncts. Usually, this allows Z3 to efficiently solve all the +conjuncts together, e.g., the proof search for one conjunct may yield +clauses useful to complete the search for other clauses. However, +sometimes, the converse can be true: the proof search for separate +conjuncts can interfere with each other negatively, leading to the +entire proof to fail even when every conjunct may be provable if tried +separately. Additionally, when F* calls Z3, it applies the current +rlimit setting for every query. If a query contains N conjuncts, +splitting the conjuncts into N separate conjuncts is effectively a +rlimit multiplier, since each query can separately consume resources +as much as the current rlimit. + +If the single query with several conjunct fails without Z3 reporting +any further information that F* can reconstruct into a localized error +message, F* splits the query into its conjuncts and tries each of +them in isolation, so as to isolate the failing conjunct it +any. However, sometimes, when tried in this mode, the proof of all +conjuncts can succeed. + +One way to respond to Warning 349 is to follow what it says and enable +``--split_queries always`` explicitly, at least for the program fragment in +question. This can sometimes stabilize a previously unstable +proof. However, it may also end up deferring an underlying +proof-performance problem. Besides, even putting stability aside, +splitting queries into their conjuncts results in somewhat slower +proofs. + +.. _UTH_opaque_to_smt: + +Taking Control of Quantifier Instantiations with Opaque Definitions +................................................................... + +Here is a revision of Alex's program that addresses the quantifier +instantiation problem. There are a few elements to the solution. + +.. literalinclude:: ../code/AlexOpaque.fst + :language: fstar + :start-after: //SNIPPET_START: opaque$ + :end-before: //SNIPPET_END: opaque$ + +1. Marking definitions as opaque + + The attribute ``[@@"opaque_to_smt"]`` on the definition of + ``unbounded`` instructs F* to not encode that definition to the SMT + solver. So, the problematic alternating quantifier is no longer + in the global scope. + +2. Selectively revealing the definition within a scope + + Of course, we still want to reason about the unbounded + predicate. So, we provide a lemma, ``instantiate_unbounded``, that + allows the caller to explicity instantiate the assumption + that ``f`` is unbounded on some lower bound ``m``. + + To prove the lemma, we use ``FStar.Pervasives.reveal_opaque``: + its first argument is the name of a symbol that should be + revealed; its second argument is a term in which that definition + should be revealed. It this case, it proves that ``unbounded f`` + is equal to ``forall m. exists n. abs (f n) > m``. + + With this fact available in the local scope, Z3 can prove the + lemma. You want to use ``reveal_opaque`` carefully, since with + having revealed it, Z3 has the problematic alternating quantifier + in scope and could go into a matching loop. But, here, since the + conclusion of the lemma is exactly the body of the quantifier, Z3 + quickly completes the proof. If even this proves to be + problematic, then you may have to resort to tactics. + +3. Explicitly instantiate where needed + + Now, with our instantiation lemma in hand, we can precisly + instantiate the unboundedness hypothesis on ``f`` as needed. + + In the proof, there are two instantiations, at ``m`` and ``m1``. + + Note, we are still relying on some non-trivial quantifier + instantiation by Z3. Notably, the two assertions are important to + instantiate the existential quantifier in the ``returns`` + clause. We'll look at that in more detail shortly. + + But, by making the problematic definition opaque and instantiating + it explicitly, our performance problem is gone---here's what + query-stats shows now. + + .. code-block:: none + + ((18,2-31,5)) Query-stats (AlexOpaque.find_above_for_g, 1) + succeeded in 46 milliseconds + +This `wiki page +`_ +provides more information on selectively revealing opaque definitions. + +Other Ways to Explicitly Trigger Quantifiers +............................................ + +For completeness, we look at some other ways in which quantifier +instantiation works. + +.. _Artificial_triggers: + +An Artificial Trigger +~~~~~~~~~~~~~~~~~~~~~ + +Instead of making the definition of ``unbounded`` opaque, we could +protect the universal quantifier with a pattern using some symbol +reserved for this purpose, as shown below. + +.. literalinclude:: ../code/AlexOpaque.fst + :language: fstar + :start-after: //SNIPPET_START: trigger$ + :end-before: //SNIPPET_END: trigger$ + + +1. We define a new function ``trigger x`` that is trivially true. + +2. In ``unbounded_alt`` we decorate the universal quantifier with an + explicit pattern, ``{:pattern (trigger x)}``. The pattern is not + semantically relevant---it's only there to control how the + quantifier is instantiated + +3. In ``find_above_for_gg``, whenever we want to instantiate the + quantifier with a particular lower bound ``k``, we assert ``trigger + k``. That gives Z3 an active term that mentions ``trigger`` which + it then uses to instantiate the quantifier with our choice of + ``k``. + +This style is not particularly pleasant, because it involves polluting +our definitions with semantically irrelevant triggers. The selectively +revealing opaque definitions style is much preferred. However, +artificial triggers can sometimes be useful. + +Existential quantifiers +~~~~~~~~~~~~~~~~~~~~~~~ + +We have an existential formula in the goal ``exists (i:nat). abs(g i) +> m`` and Z3 will try to solve this by finding an active term to +instantiate ``i``. In this case, the patterns Z3 picks is ``(g i)`` as +well the predicate ``(HasType i Prims.nat)``, which the SMT encoding +introduces. Note, F* does not currently allow the existential +quantifier in a ``returns`` annoation to be decorated with a +pattern---that will likely change in the future. + +Since ``g i`` is one of the patterns, by asserting ``abs (g (n - 1)) > +m`` in one branch, and ``abs (g (n1 - 1)) > m`` in the other, Z3 has +the terms it needs to instantiate the quantifier with ``n - 1`` in one +case, and ``n1 - 1`` in the other case. + +In fact, any assertion that mentions the ``g (n - 1)`` and +``g (n1 - 1)`` will do, even trivial ones, as the example below shows. + +.. literalinclude:: ../code/AlexOpaque.fst + :language: fstar + :start-after: //SNIPPET_START: trigger_exists$ + :end-before: //SNIPPET_END: trigger_exists$ + +We assert ``trigger (g (n - 1)))`` and ``trigger (g (n1 - 1))``, this +gives Z3 active terms for ``g (n - 1))`` and ``g (n1 - 1)``, which +suffices for the instantiation. Note, asserting ``trigger (n - 1)`` is +not enough, since that doesn't mention ``g``. + +However, recall that there's a second pattern that's also applicable +``(HasType i Prims.nat)``--we can get Z3 to instantiate the quantifier +if we can inject the predicate ``(HasType (n - 1) nat)`` into Z3's +context. By using ``trigger_nat``, as shown below, does the trick, +since F* inserts a proof obligation to show that the argument ``x`` in +``trigger_nat x`` validates ``(HasType x Prims.nat)``. + +.. literalinclude:: ../code/AlexOpaque.fst + :language: fstar + :start-after: //SNIPPET_START: trigger_nat$ + :end-before: //SNIPPET_END: trigger_nat$ + +Of course, rather than relying on implicitly chosen triggers for the +existentials, one can be explicit about it and provide the instance +directly, as shown below, where the ``introduce exists ...`` in each +branch directly provides the witness rather than relying on Z3 to find +it. This style is much preferred, if possible, than relying implicit +via various implicitly chosen patterns and artificial triggers. + +.. literalinclude:: ../code/AlexOpaque.fst + :language: fstar + :start-after: //SNIPPET_START: explicit_exists$ + :end-before: //SNIPPET_END: explicit_exists$ + +Here is `a link to the the full file <../code/AlexOpaque.fst>`_ with +all the variations we have explored. + +Overhead due to a Large Context +............................... + +Consider the following program: + +.. literalinclude:: ../code/ContextPollution.fst + :language: fstar + :start-after: //SNIPPET_START: context_test1$ + :end-before: //SNIPPET_END: context_test1$ + +The lemma ``test1`` is a simple property about ``FStar.Seq``, but the +lemma occurs in a module that also depends on a large number of other +modules---in this case, about 177 modules from the F* standard +library. All those modules are encoded to the SMT solver producing +about 11MB of SMT2 definitions with nearly 20,000 assertions for the +solver to process. This makes for a large search space for the solver +to explore to find a proof, however, most of those assertions are +quantified formulas guarded by patterns and they remain inert unless +some active term triggers them. Nevertheless, all these definitions +impose a noticeable overhead to the solver. If you turn +``--query_stats`` on (after a single warm-up query), it takes Z3 about +300 milliseconds (and about 3000 quantifier instantiations) to find a +proof for ``test1``. + +You probably won't really notice the overhead of a proof that takes +300 milliseconds---the F* standard library doesn't have many +quantifiers in scope with things like bad quantifier alternation that +lead to matching loops. However, as your development starts to depend +on an ever larger stack of modules, there's the danger that at some +point, your proofs are impacted by some bad choice of quantifiers in +some module that you have forgotten about. In that case, you may find +that seemingly simple proofs take many seconds to go through. In this +section, we'll look at a few things you can do to diagnose such +problems. + +Filtering the context +~~~~~~~~~~~~~~~~~~~~~ + +The first thing we'll look at is an F* option to remove facts from the +context. + +.. literalinclude:: ../code/ContextPollution.fst + :language: fstar + :start-after: //SNIPPET_START: using_facts$ + :end-before: //SNIPPET_END: using_facts$ + +The ``--using_facts_from`` option retains only facts from modules that +match the namespace-selector string provided. In this case, the +selector shrinks the context from 11MB and 20,000 assertions to around +1MB and 2,000 assertions and the query stats reports that the proof +now goes through in just 15 milliseconds---a sizeable speedup even +though the absolute numbers are still small. + +Of course, deciding which facts to filter from your context is not +easy. For example, if you had only retained ``FStar.Seq`` and forgot +to include ``Prims``, the proof would have failed. So, the +``--using_facts_from`` option isn't often very useful. + +Unsat Core and Hints +~~~~~~~~~~~~~~~~~~~~ + +When Z3 finds a proof, it can report which facts from the context were +relevant to the proof. This collection of facts is called the unsat +core, because Z3 has proven that the facts from the context and the +negated goal are unsatisfiable. F* has an option to record and replay +the unsat core for each query and F* refers to the recorded unsat cores +as "hints". + +Here's how to use hints: + + +1. Record hints + + .. code-block:: none + + fstar.exe --record_hints ContextPollution.fst + + This produces a file called ``ContextPollution.fst.hints`` + + The format of a hints file is internal and subject to change, + but it is a textual format and you can roughly see what it + contains. Here's a fragment from it: + + .. code-block:: none + + [ + "ContextPollution.test1", + 1, + 2, + 1, + [ + "@MaxIFuel_assumption", "@query", "equation_Prims.nat", + "int_inversion", "int_typing", "lemma_FStar.Seq.Base.lemma_eq_intro", + "lemma_FStar.Seq.Base.lemma_index_app1", + "lemma_FStar.Seq.Base.lemma_index_app2", + "lemma_FStar.Seq.Base.lemma_len_append", + "primitive_Prims.op_Addition", "primitive_Prims.op_Subtraction", + "projection_inverse_BoxInt_proj_0", + "refinement_interpretation_Tm_refine_542f9d4f129664613f2483a6c88bc7c2", + "refinement_interpretation_Tm_refine_ac201cf927190d39c033967b63cb957b", + "refinement_interpretation_Tm_refine_d83f8da8ef6c1cb9f71d1465c1bb1c55", + "typing_FStar.Seq.Base.append", "typing_FStar.Seq.Base.length" + ], + 0, + "3f144f59e410fbaa970cffb0e20df75d" + ] + + This is the hint entry for the query with whose id is + ``(ContextPollution.test1, 1)`` + + The next two fields are the fuel and ifuel used for the query, + ``2`` and ``1`` in this case. + + Then, we have the names of all the facts in the unsat core for this + query: you can see that it was only about 20 facts that were + needed, out of the 20,000 that were originally present. + + The second to last field is not used---it is always 0. + + And the last field is a hash of the query that was issued. + +2. Replaying hints + + The following command requests F* to search for + ``ContextPollution.fst.hints`` in the include path and when + attempting to prove a query with a given id, it looks for a hint + for that query in the hints file, uses the fuel and ifuel settings + present in the hints, and prunes the context to include only the + facts present in the unsat core. + + .. code-block:: none + + fstar.exe --use_hints ContextPollution.fst + + Using the hints usually improves verification times substantially, + but in this case, we see that the our proof now goes through in + about 130 milliseconds, not nearly as fast as the 15 milliseconds + we saw earlier. That's because when using a hint, each query to Z3 + spawns a new Z3 process initialized with just the facts in the + unsat core, and that incurs some basic start-up time costs. + +Many F* projects use hints as part of their build, including F*'s +standard library. The .hints files are checked in to the repository +and are periodically refreshed as proofs evolve. This helps improve +the stability of proofs: it may take a while for a proof to go +through, but once it does, you can record and replay the unsat core +and subsequent attempts of the same proof (or even small variations of +it) can go through quickly. + +Other projects do not use hints: some people (perhaps rightfully) see +hints as a way of masking underlying proof performance problems and +prefer to make proofs work quickly and robustly without hints. If you +can get your project to this state, without relying on hints, then so +much the better for you! + +Differential Profiling with qprofdiff +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you have a proof that takes very long without hints but goes +through quickly with hints, then the hints might help you diagnose why +the original proof was taking so long. This wiki page describes how to +`compare two Z3 quantifier instantiation profiles +`_ +with a tool that comes with Z3 called qprofdiff. + + +Hints that fail to replay +^^^^^^^^^^^^^^^^^^^^^^^^^ + +Sometimes, Z3 will report an unsat core, but when F* uses it to try to +replay a proof, Z3 will be unable to find a proof of unsat, and F* +will fall back to trying the proof again in its original context. The +failure to find a proof of unsat from a previously reported unsat core +is not a Z3 unsoundness or bug---it's because although the report core +is really logically unsat, finding a proof of unsat may have relied on +quantifier instantiation hints from facts that are not otherwise +semantically relevant. The following example illustrates. + +.. literalinclude:: ../code/HintReplay.fst + :language: fstar + + +Say you run the following: + +.. code-block:: none + + fstar --record_hints HintReplay.fst + fstar --query_stats --use_hints HintReplay.fst + +You will see the following output from the second run: + +.. code-block:: none + + (HintReplay.fst(15,27-15,39)) Query-stats (HintReplay.test, 1) failed + {reason-unknown=unknown because (incomplete quantifiers)} (with hint) + in 42 milliseconds .. + + (HintReplay.fst(15,27-15,39)) Query-stats (HintReplay.test, 1) succeeded + in 740 milliseconds ... + +The first attempt at the query failed when using the hint, and the +second attempt at the query (without the hint) succeeded. + +To see why, notice that to prove the assertion ``r x`` from the +hypothesis ``q x``, logically, the assumption ``Q_R`` +suffices. Indeed, if you look in the hints file, you will see that it +only mentions ``HintReplay.Q_R`` as part of the logical core. However, +``Q_R`` is guarded by a pattern ``p x`` and in the absence of the +assumption ``P_Q``, there is no way for the solver to derive an active +term ``p x`` to instantiate ``Q_R``---so, with just the unsat core, it +fails to complete the proof. + +Failures for hint replay usually point to some unusual quantifier +triggering pattern in your proof. For instance, here we used ``p x`` +as a pattern, even though ``p x`` doesn't appear anywhere in +``Q_R``---that's not usually a good choice, though sometimes, e.g., +when using :ref:`artificial triggers ` it can +come up. + +This `wiki page on hints +`_ +provides more information about diagnosing hint-replay failures, +particularly in the context of the Low* libraries. + +