Update docs and README, add additional info (#57)

wwwil · jetstack-bot · commit 433f53cb7b9a · 2020-01-24T08:52:32.000Z
Signed-off-by: wwwil &lt;wwwil.squires@gmail.com&gt;
diff --git a/README.md b/README.md
@@ -9,7 +9,8 @@
 
 # Jetstack Preflight
 
-Preflight is a tool to automatically perform Kubernetes cluster configuration checks using [Open Policy Agent (OPA)](https://www.openpolicyagent.org/).
+Preflight is a tool to automatically perform Kubernetes cluster
+configuration checks using [Open Policy Agent (OPA)](https://www.openpolicyagent.org/).
 
 <!-- markdown-toc start - Don't edit this section. Run M-x markdown-toc-refresh-toc -->
 **Table of Contents**
@@ -23,11 +24,9 @@ Preflight is a tool to automatically perform Kubernetes cluster configuration ch
 
 <!-- markdown-toc end -->
 
-
 ## Background
 
-Preflight was originally designed to automate Jetstack's
-production readiness assessments.
+Preflight was originally designed to automate Jetstack's production readiness assessments.
 These are consulting sessions in which a Jetstack engineer inspects a customer's
 cluster to suggest improvements and identify configuration issues. 
 The product of this assessment is a report
@@ -39,53 +38,100 @@ Automating the checks allows them to be more comprehensive and much faster.
 
 The automation also allows the checks to be run repeatedly,
 meaning they can be deployed in-cluster to provide continuous configuration checking.
-
 This enables new interesting use cases as policy compliance audits.
 
+## Preflight Application
+
+The Preflight application uses *data gatherers*
+to collect required data in JSON format.
+Preflight then checks the gathered data against rules specified in
+*Preflight packages* and outputs rule violations with relevant information.
+
+Preflight is designed to run both locally for one-off checking,
+and in-cluster to for continuous checking.
+
 ## Preflight Packages
 
-Policies for cluster configuration are encoded into "Preflight Packages".
+Policies for cluster configuration are encoded into *Preflight packages*.
+You can find some examples in [./preflight-packages](./preflight-packages).
 
-You can find some examples in [./preflight-packages](./preflight-packages) and you can also [write your own Preflight Packages](./docs/how_to_write_packages.md).
+Each package focuses on a different aspect of the cluster.
+For example, the [`gke_basic`](preflight-packages/examples.jetstack.io/gke_basic)
+package provides rules for the configuration of a GKE cluster,
+and the [`pods`](preflight-packages/jetstack.io/pods) package
+provides rules for the configuration of Kubernetes Pods.
 
-Preflight Packages are a very thin wrapper around OPA's policies. A package is made of [Rego](https://www.openpolicyagent.org/docs/latest/#rego) files (OPA's high-level declarative language) and a *Policy Manifest*.
+A Preflight package consists of a *Policy Manifest* and a
+[Rego](https://www.openpolicyagent.org/docs/latest/#rego) package.
 
-The *Policy Manifest* is a YAML file intended to add metadata to the rules, so the tool can display useful information when a rule doesn't pass.
+The *Policy Manifest* is a YAML file that specifies a package's rules.
+It gives descriptions of the rules and remeditation advice,
+so the tool can display useful information when a rule doesn't pass.
 
-Since the logic in these packages is just Rego, you can add tests to your policies and use OPA's command line to run them (see [OPA Policy Testing tutorial](https://www.openpolicyagent.org/docs/latest/policy-testing/)).
+Rego is OPA's high-level declarative language for specifying rules.
+Rego rules can be defined in multiples files grouped into logical Rego packages.
 
-Additionally, Preflight has a built-in linter for packages:
+Anyone can create new Preflight packages to perform their own checks.
+The Preflight docs include a guide on [how to write packages](./docs/how_to_write_packages.md).
 
-```
-preflight package lint <path to package>
-```
+![Preflight package structure diagram](./docs/images/preflight_package.png)
+
+## Get Preflight
+
+### Download
 
-## Install Preflight
+Preflight binaries and *bundles*,
+which include a binary and all the *packages* in this repo,
+can be downloaded from the [releases page](https://github.com/jetstack/preflight/releases).
 
-### Use Preflight locally
+### Build
 
-You can compile Preflight by running `make build`. It will create the binary in `builds/preflight`.
+You can compile Preflight by running `make build`.
+It will create the binary in `builds/preflight`.
 
-Create your `preflight.yaml` configuration file (you can take inspiration from the ones in `./examples`).
+## Use Preflight
 
-Run Preflight (by default it looks for `./preflight.yaml`)
+Create your `preflight.yaml` configuration file.
+There is full [configuration documentation](./docs/configuration.md) available,
+as well as several example files in [`./examples`](./examples).
+
+### Use Preflight Locally
+
+By default Preflight looks for a configuration at `./preflight.yaml`.
+Once this is set up, run a Preflight check like so:
 
 ```
 preflight check
 ```
 
-You can try `./examples/pods.preflight.yaml` without having to change a line, if you have your *kubeconfig* (`~/.kube/config`) pointing to a working cluster.
+You can try the Pods example
+[`./examples/pods.preflight.yaml`](./examples/pods.preflight.yaml)
+without having to change a line,
+if you have your *kubeconfig* is located at `~/.kube/config` and
+is pointing to a working cluster.
 
 ```
 preflight check --config-file=./examples/pods.preflight.yaml
 ```
 
-You will see a CLI formatted report if everything goes well. Also, you will get a JSON report in `./output`. 
+You will see a CLI formatted report if everything goes well.
+Also, you will get a JSON report in `./output`. 
+
+## Use Preflight Web UI
 
-If you want to visualice the report in your browser, you can access [preflight.jetstack.io](https://preflight.jetstack.io/) and load the JSON report. **This is a static website. Your report is not being uploaded to any server. Everything happens in your browser.**
+If you want to visualise the report in your browser,
+you can access the [*Preflight Web UI*](https://preflight.jetstack.io/)
+and load the JSON report.
+**This is a static website.**
+**Your report is not being uploaded to any server.**
+**Everything happens in your browser.**
 
-You can give it a try without even running the tool, since we provide some report examples ([gke.json](./examples/reports/gke.json), [pods.json](./examples/reports/pods.json)) ready to be loaded in [preflight.jetstack.io](https://preflight.jetstack.io/).
+You can give it a try without even running the tool,
+since we provide some report examples, [gke.json](./examples/reports/gke.json),
+and [pods.json](./examples/reports/pods.json),
+ready to be loaded into the [*Preflight Web UI*](https://preflight.jetstack.io/).
 
-### Preflight In-Cluster with periodic checks
+### Use Preflight In-Cluster
 
-See [Installation Manual: Preflight In-Cluster](./docs/installation_manual_in_cluster.md).
+Preflight can be installed in-cluster to run continuous checks.
+See the [Installation Manual: Preflight In-Cluster](./docs/installation_manual_in_cluster.md).
diff --git a/docs/configuration.md b/docs/configuration.md
@@ -0,0 +1,119 @@
+# Preflight Configuration
+
+Configuration is provided to the Preflight application using a YAML file.
+This specifies what packages to use, how data gatherers are configured,
+and what outputs to produce.
+
+Several example configuration files can be found in [`examples`](./examples).
+
+## Cluster Name
+
+The `cluster-name` field is used as the 'directory' prefix for output.
+The value shouldn't contain spaces or `/`.
+For example:
+
+```
+cluster-name: my-cluster
+```
+
+## Data Gatherers
+
+Data gatherers are specified under the `data-gatherers` field.
+For example:
+
+```
+data-gatherers:
+  gke:
+    project: my-gcp-project
+    location: us-central1-a
+    cluster: my-cluster
+    credentials: /tmp/credentials.json
+  k8s/pods:
+    kubeconfig: ~/.kube/config
+```
+
+Each data gatherer has it's own configuration requirements,
+which are documented separately.
+
+The following data gatherers are available:
+
+- [Kubernetes Pods](docs/datagatherers/k8s_pods.md)
+- [Google Kubernetes Engine](docs/datagatherers/gke.md)
+- [Amazon Elastic Kubernetes Service](docs/datagatherers/eks.md)
+- [Microsoft Azure Kubernetes Service](docs/datagatherers/aks.md)
+
+# Package Sources
+
+The `package-sources` field is a list of locations
+which Preflight should load packages from.
+For example:
+
+```
+package-sources:
+- type: local
+  dir: ./preflight-packages/
+- type: local
+  dir: /home/user/other-preflight-packages
+```
+
+Each source must a `type`, though currently the only valid type is `local`.
+Local sources must then specify a directory
+for Preflight to look for packages in using the `dir` field.
+Preflight will search for packages in this directory recursively.
+
+In future other source types may be added,
+for example to load packages in GCS buckets.
+
+# Enabled Packages
+
+The `enabled-packages` field is a list of packages that Preflight should use.
+For example:
+
+```
+enabled-packages:
+  - "examples.jetstack.io/gke_basic"
+  - "jetstack.io/pods"
+```
+
+This allows `package-sources` to be large collections of packages,
+only some of which will be run depending on user configuration.
+
+## Outputs
+
+The `outputs` field is a list of output formats and locations that Preflight
+will write data to. Multiple outputs can be specified,
+each with their own settings.
+
+```
+outputs:
+- type: local
+  path: ./output
+  format: json
+- type: local
+  path: ./output
+  format: intermediate
+- type: cli
+```
+
+Possible types of output include:
+- `local` for a local file.
+- `gcs` for a Google Cloud Storage bucket.
+- `cli` for command line output.
+
+Most types also require a `format` to be specified.
+Possible formats are:
+- `json` for raw JSON output.
+- `markdown` for a markdown formatted report.
+- `html` for a HTML formatted report.
+- `intermediate` to output the raw JSON fetched by the *data gatherers*.
+
+With the `cli` type output the format is optional
+and defaults to the `cli` format, for a coloured CLI formatted report.
+
+The reports in `markdown`, `html` and `cli` format make use of the
+*policy manifest* to produce a human readable report describing
+ which checks passed and which failed.
+The `json` format is raw output from OPA evaluation.
+
+If no `outputs` are specified Preflight will output a report
+of the results to the CLI.
diff --git a/docs/datagatherers/gke.md b/docs/datagatherers/gke.md
@@ -1,28 +1,72 @@
-# GKE data gatherer
+# GKE Data Gatherer
 
-It pulls information about one cluster from the GKE API.
-
-## Configuration
-
-[Here](../../examples/gke.preflight.yaml) you have a sample configuration file setting up the GKE data gatherer.
-
-You have to set these parameters in the configuration:
-
-- **project:** the ID of your Google Cloud Platform project.
-- **location:** the compute zone or region where your cluster is running.
-- **cluster:** the name of your GKE cluster.
-- **credentials** *optional* **:** path to a file containing valid credentials for your cluster. Useful if you want to configure a separate service account. If not specified, it will attept to use Workload Identity. If you run Preflight locally on your machine, you can just run `gcloud auth application-default login`
+The GKE *data gatherer* fetches information about a cluster 
+from the Google Kubernetes Engine API.
 
 ## Data
 
-The output of the GKE data gatherer follows this format:
+The output of the GKE data gatherer follows the format described in the
+[GKE API reference](https://cloud.google.com/kubernetes-engine/docs/reference/rest/v1beta1/projects.locations.clusters#Cluster)
+and the [Go Docs](https://godoc.org/google.golang.org/api/container/v1#Cluster).
+These are useful to check when writing new rules.
+
+The gathered data looks like this:
 
 ```json
 {
   "Cluster": {...}
 }
 ```
 
-The `Cluster` property is a JSON representation of [google.golang.org/api/container/v1#Cluster](https://godoc.org/google.golang.org/api/container/v1#Cluster).
+## Configuration
+
+To use the GKE *data gatherer* add a `gke` section to the 
+`data-gatherers` configuration. 
+For example:
+
+```
+...
+data-gatherers:
+  gke:
+    project: my-gcp-project
+    location: us-central1-a
+    cluster: my-gke-cluster
+    # Path to a file containing the credentials. If empty, it will try to use Workload Identity (run `gcloud auth application-default login`).
+    # credentials: /tmp/credentials.json
+...
+```
+
+The `gke` configuration contains the following fields:
+
+- `project`: The ID of your Google Cloud Platform project.
+- `location`: The compute zone or region where your cluster is running.
+- `cluster`: The name of your GKE cluster.
+- `credentials`: *optional* The path to a file containing credentials for your cluster.
+  
+An example configuration can be found at
+[`./examples/gke.preflight.yaml`](./examples/gke.preflight.yaml).
+
+## Permissions
+
+If a `credentials` file is not specified,
+Preflight will attempt to use Workload Identity or Application Default Credentials.
+
+If Preflight is running locally
+and the `gcloud` command is installed and configured,
+just run `gcloud auth application-default login` to set up
+Application Default Credentials.
+
+The `credentials` file is useful if you want to configure
+a separate service account for Preflight to use to fetch GKE data.
+
+Whatever user or service account is used must have the correct
+[IAM Roles](https://cloud.google.com/kubernetes-engine/docs/how-to/iam).
+Specifically it must have the `container.clusters.get` permission.
+This can be given with the _Kubernetes Engine Cluster Viewer_ role
+(`roles/container.clusterViewer`).
 
-> Tip: Use the 'intermediate' output format to get the raw output from the data gatherer. You can use that try your rego rules.
+A sample Terraform project can be found at
+[`./deployment/terraform/gke-datagatherer/`](deployment/terraform/gke-datagatherer).
+This can be used to create a GCP service account called `preflight` which
+is then bound to a custom role of the same name
+with the minimum required permissions.
diff --git a/docs/datagatherers/k8s_pods.md b/docs/datagatherers/k8s_pods.md