|
| 1 | ++++ |
| 2 | +title="Caching Strategies" |
| 3 | +weight=6 |
| 4 | +summary="Learn strategies for caching layers." |
| 5 | ++++ |
| 6 | + |
| 7 | +# Caching |
| 8 | + |
| 9 | +There are three types of layers that can be contributed to an image |
| 10 | + |
| 11 | +* `build` layers -- the directory will be accessible by subsequent buildpacks, |
| 12 | +* `cache` layers -- the directory will be included in the cache, |
| 13 | +* `launch` layers -- the directory will be included in the run image as a single layer, |
| 14 | + |
| 15 | +A fourth type of layer |
| 16 | + |
| 17 | +* `ignored` layers |
| 18 | + |
| 19 | +are available to buildpack authors for use as temporary layers. |
| 20 | + |
| 21 | +In this section we look at caching each layer type. |
| 22 | + |
| 23 | +## Layer Metadata |
| 24 | + |
| 25 | +buildpacks ensure byte-for-byte reproducibility of layers. File creation time is [normalized to January 1, 1980](https://medium.com/buildpacks/time-travel-with-pack-e0efd8bf05db) to ensure reproducibility. Byte-for-byte reproducibility means previous layers can be reused. However, we want to invalidate previously cached layers if |
| 26 | + |
| 27 | +* the buildpacks API changes, |
| 28 | +* the type of the layer changes. |
| 29 | + |
| 30 | +A layer built using a buildpack at API version `0.7` should be considered invalid if the API version for that buildpack has been updated to `0.8`. Similarly, if a layer is changed from being a cache-only layer to being a cache and launch layer, then the cache should be considered invalid. |
| 31 | + |
| 32 | +In addition to general cache invalidation conditions a buildpack should invalidate a previous layer if an important property changes, such as: |
| 33 | + |
| 34 | +* the major version of the runtime changes eg: NodeJS changes from 16 to 18 |
| 35 | +* requested application dependencies have changed eg: a Python application adds a dependency on the `requests` module |
| 36 | + |
| 37 | +Launch layers are exported to an OCI registry and layer metadata is stored with the launch layer. The layer metadata is commonly used when deciding if a launch layer should be re-used from cache. |
| 38 | + |
| 39 | +## Strategies |
| 40 | + |
| 41 | +Caching during the production of an application image is necessarily very flexible. Most buildpacks that wish to contribute a layer to the application image need only to |
| 42 | + |
| 43 | +1. Check that the metadata of the cached layer is current, |
| 44 | +2. Create an empty layer, and |
| 45 | +3. Set `launch = true`. |
| 46 | + |
| 47 | +This will guarantee that the previously published application image layer in the registry is re-used if the layer metadata matches the requested metadata. In this most straightforward use-case `launch` is `true` and both `build` and `cache` are set to `false`. That is to say, the most common case is where `cache = false`, `build = false` and `launch = true`. It is important to note that the layer is re-used on the OCI registry. This avoids expensive rebuilds of the layer and expensive pulls of the layer to the host running the build. |
| 48 | + |
| 49 | +Setting `build = true` makes a layer available to subsequent buildpacks. Therefore binaries installed to the `bin` directory on a `build = true` layer are available to subsequent buildpacks during the build phase. It is also the case that `lib` directories on a `build = true` later are added to the `LD_LIBRARY_PATH` during the build phase of subsequent buildpacks. Environment variables defined in a `build = true` layer are similarly available. For any layer where `launch = true` and `build = true`, a launch layer from the OCI registry can no longer be reused. Instead, the layer must be made available locally so that subsequent buildpacks can use it. |
| 50 | + |
| 51 | +Setting `cache = true` allows additional fine-grained control over caching. The `cache = true` flag caches a layer and allows a buildpack to make _content_ level decisions about the validity of the cache (as opposed to using the less granular metadata). As an example, suppose a layer where `launch = true` installs a `jq` binary with version `1.5` and sets `version=1.5` in the layer metadata. By default, this layer will not be re-used from the registry when a buildpack requests `jq` with `version=1.6` to be installed. However, setting `cache = true` makes a previously built layer available during the build. A buildpack could then prefer to implement logic to restore `jq` with `version=1.5` instead of performing a potentially expensive download of `jq` with `version=1.6`. The `cache = true` setting allows for cache validation decisions to be made at a level of granularity that is much finer grained than layer metadata. |
| 52 | + |
| 53 | +Setting `cache = false`, `build = false`, and `launch = true` is the most common configuration. If `cache = false`, `build = false`, and `launch = true` is not appropriate for your layer, then `cache = true`, `build = true`, and `launch = true` should be the next combination to evaluate: |
| 54 | + |
| 55 | +* When `cache = true, build = true, launch = true`, explicitly setting `build = true` makes the layer available, to subsequent buildpacks, during the build phase. As `cache = true` the layer is restored from local cache before proceeding to the build phase. For example, a Python distribution could be provided in a cached, build and launch layer. The build phase could verify that the restored cached version of the Python distribution contains Python 3.10 but disregard the patch number of the Python interpreter. |
| 56 | +* `cache = true, build = true, launch = true` is an appropriate setting for a layer providing a distribution or runtime such as a Python interpreter or NodeJS runtime. |
| 57 | + |
| 58 | +Other common configurations include |
| 59 | + |
| 60 | +* `cache = true, build = false, launch = true` Allows the same caching behavior as `cache = true, build = true, launch = true`, but the layer is not available to subsequent buildpacks. For example, the build phase can restore a Python distribution disregarding the patch number of the `major.minor.patch` number stored in the metadata. As `build = false` the python interpreter is unavailable to subsequent buildpacks. |
| 61 | +* `cache = true, build = true, launch = false` This configuration is useful where a build time dependency is provided. For example, a JDK could be provided as a cached build layer that is not added as a launch layer. Instead, a JRE could be provided as a launch layer in the application image. |
| 62 | +* `cache = true, build = false, launch = false` This configuration is particularly useful in layers that download resources. Using a cache-only layer supports allows a buildpack to re-use cached downloads during installation. For example, pip wheels could be downloaded as a cache-only layer and the same buildpack could install the wheels in to a launch layer. |
| 63 | + |
| 64 | +There are other boolean combinations of cache, build and launch. These provide significant flexibility in the caching system. Users of less common caching strategies need a good understanding of the [buildpacks specification on Layer Types](https://github.com/buildpacks/spec/blob/main/buildpack.md#layer-types |
| 65 | +). |
0 commit comments