-
Notifications
You must be signed in to change notification settings - Fork 774
Description
The OCI Image config document covers the calculation of the ChainID but it doesn't go into why this is useful or how to best leverage.
The best way to view it is a hash of ordering of applied layers.
Let's say we have layers A, B, C, ordered from bottom to top, where A is the base and C is the top. Defining | as a binary application operator, the root filesystem may be A|B|C. While it is implied that C is only useful when applied to A|B, the identifier C is insufficient to identify this result, as we'd have the equality C = A|B|C, which isn't true.
The main issue is when we have two definitions of C, C = C and C = A|B|C. If this is true (with some handwaving), C = x|C where x = any application must be true. This means that if an attacker can define x, relying on C provides no guarantee that the layers were applied in any order.
The ChainID addresses this problem by being defined as a compound hash. We differentiate the changeset C, from the order dependent application A|B|C by saying that the resulting rootfs is identified by ChainID(A|B|C), which can be calculated by ImageConfig.rootfs.
The definition from the spec is something like this (also, see the base implementation):
ChainID(layer[N]) = SHA256hex(ChainID(layer[N-1]) + " " + DiffID(layer[N])).
(Note that this definition is slightly insufficient, because it implies that layer[N] is layer[0]|...|layer[N-1]|layer[N], which we indicate doesn't quite add up above)
With our expanded example, the we can have a symbolic definition of ChainID(C), which is a variation on some function Hchain(A|B|C), with some notation hand-waving.
ChainID(A) = DiffID(A)
ChainID(A|B) = SHA256(ChainID(A) + " " + DiffID(B))
ChainID(A|B|C) = SHA256(ChainID(A|B) + " " + DiffID(C))
(Note that we may be missing the base case, ChainID(A) = DiffID(A), as well)
Let's expand this, for fun:
ChainID(A|B|C) = SHA256(SHA256(DiffID(A) + " " + DiffID(B)) + " " + DiffID(C))
Hopefully, the above is illustrative of the actual contents of the ChainID.
Most importantly, ChainID(C) != ChainID(A|B|C), otherwise, ChainID(C) = DiffID(C), which is the base case, could not be true.
Taking these considerations, we can write a new definition in the following form:
ChainID(L0) = DiffID(L0)
ChainID(L0|...|Ln-1|Ln) = SHA256(ChainID(L0|...|Ln-1) + " " + DiffID(Ln))
While the notation is a little obtuse (suggestions welcome), it better reflects the recursive nature of the algorithm and the fact that the ChainID is not a property of the layer, but a property of the application of layers.
The provides the following implications:
- Update the specification (config: provide more complete explanation of ChainID #586)
- Provide better context on the usage and role of ChainID -> implementations should use it to identify unpacking result.
- clarify the recursive nature of this algorithm.
- Provide implementation of
ChainIDfunction. (identity: add implementation of ChainID #486)