Skip to content

Clarify role of ChainID #482

@stevvooe

Description

@stevvooe

The OCI Image config document covers the calculation of the ChainID but it doesn't go into why this is useful or how to best leverage.

The best way to view it is a hash of ordering of applied layers.

Let's say we have layers A, B, C, ordered from bottom to top, where A is the base and C is the top. Defining | as a binary application operator, the root filesystem may be A|B|C. While it is implied that C is only useful when applied to A|B, the identifier C is insufficient to identify this result, as we'd have the equality C = A|B|C, which isn't true.

The main issue is when we have two definitions of C, C = C and C = A|B|C. If this is true (with some handwaving), C = x|C where x = any application must be true. This means that if an attacker can define x, relying on C provides no guarantee that the layers were applied in any order.

The ChainID addresses this problem by being defined as a compound hash. We differentiate the changeset C, from the order dependent application A|B|C by saying that the resulting rootfs is identified by ChainID(A|B|C), which can be calculated by ImageConfig.rootfs.

The definition from the spec is something like this (also, see the base implementation):

ChainID(layer[N]) = SHA256hex(ChainID(layer[N-1]) + " " + DiffID(layer[N])).

(Note that this definition is slightly insufficient, because it implies that layer[N] is layer[0]|...|layer[N-1]|layer[N], which we indicate doesn't quite add up above)

With our expanded example, the we can have a symbolic definition of ChainID(C), which is a variation on some function Hchain(A|B|C), with some notation hand-waving.

ChainID(A) = DiffID(A)
ChainID(A|B) = SHA256(ChainID(A) + " " + DiffID(B))
ChainID(A|B|C) = SHA256(ChainID(A|B) + " " + DiffID(C))

(Note that we may be missing the base case, ChainID(A) = DiffID(A), as well)

Let's expand this, for fun:

ChainID(A|B|C) = SHA256(SHA256(DiffID(A) + " " + DiffID(B)) + " " + DiffID(C))

Hopefully, the above is illustrative of the actual contents of the ChainID.

Most importantly, ChainID(C) != ChainID(A|B|C), otherwise, ChainID(C) = DiffID(C), which is the base case, could not be true.

Taking these considerations, we can write a new definition in the following form:

ChainID(L0) =  DiffID(L0)
ChainID(L0|...|Ln-1|Ln) =  SHA256(ChainID(L0|...|Ln-1) + " " + DiffID(Ln))

While the notation is a little obtuse (suggestions welcome), it better reflects the recursive nature of the algorithm and the fact that the ChainID is not a property of the layer, but a property of the application of layers.

The provides the following implications:

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions