Skip to content

Commit fc936c7

Browse files
authored
Merge pull request #695 from stevvooe/ebnf-grammar
spec: describe components of EBNF grammar
2 parents ed8677a + d17a0fa commit fc936c7

File tree

3 files changed

+122
-11
lines changed

3 files changed

+122
-11
lines changed

annotations.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -31,12 +31,12 @@ This specification defines the following annotation keys, intended for but not l
3131
* **org.opencontainers.image.ref.name** Name of the reference for a target (string).
3232
* SHOULD only be considered valid when on descriptors on `index.json` within [image layout](image-layout.md).
3333
* Character set of the value SHOULD conform to alphanum of `A-Za-z0-9` and separator set of `-._:@/+`
34-
* An EBNF'esque grammar + regular expression like:
34+
* The reference must match the following [grammar](considerations.md#ebnf):
3535
```
36-
ref := component ["/" component]*
37-
component := alphanum [separator alphanum]*
38-
alphanum := /[A-Za-z0-9]+/
39-
separator := /[-._:@+]/ | "--"
36+
ref ::= component ("/" component)*
37+
component ::= alphanum (separator alphanum)*
38+
alphanum ::= [A-Za-z0-9]+
39+
separator ::= [-._:@+] | "--"
4040
```
4141
* **org.opencontainers.image.title** Human-readable title of the image (string)
4242
* **org.opencontainers.image.description** Human-readable description of the software packaged in the image (string)

considerations.md

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -24,3 +24,114 @@ Implementations:
2424
[github.com/docker/go]: https://github.com/docker/go/
2525
[Go]: https://golang.org/
2626
[JSON]: http://json.org/
27+
28+
# EBNF
29+
30+
For field formats described in this specification, we use a limited subset of [Extended Backus-Naur Form][ebnf], similar to that used by the [XML specification][xmlebnf].
31+
Grammars present in the OCI specification are regular and can be converted to a single regular expressions.
32+
However, regular expressions are avoided to limit abiguity between regular expression syntax.
33+
By defining a subset of EBNF used here, the possibility of variation, misunderstanding or ambiguities from linking to a larger specification can be avoided.
34+
35+
Grammars are made up of rules in the following form:
36+
37+
```
38+
symbol ::= expression
39+
```
40+
41+
We can say we have the production identified by symbol if the input is matched by the expression.
42+
Whitespace is completely ignored in rule definitions.
43+
44+
## Expressions
45+
46+
The simplest expression is the literal, surrounded by quotes:
47+
48+
```
49+
literal ::= "matchthis"
50+
```
51+
52+
The above expression defines a symbol, "literal", that matches the exact input of "matchthis".
53+
Character classes are delineated by brackets (`[]`), describing either a set, range or multiple range of characters:
54+
55+
```
56+
set := [abc]
57+
range := [A-Z]
58+
```
59+
60+
The above symbol "set" would match one character of either "a", "b" or "c".
61+
The symbol "range" would match any character, "A" to "Z", inclusive.
62+
Currently, only matching for 7-bit ascii literals and character classes is defined, as that is all that is required by this specification.
63+
Multiple character ranges and explicit characters can be specified in a single character classes, as follows:
64+
65+
```
66+
multipleranges := [a-zA-Z=-]
67+
```
68+
69+
The above matches the characters in the range `A` to `Z`, `a` to `z` and the individual characters `-` and `=`.
70+
71+
Expressions can be made up of one or more expressions, such that one must be followed by the other.
72+
This is known as an implicit concatenation operator.
73+
For example, to satisfy the following rule, both `A` and `B` must be matched to satisfy the rule:
74+
75+
```
76+
symbol ::= A B
77+
```
78+
79+
Each expression must be matched once and only once, `A` followed by `B`.
80+
To support the description of repetition and optional match criteria, the postfix operators `*` and `+` are defined.
81+
`*` indicates that the preceeding expression can be matched zero or more times.
82+
`+` indicates that the preceeding expression must be matched one or more times.
83+
These appear in the following form:
84+
85+
```
86+
zeroormore ::= expression*
87+
oneormore ::= expression+
88+
```
89+
90+
Parentheses are used to group expressions into a larger expression:
91+
92+
```
93+
group ::= (A B)
94+
```
95+
96+
Like simpler expressions above, operators can be applied to groups, as well.
97+
To allow for alternates, we also define the infix operator `|`.
98+
99+
```
100+
oneof ::= A | B
101+
```
102+
103+
The above indicates that the expression should match one of the expressions, `A` or `B`.
104+
105+
## Precedence
106+
107+
The operator precedence is in the following order:
108+
109+
- Terminals (literals and character classes)
110+
- Grouping `()`
111+
- Unary operators `+*`
112+
- Concatenation
113+
- Alternates `|`
114+
115+
The precedence can be better described using grouping to show equivalents.
116+
Concatenation has higher precedence than alernates, such `A B | C D` is equivalent to `(A B) | (C D)`.
117+
Unary operators have higher precedence than alternates and concatenation, such that `A+ | B+` is equivalent to `(A+) | (B+)`.
118+
119+
## Examples
120+
121+
The following combines the previous definitions to match a simple, relative path name, describing the individual components:
122+
123+
```
124+
path ::= component ("/" component)*
125+
component ::= [a-z]+
126+
```
127+
128+
The production "component" is one or more lowercase letters.
129+
A "path" is then at least one component, possibly followed by zero or more slash-component pairs.
130+
The above can be converted into the following regular expression:
131+
132+
```
133+
[a-z]+(?:/[a-z]+)*
134+
```
135+
136+
[ebnf]: https://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
137+
[xmlebnf]: https://www.w3.org/TR/REC-xml/#sec-notation

descriptor.md

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -66,14 +66,14 @@ If the _digest_ can be communicated in a secure manner, one can verify content f
6666
The value of the `digest` property is a string consisting of an _algorithm_ portion and an _encoded_ portion.
6767
The _algorithm_ specifies the cryptographic hash function and encoding used for the digest; the _encoded_ portion contains the encoded result of the hash function.
6868

69-
A digest string MUST match the following grammar:
69+
A digest string MUST match the following [grammar](considerations.md#ebnf):
7070

7171
```
72-
digest := algorithm ":" encoded
73-
algorithm := algorithm-component [algorithm-separator algorithm-component]*
74-
algorithm-component := /[a-z0-9]+/
75-
algorithm-separator := /[+._-]/
76-
encoded := /[a-zA-Z0-9=_-]+/
72+
digest ::= algorithm ":" encoded
73+
algorithm ::= algorithm-component (algorithm-separator algorithm-component)*
74+
algorithm-component ::= [a-z0-9]+
75+
algorithm-separator ::= [+._-]
76+
encoded ::= [a-zA-Z0-9=_-]+
7777
```
7878

7979
Note that _algorithm_ MAY impose algorithm-specific restriction on the grammar of the _encoded_ portion.

0 commit comments

Comments
 (0)