Path is a list of Unicode characters that encodes a series of segments. Each segment represent an access operations on Protocol Buffers composite type, that when executed will retrieve a specific value. Protocol Buffers composite types are:
- messages,
- lists (repeated field),
- maps.
Therefore each path segment represent either a message field, a list index or a map key.
Path segments that designates message fields are strings that exactly match:
- a field name as specified in the proto definition,
- JSON variant of a field name from the proto definition,
- a field number from the proto definition.
Path segments that designates list indexes are strings of base 10 encoded offsets of an element in the list.
Indexes can either be positive or negative. Positive indexes indicate offset from the beginning of the list. Negative indexes indicates number of an element from the back of the list.
Path segments that designates map keys are strings representation of a given map key.
To form a path individual segments needs to be encoded.
Individual path segments are separated via a single dot character. To create a path segment that contain a dot character backslash escape can be used. Similarly to create a path segment that contains a backslash character another backslash character can be used.
Alternatively segments can be wrapped in square brackets ('[' and ']'). There must be no extra dot between a preceding segment and a wrapped segment. To use closing square bracket character inside wrapped path segment it needs to be escaped using backslash. Similarly to create a path segment wrapped in square brackets that contains a backslash another backslash can be used.
Path is a list of Unicode characters. When decoded into individual segments each segment should also be composed of individual Unicode characters.
Paths must conform to the following grammar:
proto_path = segment *following_segment
segment = identifier_segment / ; segment
wrapped_segment ; [segment]
following_segment = ("." identifier_segment) / ; .segment
wrapped_segment ; [segment]
identifier_segment = *identifier_char ; segment
identifier_char = ; skip control characters
%x20-21 /
; skip %x22 " double quote
%x23-26 /
; skip %x27 ' single quote
%x28-2D /
; skip %x2E . dot
%x2F-5A /
; skip %x5B [ opening square bracket
%x5C escapable_char / ; \X escape sequence
; skip %x5D ] closing square bracket
%x5E-10FFFF
wrapped_segment = "[" *wrapped_segment_char "]" ; [segment]
wrapped_segment_char = identifier_char / "."
escapable_char = %x22 / ; " double quote U+0022
%x27 / ; ' single quote U+0027
"." / ; . dot U+002E
"[" / ; [ opening square bracket U+005B
"]" / ; ] closing square bracket U+005D
"\" / ; \ backslash U+005C
"b" / ; b BS backspace U+0008
"f" / ; f FF form feed U+000C
"n" / ; n LF line feed U+000A
"r" / ; r CR carriage return U+000D
"t" / ; t HT horizontal tab U+0009
("u" hexchar_s) / ; uXXXX U+XXXX
("U" hexchar_l) ; UXXXXXXXX U+XXXXXXXX
hexchar_s = ((DIGIT / A / B / C / E / F) 3HEXDIGIT) / ; %x0-CFFF and %xE000-FFFF (U+0000-U+CFFF and U+E000-U+FFFF)
(D ("0" / "1" / "2" / "3" / "4" / "5" / "6" / "7") 2HEXDIGIT) ; %xD000-D7FF (U+D000-U+D7FF)
; UTF-16 surrogates are not allowed and should result in parsing error or be treated as Unicode replacement character (U+FFFD)
hexchar_l = (4ZERO hexchar_s) / ; %x0-D7FF and %xE000-FFFF (U+00000000-U+0000D7FF and U+0000E000-U+0000FFFF)
(3ZERO NON_ZERO_HEXDIGIT 4HEXDIGIT) / ; %x10000-FFFFF (U+00010000-U+000FFFFF)
(2ZERO ONE ZERO 4HEXDIGIT) ; %x100000-10FFFF (U+00100000-U+0010FFFF)
; code points that are out of range are not allowed and should result in parsing error or be treated as Unicode replacement character (U+FFFD)
NON_ZERO_HEXDIGIT = NON_ZERO_DIGIT / A / B / C / D / E / F
HEXDIGIT = DIGIT / A / B / C / D / E / F
F = "f" / "F"
E = "e" / "E"
D = "d" / "D"
C = "c" / "C"
B = "b" / "B"
A = "a" / "A"
NON_ZERO_DIGIT = %x31-39 ; 1-9
DIGIT = %x30-39 ; 0-9
ONE = "1"
ZERO = "0"