Skip to content

Latest commit

 

History

History
117 lines (77 loc) · 4.89 KB

File metadata and controls

117 lines (77 loc) · 4.89 KB

Specification of path semantic for Protocol Buffers

Path is a list of Unicode characters that encodes a series of segments. Each segment represent an access operations on Protocol Buffers composite type, that when executed will retrieve a specific value. Protocol Buffers composite types are:

  • messages,
  • lists (repeated field),
  • maps.

Therefore each path segment represent either a message field, a list index or a map key.

Message field path segments

Path segments that designates message fields are strings that exactly match:

  • a field name as specified in the proto definition,
  • JSON variant of a field name from the proto definition,
  • a field number from the proto definition.

List indexes path segments

Path segments that designates list indexes are strings of base 10 encoded offsets of an element in the list.

Indexes can either be positive or negative. Positive indexes indicate offset from the beginning of the list. Negative indexes indicates number of an element from the back of the list.

Map keys path segments

Path segments that designates map keys are strings representation of a given map key.

Path segments encoding

To form a path individual segments needs to be encoded.

Individual path segments are separated via a single dot character. To create a path segment that contain a dot character backslash escape can be used. Similarly to create a path segment that contains a backslash character another backslash character can be used.

Alternatively segments can be wrapped in square brackets ('[' and ']'). There must be no extra dot between a preceding segment and a wrapped segment. To use closing square bracket character inside wrapped path segment it needs to be escaped using backslash. Similarly to create a path segment wrapped in square brackets that contains a backslash another backslash can be used.

Path grammar

Path is a list of Unicode characters. When decoded into individual segments each segment should also be composed of individual Unicode characters.

Paths must conform to the following grammar:

proto_path = segment *following_segment

segment = identifier_segment / ; segment
          wrapped_segment      ; [segment]

following_segment = ("." identifier_segment) / ; .segment
                    wrapped_segment            ; [segment]

identifier_segment = *identifier_char ; segment

identifier_char =   ; skip control characters
                  %x20-21 /
                    ; skip %x22 " double quote
                  %x23-26 /
                    ; skip %x27 ' single quote
                  %x28-2D /
                    ; skip %x2E . dot
                  %x2F-5A /
                    ; skip %x5B [ opening square bracket
                  %x5C escapable_char / ; \X escape sequence
                    ; skip %x5D ] closing square bracket
                  %x5E-10FFFF

wrapped_segment = "[" *wrapped_segment_char "]" ; [segment]

wrapped_segment_char = identifier_char / "."

escapable_char = %x22            / ; " double quote U+0022
                 %x27            / ; ' single quote U+0027
                 "."             / ; . dot U+002E
                 "["             / ; [ opening square bracket U+005B
                 "]"             / ; ] closing square bracket U+005D
                 "\"             / ; \ backslash U+005C
                 "b"             / ; b BS backspace U+0008
                 "f"             / ; f FF form feed U+000C
                 "n"             / ; n LF line feed U+000A
                 "r"             / ; r CR carriage return U+000D
                 "t"             / ; t HT horizontal tab U+0009
                 ("u" hexchar_s) / ; uXXXX U+XXXX
                 ("U" hexchar_l)   ; UXXXXXXXX U+XXXXXXXX

hexchar_s = ((DIGIT / A / B / C / E / F) 3HEXDIGIT)                       / ; %x0-CFFF and %xE000-FFFF (U+0000-U+CFFF and U+E000-U+FFFF)
            (D ("0" / "1" / "2" / "3" / "4" / "5" / "6" / "7") 2HEXDIGIT)   ; %xD000-D7FF              (U+D000-U+D7FF)
                                                                            ; UTF-16 surrogates are not allowed and should result in parsing error or be treated as Unicode replacement character (U+FFFD)

hexchar_l = (4ZERO hexchar_s)                   / ; %x0-D7FF and %xE000-FFFF (U+00000000-U+0000D7FF and U+0000E000-U+0000FFFF)
            (3ZERO NON_ZERO_HEXDIGIT 4HEXDIGIT) / ; %x10000-FFFFF            (U+00010000-U+000FFFFF)
            (2ZERO ONE ZERO 4HEXDIGIT)            ; %x100000-10FFFF          (U+00100000-U+0010FFFF)
                                                  ; code points that are out of range are not allowed and should result in parsing error or be treated as Unicode replacement character (U+FFFD)

NON_ZERO_HEXDIGIT = NON_ZERO_DIGIT / A / B / C / D / E / F

HEXDIGIT = DIGIT / A / B / C / D / E / F

F = "f" / "F"

E = "e" / "E"

D = "d" / "D"

C = "c" / "C"

B = "b" / "B"

A = "a" / "A"

NON_ZERO_DIGIT = %x31-39 ; 1-9

DIGIT = %x30-39 ; 0-9

ONE = "1"

ZERO = "0"