Hardening of codes for labels #1022

lenkan · 2025-06-04T14:25:38Z

lenkan
Jun 4, 2025

Now when Sam is implementing CESR native messages for KERI and ACDC events that includes generic field maps, I wonder if it is time to harden the rules for how to serialize the field labels. Unless I am missing something, the CESR spec does not contain the rules implemented here:

keripy/src/keri/core/coring.py

Lines 3210 to 3238 in 95481fb

    
           if label: 
        
               if hasattr(label, "encode"):  # make label bytes 
        
                   label = label.encode("utf-8") 
        
               if Reb64.match(label):  # candidate for Base64 compact encoding 
        
                   try: 
        
                       code = Tagger._codify(tag=label) 
        
                       soft = label 
        
                   except InvalidSoftError as ex:  # too big 
        
                       if label[0] != ord(b'A'):  # use Bexter code 
        
                           code = LabelDex.StrB64_L0 
        
                           raw = Bexter._rawify(label) 
        
                       else:  # use Texter code since ambiguity if starts with 'A' 
        
                           code = LabelDex.Bytes_L0 
        
                           raw = label 
        
               else: 
        
                   if len(label) == 1: 
        
                       code = LabelDex.Label1 
        
                   elif len(label) == 2: 
        
                       code = LabelDex.Label2 
        
                   else: 
        
                       code = LabelDex.Bytes_L0 
        
                   raw = label

.

The Labeler class has logic that determines which CESR Primitive to use, These rules depends on the length and contents of the label. I think these rules should either be in the CESR spec, and/or field map labels should be simplified to just use the Texter class.

Any thoughts on this?

My naive suggestion is to encode field labels as string (4A, 5A, 6A), regardless of length or contents. That way it is easy to reproduce the serialization without custom logic.

SmithSamuelM · 2025-06-04T19:47:00Z

SmithSamuelM
Jun 4, 2025
Maintainer

While I agree that merely encoding labels as variable length text streams is simpler is it also much more verbose in general or on average. For protocols like KERI ACDC that use compact field labels, Labeler attempts to find the most compact CESR encoding for the specific label. If compactness is never an issue then we could just use JSON everywhere and call it a day. But the reason CBOR exists is that compactness matters and the reason CESR exists is that the tension between compact binary protocols like like CBOR and much less compact text protocols like JSON is a for real tension and CESR provides a solution. It goes even further because it adds the at scale advantages of pipelineable serialization and deserialization. But if we use a dumb serialization approach, then native CESR field maps in the TEXT domain start to be less compact than JSON and native CESR serializations in the binary domain start to be less compact than CBOR. The goal is that CESR should be no worse that JSON on average in the Text domain and no worse than CBOR in the binary domain, so that net over net there is no reason not to use CESR as its composability and pipelineability make is otherwise advantageous even when its compactness is no worse than Cbor/Json in thier respective domains.

For fixed field serializations where the labels are not included, CESR is signicantly more compact than JSON in the Text domain and is more compact in the Text domain than CBOR is in the binary domain and Fixed Field CESR in the binary domain is significantly more compact than CBOR in the binary domain. This advantage is largely due to the overhead of labels. Since one can encode fixed field with CESR due to its self-framing nature, one should whenever one can. But in those cases where one can't use fixed fields and must include labels then using compact encodings for labels makes a difference.

1 reply

lenkan Jun 5, 2025
Author

If compactness is never an issue then we could just use JSON everywhere and call it a day.

This is where I get hung up. From my point of view, this is preferable. At this stage of development, I think adoptability and maintainability is much more important than compactness. Are there any real cases where the compactness of JSON events is not enough?

SmithSamuelM · 2025-06-04T19:53:16Z

SmithSamuelM
Jun 4, 2025
Maintainer

I do agree that the spec should harden the rules for label serialization, once we work out the kinks.

0 replies

SmithSamuelM · 2025-06-04T20:10:15Z

SmithSamuelM
Jun 4, 2025
Maintainer

For example let's compare a single character string label = 'a'
In JSON there are 3 characters of overhead. "a": these are two quote characters and the colon character.
In CBOR there is a single byte of overhead to encode a single character string (indeed any string up to 23 characters.
So in CESR if we use a Tag1, we can encode the label "a" with 3 characters of overhead since the minimum sized primitive in CESR in the TEXT domain must align on a 24 bit boundary (aka 4 Base64 characterrs). These 4 Text domain characters convert to 3 bytes in the binary domain.

So in comparison the overhead for a single character label in CESR == JSON in TEXT so no worse. The overhead in CESR in binary domain is 2 which is 1 more than CBOR. So a little worse.

For two character labels the overhead in CESR using the Tag2 code in Text is 2 and the overhead in JSON is still 3 so CESR better. In CBOR its 1 byte of overhead and in CESR 1 byte of overhead. So no worse. Indeed for short length labels on average CBOR is just a little better than CESR. But only a little. So the advantage of bulk composability and conversion to TEXT make it still advantageous. And for cryptographic heavy protocols the crypto agility encoding adantage of cESR will dominate from acompactness point of view.

If we were to use variable length text strings whose minimum size is 8 characters then for short labels CESR would have significantly more overhead that either JSON and CBOR.

3 replies

lenkan Jun 5, 2025
Author

In JSON there are 3 characters of overhead. "a":

Actually, is it not 5 characters? You need the enclosing quotes of the string itself: "a":"<string>". If I understand your argument correctly.

If we were to use variable length text strings whose minimum size is 8 characters then for short labels CESR would have significantly more overhead that either JSON and CBOR.

True. I guess the overhead compared to JSON is not that bad considering the above.

SmithSamuelM Jun 5, 2025
Maintainer

I am only refering to the overhead of the label not its value

lenkan Jun 5, 2025
Author

Right. I had the other conversation about field values top of mind when I wrote that.

SmithSamuelM · 2025-06-05T01:45:41Z

SmithSamuelM
Jun 5, 2025
Maintainer

Just pushed fix for Mapper class so that it rounds trips the special case where a field map value is a string that is also a verbatim qb64 of a special type. When this happens the Mapper inserts a new special primitive code called Escape to indicate that the following primitives qb64 is to be round tripped verbatim and not converted to a special type. the special types are Escape, None, bool, int, float, and string.

1 reply

lenkan Jun 5, 2025
Author

Great. It seems to solve the issue #1019 at least. Perhaps it can be closed.

Hardening of codes for labels #1022

Uh oh!

lenkan Jun 4, 2025

Replies: 4 comments · 5 replies

Uh oh!

Uh oh!

SmithSamuelM Jun 4, 2025 Maintainer

Uh oh!

lenkan Jun 5, 2025 Author

Uh oh!

SmithSamuelM Jun 4, 2025 Maintainer

Uh oh!

Uh oh!

SmithSamuelM Jun 4, 2025 Maintainer

Uh oh!

lenkan Jun 5, 2025 Author

Uh oh!

SmithSamuelM Jun 5, 2025 Maintainer

Uh oh!

lenkan Jun 5, 2025 Author

Uh oh!

SmithSamuelM Jun 5, 2025 Maintainer

Uh oh!

lenkan Jun 5, 2025 Author

lenkan
Jun 4, 2025

Replies: 4 comments 5 replies

SmithSamuelM
Jun 4, 2025
Maintainer

lenkan Jun 5, 2025
Author

SmithSamuelM
Jun 4, 2025
Maintainer

SmithSamuelM
Jun 4, 2025
Maintainer

lenkan Jun 5, 2025
Author

SmithSamuelM Jun 5, 2025
Maintainer

lenkan Jun 5, 2025
Author

SmithSamuelM
Jun 5, 2025
Maintainer

lenkan Jun 5, 2025
Author