Synchronization between KERI and ACDC message protocol versions and CESR genus versions #1016
Replies: 7 comments 14 replies
-
Relying on inferred information for versioning can lead to potential confusion and errors down the line. Inference rules, no matter how well-defined initially, can become subject to interpretation or misimplementation. As systems evolve, or if new developers come on board, the nuances of these rules might be overlooked, leading to bugs that are hard to diagnose, particularly in multi-vendor environments where messages generated by one vendor's software are parsed by another's. I vouch for Explicit Version Synchronization, as it offers unambiguous clarity. |
Beta Was this translation helpful? Give feedback.
-
@daidoji If I understand what you are proposing based on the discussion. One way to address the problem is to force that the gvrsn == pvrsn . This is tight synchronization. So if the genus is KERI/ACDC then the supported CESR KERI table version must be the same as the version of KERI in the message. Moreover because ACDC message share the same genus table the version of ACDC must also be synchronized. This also means that the stream in which a KERI message is send must be set to the same version as the version of the messages in the stream. This means that one can't send a version of KERI message without explicitly putting the genus version in the stream for that message. This forces that the governance of the KERI/ACDC protocols GENUS table must be synchornized with the KERI and ACDC protocols. This also means that the KERI and ACDC protocol versions must also be synched to the CESR GENUS version. I.e. all three protocols must use the same version. So if I want to send an ACDC version 2.1 then I must use CESR GENUS ACDC/KERI version 2.1. But if I want to send a KERI version 2.1 then I must use GENUS ACDC/KERI 2.1 which forces KERI and ACDC to be lock step versions. This is doable as long as the governance of both specs is synchronized on versioning. Which it currently is. Otherwise We could split the the GENUS table and have a different GENUS table for ACDC and a different GENUS table for KERI. Then when we send an ACDC message we have to set the stream version to be GENUS ACDC not GENUS KERI. Currently they use the same genus table. But splitting the GENUS tables means that we can't share code between the two protcools. For example we would have to have either Counters that are ACDC counters and Counters that are KERI counters, as well ad Matter classes or have the code table lookup include both genus and version for a counter. Same for matter primitives.. Since a Counter is based on a GENUs table and version. So when you create a counter you can't just have it look up its codes by version it has to also look up its codes by Genus. So I would rather not split the GENUS table in order to avoid synchronizing KERI and ACDC versions. So we either lock step all three protocols to the same version which is the implicit synchronization or we allows for the genus version A parser of the stream must support the genus version of the stream. A parser of a message in a stream must support both the genus and message protocol version of that message (either implict or explicit for the genus version of the message). I can go either way. We have three choices
The least work is 1. but it is the least flexible. The most work is 3. but it is more flexible than 1. An intermediate amount of work is 2. and it is also more flexible than 1. it may or may not be less flexible than 3. |
Beta Was this translation helpful? Give feedback.
-
As discussed on the community call. This doesn't seem like rocket science to me but I'm surprised its come up again as I pointed this out a year and a half ago. No worries we can revisit. Some of this stuff has already been revisited/changed without comment in the specs already (the preambles changed from There are three main things that can be versioned in the current construction and interaction between the specs:
There are other things that could/should be versioned like SAID construction, semantics of code tables as many of these elements have implicit dependencies on the particular behaviors of the python libraries that underly keripy (at least if you want to be interoperable), but these things are all out of scope. We'll review: Changes in 1) probably should be versioned but weren't. If the, lets say "cold-start" changes tomorrow it would be nice to have a way to shift the parsers or at least let them know that they can/can't parse a given stream because of parsing elements or structures contained within that stream. However, another preamble say
insha'Allah 🙏 Changes in 1) are already bound to code tables in 2). The normative requirement in spec is that future code tables are already bound to a specific CESR version. We know that because of the several normative requirements listed in the spec:
That is the code tables have the normative requirement to have certain count codes and an implicit dependency to a CESR version. If you wanted to make this more explicit you can change the preamble Changes in 3) are already bound to code tables via specification and can already be switched with the current protocol/genus preamble. A message in isolation outside of a CESR stream is bound to a particular code table, not a particular CESR version. If anything it should be bound to 2) not 1). (Which is another oddity of the current CESR spec, reading individual TLV primitives out of say a JSON map is never described in any of the three specs, this is just folklore based on the reference implementation, which fields are TLV primitives and which are enums, and which can be anything, say So code tables 2) (of which only KERI/ACDC/TSP is published but its trivial to create your own) are orthogonal to CESR versions 1) when viewed in the context of KERI/ACDC/TSP 3) messages. This implicit dependency was always going to exist by dint of the fact that primitives exist in the field map representations. This dependency already always exists even in CESR native serializations. So if you're looking to serialize messages from 3) with enough information to reparse in isolation (although why you'd save a serialized representation of a message without its cesr attachment proofs or any information) you should really store the version of the code table in 2) ,not the cesr version in 1), the CESR version in 1) change is already going to maybe or maybe not break changes in 2). The reality is there aren't going to be a lot of code tables in existence and so enumerating them isn't difficult but they are entirely orthogonal to 1) even though 1) may imply some implicit dependencies on preamble changes and universal codes and the like. Summary: CESR Versions 1) -> there are very few |
Beta Was this translation helpful? Give feedback.
-
@daidoji @pfeairheller @m00sey
I guess that is where we differ since as I understand the spec as written and as I wrote it, that is not true. And the current reference implementation does not use a one to one. That may have been your interpretation, but I think you are overstepping to say that it is already true. |
Beta Was this translation helpful? Give feedback.
-
@pfeairheller @m00sey @daidoji Also CESR itself is not versioned per se, only the per-genus code tables. But even then the first normative version of CESR is actually 2.0. What we are calling CESR 1.0 was never a formal specification, it was grandfathered in. What is normative for CESR 2.0 is that there are universal codes. But technically, even those could be code table specific since the appearance of the genus-version code in the stream allows a parser to switch to different codes for the universal codes, as is the case with the new Keripy parser that supports non-normative "universal" group count tables from CESR 1 that differ from the normative "universal" count codes in CESR 2.0. So the real question is still valid for the community. How do we want to map code table genus-version to an given message protocol version. Is is strictly 1 to 1, and even if we decide to make it 1 to 1 i.e. message protocol = protocol genus, do the versions of each stay in lock step or can then vary. From a purely version dependency standpoint, the two versions govern different behavior, so there is a valid technical reason for them not to be lock-step even when 1to 1 on the prototocol == genus. So instead of asserting requirements that were never specified please engage in the discussion on its own merits. If you want to hold that it's not a valid question, then it's hard even to have the conversation since you already are reading into the specs non-normative constraints that are not written as MUST requirements. I just searched the KERI spec and there is no normative statement I can find that specifies a given code table version for the KERI protocol. It is implied that there is some set of CESR codes, but it does not ever refer to a specific genus. Indeed, the word
Which is definitely not 1 to 1 as it merges KERI/ACDC Moreover, the Annex that provides the defined GENUS codes is as follows: Universal Code table genus/version codes
Which is definitely NOT 1 to 1. So I fail to see on what basis you would strongly state:
It is still TBD. This has become an issue because not only do we have two protocols KERI and ACDC but a third and maybe a fourth, namely, the SPAC and TSP protocols. So the point of this discussion is to nail down the normative requirements. One of the reasons to implement a draft spec before it becomes hardened is that real implementation issues can re-prioritize specification design features. This sometimes means abandoning some implementation details as real-world experience better educates the specification. The fuzzy parts of the spec are fuzzy because of a lack of open source full implementations to educate. When someone does a proprietary implementation of a draft spec and does not share that back with the community nor even share back lessons learned in that proprietary implementation, it's is not likely to be well received by those doing open source implementations. |
Beta Was this translation helpful? Give feedback.
-
Implicit normative dependency is not objective its subjective. So yes you could not safely assume that anything implicit is normative. This may be because the spec is incomplete and we need a pull request to harden that, or because it is yet TBD. Sorry if that is confusing. But as I have limited resources, I suffer from not having the time to track down every possible interpretation and that means It might be problematic for implementers. All I can say is, please contribute, pull requests welcome. |
Beta Was this translation helpful? Give feedback.
-
Perhaps a dumb question. But if the event messages are encoded as generic field maps That would require a resolution to this issue first though: #1021 - perhaps by encoding all string field values as generic string codes. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Versioning Synchronization
A given message in KERI has a protocol version that governs what fields appear in a message, in what order, and also what those fields mean. For non-native CESR serializations of those messages (namely JSON CBOR MGPK), the field label is also determined by the protocol version.
Likewise, a given Message in ACDC has a protocol version that governs what fields appear in a message in what order and what those fields mean. An ACDC message itself is a field map (not fixed fields). Other message types in the ACDC protocol in CESR Native format may have either fixed fields or field maps at the top level..
Field values in both KERI and ACDC messages may use CESR primitives. This means that the CESR genus version is relevant. Native ACDC field maps also use CESR primitives for field labels. Both KERI and ACDC use CESR primitives and groups for attachments, and CESR native KERI and ACDC messages use CESR groups inside the messages.
Thus, the version of CESR used for KERI and ACDC messages may matter in order to correctly generate and parse a message.
The problem arises in that currently the CESR genus version does not appear in the version string for KERI/ACDC non-native messages nor does it appear in the version field (Verser) for native CESR messages.
There are two solutions.
Inferred Version Synchronization
message protocol version (pvrsn)
cesr genus version (gvrsn)
The purpose of inferred synchronizing is to avoid including both the pvrsn and gvrsn in the version field of a KERI or ACDC message. Otherwise we would need to put both the pvrsn and gvrsn in the version field value. This means changing the defition of the
v
version string for JSON, CBOR, MGPK, i.e., non-CESR-native message formats.For CESR native, messages the
v
field value is provided by a Verser class instance that accepts one of two codes, a seven-character version value or a ten-character version value. The seven-character only has the pvrsn, the ten-character has both the pvrsn and gvrsn.It becomes problematic if the
v
string does not have both versionsbut the Verser version tag field value does, because then there is no way to
store the second version in the sad dict in the Serder unless its in the version string.
Discussion
So if at some point in the future we need want both versions because we decide to not
use implicit synchronization then we can go to a version3 version string that has both versions
such as:
V1
KERI10JSON00005a_
V2
KERICAACESRAAAA.
this corresponds CESR Tag7YKERICAA
V3
KERICAACAACESRAAAA.
this corresponds CESR Tag100OKERICAACAA
Then versify and deversify which returns a smellage would have both versions. Currently Smellage has a gvrsn field but it is set to None.
Synchronization Rules
CESR native Serders at version 2 synchronize the major version to be the same
i.e. assume genus and protocol are synced so only need one the protocol
As a given genus is for the whole family of message protocol types KERI, ACDC, SPAC
etc.
We assume that any minor version for a given major version of CESR Genus version
is always backwards compatible with previous minor versions for that major version.
But not forwards compatible. Typically minor version changes accomodate new
primitive codes. So if the stream parser's supported minor version is not later
that the provided serder minor version then the message is dropped and the stream
errors out.
To reiterate should a native serder uses a new primitive not supported by
the current parser minor version then it won't parse. But all earlier versions
of primitives should parse with any parser whose supported minor version is
later that the message version. This means that message protocol versions should
not advance faster than CESR genus versions.
So with only one version in a serder, which is the pvrsn, the supported gvrsn must
advance so that pvrsn.minor <= gvrsn.minor for a given pvrsn.major =gvrsn.major.
Otherwise all messages will be dropped.
To reiterate when group code semantics change, this forces the gvrsn.major
to change. This is a backward breaking change. So a parser supported gvrsn.major
incoming message stream has to explicity be switched back an earlier genus version
to support an earlier native message that uses groups that are not backwards
comptible anymore.
Since non-native (JSON, CBOR, MGPK) messages don't use CESR group codes they
don't have to worry about backward breaking gvrsn major changes. They would
only have to worry about backward breaking primitive code table changes which
as a policy we strive to avoid.
So if CESR native message (serder) do not have gvrsn codes then should it ever be the
case that there is a backwards breaking group/count code change that induces
a new major gvrsn then the pvrsn of the serder native message must be synchronized
to the gvrsn of CESR used for its group codes (assuming primitives are always
not backward breaking).
So for either native messages or non-native messages in streams with group codes
For native messages:
pvrsn.major == gvrsn.major
pvrsn.minor <= gvrsn.major
Can't send a stream to a stream parser that does not support an older major
gvrsn. Can't guaranteee the any gvrsn.major != pvrsn.major will parse.
Explicit dependency
An alternative approach to implicity synchronization rules is to publish the explicit dependency of any pvrsn of KERI or ACDC on a specific gvrsn for CESR. This means that any parser must store the dependency table and look it up when parsing. For example,
Lets say Version 2.11 of KERI is released. It would have to publish that its dependency is CESR version 2.01 or 2.31 or whatever is the minimum version of CESR that is required or something like that, >=2.31 < 2.4. This means that for every message received by a parser it has to to the version dependency mapping math to decided if it can parse the message.
Including both versions in the version string means the parser doesn't have to do version dependency mapping math, it merely has to compare which versions of both it supports and if it supports both versions, then it selects those and parses, otherwise it drops.
Inferred Dependency with Required CESR stream version
Currently a given parser may start with a default CESR stream version (gvrsn) for group codes in the stream. Top level non-native messages (JSON, CBOR, MGPK) are not affected by this stream version. Another way to infer the gvrsn inside of a message is to force the stream to always have a Genus Version Group code and then any messages internal gvrsn is given by the stream gvrsn code not an explicity gvrsn provided inside the message. This changes the constraint on the generator of a message to always transmist the message in a stream with a Genus Version Code. This may be problematic because the gvrsn for that message is not self-contained and must be tracked elsewhere for replay. Because messages are signed, an internal version string that includes the gvrsn becomes non-repudiable to the generator of the message. It can't be spoofed, whereas an externally tracked gvrsn that must be inserted in the stream via genus version group code can be spoofed.
Beta Was this translation helpful? Give feedback.
All reactions