Replies: 2 comments 4 replies
-
Very much so. Basically... understanding the difference between "interesting" and "uninteresting" requires a very high degree of both semantic and contextual information in the generic. |
Beta Was this translation helpful? Give feedback.
-
I agree, but tend to think about this from the opposite side in terms of 'necessary understanding'. Noting byte level divergence requires little to no 'understanding'. 'less divergent' (in your parlance) approaches require much more understanding, and are thus much easier to get wrong. Your 'protobuf generates to go code' is a really good example. But consider Go Stringer as another example. It generates code that carries out various 'niceties' for enum like things in Go. If I change the name of a const, is that 'interesting' or not? You should probably make a case for 'not'. But it requires fairly deep understanding of what is going on to make that determination one way or the other. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
"Convergence" and "divergence" are words I've been using to myself to express a property of hash-based indexing such as a merkle graph. I'm wondering if these terms resonate with others, or if not, some other way of expressing this idea.
Basically:
For example, if you have a IDL file (eg protobuf) which is fed into a code generator which produces sources files implementing the serialization and deserialization. If you make, say, whitespace changes to the IDL input, it will have no effect on the generated output. In effect the output artifacts end up "converging" with all the other dependency graphs which include the same artifact. For something like a caching build system, it means it can skip actually compiling those intermediate generated source files, because it can reuse the results of a previously cached build system.
But if you're really concerned about how exactly those source files were generated, the convergence introduces an ambiguity - there are multiple potential build steps which could have generated that same source file. If you embed the input manifest in the generated output sources, then you avoid that by making every generated artifact distinct. But that has the downside of making the build system do redundant work.
The key point here is that what counts as convergence or divergence is very dependent on what you consider to be "interesting differences", which in turn is very use-case dependent.
The current Omnibor design is very much oriented towards "maximal divergence" - any change at the bit level of any artifact is encoded and propagated to every downstream derived artifact. This is conservative in that every change will be caught so there's maximal information conveyed - and specifically, if you see that two graphs share the same nodes then you're very sure they're exactly the same. But it also means that if there's any difference you need to then start digging into the actual artifacts to see if they're differences you care about.
On the other hand, if you can arrange for the graph to encode the right level of convergence for your use-case, then you can impement it much more efficiently. If you can be sure that "same id = no change for my use-case" then you can implement it just by looking at the graph itself.
Beta Was this translation helpful? Give feedback.
All reactions