-
Notifications
You must be signed in to change notification settings - Fork 2
fix: std::vector unpacking to avoid resize if possible
#5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
In principle making any changes to this repo (which I didn't know existed) would make it difficult to move to newer versions/merge upstream changes, so this shouldn't be merged lightly. I'll review in a bit but are you sure this is a bug? Has it been reported or fixed upstream? |
|
@fcarreiro this is the repo used in https://github.com/AztecProtocol/aztec-packages/blob/master/barretenberg/cpp/cmake/msgpack.cmake ; I doubt that we would be moving to a different version in the upstream, given the changes Adam had to do to make this codebase work with Wasm (throwing errors had to be changed into macros doing abort based on compilation flags available in the Barretenberg build). We can still pull updates. I am very surprised by this bug, I have no explanation to why it happens. That piece of code has not changed for the last 10 years. All I know is based on my experiments, that single |
|
I don't think I'm best placed to determine whether this is necessary. I'm tempted to say that it's unlikely that we're the first ones to hit this edgecase but I can't really fault the changes in aztec-packages. |
|
FWIW I added It should be just: |
|
As a quick sanity check, I can decode the msgpack serialization of the noir template project to json with the below CLI command, so it's not that we're generating garbled data on the noir side. $ jq -r '.bytecode' ./target/temp.json | base64 --decode | gunzip | tail -c +2 | msgpack-cli decode
{"functions":[{"assert_messages":[],"current_witness_index":2,"expression_width":{"Bounded":{"width":4}},"opcodes":[{"BrilligCall":{"id":0,"inputs":[{"Single":{"linear_combinations":[["0000000000000000000000000000000000000000000000000000000000000001",0],["30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000",1]],"mul_terms":[],"q_c":"0000000000000000000000000000000000000000000000000000000000000000"}}],"outputs":[{"Simple":2}],"predicate":null}},{"AssertZero":{"linear_combinations":[],"mul_terms":[["0000000000000000000000000000000000000000000000000000000000000001",0,2],["30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000",1,2]],"q_c":"30644e72e131a029b85045b68181585d2833e84879b9709143e1f593f0000000"}}],"private_parameters":[0],"public_parameters":[1],"return_values":[]}],"unconstrained_functions":[{"bytecode":[{"Const":{"bit_size":{"Integer":"U32"},"destination":{"Direct":21},"value":"0000000000000000000000000000000000000000000000000000000000000001"}},{"Const":{"bit_size":{"Integer":"U32"},"destination":{"Direct":20},"value":"0000000000000000000000000000000000000000000000000000000000000000"}},{"CalldataCopy":{"destination_address":{"Direct":0},"offset_address":{"Direct":20},"size_address":{"Direct":21}}},{"Const":{"bit_size":"Field","destination":{"Direct":2},"value":"0000000000000000000000000000000000000000000000000000000000000000"}},{"BinaryFieldOp":{"destination":{"Direct":3},"lhs":{"Direct":0},"op":"Equals","rhs":{"Direct":2}}},{"JumpIf":{"condition":{"Direct":3},"location":8}},{"Const":{"bit_size":"Field","destination":{"Direct":1},"value":"0000000000000000000000000000000000000000000000000000000000000001"}},{"BinaryFieldOp":{"destination":{"Direct":0},"lhs":{"Direct":1},"op":"Div","rhs":{"Direct":0}}},{"Stop":{"return_data":{"pointer":{"Direct":20},"size":{"Direct":21}}}}]}]} |
|
I ran into another one, and this time I'm out of ideas: AztecProtocol/aztec-packages#12841 (comment) |
|
I figured it out, it was a dangling pointer, fixed in AztecProtocol/aztec-packages@5810e3b |
|
Notably the dangling pointer pattern appears in multiple places in |
|
This one (in the docs) I think is fine. Let me explain why.
however, because of the way this is written, the handle only gets deallocated AFTER this whole expression is used. So things are ok. If instead you did then you are in trouble. Because the handle gets deallocated after the PS: I have to check that things work like this, just my intuition says this is the explanation. |
|
Thanks, that sounds reasonable to me 👍 I'd expect the problem would have manifested itself already somehow if it wasn't right. |
|
https://www.programiz.com/online-compiler/493yYnMXkiR55 the last 7 may or may not work |
Adds `msgpack` serialisation to the generated Acir and Witness C++ code. I moved the alterations described in `dsl/README.md` into the code generation itself, so no manual work is required. The PR is running against a feature branch with the same name in Noir, here's the upstream PR: noir-lang/noir#7716 With this PR is merged, `bb` should be able to handle `msgpack` or `bincode`. Once that's released we can switch to using `msgpack` in Noir in both native and wasm by merging noir-lang/noir#7810. And then we can remove the `msgpack` format detection and the fallback to `bincode`. **TODO**: - [x] Get it to compile - [x] Change `nargo` to allow compiling contracts with `msgpack` format: added `NOIR_SERIALIZATION_FORMAT` env var - [x] Add a first byte to the data to say which serialization format it is. There is a chance that it would clash with existing bincode data though, so a fallback would anyway be necessary. (Or we should ascertain that bincode never starts with some specific bit sequence). - [x] ~Change the `bb` code so it tries `bincode`, then falls back to `msgpack` - this way the currently used format stays fast, but we can feed it new data.~ _This looks problematic, as exceptions in the wasm build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm. Instead we run [msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968) first to check if the data looks like msgpack; if not, we use bincode._ - [x] Run integration tests with `msgpack` on both sides in #13021 - [x] Ignore the Brillig opcodes in Barretenberg - [x] Change the serialization of `WitnessStack` and `WitnessMap` to use the env var, add fallbacks in `bb` for them - [x] Revert the change to `noir-repo-ref` before merging ### Use of `MSGPACK_FIELDS` The generated code is using `MSGPACK_FIELDS` for structs, to keep it more terse. At some point during debugging the memory issue below I changed it so that I can have more direct control by generating code for individual fields. That needed some helper functions which I looted from the `msgpack-c` library and injected into the namespaces as a `Helpers` struct. This approach might be useful if we wanted to have extra checks, for example rejecting the data if there are extra fields, indicating a type has been extended with things we don't recognise, or if we wanted handle renamed fields. I left it out so there is less code to maintain, but if we need it we can recover it from the [commit history](noir-lang/noir@b0a612d). ### Compile `nargo` with the `msgpack` feature ```bash echo af/msgpack-codegen > noir/noir-repo-ref noir/bootstrap.sh ``` ### Generate and compile C++ code ```bash cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd - cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp cd barretenberg/cpp && ./format.sh changed && cd - barretenberg/cpp/bootstrap.sh ``` ### Test `nargo` with `bb` One example of an integration test that uses `bb` and noir in the Noir repo is https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964 We can call it like this: ```bash cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd - ./barretenberg/cpp/bootstrap.sh export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb export NOIR_SERIALIZATION_FORMAT=msgpack noir/noir-repo/examples/prove_and_verify/test.sh ``` If it works, it should print this: ```console % unset NOIR_SERIALIZATION_FORMAT % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Finalized circuit size: 18 Proof saved to "./proofs/proof" Finalized circuit size: 18 VK saved to "./target/vk" Proof verified successfully ``` Whereas if it doesn't: ```console % export NOIR_SERIALIZATION_FORMAT=msgpack % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Length is too large ``` I attached the final artefacts to the PR so it's easier to test with just `bb`. [hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json) [witness.gz](https://github.com/user-attachments/files/19391074/witness.gz) ### Further testing With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen` feature branch, we can run all the contract compilations and tests with `msgpack` as follows: ```shell export NOIR_SERIALIZATION_FORMAT=msgpack ./bootstrap.sh ci ``` This is tested in #13021 ### Peek into artefacts We can inspect the file in JSON format using [this](https://crates.io/crates/msgpack-cli) msgpack CLI tool. ```shell jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq ``` Thanks Tom for the [spell](AztecProtocol/msgpack-c#5 (comment)) 🙏 ### False bug At some point I thought had to make some fixes in `msgpack-c` itself to make this work: AztecProtocol/msgpack-c#5 A similar [blocking bug](#12841 (comment)) was encountered when running the entire `ci` build with msgpack format. It turned out it was a [dangling pointer](msgpack/msgpack-c#695 (comment)) issue, fixed in 5810e3b Much of the comments below are related to my struggles that came from this mistake; you can ignore them.
Adds `msgpack` serialisation to the generated Acir and Witness C++ code. I moved the alterations described in `dsl/README.md` into the code generation itself, so no manual work is required. The PR is running against a feature branch with the same name in Noir, here's the upstream PR: noir-lang/noir#7716 With this PR is merged, `bb` should be able to handle `msgpack` or `bincode`. Once that's released we can switch to using `msgpack` in Noir in both native and wasm by merging noir-lang/noir#7810. And then we can remove the `msgpack` format detection and the fallback to `bincode`. **TODO**: - [x] Get it to compile - [x] Change `nargo` to allow compiling contracts with `msgpack` format: added `NOIR_SERIALIZATION_FORMAT` env var - [x] Add a first byte to the data to say which serialization format it is. There is a chance that it would clash with existing bincode data though, so a fallback would anyway be necessary. (Or we should ascertain that bincode never starts with some specific bit sequence). - [x] ~Change the `bb` code so it tries `bincode`, then falls back to `msgpack` - this way the currently used format stays fast, but we can feed it new data.~ _This looks problematic, as exceptions in the wasm build is disabled in `arch.cmake` and `throw_or_abort` aborts in wasm. Instead we run [msgpack::parse](https://c.msgpack.org/cpp/namespacemsgpack.html#ad844d148ad1ff6c9193b02529fe32968) first to check if the data looks like msgpack; if not, we use bincode._ - [x] Run integration tests with `msgpack` on both sides in AztecProtocol/aztec-packages#13021 - [x] Ignore the Brillig opcodes in Barretenberg - [x] Change the serialization of `WitnessStack` and `WitnessMap` to use the env var, add fallbacks in `bb` for them - [x] Revert the change to `noir-repo-ref` before merging ### Use of `MSGPACK_FIELDS` The generated code is using `MSGPACK_FIELDS` for structs, to keep it more terse. At some point during debugging the memory issue below I changed it so that I can have more direct control by generating code for individual fields. That needed some helper functions which I looted from the `msgpack-c` library and injected into the namespaces as a `Helpers` struct. This approach might be useful if we wanted to have extra checks, for example rejecting the data if there are extra fields, indicating a type has been extended with things we don't recognise, or if we wanted handle renamed fields. I left it out so there is less code to maintain, but if we need it we can recover it from the [commit history](noir-lang/noir@b0a612d). ### Compile `nargo` with the `msgpack` feature ```bash echo af/msgpack-codegen > noir/noir-repo-ref noir/bootstrap.sh ``` ### Generate and compile C++ code ```bash cd noir/noir-repo && NOIR_CODEGEN_OVERWRITE=1 cargo test -p acir cpp_codegen && cd - cp noir/noir-repo/acvm-repo/acir/codegen/acir.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/acir.hpp cp noir/noir-repo/acvm-repo/acir/codegen/witness.cpp barretenberg/cpp/src/barretenberg/dsl/acir_format/serde/witness_stack.hpp cd barretenberg/cpp && ./format.sh changed && cd - barretenberg/cpp/bootstrap.sh ``` ### Test `nargo` with `bb` One example of an integration test that uses `bb` and noir in the Noir repo is https://github.com/noir-lang/noir/actions/runs/13631231158/job/38099477964 We can call it like this: ```bash cd noir/noir-repo && cargo install --path tooling/nargo_cli && cd - ./barretenberg/cpp/bootstrap.sh export BACKEND=$(pwd)/barretenberg/cpp/build/bin/bb export NOIR_SERIALIZATION_FORMAT=msgpack noir/noir-repo/examples/prove_and_verify/test.sh ``` If it works, it should print this: ```console % unset NOIR_SERIALIZATION_FORMAT % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Finalized circuit size: 18 Proof saved to "./proofs/proof" Finalized circuit size: 18 VK saved to "./target/vk" Proof verified successfully ``` Whereas if it doesn't: ```console % export NOIR_SERIALIZATION_FORMAT=msgpack % noir/noir-repo/examples/prove_and_verify/test.sh [hello_world] Circuit witness successfully solved [hello_world] Witness saved to /mnt/user-data/akosh/aztec-packages/noir/noir-repo/examples/prove_and_verify/target/witness.gz Length is too large ``` I attached the final artefacts to the PR so it's easier to test with just `bb`. [hello_world.json](https://github.com/user-attachments/files/19391072/hello_world.json) [witness.gz](https://github.com/user-attachments/files/19391074/witness.gz) ### Further testing With the `noir-repo-ref` pointing at the feature `af/msgpack-codegen` feature branch, we can run all the contract compilations and tests with `msgpack` as follows: ```shell export NOIR_SERIALIZATION_FORMAT=msgpack ./bootstrap.sh ci ``` This is tested in AztecProtocol/aztec-packages#13021 ### Peek into artefacts We can inspect the file in JSON format using [this](https://crates.io/crates/msgpack-cli) msgpack CLI tool. ```shell jq -r '.bytecode' ./target/program.json | base64 --decode | gunzip | tail -c +2 | mpk --to-json | jq ``` Thanks Tom for the [spell](AztecProtocol/msgpack-c#5 (comment)) 🙏 ### False bug At some point I thought had to make some fixes in `msgpack-c` itself to make this work: AztecProtocol/msgpack-c#5 A similar [blocking bug](AztecProtocol/aztec-packages#12841 (comment)) was encountered when running the entire `ci` build with msgpack format. It turned out it was a [dangling pointer](msgpack/msgpack-c#695 (comment)) issue, fixed in AztecProtocol/aztec-packages@5810e3b Much of the comments below are related to my struggles that came from this mistake; you can ignore them.
This fixes in inexplicable bug I found while working on AztecProtocol/aztec-packages#12841
AztecProtocol/aztec-packages#12841 (comment) demonstrated that a vector of enums can be printed in the
convertextension method added tomsgpack::objectto handlestd::vector, showing all items as maps, but during the iteration of the items they change from being MAP to NIL.AztecProtocol/aztec-packages#12841 (comment) narrowed it down to being due to
v.resize(o.via.array.size);, which I could only explain byvoverwriting the memory ofo.via.arrayitself, which makes no sense to me, but what else can I conclude. It's weird because there are other instances ofstd::vectorin the ACIR model of the PR with structs, instead of maps.For example this seems fine:
but this is not:
where
Acir::Opcodeis a single element map keyed by the name of the type, with the value being heterogenous.The fix I found was to use
push_backif possible.I expected that maps would cause a similar problem, which I'll know when I switch
WitnessWitnessto usemsgpackas well, but it didn't.