-
Notifications
You must be signed in to change notification settings - Fork 143
aes: add VAES support #491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
49a52de
to
a49e9e0
Compare
I had to change the cfg flag feature gating to be opt-in rather than opt-out in order for the CI tests to pass without bumping the MSRV. |
Maybe for cleaner history it's worth to revert #482, rebase this PR to new master, and merge it after v0.9.0 release? It would also allow us to publish a smaller v0.9.0. |
a49e9e0
to
1ba6590
Compare
One reason to consider keeping the original in history is because it serves as an explanation for why par block sizes of 30 for VAES256 and 64 for VAES512 were chosen, i.e., due to the register configuration. This detail is absent from the intrinsics implementation. |
This better be described in a code comment, than to be left for git history. |
True. I can add a comment about that. |
d9e12be
to
230c5f8
Compare
I added these comment about block sizes: #[cfg(all(target_arch = "x86_64", any(aes_avx256, aes_avx512)))]
impl<'a> ParBlocksSizeUser for $name_backend::Vaes256<'a> {
// Block size of 30 is chosen based on AVX2's 16 YMM registers.
//
// * 1 register holds 2 keys per round (loads interleaved with rounds)
// * 15 registers hold 2 data blocks
//
// This gives (16 <total> - 1 <round key>) * 2 <data> = 30 <data>.
type ParBlocksSize = U30;
}
#[cfg(all(target_arch = "x86_64", aes_avx512))]
impl<'a> ParBlocksSizeUser for $name_backend::Vaes512<'a> {
// Block size of 64 is chosen based on AVX512's 32 ZMM registers.
//
// * 11, 13, 15 registers for keys, correspond to AES-128, AES-192, AES-256
// * 11, 13, 15 registers hold 4 keys each (no interleaved loading like VAES256)
// * 16 registers hold 4 data blocks
// * 1-4 registers remain unused (could use them but probably not worth it)
//
// This gives (32 <total> - 15 <AES-256 round keys> - 1 <unused>) * 4 <data> = 64 <data>.
type ParBlocksSize = U64;
} |
@newpavlov what's the path forward here?
|
I would prefer to do the following:
|
Again, we can use |
I would prefer to have a simpler v0.9.0 release. Since v0.9.1 will be probably released shortly after v0.9.0, I think it's fine to postpone VAES support a bit. |
What is the drawback of an off-by-default feature which requires |
5k LoC is not a small amount. It also would require additional work to make it experimental. It will be much easier to revert the old PR and merge this PR with a proper intrinsics-based implementation and autodetection enabled by default. If we are to encounter any issues with the VAES backend we also will be able to quickly yank v0.9.1. |
I'm talking about shipping this PR, |
This reverts commit ad83428. This implementation uses assembly, but the relevant intrinsics will be stable in Rust 1.89, and we have a PR open to use them: #491 For a cleaner history, this reverts the assembly implementation so the intrinsics-based implementation can be cleanly applied to an ASM-free codebase, rather than as a replacement for the ASM.
This reverts commit ad83428. This implementation uses assembly, but the relevant intrinsics will be stable in Rust 1.89, and we have a PR open to use them: #491 For a cleaner history, this reverts the assembly implementation so the intrinsics-based implementation can be cleanly applied to an ASM-free codebase, rather than as a replacement for the ASM.
@silvanshade now that #496 has been merged, can you rebase/merge? |
230c5f8
to
22dd9e3
Compare
@silvanshade it looks like the CI config got lost in the rebase (also as of tomorrow you'll be able to use Rust 1.89) |
e14fdff
to
d33373e
Compare
d33373e
to
f655326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reapproving with the following notes:
- MSRV is untouched, since the
--cfg aes_avx256
or--cfg aes_avx512
options are required to enable the feature, as we requested - This should likewise have no impact on anyone who does not explicitly enable those
autodetect.rs
is improved/simplified despite a new backend, though the complexity has moved tox86.rs
. I still consider that a general win.- In a future release, we can potentially bump MSRV and enable this by default, but having a
cfg
at first allows an initial set of users to experiment with it and help us gather data if that's a good idea or not. I think at least for now this is a niche feature for users with Intel Xeon servers?
#[allow(unused)] // TODO: remove once cfg flags are removed | ||
pub(crate) features: Features, | ||
pub(crate) keys: &'a Simd128RoundKeys<$rounds>, | ||
pub(crate) simd_256_keys: OnceCell<Simd256RoundKeys<$rounds>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to cache broadcasted keys. It makes the struct bigger and it especially will be a problem when the backends will be enabled by default. I think we should just broadcast the keys during encryption/decryption and let the compiler to optimize it out when possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think changes to this behavior should be accompanied by benchmarks.
|
||
let backend = &$name_backend::Vaes256::from(backend); | ||
if backend.features.has_vaes256() { | ||
while rem >= backend.par_blocks() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think VAES256 availability should be guaranteed if VAES512 available, no? If it's not guaranteed for some reason, we can be conservative and check VAES256 support in has_vaes512
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeap. VAES256 intrinsics are gated on the vaes
target feature, while VAES512 intrinsics are gated on vaes + avx512f
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This one probably does make sense to change. I don't technically know if the ISA spec guarantees that VAES256 should be available if VAES512 is available (one would assume so), but especially if the compiler and LLVM believes that to be the case, we might as well follow that convention.
backend.encrypt_par_blocks(blocks); | ||
rem -= backend.par_blocks(); | ||
iptr = unsafe { iptr.add(backend.par_blocks()) }; | ||
optr = unsafe { optr.add(backend.par_blocks()) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to write this code using safe code. See the InOutBuf::into_chunks
method.
Regarding the reviews from @newpavlov, I don't have time to work on this again and I'm not sure when I will. But even if I did, I also don't think it's reasonable to add a bunch of new requested changes at this point, especially given the long history of this code. I won't object if you want to make those changes yourself but I don't plan to do it. |
@silvanshade can you address this one in particular? #491 (comment) |
I can push fixes later into this branch (I also have a bunch of other nitpicky changes which I would like to add), but it probably will happen after v0.9.0 release. |
@newpavlov is there a particular reason you want to block merging this on "nitpicky changes"? Those sound like they could all be done in a completely incremental fashion at any point in time, since they aren't externally-facing. |
I did not mean that the merge is blocked on them, but that I plan to work on them together with fixing the comments above. At the very least before merging I would like to see removal of the I don't see why we should rush addition of VAES support into v0.9.0. I think it's fine to add it in v0.9.1 together with other new backends. |
I agree it would be good to get rid of |
I'd also say there appears to be a time-memory tradeoff here. We've had another request to support a lower memory profile at the cost of performance (#191) and I'm wondering if we should perhaps expose a That said, my expectation would be that performance is the top priority for anyone who goes out of their way to enable |
The broadcast should be extremely efficient cycle-wise, cache-friendly, and trivial for the compiler to move outside of the loop. |
@silvanshade oh no, not again! I really intend to ship this and see it through, but please be patient until we can get our initial next release out (which has been dragging on to the point I’m quite frustrated as well) |
@newpavlov can we just merge this please? You are frustrating our contributors and myself by being so stubbornly obstinate, all the time holding up our next release which has been dragging on for years and years. I still don’t buy any of your reasons for not merging this. You have minor complaints which if you want them addressed, you can fix yourself. They are not blockers for merging an off-by-default MVP requiring explicit opt-in |
Good luck on the next release but I am done working on anything related to this project. I'll refer back to my post here]. After your reply, despite my skepticism, I made another attempt to contribute. I still never got a proper review, even when it was merged: #482 (comment) You had the feature merged and then reverted the PR. I've never seen a project do that before and I don't really think the reasoning behind it even made much sense. The history is still there. Setting that aside, this PR should already have been merged. There are many ways it could have been gated if for some reason you don't want to expose it directly. But there's no communication or activity on that front so time to move on. |
@silvanshade I did the revert because I expected to immediately merge this thereafter. But @newpavlov pulled a bait-and-switch on me, I expected to be able to merge this, but then @newpavlov invented some blockers. Otherwise I never would’ve done the revert. @silvanshade with your blessing, if you reopen, I will merge immediately. I am tired of the lack of progress and extremely frustrated myself. |
Thanks for the offer but I'm going to decline because I just don't want to be involved anymore. You are certainly welcome to pull the commits from my repo branch and merge them yourself if you like though. |
@silvanshade I am hesitant to merge the code if you just intend to abandon it. I would really like for you to work with us and I’m sorry it’s been such a shitshow, but I’m here to support you |
I mean, I can't even get the code reviewed, so would it even have mattered if I did try to maintain it versus "abandon" it? Imagine what the process would have looked like for me trying to make some improvements after it had finally been merged at some unknown point in the future. If you want people to contribute to your project and help maintain it, you have to actually enable them to do that. |
@silvanshade it was reviewed with minor nits, none of which should've been blockers for an MVP. I’m ready to merge now if there’s anything I can do to change your mind. But if you want to leave, I understand. You wouldn’t be the first person @newpavlov has driven away from this project. |
Like I said, you have my permission to use the code from my branch if you like. If you are unwilling to use that because I may not make any further changes to it, well, okay that's up to you. I'm not a contributor to this project with commit access and I have no direct stake in the success of this project so I don't think there's ever a guarantee I'd maintain the code regardless of whether it was merged from this PR with my participation or if you pulled from my branch. If the code is merged though, I don't know, anything is possible I guess. |
@silvanshade opened #508 |
This is #491, rebased on `master`. Closes #489 Co-authored-by: silvanshade <[email protected]>
@silvanshade It's indeed unfortunate that this position has effectively blocked this PR for a prolonged time (we clearly should finish the release cycle before the year's end and it should be our highest priority), but I do not understand why it's so important for you to get it merged ASAP. If you want to do some experiments, then git patching should do just fine, having this implementation in pre-releases does not change much. No one expects you to do constant rebasing. Assuming you have allowed maintainer edits and do not want to do it, we can do it ourselves later.
Yes, and no one expects you to. After all, it's a volunteer work on both sides. But this is also why we can not guarantee any timeline for merging or even reviewing PRs. After the merge maintenance burden will be squarely on us. Personally, I view your acts of closing PRs as an attempt to pressure us to review/merge your PRs with some fixed time limits (i.e. "if you do not merge my RPs in 6 months, I will close my PRs and will contribute no more"). Sorry, but I would prefer to lose your contributions, than to lose my agency over this matter. @tarcieri I will not comment on your rude remarks and will message you privately instead. |
This migrates the VAES implementation to the stabilized intrinsics for the upcoming
1.89
release.CC @tarcieri
Related:
Benchmarks:
The numbers are basically on par with #482
VAES512
VAES256
AES-NI