Skip to content

Conversation

Shark64
Copy link
Contributor

@Shark64 Shark64 commented Sep 18, 2025

As we talked in the other pull request, this is the "lite" version. Only enabling macrofusion for loops and setting registers to zero in the right way.

Copy link
Contributor

@pablodelara pablodelara left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Shark64. There is still an issue, could you look into it? Thanks!

xor DWORD(pos), DWORD(pos)

loop128:
mov tmp2, [arg2+tmp*PS] ;Fetch last pointer in array
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tmp is not set on the first pass here, unlike earlier. xor_gen_test is failing for SSE only architectures.
You can check this Intel SDE tool, using sde -snr -- ./raid/xor_gen_test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uh, you're right. I did check with sde for AVX but forgot to check the SSE codepath. Sorry about that, i just put the right register there, with that we don't need the mov altogether. Thx

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I've squashed the last commit message into the main one. This should make ```check_format.sh''' happy (I think)

@pablodelara
Copy link
Contributor

This is merged now, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants