[CORE] Reduce cost of XferCRC::xferImplementation by 60%#1228
[CORE] Reduce cost of XferCRC::xferImplementation by 60%#1228xezon merged 1 commit intoTheSuperHackers:mainfrom
Conversation
|
Perhaps we could go for even fewer branches by using the data size as a template parameter. Most of the call sites use a size that's known at compile-time. |
|
Replacing the winsock call is good. Removing the hibit branch also makes sense, because the branch predictor likely does not work well there (50% chance). I would like to see a performance comparison when this works without any remaining logical mismatch. |
|
Fixed the logic on validity handling of the leftover bytes. |
9949da4 to
aa01517
Compare
|
rebased with main, just going to make some other tweaks |
639521e to
11539cd
Compare
|
Tested and tweaked with the switch replacing the second loop, it doesn't mismatch with the golden replays and you would expect it to mismatch pretty quckly if it was going to. |
|
Are you seeing any improvements in overall performance? |
I still need to look into performance testing, need to make a game replay under debug, might just do it with some AI since they can really hit different systems hard. |
11539cd to
f3c4219
Compare
|
tweaked the switch to catch when data is null, so we don't have valid data. |
|
Is this meant to be a Draft Review? |
Initially while the implementation was being checked, but i think it is in an okay place now. |
Performance testXferCRC crc1;
XferCRC crc2;
{
std::srand(111);
int64_t begin = GetTickCount64();
UnsignedInt var2 = (UnsignedInt)std::rand();
Short var1 = (Short)var2;
struct S
{
char buf[255];
} var3;
memset(&var3, var2, sizeof(var3));
for (Int i = 0; i < 100000000; ++i)
{
crc1.xferShort_ORIGINAL(&var1);
crc1.xferUnsignedInt_ORIGINAL(&var2);
crc1.xferUser_ORIGINAL(&var3, sizeof(var3));
crc1.xferUser_ORIGINAL(NULL, 1);
crc1.xferUser_ORIGINAL(&var3, 0);
}
int64_t time = GetTickCount64() - begin;
DEBUG_LOG(("time old %lld ms\n", time));
}
{
std::srand(111);
int64_t begin = GetTickCount64();
UnsignedInt var2 = (UnsignedInt)std::rand();
Short var1 = (Short)var2;
struct S
{
char buf[255];
} var3;
memset(&var3, var2, sizeof(var3));
for (Int i = 0; i < 100000000; ++i)
{
crc2.xferShort(&var1);
crc2.xferUnsignedInt(&var2);
crc2.xferUser(&var3, sizeof(var3));
crc2.xferUser(NULL, 1);
crc2.xferUser(&var3, 0);
}
int64_t time = GetTickCount64() - begin;
DEBUG_LOG(("time new %lld ms\n", time));
}
DEBUG_ASSERTCRASH(crc1.getCRC() == crc2.getCRC(), ("ohoh"));ResultNew implementation is about 55% faster. Bravo. |
f3c4219 to
cbb279b
Compare
|
Just a rebase with main first. |
cbb279b to
5b755e5
Compare
|
So people thought this was too complicated, so i took offence to that and made it faster. |
xezon
left a comment
There was a problem hiding this comment.
Code looks better now indeed. There is more we can do I think.
5b755e5 to
153b540
Compare
|
A quick rebase before making other tweaks. |
153b540 to
58d9071
Compare
|
Updated with further simplifications and small refactors. |
xezon
left a comment
There was a problem hiding this comment.
Looks very good to the eye now.
Did you test it again for mismatch or performance?
I tested for mismatches before every push and all versions ran without mismatching. |
|
I can run the benchmark one more time. |
|
New measurement passed with: Around 60% faster now. Code is also reasonably understandable. |
This PR refactors the XferCRC to make the code branchless and to remove the winsock dependency within it.
There may be minimal performance improvement from this as other factors will need changing in the data being passed to the CRC to further improve the performance.
The winsock dependency has been replaced with the endian compat library instead.