-
Notifications
You must be signed in to change notification settings - Fork 603
Description
Description
This might eventually become a PR by me or someone else who has incentive to tackle this. But I need to archive the backstory and engineering details somewhere.
Research notes to myself about turning on memset/memcpy/memcmp CPU intrinsics feature inside WinPerl built with MSVC.
A naive person would say, are you stupid, it takes 20 seconds to add -Oi to /win32/GNUmakefile and this job/bug/task is done.
-Oi docs: https://learn.microsoft.com/en-us/cpp/build/reference/oi-generate-intrinsic-functions?view=msvc-170
I do not want to hit the -Oi button globally for many technical engineering reasons. If I do it, I know I am invasively altering the entire WinPerl+MSVC ecosystem forever with that button.
Also Perl in XS/C is not written in C89/C99 language. "Perl in XS/C" is written in Peroost Framework (jkjk) which is P5P's clone of https://www.boost.org/libraries/latest/grid/ . Peroost Framework's API Docs are located here -> https://perldoc.perl.org/perlclib .
P5P can modify Peroost, has modified, and alot of time and engineering went into creating Peroost's/XS's .h files. P5P and libperl.so.dll and ./Configure and metaconfig and miniperl.exe and the general toolchain can do many things, or automate, or improve, or correct, alot of things, defects, mousetraps, flaws, poor dev decisions in C lang, that you dont get with a stock CC toolchain.
I'm not going to go into details why Im not hitting the -Oi button right now.
Research notes:
A line of code somewhere in mro.xs.
AV* const isa_lin = newAV_alloc_xz(4);
expands to
#define newAV_alloc_xz(size) av_new_alloc(size,1)
expands to
PERL_STATIC_INLINE AV *
Perl_av_new_alloc(pTHX_ SSize_t size, bool zeroflag)
{
AV * const av = newAV();
SV** ary;
PERL_ARGS_ASSERT_AV_NEW_ALLOC;
assert(size > 0);
Newx(ary, size, SV*); /* Newx performs the memwrap check */
AvALLOC(av) = ary;
AvARRAY(av) = ary;
AvMAX(av) = size - 1;
if (zeroflag)
Zero(ary, size, SV*);
return av;
}
Now lets analyze Zero(ary, size, SV*); in detail. After inlining it becomes
memset(array, 0, 0x20);
default MSVC WinPerl for the last 25 years emits
00000001800015EF 33 D2 xor edx, edx ; Val
00000001800015F5 44 8D 42 20 lea r8d, [rdx+20h] ; Size
0000000180001608 48 8B C8 mov rcx, rax ; Dst
000000018000160B FF 15 87 2A 00 00 call cs:__imp_memset
now I hack perl core headers with
#pragma intrinsic(memcpy)
#pragma intrinsic(memset)
#pragma intrinsic(memcmp)
#pragma intrinsic(strcat
#pragma intrinsic(strcmp)
#pragma intrinsic(strcpy)
#pragma intrinsic(strlen)
and recompile my xs module and I get
0000000180001926 0F 57 C0 xorps xmm0, xmm0
0000000180001945 0F 11 00 movups xmmword ptr [rax], xmm0
0000000180001948 0F 11 40 10 movups xmmword ptr [rax+10h], xmm0
lets count the bytes
before 2 + 4 + 3 + 6 = 15
after 3 + 3 + 4 = 10
Result: inlined memset() with basic SSE 1.0 ops wins the game.
It was a win performance wise, since it didn't do the formalities of a C call stack frame, and the large switch tree that lives inside libc function memset(), that creates an bounds checked aligned pointer, from an unaligned pointer, with something like
switch(ptr & 0xf) {
case: 16
case: 15
case: 14
case: 13
case: 12
//etc
}
didn't execute.
It was a win machine code bloat wise. 5 bytes shorter to do the same thing.
Steps to Reproduce
Disassembly a WinPerl compiled with MSVC, or with VS IDE, right click on ur src code -> left click "Go To Disassembly", press F11 a couple 100 times, or just press and hold F11 for a while with a podcast playing or a TV in the background.
Expected behavior
Code gen that is more like current LinPerl, or codegen that is more GCC or LLVM like than what MSVC produced currently for WinPerl.
Perl configuration
N/A