Skip to content

Conversation

@khwilliamson
Copy link
Contributor

@khwilliamson khwilliamson commented Jul 3, 2025

Prior to this commit, code in peep.c that checks to see if a PL_check[] function had been overridden would always return true on z/OS. This would turn off optimization for multideref, causing tests to fail that expected it to be on.

One solution would be to just not have multideref on that box, and to skip the failing tests. I'd rather not turn off optimizations unless absolutely necessary.

The check for being overridden was to simply compare two function pointers for equality. From information relayed to me on an IBM Discord z/OS chat channel, function pointer equality comparisons will never compare equal across different translation units. To get a valid comparison, you must use pointers from the same translation unit. I had first looked at the documentation. It required more background knowledge of the jargon used, and the z/OS design than I was willing to spend the time learning. But there may be a way around this, but if so it's complicated. That's when I turned to the chat channel.

The PL_check[] table of the function pointers is declared extern, and when peep.c is compiled, it doesn't have access to that table; it gets linked to later. So these are different translation units, and so the pointers never will be equal. However within the table itself, all the pointers to function X will have the same address, as it is going to be in the same translation unit.

What this commit does is to add three extra elements to PL_check[], initialized with the function pointers that peep.c wants to compare against. Those pointers are unknown to any other code, so will never be changed away from pointing to these function pointers.

And the commit changes peep.c to use the appropriate array entry, which will contain a valid value, instead of using the function pointer directly. This solution works for z/OS and all other current implementations.

It, unfortunately, isn't a general solution. We would have to do a similar game if there were other function pointers that were compared across translation units. But this appears to be the only current case this happens. As long as the function pointers to be compared are known in the same translation unit, it works.

This commit builds upon some ideas from Dave Mitchell and Richard Leach. People on the z/OS chat independently came up with the same general solution.

This set of changes does not require a perldelta entry.

@khwilliamson khwilliamson requested a review from iabyn July 3, 2025 21:01
@bulk88
Copy link
Contributor

bulk88 commented Jul 11, 2025

I think its better to figure out which Z/OS C compiler flag/linker flag turns on function pointer equality, and throw that cmd line flag into Configure or hints.sh once and forever. Z/OS linker has 2 different "ld.so" implementations to choose from in its manual. Or Perl on Z/OS simply needs a 1 line macro, to look inside the "function descriptor" C struct, and do 1 or 2 integer math equations to get the PL_ppaddr/PL_check style 64 bit address of the function, and not compare against the 128 bit integer that is represented by typing == Perl_ck_null in source code.

This bug is obvious to understand for any Linux devs. Its basic ELF interposition at work. The integer address you get by typing Perl_ck_null, is libperl.so calling libperl.so without indirection, and without the PLT/GOT, and without the tiny jump shim C function that the PLT/GOT scheme requires, to get a "const literal" function pointer, to a C function whose address WILL NOT BE KNOWN until runtime. AFAIK ELF doesn't allow "re-linking" an SO after it is loaded into address space, so once libperl.so is mapped into address space, the final address of Perl_ck_null inside libperl.so can never change again. AFAIK If a 2nd libfakeperl.so is loaded into the same address space, it will NOT replace the Perl_ck_null from libperl.so with Perl_ck_null from libfakeperl.so. AFAIK LD_PRELOAD/interposition only works until the 1st invocation (LAZY_BIND), or the 1st definition of C symbol Perl_ck_null appears in the process's address space. After that, the ship has sailed for ELF hooking/interposition/PRELOAD feature.

PL_check[PERL_CK_NULL] is the the tiny jump shim C function that the PLT/GOT scheme requires, so that ld.so can interpose/LD_PRELOAD replace Perl_ck_null to point at any machine code address at runtime, yet keep the "address" of the Perl_ck_null function an integer constant that can be duplicated with memcpy() for the purpose of C abstract machine.

https://docs.oracle.com/cd/E19683-01/816-1386/6m7qcobkv/index.html#indexterm-315

A possible fix for Z/OS could also be if ( NUM2PTR(void*,PL_check[o->op_type]) != NUM2PTR(void*,Perl_ck_null)) {. The void * cast will convert Perl_ck_null from a 128 bit function descriptor, or "position independent relative addressing" thing, into a function pointer to a C static jump shim, like on Linux.

1 time hacks like this aren't sustainable, and they won't fix any of the p5p/.git XS mods or any CPAN/PAUSE XS mods either.

For example, this PR didn't catch this line

perl5/pp_hot.c

Line 5121 in 9ef5300

if (PL_op->op_next->op_ppaddr == Perl_pp_and) {

    /* Try to bypass pushing &PL_sv_yes and calling pp_and(); instead
     * jump straight to the AND op's op_other */
    assert(PL_op->op_next->op_type == OP_AND);
    if (PL_op->op_next->op_ppaddr == Perl_pp_and) {
        return cLOGOPx(PL_op->op_next)->op_other;
    }
    else {
        /* An XS module has replaced the op_ppaddr, so fall back to the slow,
         * obvious way. */
        /* pp_enteriter should have pre-extended the stack */

How many more of these lines exist in Perl ecosystem?

Its pretty well documented Z/OS has plenty of different ABIs inside 1 or more address spaces per process. And Z/OS C stack is a linked list if I read this correctly.

https://share.confex.com/share/119/webprogram/Handout/Session11408/Save_area_Conventions.pdf
https://www.ibm.com/docs/en/zos/3.1.0?topic=c-metal-mvs-linkage-conventions#mvslnkcnv

A __far pointer is 128 bits long on Z/OS. Perl DOESN'T SUPPORT 128 bit pointers yet!!! Patches are WELCOME!!!

https://www.ibm.com/docs/en/zos/2.4.0?topic=qualifiers-far-type-qualifier-c-only

The upper half of the pointer contains the access-list-entry token (ALET), which identifies the secondary virtual address space you want to access. The lower half the pointer is the offset within the secondary virtual address space. The size of a __far-qualified pointer is increased to 8 bytes in 31-bit mode and 16 bytes in 64-bit mode. In 31-bit mode, the upper 4 bytes contain the ALET, and the lower 4 bytes is the address within the data space. In 64-bit mode, bytes 0-3 are unused, bytes 4-7 are the ALET, and bytes 8-15 are the address within the data space.

A normal pointer can be converted to a __far pointer explicitly through typecasting or implicitly through assignment. The ALET of the __far pointer is set to zero. A __far pointer can be explicitly converted to a normal pointer through typecasting; the normal pointer keeps the offset of the __far pointer and the ALET is lost. A __far pointer cannot be implicitly converted to a normal pointer.

Pointer arithmetic is supported for __far pointers, with the ALET part being ignored. If the two ALETs are different, the results may have no meaning.

Two __far pointers can be compared for equality and inequality using the == and != operators. The whole pointer is compared. To compare for equality of the offset only, use the built-in function to extract the offset and then compare. To compare for equality of the ALET only, use the built-in function to extract the ALET and then compare. For more information on the set of built-in functions that operate on __far pointers, see z/OS XL C/C++ Programming Guide.

Two __far pointers can be compared using the >, < , >=, and <= relational operators. The ALET parts of the pointers are ignored in this operation. There is no ordering between two __far pointers if their ALETs are different, and between a NULL pointer and any __far pointers. The result is meaningless if they are compared using relational operators.

When a __far pointer and a normal pointer are involved in an operation, the normal pointer is implicitly converted to __far before the operation. There is unspecified behavior if the ALETs are different.
The result of the & (address) operator is a normal pointer, except for the following cases:
If the operand of & is the result of an indirection operator (*), the type of & is the same as the operand of the indirection operator.
If the operand of & is the result of the arrow operator (->, structure member access), the type of & is the same as the left operand of the arrow operator.

maybe this option for ZOS?

https://www.ibm.com/docs/en/zos/2.4.0?topic=qualifiers-fdptr-type-qualifier-c-only

You can declare a function pointer with the __fdptr keyword so that this function pointer can point to a Metal C function descriptor, which is an internal control block that encapsulates all the information that a function call needs to access both the function and the application-specific data.

You use a function pointer that points to a Metal C function descriptor to point to and call functions with their own set of associated data for the particular program or invocation.

So yeah, this is strictly the fault of the Perl 5 interpreter for assuming void * is 8 bytes long. A void * 16 bytes on a Z/OS CPU. The Z/OS C compiler and Z/OS linker, are applying workarounds mechanisms to the Perl 5 interpreter, since Perl 5 is using obsolete/EOL-ed 64 bit 8 byte long void *s, and not can't handle a 128 bit void* pointer. Have a nice day!

Instead of adding extra entries to a P5P created array stored in .rodata or .data, Z/OS Perl needs the right macro or the right cast so the NUM2PTR(void*,Perl_ck_null) inside if ( NUM2PTR(void*,PL_check[o->op_type]) != NUM2PTR(void*,Perl_ck_null)) { is a 64 bit address to a C struct in the __near address space that libperl.so/Glob.so/Utils.so/B.so all share. Not a Z/OS CPU's native 128 bit memory address.

Or properly declare arrays PL_check/PL_ppaddr with a 128 bit pointer type.

@bulk88
Copy link
Contributor

bulk88 commented Aug 19, 2025

+ESsfc Command Line Option Syntax Page 31 HPUX aCC manual

http://www.bitsavers.org/pdf/hp/9000_hpux/1991-200x/HP_aC++_Online_Programmers_Guide.pdf

Apparently Commercial Unix OSes or perhaps all RISC+CISC CPUs except for ARM and X86, really hate end users trying to do function pointer numeric comparisons across .a or .so or .o files with non-static symbols, aka extern "C" things.

All the special OSes/CPUs have a CC or LD cmd line option to turn on the option to make func ptrs == work correctly in C, but they prefer to keep the cmd line option off by default because correct C func ptr == behavior is a slight de-optimization on their CPU archs/OS designs.

Relocation, JIT, mmap, pointer encryption, COW, ASLR, PIC, ShLibs, --X only, not R-X, weak symbols, lazy symbol resolver, etc,

Prior to this commit, code in peep.c that checks to see if a PL_check[]
function had been overridden would always return true on z/OS.  This
would turn off optimization for multideref, causing tests to fail that
expected it to be on.

One solution would be to just not have multideref on that box, and to
skip the failing tests.  I'd rather not turn off optimizations unless
absolutely necessary.

The check for being overridden was to simply compare two function
pointers for equality.  From information relayed to me on an IBM discord
z/OS chat channel, function pointer equality comparisons will never
compare equal across different translation units.  To get a valid
comparison, you must use pointers from the same translation unit.  I
had first looked at the documentation.  It required more background
knowledge of the jargon used, and the z/OS design than I was willing to
spend the time learning.  But there may be a way around this, but if so
it's complicated.  That's when I turned to the chat channel.

The PL_check[] table of the function pointers is declared extern, and
when peep.c is compiled, it doesn't have access to that table; it gets
linked to later.  So its different translation units, and so the
pointers never will be equal.  However within the table itself, all the
pointers to function X will have the same address, as it is going to be
in the same translation unit.

What this commit does is to add three extra elements to PL_check[],
initialized with the function pointers that peep.c wants to compare
against.  Those pointers are unknown to any other code, so will never be
changed away from pointing to these function pointers.

And the commit changes peep.c to use the appropriate array entry, which
will contain a valid value, instead of using the function pointer
directly.  This solution works for z/OS and all other current
implementations.

It, unfortunately, isn't a general solution.  We would have to do a
similar game if there were other function pointers that were compared
across translation units.  But this appears to be the only current case
this happens.  As long as the function pointers to be compared are known
in the same translation unit, it works.

This commit builds upon some ideas from Dave Mitchell and Richard Leach.
People on the z/OS chat independently came up with the same general
solution.
It's also unlikely that the op_type will be any given value, but I
expect the compiler and libc know that.

This stresses that it is unlikely some module will customize the
handling of these checkers.
@khwilliamson
Copy link
Contributor Author

Please read the revised description.

I found a z/OS chat channel, and asked about this. There is no compiler option to force things to work as we expect. I also got more clarity about how things work. Pointer comparisons of non-functions work as expected. Function pointer comparison works on pointers compiled in the same translation unit; will never work across translation units. That is the situation here, where PL_check[] is extern.

Things like __far are valid only when the program is compiled with the METAL option. That option is similar to the git porcelain options. It gets you closer to the hardware metal, disabling a bunch of useful features. It is designed mainly for IBM system programmers https://www.ibm.com/docs/en/zos/2.4.0?topic=options-metal-nometal-c-only

The z/OSers came up with essentially the same solution as this p.r. originally had. But now that I understand it better, I was able to remove some hedging conditionals.

@khwilliamson khwilliamson changed the title Add additional check for custom array/hash access checking Avoid z/OS function pointer comparison undefined behavior Aug 26, 2025
@khwilliamson khwilliamson merged commit 935fddc into Perl:blead Sep 1, 2025
33 checks passed
@khwilliamson khwilliamson deleted the function_ptrs branch September 1, 2025 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants