Skip to content

Conversation

@bulk88
Copy link
Contributor

@bulk88 bulk88 commented Jun 17, 2025

GetFileType() has shortcuts, it first checks against -1 -2 -3, then checks
against PEB's 3 master IN, OUT, ERR kernel handles, then checks if it is
tagged/unaligned [open secret, not frozen API], and only then does the
NtRequestWaitReplyPort() RPC call to csrss.exe instead of
NtQueryVolumeInformationFile(). GetFileType() from kernel32.dll is different
from GetFileType() in kernelbase.dll.

  • This set of changes does not require a perldelta entry.

@tonycoz
Copy link
Contributor

tonycoz commented Jul 9, 2025

It definitely speeds up the test against file type file handles, from ~1.2µs to 0.39µs per call.

It slows down the test against tty handles a little, from ~23.8-24.5µs to 23.4-25.0µs.

@bulk88
Copy link
Contributor Author

bulk88 commented Jul 10, 2025

It definitely speeds up the test against file type file handles, from ~1.2µs to 0.39µs per call.

It slows down the test against tty handles a little, from ~23.8-24.5µs to 23.4-25.0µs.

Good to know it was a verified benchmark improvement? How did you measure it? Just curious. Maybe I can do it as routine.

I didn't benchmark it myself, but that RtlNtErr2WinErr() function made me furious single stepping its Asm code. Its a 0-65000 for() loop with zero optimizations. After 500 iterations I stepped out of the loop. And started to think how to patch the WinPerl interp to stop it.

Its probably worse than 0-65000. Its more like [0x0***-****, 0x4***-****, 0x8***-****, 0xc***-****] x 0-65000, and NO, [0x0***-****, 0x4***-****, 0x8***-****, 0xc***-****] are not optimized to if() else if() else if() else {}. MS devs assumed no sane production process will do high speed, high frequency, failing syscalls, as normal runtime behavior. RtlNtErr2WinErr() isn't something worthy to optimize.

I have another patch somewhere that cuts down the number win32_isatty() calls coming from the POSIX-y PerlIO .c code by an order of magnitude (90%), but I want to get this patch in first, which makes the win32_isatty() impl better, regardless of goodness or badness, of whatever the caller frame's code is doing.

The patch that removes isatty() calls from POSIX-y PerlIO .c code by an order of magnitude (90%) is a cross platform patch, so its definitely another PR.

The Win32 Console APIs aren't known for being I/O speed demons. > to a disk file handle vs console level > to a disk file or console level | to another process is like 10x or 100x the speed to move the same number of MBs.

Plus waking up csrss.exe process to force it to search its global handle table for "junk value handles" from perl.exe made with an "RNG" isn't polite.

I could've done Native API/Asm style optimizations inside win32_isatty() but I decided that is a bad idea, MS in late Win10s era/Win 11 era has done heavy refactoring on cmd.exe, and I don't want to make non-public API assumptions of what a Console I/O handle/opaque integer actually is. So I decided NOT to use the unaligned U32 * test/magic trick for Console I/O handles, or the -1 -2 -3 tricks.

Safer and easier and less thinking and less work to off load responsibility for all the shortcut tricks to GetFileType()@kernel32,dll.

GetFileType(fh) == FILE_TYPE_CHAR sounds like that is frozen MS Public API forever, aslong as you have access to symbol GetFileType()@kernel32,dll it will work forever.

Remember GetFileType()@kernelbase.dll IS NOT identical to GetFileType()@kernel32,dll, but that is irrelevant to WinPerl. All 3, WinPerl, Mingw GCC and MSVC 2022, don't know about and don't link to kernelbase.dll.

MS specifically says kernelbase.dll isn't part of Public API/ABI and isn't part of 1337 API/ABI, and can disappear at any time, see https://learn.microsoft.com/en-us/windows/win32/win7appqual/new-low-level-binaries .

Win RT Apps/MS Mobile App Store walled garden Apps, probably are forbidden from C linking against kernel32,dll. kernel32,dll simple doesn't exist anymore for UWP/Win RT App Store processes. They can only link against kernelbase.dll, and GetFileType()@kernelbase.dll has no idea that the WinOS has user-mode emulated Console I/O Handles. Technically Win RT apps don't link against a file called kernelbase.dll, they use the API Set .dlls in their import tables, and MS can rewire those PE symbol forwarder entries to anywhere .dll-wise at runtime.

I'd have to double check, but I believe the kernelbase.dll file doesn't know that Consoles/cmd.exe/STDIN/STDOUT/STDERR exist on the Windows Platform anywhere inside of itself. Just like ntdll.dll has no clue about STDIN/STDOUT/STDERR. [* (@bulk88 is still wrong >>>) lies! lies! and even more lies! Of course ntdll.dll knows what stdin/stdout/stderr and what the console is, How does autochk.exe work before the Video Card driver is loaded? Perhaps, we should say ntdll.dll doesn't know Win32 has a console 😉 ]

MS's API Sets and kernelbase.dll reorganization makes perfect sense to me. Why would smartphone apps need access to STDIN/STDOUT/STDERR? Why would 48U rack of full of Win Server OS Blade Servers, need to know what user32.dll is, or what is a TUI/Console/command prompt is, or what a DVI/DP/HDMI cable is?

user32.dll and gdi.dll are still single thread handle, single TID, semi-single threaded, synchronous I/O APIs from pre-historic times. Just like a pre-pthreads Unix OS or a pre-pthreads Unix libc.so library.

random facts: ntdll.dll only knows about STDOUT, aka NtDisplayString(). It has no clue what STDIN or STDERR is. But autochk.exe knows how to open a FD to /Device/KeyboardClass, and now you have a fully functioning TUI app!

@tonycoz
Copy link
Contributor

tonycoz commented Jul 10, 2025

Good to know it was a verified benchmark improvement? How did you measure it? Just curious. Maybe I can do it as routine.

It wasn't anything too rigorous:

#!perl
use v5.40;
use Time::HiRes "time";

testit(\*STDIN);
open my $fh, "<", $0 or die;
testit($fh);

sub testit ($myfh) {
    my $start = time();

    for (1 .. 1_000_000) {
	my $x = -t $myfh;
    }
    my $end = time();
    print $end - $start, "\n";
}

tested 3 times each with blead and with your change.

blead results (Ryzen 7 2700):

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.8609209060669
1.24526906013489

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.7503299713135
1.19657588005066

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
24.4927198886871
1.2960000038147

with your change:

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
24.9973509311676
0.387558937072754

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
25.186952829361
0.397616147994995

C:\Users\Tony\dev\perl\git\perl\win32>..\perl -I..\lib ..\..\23375.pl
23.3968820571899
0.391831159591675

Debian (i7-10700F):

tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.237051963806152
0.199118852615356
tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.248272895812988
0.191707134246826
tony@venus:.../git/perl6$ ./perl -Ilib ../23375.pl
0.23987603187561
0.188696146011353

Debian (WSL2, Ryzen 7 2700, same hardware as Windows above)

tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.769921064376831
0.469972848892212
tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.762050151824951
0.463076114654541
tony@GANYMEDE:~/dev/perl/git/perl$ ./perl -Ilib ../23375.pl
0.762561082839966
0.469331026077271

@khwilliamson
Copy link
Contributor

I would merge this if the commit message is improved enough. But I object to its merging as-is. The code itself looks fine to me.

First, the commit message title is too long. GH was forced to wrap it, using ellipses. And what is there doesn't really help me understand what's happening. It is actually factually wrong. This commit doesn't stop calling any syscall. It inserts another syscall first and avoids calling the original one if that one fails. It also assumes a more intimate knowledge of Windows internals than I possess, and I'm sure I'm not alone. "NT->Winn err conv is slow" is something I can guess at what it means. But it shouldn't be in a commit title

Second the commit message body is non-existent, and the comments refer the reader to the p.r. for details. The comments should not refer to an unspecified GH p.r. that someone would have to take steps to track down. It is ok to refer to the commit message that created them. But making a later reader have to go through the extra level of indirection is unacceptable.

Third, the p.r. description isn't very helpful. People reading this want to know what is changing and why. Starting off with a description of the internals of a Windows library function does not meet that need. I myself would not have included it, but if you feel that background is helpful to people more attuned to Windows internals than I and most of the people who will ever read the message, then it should be placed in a separate paragraph later.

The p.r. description should be copied into the commit message in this case. And it is a non-sequitor with its title. Its first sentence needs to expand on what the title says. It doesn't currently.

What it looks to me is that the commit basically finds many failures using a faster but incomplete syscall before falling back to the slow complete one. But I wouldn't have figured that out from any of your descriptions.

The "Let them read code" attitude is not a principal compatible with this project. (Today I learned that Marie Antoinette did not say the similar phrase attributed to her; that claim was first made 50 years after her execution.)

Writing a good commit message for anything but trivial changes takes some effort. We require that pull requests have had sufficient effort not just in the code but in its comments and descriptions.

return 0;
}

if (GetConsoleMode(fh, &mode))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And the description says there is a syscall that mostly fails. There is nothing that explains that statement, so we are left to guess about it.

Copy link
Contributor Author

@bulk88 bulk88 Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description is below, GetConsoleMode() only accepts CONSOLE handles, not file handles, not serial ports, not TCPIP sockets, not process handles, not thread handles, not mutexes

On NT Kernel, what Perl calls a Pack::age/HV*/stash, and every other lang calls it a class. This is a Pack::age/HV*/stash inside NT Kernel

https://doxygen.reactos.org/d7/dca/ndk_2obtypes_8h_source.html#l00379

NT has 6 built in Classes that can I am calling "I/O" objects, they accept create/read/write/seek/close/delete IOCTLs, unix calls these things a file descriptor

https://doxygen.reactos.org/da/d7d/iomgr_8c_source.html#l00254

but unlike Unix, with its unreaped memory leaking zombie PID API, and you can't select() on a EXT2/EXT3 file on a 5400 RPM for some bizarre design decision, and Linux/BSD/RIP Solaris still can't figure out how to do select() on a disk file after 20 years

https://www.upwind.io/feed/io_uring-linux-performance-boost-or-security-headache

my meatspace dev friend won an iouring bug bounty last year but s/he is not a public figure and is not googleable

https://doxygen.reactos.org/df/d04/cmsysini_8c_source.html#l00980 the registry classes register themselves with HV* and *main::`
https://doxygen.reactos.org/d1/d6e/ntoskrnl_2ex_2callback_8c_source.html#l00256 here comes the APC/C function pointer closure class (posix calls them signals)

https://doxygen.reactos.org/d4/deb/ntoskrnl_2ex_2event_8c_source.html#l00039 Now we get the Kernel and User mode "Event" object, so now a Ring 0 or 3 thread, can de-schedule itself off the CPU, unlike MSDOS 8086/286 era, where there is no way on earth to stop the CPU from sucking in machine code and executing that code

https://doxygen.reactos.org/de/d7a/ntoskrnl_2ex_2timer_8c_source.html#l00223 Now we have wall time objects!

https://doxygen.reactos.org/d9/d6e/win32k_8c_source.html#l00259 And welcome to the Windows in Windows, now we have a VGA/DP/DVI port and can show things to humans

Shells Terminals TUIs and GUIs are ridiculous concepts to be baked into an OS's public API. Windows's late 1980s architecture made them end user plugins. The original Windows NT Kernel GUI design, in 3.1-3.51, the VGA driver/mouse/keyboard/screen/GUI was a ring 3 userland process that probably used kernel named pipes to talk to other processes and the VGA adapter. The design was so horrible with performance, in NT 4 win32k.sys was invented and still exists in Win 10 basically unmodified to paint the GUI pixels. win32k.sys is the ONLY DRIVER/only disk file, that is allowed to have a range of precious hardware/CPU syscall constants, specifically the x64 sysenter instruction constants, or i386 interrupt 21h constants.

Every other kernel "Class", written by MS and burned into the kernel, or a driver file written by the general public, must accept the rules that "IRP"s, im calling them asynchronous ioctl packets, or event queue packets, is the only way to communicate with userland. Its very organized. There is a 2nd way dis recommended way to talk to userland, that I believe has been banned forever by MS's signed kernel driver program, probably around Win 8 or Win 10 era.

Windows Services for Linux was engineering wise/software/technically impossible to do, until all Windows 3rd party Hardware or Software vendors who write Kernel drivers, aged out, got banned by MS, or those HW/SW vendors left the market (no 64 bit drivers available), or went bankrupt, or actually PAID humans to rewrite and recompile the Win NT kernel drivers for a USB WebCam, or a $1.99 ethernet card with a Realtek chip from hell that an army of Linux HW devs for 10 years haven't been able to get stable.

WSL and unmodified ELF files executing on WinNT, only became possible, once ALL, and I mean ALL, lines of code, written by anyone, in the NT kernel, agreed to never ever ever again open a mmap portal to a process and inspect its address space for "known C structs at fixed userland virtual addresses".

This post is too long, ill let someone else do the talking.
20250809_194034
20250809_194141
20250809_194213
20250809_194243

So lets go back to my PERL scripts on NT and this PERL.EXE program I have.

20250809_194816
20250809_194910
20250809_194929

Why would passing any NT Handle number but a Console handle number into GetConsoleMode(), make GetConsoleMode() return TRUE/SUCCESS?

I can reverse the question, why are syscalls tell() and seek() returning -1 on my Linux VM for FD 1 between PERL and xterm?

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/tty/tty_io.c

Will the ext3 FS driver give me /usr/bin/perl's baudrate and ECHO and vertical tab delay

help!!!! Im a lawyer but a real estate that accidentally took on a murder trial client, What do I do now?

what is a TERMINAL HANGUP SLAVE in Linux? so not PC src code lol

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/tree/drivers/tty/tty_io.c#n2306

What is the window size of /lib/strict.pm?

LETS FIND OUT!!!!

IM GOING TO SPEAK PENGUINESE FOR ONCE

Because most of P5P only speaks PENGUIN or TUX, maybe I say what the problem with WINPERL is in PENGUIN.

penguin_good_handle
penguin_bad_handle

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The post above is not technically/engineering wise accurate. On Windows Console Handles/Console File Descriptors are 100% fake, and just userland Ring 3 magic tricks, exactly like WinPerl's Pseudo-fork is 100% fake user land magic tricks. A Windows console handle is an illegal unaligned pointer ending with the digit 0x01, All Windows kernel handles are aligned U32 * offsets into an array somewhere in Ring 0. So a U32 ending with 0x00 0x04 0x08 0x0Cis a real NT/POSIX kernel handle, anything ending with0x01, 0x02, 0x03 is illegal. Console handles in MS's public API always end with 0x01. Reverse engineering the Windows OS will show STDIN STDOUT and STDERR are https://en.wikipedia.org/wiki/WebSocket packets over a TCPIP socket to another process. Thats why its so slow, and even TonyC proved it with benchmarks. The only way to know if a Console Handle (a U32 int between 0-4GB) is "real" or its "buffer mode" or "code page" is over a TCPIP socket to a another process called csrss.exe that is a daemon/root privilages/dark magic.

Copy link
Contributor

@tonycoz tonycoz Aug 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was excessive.

Your commit message (and the brain dump just above much more so) goes into irrelevant technical detail, when indicating that GetConsoleMode() is slow even for non-console handles and the GetFileType() shortcuts that.

and maybe a benchmark.

So your commit message might be something like:

win32_isatty(): only call expensive GetConsoleMode() for character devices

win32_isatty() is called from win32_read() so it is called often and often called on
non-TTY handles, so performance for non-TTY handles is important.

GetConsoleMode() is expensive even for non-character handles, we can use
GetFileType() to cheaply distinguish the common case of non-character device 
handles from character device handles, since only character devices can be TTYs.

For a rough benchmark this improved performance from roughly 1.24µs per call
to 0.39µs per call on a non-character handle on a Ryzen 7 2700 Windows 10

Calls for console device handles were slightly slowed, from 24.0µs per call to 24.5µs
per call though these results were fairly noisy.

@bulk88
Copy link
Contributor Author

bulk88 commented Aug 26, 2025

Interesting callstack I happened to see.

>	ucrtbase.dll!_isatty()	Unknown
 	ucrtbase.dll!_read_nolock()	Unknown
 	ucrtbase.dll!_read()	Unknown
 	miniperl.exe!win32_read(int fd, void * buf, unsigned int cnt) Line 4144	C
 	miniperl.exe!PerlIOUnix_read(_PerlIO * * f, void * vbuf, unsigned __int64 count) Line 3166	C
 	miniperl.exe!PerlIOBuf_fill(_PerlIO * * f) Line 4471	C
 	miniperl.exe!PerlIOBase_read(_PerlIO * * f, void * vbuf, unsigned __int64 count) Line 2528	C
 	miniperl.exe!PerlIOBuf_read(_PerlIO * * f, void * vbuf, unsigned __int64 count) Line 4495	C
 	[Inline Frame] miniperl.exe!S_PerlIO_getc_x(_PerlIO * *) Line 376	C
 	miniperl.exe!PerlIO_getc(_PerlIO * * f) Line 5389	C
 	miniperl.exe!Perl_sv_gets(sv * const sv, _PerlIO * * const fp, __int64 append) Line 9233	C
 	miniperl.exe!Perl_pp_backtick() Line 311	C
 	miniperl.exe!Perl_runops_standard() Line 41	C
 	miniperl.exe!S_run_body(long oldscope) Line 2886	C
 	miniperl.exe!perl_run(interpreter *) Line 2799	C
 	miniperl.exe!main(int argc, char * * argv, char * * env) Line 137	C
 	[Inline Frame] miniperl.exe!invoke_main() Line 78	C++
 	miniperl.exe!__scrt_common_main_seh() Line 288	C++
 	kernel32.dll!BaseThreadInitThunk�()	Unknown
 	ntdll.dll!RtlUserThreadStart�()	Unknown

WinPerl's win32_read() calls WinPerl's DIYed win32_isatty(), then immediately UCRT calls UCRT's isatty() implementation.

@tonycoz
Copy link
Contributor

tonycoz commented Aug 27, 2025

WinPerl's win32_read() calls WinPerl's DIYed win32_isatty(), then immediately UCRT calls UCRT's isatty() implementation.

The UCRT just checks a flag (and that flag only indicates a character device from what I can see)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants