Commit e2062a2
committed
Time::HiRes::bootstrap() use more local vars in registers vs global derefs
-each reference to a global var like qpc_res_ns or tick_frequency is 7
bytes in machine code, or a couple more bytes than 7. Since BOOT:{}
runs only once, and the chance 2 parallel BOOT:{} XSUBs in 2 different
my_perls is almost zero, and even if there are 2 parallel OS threads
executing, 1 OS thread isn't going help shave time off the 2nd OS thread.
So to reduce the number of 7 byte opcodes that are reading from the
global vars, maximize C auto vars as much as possible.
QueryPerformanceFrequency() internally on Win7 is around 1-3 ptr derefs
into NT's "VDSO" aka KUSER_SHARED_DATA. On Win2k, QPF() is a ring 0 call.
-slide indent level to the left b/c the Win32 code block is nested too
deep and almost ever statement would exceed 80 chars
-cache PL_modglobal to a register, PL_modglobal is a big U32 offset 0x698
into my_perl struct " 48 8B 9F 98 06 00 00 mov rbx, [rdi+698h] "1 parent 7ab39d0 commit e2062a2
1 file changed
+26
-21
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
972 | 972 | | |
973 | 973 | | |
974 | 974 | | |
975 | | - | |
976 | | - | |
| 975 | + | |
| 976 | + | |
| 977 | + | |
| 978 | + | |
| 979 | + | |
977 | 980 | | |
978 | | - | |
| 981 | + | |
| 982 | + | |
979 | 983 | | |
| 984 | + | |
980 | 985 | | |
981 | | - | |
982 | | - | |
983 | | - | |
984 | | - | |
985 | | - | |
986 | | - | |
987 | | - | |
988 | | - | |
989 | | - | |
990 | | - | |
991 | | - | |
992 | | - | |
993 | | - | |
994 | | - | |
| 986 | + | |
| 987 | + | |
| 988 | + | |
| 989 | + | |
| 990 | + | |
| 991 | + | |
| 992 | + | |
| 993 | + | |
| 994 | + | |
| 995 | + | |
| 996 | + | |
| 997 | + | |
| 998 | + | |
| 999 | + | |
| 1000 | + | |
995 | 1001 | | |
996 | 1002 | | |
997 | 1003 | | |
998 | | - | |
999 | | - | |
1000 | | - | |
1001 | | - | |
| 1004 | + | |
| 1005 | + | |
| 1006 | + | |
1002 | 1007 | | |
1003 | 1008 | | |
1004 | 1009 | | |
| |||
0 commit comments