Skip to content

Commit 0829014

Browse files
committed
x86 asm: move x87 FPU instructions from x86-assembly-cheat
1 parent f66e777 commit 0829014

File tree

12 files changed

+339
-0
lines changed

12 files changed

+339
-0
lines changed

README.adoc

Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11939,6 +11939,23 @@ Then it is just a huge copy paste of infinite boring details:
1193911939
* <<x86-simd>>
1194011940
* <<arm-simd>>
1194111941

11942+
To debug these instructoins, you can see the register values in GDB with:
11943+
11944+
....
11945+
info registers float
11946+
....
11947+
11948+
or alternatively with register names (here the ARMv8 V0 register):
11949+
11950+
....
11951+
print $v0
11952+
....
11953+
11954+
as mentioned at:
11955+
11956+
* https://stackoverflow.com/questions/5429137/how-to-print-register-values-in-gdb/38036152#38036152
11957+
* https://reverseengineering.stackexchange.com/questions/8992/floating-point-registers-on-arm/20623#20623
11958+
1194211959
Bibliography: https://stackoverflow.com/questions/1389712/getting-started-with-intel-x86-sse-simd-instructions/56409539#56409539
1194311960

1194411961
=== User vs system assembly
@@ -11995,6 +12012,7 @@ Examples under `arch/<arch>/c/` directories show to how use inline assembly from
1199512012
* x86_64
1199612013
** link:userland/arch/x86_64/inline_asm/inc.c[]
1199712014
** link:userland/arch/x86_64/inline_asm/add.c[]
12015+
** link:userland/arch/x86_64/inline_asm/sqrt_x87.c[] Shows how to use the <<x86-x87-fpu-instructions>> from inline assembly. Bibliography: https://stackoverflow.com/questions/6514537/how-do-i-specify-immediate-floating-point-numbers-with-inline-assembly/52906126#52906126
1199812016
* arm
1199912017
** link:userland/arch/arm/inline_asm/inc.c[]
1200012018
** link:userland/arch/arm/inline_asm/inc_memory.c[]
@@ -12395,6 +12413,7 @@ Common combo with idiv 32-bit, which takes the input from `edx:eax`: so you need
1239512413

1239612414
Has some Intel vs AT&T name overload hell:
1239712415

12416+
* https://stackoverflow.com/questions/6555094/what-does-cltq-do-in-assembly/45386217#45386217
1239812417
* https://stackoverflow.com/questions/17170388/trying-to-understand-the-assembly-instruction-cltd-on-x86/50315201#50315201
1239912418
* https://sourceware.org/binutils/docs/as/i386_002dMnemonics.html
1240012419

@@ -12703,6 +12722,39 @@ There is also the `cpuinfo` command line tool that parses the CPUID instruction
1270312722

1270412723
Old floating point unit that you should likely not use anymore, prefer instead the newer <<x86-simd>> instructions.
1270512724

12725+
* FPU basic examples, start here
12726+
** link:userland/arch/x86_64/fadd.S[] FADD. The x76 FPU works on a stack of floating point numbers.
12727+
** link:userland/arch/x86_64/faddp.S[] FADDP. Instructions with the P suffix also Pop the stack. This is often what you want for most computations, where the intermediate results don't matter.
12728+
** link:userland/arch/x86_64/fldl_literal.S[] FLDL literal. It does not seem possible to either https://stackoverflow.com/questions/6514537/how-do-i-specify-immediate-floating-point-numbers-with-inline-assembly
12729+
*** load floating point immediates into x86 x87 FPU registers
12730+
*** encode floating point literals in x86 instructions, including MOV
12731+
* Bulk instructions
12732+
** link:userland/arch/x86_64/fabs.S[] FABS: absolute value: `ST0 = |ST0|`
12733+
** link:userland/arch/x86_64/fchs.S[] FCHS: change sign: `ST0 = -ST0`
12734+
** link:userland/arch/x86_64/fild.S[] FILD: Integer Load. Convert integer to float.
12735+
** link:userland/arch/x86_64/fld1.S[] FLD1: Push 1.0 to ST0. CISC!
12736+
** link:userland/arch/x86_64/fldz.S[] FLDZ: Push 0.0 to ST0.
12737+
** link:userland/arch/x86_64/fscale.S[] FSCALE: `ST0 = ST0 * 2 ^ RoundTowardZero(ST1)`
12738+
** link:userland/arch/x86_64/fsqrt.S[] FSQRT: square root
12739+
** link:userland/arch/x86_64/fxch.S[] FXCH: swap ST0 and another register
12740+
12741+
==== x86 x87 FPU vs SIMD
12742+
12743+
http://stackoverflow.com/questions/1844669/benefits-of-x87-over-sse
12744+
12745+
Modern x86 has two main ways of doing floating point operations:
12746+
12747+
* <<x86-x87-fpu-instructions>>
12748+
* <<x86-simd>>
12749+
12750+
Advantages of FPU:
12751+
12752+
* present in old CPUs, while SSE2 is only required in x86-64
12753+
* contains some instructions no present in SSE, e.g. trigonometric
12754+
* higher precision: FPU holds 80 bit Intel extension, while SSE2 only does up to 64 bit operations despite having the 128-bit register
12755+
12756+
In GCC, you can choose between them with `-mfpmath=`.
12757+
1270612758
=== x86 SIMD
1270712759

1270812760
History:

userland/arch/x86_64/fabs.S

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_0: .double 1.0
7+
double_minus_1_0: .double -1.0
8+
LKMC_PROLOGUE
9+
/* |-1| == 1 */
10+
fldl double_minus_1_0
11+
fabs
12+
fldl double_1_0
13+
fcomip %st(1)
14+
LKMC_ASSERT(je)
15+
finit
16+
17+
/* |1| == 1 */
18+
fldl double_1_0
19+
fabs
20+
fldl double_1_0
21+
fcomip %st(1)
22+
LKMC_ASSERT(je)
23+
finit
24+
LKMC_EPILOGUE

userland/arch/x86_64/fadd.S

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_5: .double 1.5
7+
double_2_5: .double 2.5
8+
double_4_0: .double 4.0
9+
LKMC_PROLOGUE
10+
/* Load to the FPU stack.
11+
* Push value from memory to the FPU stack. */
12+
fldl double_1_5
13+
/* FPU stack after operation:
14+
* ST0 == 1.5 */
15+
16+
fldl double_2_5
17+
/* FPU stack after operation:
18+
* ST0 == 2.5
19+
* ST1 == 1.5 */
20+
21+
/* ST0 = ST0 + ST1 */
22+
fadd %st, %st(1)
23+
/* FPU stack after operation:
24+
* ST0 == 4.0
25+
* ST1 == 1.5 */
26+
27+
fldl double_4_0
28+
/* FPU stack after operation:
29+
* ST0 == 4.0
30+
* ST1 == 1.5
31+
* ST2 == 4.0 */
32+
33+
/* Compare ST0 == ST2 */
34+
fcomi %st(2)
35+
/* FPU stack after operation:
36+
* ST0 == 4.0
37+
* ST1 == 1.5
38+
* ST2 == 4.0 */
39+
LKMC_ASSERT(je)
40+
LKMC_EPILOGUE

userland/arch/x86_64/faddp.S

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,36 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_5: .double 1.5
7+
double_2_5: .double 2.5
8+
double_4_0: .double 4.0
9+
LKMC_PROLOGUE
10+
fldl double_1_5
11+
/* FPU stack after operation:
12+
* ST0 == 1.5 */
13+
14+
fldl double_2_5
15+
/* FPU stack after operation:
16+
* ST0 == 2.5
17+
* ST1 == 1.5 */
18+
19+
/* ST0 = ST0 + ST1
20+
* Pop ST0. */
21+
faddp %st, %st(1)
22+
/* FPU stack after operation:
23+
* ST0 == 4.0 */
24+
25+
fldl double_4_0
26+
/* FPU stack after operation:
27+
* ST0 == 4.0
28+
* ST1 == 4.0 */
29+
30+
/* Compare ST0 == ST1
31+
* Pop ST0. */
32+
fcomip %st(1)
33+
/* FPU stack after operation:
34+
* ST0 == 4.0 */
35+
LKMC_ASSERT(je)
36+
LKMC_EPILOGUE

userland/arch/x86_64/fchs.S

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1: .double 1.0
7+
double_minus_1: .double -1.0
8+
LKMC_PROLOGUE
9+
/* -(1) == -1 */
10+
fldl double_1
11+
fchs
12+
fldl double_minus_1
13+
fcomip %st(1)
14+
LKMC_ASSERT(je)
15+
finit
16+
17+
/* -(-1) == 1 */
18+
fldl double_minus_1
19+
fchs
20+
fldl double_1
21+
fcomip %st(1)
22+
LKMC_ASSERT(je)
23+
finit
24+
LKMC_EPILOGUE

userland/arch/x86_64/fild.S

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_10_0: .double 10.0
7+
.bss
8+
double_10_0_2: .skip 8
9+
LKMC_PROLOGUE
10+
movl $10, double_10_0_2
11+
fildl double_10_0_2
12+
fldl double_10_0
13+
fcomip %st(1)
14+
LKMC_ASSERT(je)
15+
finit
16+
LKMC_EPILOGUE

userland/arch/x86_64/fld1.S

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_0: .double 1.0
7+
LKMC_PROLOGUE
8+
fld1
9+
fldl double_1_0
10+
fcomip %st(1)
11+
LKMC_ASSERT(je)
12+
LKMC_EPILOGUE

userland/arch/x86_64/fldl_literal.S

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_5: .double 1.5
7+
.bss
8+
double_1_5_2: .skip 8
9+
LKMC_PROLOGUE
10+
#if 0
11+
/* Error: junk `.5' after expression */
12+
movq $1.5, double_1_5_2
13+
fldl double_1_5
14+
fldl double_1_5_2
15+
fcomi %st(1)
16+
LKMC_ASSERT(je)
17+
#endif
18+
LKMC_EPILOGUE

userland/arch/x86_64/fldz.S

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_0_0: .double 0.0
7+
LKMC_PROLOGUE
8+
fldz
9+
fldl double_0_0
10+
fcomip %st(1)
11+
LKMC_ASSERT(je)
12+
LKMC_EPILOGUE

userland/arch/x86_64/fscale.S

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
/* https://github.com/cirosantilli/linux-kernel-module-cheat#x86-x87-fpu-instructions */
2+
3+
#include <lkmc.h>
4+
5+
.data
6+
double_1_0: .double 1.0
7+
double_2_5: .double 2.5
8+
double_4_0: .double 4.0
9+
LKMC_PROLOGUE
10+
fldl double_4_0
11+
# ST0 = 4.0
12+
13+
fldl double_2_5
14+
# ST0 = 2.5
15+
# ST1 = 4.0
16+
17+
fldl double_1_0
18+
# ST0 = 1.0
19+
# ST1 = 2.5
20+
# ST2 = 4.0
21+
22+
# ST0 = 1 * 2 ^ (RoundTowardZero(2.5))
23+
# = 1 * 2 ^ 2
24+
# = 4
25+
fscale
26+
# ST0 = 4.0
27+
# ST1 = 2.5
28+
# ST2 = 4.0
29+
30+
fcomip %st(2)
31+
# ST0 = 4.0
32+
# ST1 = 2.5
33+
LKMC_ASSERT(je)
34+
LKMC_EPILOGUE

0 commit comments

Comments
 (0)