Fast `R_QtangentsToTBN()` variant to speed-up `Tess_SurfaceIQM()` and `Tess_SurfaceMD5()` #1831

illwieckz · 2025-09-29T18:31:19Z

renderer: introduce R_TBNtoQtangentsFast() and floatToSnorm16_fast()
renderer: faster Tess_SurfaceIQM() and Tess_SurfaceMD5() with R_TBNtoQtangentsFast()

I noticed that the the Tess_SurfaceIQM() function (used in CPU model code) was spending significant time in R_QtangentsToTBN() and this was spending significant time in floatToSnorm16(). Then I discovered that this function was doing a rounding to the closest int on every component of the vector, calling lrintf() on every component one by one. But, this is model rendering code, we may not need that much exactitude, and a basic truncate can likely do the job as well and the result is likely good enough. Also, this CPU code is a fallback for when low-end devices can't process the current model on GPU, so the player is already using a low preset with low LOD and then, already gets worse results than that in other rendered things. Using a basic truncate means the compiler can vectorize, and it does.

I reported the trick to Tess_SurfaceMD5() as well.

Other model code (like the MD3 code) using R_QtangentsToTBN() at load time still use the properly-rounded, lrintf-based variant. Same for the IQM/MD5 loading code for the GPU code path.

Before, 97fps, with generated code:

		floatToSnorm16( q, resqtangent );
0x5ae46f35db61:	mulss        xmm0, dword ptr [rip + 0x2b6537]
0x5ae46f35db69:	movss        dword ptr [rsp + 8], xmm3
0x5ae46f35db6f:	movss        dword ptr [rsp + 4], xmm2
0x5ae46f35db75:	movss        dword ptr [rsp + 0xc], xmm1
0x5ae46f35db7b:	call         0x5ae46f1c3440 (???)
0x5ae46f35db80:	movss        xmm1, dword ptr [rsp + 0xc]
0x5ae46f35db86:	movss        xmm0, dword ptr [rip + 0x2b6512]
0x5ae46f35db8e:	mov          rbp, rax
0x5ae46f35db91:	mulss        xmm0, xmm1
0x5ae46f35db95:	call         0x5ae46f1c3440 (???)
0x5ae46f35db9a:	movss        xmm2, dword ptr [rsp + 4]
0x5ae46f35dba0:	movss        xmm0, dword ptr [rip + 0x2b64f8]
0x5ae46f35dba8:	mov          r12, rax
0x5ae46f35dbab:	mulss        xmm0, xmm2
0x5ae46f35dbaf:	call         0x5ae46f1c3440 (???)
0x5ae46f35dbb4:	movss        xmm0, dword ptr [rip + 0x2b64e4]
0x5ae46f35dbbc:	movss        xmm3, dword ptr [rsp + 8]
0x5ae46f35dbc2:	mov          r13, rax
0x5ae46f35dbc5:	mulss        xmm0, xmm3
0x5ae46f35dbc9:	call         0x5ae46f1c3440 (???)

After, 103fps, with generated code:

		floatToSnorm16_fast( q, resqtangent );
0x6132db310b42:	mulss        xmm0, dword ptr [rip + 0x2b6556]
	if ( fast )
0x6132db310b4a:	test         bpl, bpl
0x6132db310b4d:	je           0x6132db310ce0
0x6132db310b53:	mulss        xmm3, dword ptr [rip + 0x2b6545]
0x6132db310b5b:	cvttss2si    r14d, xmm0
0x6132db310b60:	mulss        xmm4, dword ptr [rip + 0x2b6538]
0x6132db310b68:	mulss        xmm5, dword ptr [rip + 0x2b6530]
0x6132db310b70:	cvttss2si    r13d, xmm3
0x6132db310b75:	cvttss2si    ebp, xmm4
0x6132db310b79:	cvttss2si    eax, xmm5

renderer: faster Tess_SurfacePolychain() with R_TBNtoQtangentsFast()

I don't know what Tess_SurfacePolychain() is used for, so I don't know if it's safe to use R_TBNtoQtangentsFast() there.
I noticed this is another function that is called at render time, so making it faster can reduce frametime as well.

renderer: faster Tess_SurfacePolychain() with VectorNormalizeFast()
renderer: faster Tess_SurfacePolychain() with VectorNormalizeFast() in R_CalcTangents()
renderer: faster other R_CalcTangents() with VectorNormalizeFast()

It is safe to use VectorNormalizeFast() anytime VectorNormalize() doesn't return anything (no difference in the final computation, just less branching).

I don't know what uses that other R_CalcTangents() variant, but we can blindly use VectorNormalizeFast() when there is no return, so let's do it as well.

…QtangentsFast()

…n R_CalcTangents()

illwieckz · 2025-09-29T18:37:17Z

I actually wonder if we really need that rounding with lrintf() via floatToSnorm16() in R_TBNtoQtangents(), if we don't need that, we can even skip the test. But the, I have no idea what's the real drawback of truncating instead of rounding. I just assume that for rendering a model it can't be that bad.

slipher · 2025-09-30T06:16:18Z

LGTM

Screenshots with r_vboModels 0 generally came out pixel-for-pixel identical before/after these changes.

illwieckz · 2025-09-30T15:50:49Z

I looked at Tess_SurfacePolychain() and it is used to set-up fog volumes, and it looks like the cgame trail code can also make use of it. So using a faster truncate code here instead of a precise rounding should be fine as well. It's not like if we were doing entity collisions or things like that.

illwieckz added 6 commits September 27, 2025 05:36

renderer: introduce R_TBNtoQtangentsFast() and floatToSnorm16_fast()

99df5ff

renderer: faster Tess_SurfaceIQM() and Tess_SurfaceMD5() with R_TBNto…

1fbadd1

…QtangentsFast()

renderer: faster Tess_SurfacePolychain() with R_TBNtoQtangentsFast()

9bada75

renderer: faster Tess_SurfacePolychain() with VectorNormalizeFast()

1e86409

renderer: faster Tess_SurfacePolychain() with VectorNormalizeFast() i…

522469f

…n R_CalcTangents()

renderer: faster other R_CalcTangents() with VectorNormalizeFast()

ad8c15c

illwieckz added A-Renderer A-Client labels Sep 29, 2025

illwieckz force-pushed the illwieckz/faster-tbntoqtangents branch from f023987 to ad8c15c Compare September 29, 2025 18:40

illwieckz merged commit 4ff4427 into master Sep 30, 2025
9 checks passed

illwieckz deleted the illwieckz/faster-tbntoqtangents branch September 30, 2025 15:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fast `R_QtangentsToTBN()` variant to speed-up `Tess_SurfaceIQM()` and `Tess_SurfaceMD5()` #1831

Fast `R_QtangentsToTBN()` variant to speed-up `Tess_SurfaceIQM()` and `Tess_SurfaceMD5()` #1831

Uh oh!

illwieckz commented Sep 29, 2025 •

edited

Loading

Uh oh!

illwieckz commented Sep 29, 2025

Uh oh!

slipher commented Sep 30, 2025

Uh oh!

illwieckz commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fast R_QtangentsToTBN() variant to speed-up Tess_SurfaceIQM() and Tess_SurfaceMD5() #1831

Fast R_QtangentsToTBN() variant to speed-up Tess_SurfaceIQM() and Tess_SurfaceMD5() #1831

Uh oh!

Conversation

illwieckz commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

illwieckz commented Sep 29, 2025

Uh oh!

slipher commented Sep 30, 2025

Uh oh!

illwieckz commented Sep 30, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fast `R_QtangentsToTBN()` variant to speed-up `Tess_SurfaceIQM()` and `Tess_SurfaceMD5()` #1831

Fast `R_QtangentsToTBN()` variant to speed-up `Tess_SurfaceIQM()` and `Tess_SurfaceMD5()` #1831

illwieckz commented Sep 29, 2025 •

edited

Loading