Fast R_QtangentsToTBN() variant to speed-up Tess_SurfaceIQM() and Tess_SurfaceMD5()
#1831
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
R_TBNtoQtangentsFast()andfloatToSnorm16_fast()Tess_SurfaceIQM()andTess_SurfaceMD5()withR_TBNtoQtangentsFast()I noticed that the the
Tess_SurfaceIQM()function (used in CPU model code) was spending significant time inR_QtangentsToTBN()and this was spending significant time infloatToSnorm16(). Then I discovered that this function was doing a rounding to the closest int on every component of the vector, callinglrintf()on every component one by one. But, this is model rendering code, we may not need that much exactitude, and a basic truncate can likely do the job as well and the result is likely good enough. Also, this CPU code is a fallback for when low-end devices can't process the current model on GPU, so the player is already using a low preset with low LOD and then, already gets worse results than that in other rendered things. Using a basic truncate means the compiler can vectorize, and it does.I reported the trick to
Tess_SurfaceMD5()as well.Other model code (like the MD3 code) using
R_QtangentsToTBN()at load time still use the properly-rounded, lrintf-based variant. Same for the IQM/MD5 loading code for the GPU code path.Before,
97fps, with generated code:After,
103fps, with generated code:Tess_SurfacePolychain()withR_TBNtoQtangentsFast()I don't know what
Tess_SurfacePolychain()is used for, so I don't know if it's safe to useR_TBNtoQtangentsFast()there.I noticed this is another function that is called at render time, so making it faster can reduce frametime as well.
Tess_SurfacePolychain()withVectorNormalizeFast()Tess_SurfacePolychain()withVectorNormalizeFast()inR_CalcTangents()R_CalcTangents()withVectorNormalizeFast()It is safe to use
VectorNormalizeFast()anytimeVectorNormalize()doesn't return anything (no difference in the final computation, just less branching).I don't know what uses that other
R_CalcTangents()variant, but we can blindly useVectorNormalizeFast()when there is no return, so let's do it as well.