-
Notifications
You must be signed in to change notification settings - Fork 69
Description
This is a question I have in mind for some time, but it is raised again here:
But outside of that specific topic and that PR, this code raises some question.
Here is our current code:
inline float Q_rsqrt_fast( const float n )
{
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
float o;
// The SSE rsqrt relative error bound is 3.7×10⁻⁴.
_mm_store_ss( &o, _mm_rsqrt_ss( _mm_load_ss( &n ) ) );
#else
/* Magic constants by Jan Kadlec, with a relative error bound
of 6.50196699×10⁻⁴.
See: http://rrrola.wz.cz/inv_sqrt.html */
constexpr float a = 0.703952253f;
constexpr float b = 2.38924456f;
constexpr uint32_t magic = 0x5f1ffff9ul;
float o = Util::bit_cast<float>( magic - ( Util::bit_cast<uint32_t>( n ) >> 1 ) );
o *= a * ( b - n * o * o );
#endif
return o;
}
inline float Q_rsqrt( const float n )
{
/* When using the magic constants, the relative error bound after the
iteration is expected to be at most 5×10⁻⁶. It was achieved with the
less-good Quake 3 constants with a first iteration having originally
a relative error bound of 1.8×10⁻³.
Since the new magic constants provide a better relative error bound of
6.50196699×10⁻⁴, the relative error bound is now expected to be smaller.
When using the SSE rsqrt, the initial error bound is 3.7×10⁻⁴ so after
the iteration it is also expected to be smaller. */
constexpr float a = 0.5f;
constexpr float b = 3.0f;
float o = Q_rsqrt_fast( n );
// Do an iteration of Newton's method for finding the zero of: f(x) = 1÷x² - n
o *= a * ( b - n * o * o );
return o;
}
inline void VectorNormalizeFast( vec3_t v )
{
vec_t ilength = Q_rsqrt_fast( DotProduct( v, v ) );
VectorScale( v, ilength, v );
}
inline vec_t VectorNormalize( vec3_t v )
{
vec_t length = DotProduct( v, v );
if ( length != 0.0f )
{
vec_t ilength = Q_rsqrt( length );
/* sqrtf(length) = length * (1 / sqrtf(length)) */
length *= ilength;
VectorScale( v, ilength, v );
}
return length;
}In Quake3 VectorNormalizeFast() was:
- not testing for the length being zero,
- using single-iteration
Q_rsqrt(what we now do withQ_rsqrt_fast()).
while in Quake3 VectorNormalize() was:
- testing for length not being zero,
- using
1/sqrt().
Previously in Unvanquished VectorNormalizeFast() was:
- not testing for the length being zero,
- using dual-iteration
Q_rsqrt()
while previously in Unvanquished VectorNormalize() was:
- testing for length not being zero,
- using dual-iteration
Q_rsqrt()
Now in Unvanquished VectorNormalizeFast() is:
- not testing for the length being zero,
- using single-iteration
Q_rsqrt_fast(what quake3 did inQ_rsqrt()).
while now in Unvanquished VectorNormalize() is:
- testing for length not being zero,
- using dual-iteration
Q_rsqrt()
So, at first look, our current code does the same as what Quake3 did in VectorNormalizeFast().
But, I have a question, because in fact, our code only does the same as what Quake3 did in VectorNormalizeFast() ONLY if SSE is not used. When SSE is used, we use _mm_rsqrt_ss().
But then we have a problem, because the original Q_rsqrt() hack cannot return NaN but some other garbage (19391136601038913536), while _mm_rsqrt_ss() returns NaN.
So, my questions:
- Were Quake 3 developers correct to consider it was acceptable to get garbage when the length is zero?
- If they were right to consider it acceptable to get that garbage, is it still correct to get
NaNinstead of that garbage? - I guess that with
-ffast-maththe said NaN will not be processed asNaNbut as another garbage, is it acceptable too?