Skip to content

Some questions about VectorNormalizeFast() #1493

@illwieckz

Description

@illwieckz

This is a question I have in mind for some time, but it is raised again here:

But outside of that specific topic and that PR, this code raises some question.

Here is our current code:

inline float Q_rsqrt_fast( const float n )
{
#if defined(DAEMON_USE_ARCH_INTRINSICS_i686_sse)
	float o;
	// The SSE rsqrt relative error bound is 3.7×10⁻⁴.
	_mm_store_ss( &o, _mm_rsqrt_ss( _mm_load_ss( &n ) ) );
#else
	/* Magic constants by Jan Kadlec, with a relative error bound
	of 6.50196699×10⁻⁴.
	See: http://rrrola.wz.cz/inv_sqrt.html */
	constexpr float a = 0.703952253f;
	constexpr float b = 2.38924456f;
	constexpr uint32_t magic = 0x5f1ffff9ul;
	float o = Util::bit_cast<float>( magic - ( Util::bit_cast<uint32_t>( n ) >> 1 ) );
	o *= a * ( b - n * o * o );
#endif
	return o;
}

inline float Q_rsqrt( const float n )
{
	/* When using the magic constants, the relative error bound after the
	iteration is expected to be at most 5×10⁻⁶. It was achieved with the
	less-good Quake 3 constants with a first iteration having originally
	a relative error bound of 1.8×10⁻³.
	Since the new magic constants provide a better relative error bound of
	6.50196699×10⁻⁴, the relative error bound is now expected to be smaller.
	When using the SSE rsqrt, the initial error bound is 3.7×10⁻⁴ so after
	the iteration it is also expected to be smaller. */
	constexpr float a = 0.5f;
	constexpr float b = 3.0f;
	float o = Q_rsqrt_fast( n );
	// Do an iteration of Newton's method for finding the zero of: f(x) = 1÷x² - n
	o *= a * ( b - n * o * o );
	return o;
}

inline void VectorNormalizeFast( vec3_t v )
{
	vec_t ilength = Q_rsqrt_fast( DotProduct( v, v ) );

	VectorScale( v, ilength, v );
}

inline vec_t VectorNormalize( vec3_t v )
{
	vec_t length = DotProduct( v, v );

	if ( length != 0.0f )
	{
		vec_t ilength = Q_rsqrt( length );
		/* sqrtf(length) = length * (1 / sqrtf(length)) */
		length *= ilength;
		VectorScale( v, ilength, v );
	}

	return length;
}

In Quake3 VectorNormalizeFast() was:

  • not testing for the length being zero,
  • using single-iteration Q_rsqrt (what we now do with Q_rsqrt_fast()).

while in Quake3 VectorNormalize() was:

  • testing for length not being zero,
  • using 1/sqrt().

Previously in Unvanquished VectorNormalizeFast() was:

  • not testing for the length being zero,
  • using dual-iteration Q_rsqrt()

while previously in Unvanquished VectorNormalize() was:

  • testing for length not being zero,
  • using dual-iteration Q_rsqrt()

Now in Unvanquished VectorNormalizeFast() is:

  • not testing for the length being zero,
  • using single-iteration Q_rsqrt_fast (what quake3 did in Q_rsqrt()).

while now in Unvanquished VectorNormalize() is:

  • testing for length not being zero,
  • using dual-iteration Q_rsqrt()

So, at first look, our current code does the same as what Quake3 did in VectorNormalizeFast().

But, I have a question, because in fact, our code only does the same as what Quake3 did in VectorNormalizeFast() ONLY if SSE is not used. When SSE is used, we use _mm_rsqrt_ss().

But then we have a problem, because the original Q_rsqrt() hack cannot return NaN but some other garbage (19391136601038913536), while _mm_rsqrt_ss() returns NaN.

So, my questions:

  • Were Quake 3 developers correct to consider it was acceptable to get garbage when the length is zero?
  • If they were right to consider it acceptable to get that garbage, is it still correct to get NaN instead of that garbage?
  • I guess that with -ffast-math the said NaN will not be processed as NaN but as another garbage, is it acceptable too?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions