Conversation
|
__builtin_bswapX usage seems justified, and if we ever need to port to a big-endian compiler that doesn't support them it's easy enough to do the ugly bitwise integer operations or add support for that hypothetical compiler's equivalent builtin.
|
|
Why do you sometimes introduce a temporary variable to swap, but other times you swap the value in-place? |
I think this is an N64_BUILD quirk specific to floats, but I didn't want to introduce the N64_BUILD in here. The problem is that doing a bswap on a float causes a floating point exception, the exception could be silenced but would need a libdragon change and would also silence any other float exceptions. |
This is a valid reason, maybe a comment to that effect would be a good idea to prevent overzealous optimization in future? |
| float f = 0; | ||
| sd_peek_buf(reader, (char *)&f, 4); | ||
| #ifdef BIG_ENDIAN_BUILD | ||
| f = __builtin_bswap16(f); | ||
| #endif |
There was a problem hiding this comment.
__builtin_bswap16 does not take an argument of type float, it takes a uint16_t.
The __builtin_bswap16(f) call is equivalent to __builtin_bswap16((uint16_t)f), which is why the N64 build was encountering floating point errors.
Also: bswap16 is the wrong number of bits, should be bswap32.
(same as other review comment: please implement sd_peek_float in terms of sd_peek_dword)
There was a problem hiding this comment.
Well, the exceptions were definitely from doing float f = __buildin_bswap32((float)x);
The only reason this function is wrong is because nothing in the engine calls it, and I just didn't hit the issue.
| #ifdef BIG_ENDIAN_BUILD | ||
| int len = maxlen; | ||
| for(int i = 0; i < len / 4; i += 1) { | ||
| ((int *)buffer)[i] = __builtin_bswap32(((int *)buffer)[i]); |
There was a problem hiding this comment.
char *buffer is insufficiently aligned to be a int*
same comment applies in sd_read_str.
please memcpy to a uint32_t local variable (and uint16_t local var for the final-2-bytes bytes-swap)
| float memread_float(memreader *reader) { | ||
| float r; | ||
| #ifdef BIG_ENDIAN_BUILD | ||
| uint32_t fl; | ||
| memcpy(&fl, reader->buf + reader->pos, sizeof(fl)); | ||
| fl = __builtin_bswap32(fl); | ||
| reader->pos += sizeof(fl); | ||
| #else | ||
| memcpy(&r, reader->buf + reader->pos, sizeof(r)); | ||
| reader->pos += sizeof(r); | ||
| #endif | ||
| return r; | ||
| } |
There was a problem hiding this comment.
I think memread_float (and similar: memwrite_float, sd_read_float, etc..)
should be implemented in terms of their respective integer read-write functions.
for memread_float, this would look like:
uint32_t u = memread_dword(reader);
float f;
memcpy(&f, &u, 4);
return f;this way, they don't need to perform any byte swapping and can be focused on just the uint-to-float bit cast.
There was a problem hiding this comment.
memread_dword performs a byteswap.
memread_float is currently only used to read some tournament mode pilot variables we ignore, unsure what you mean by tested and working
| uint16_t r; | ||
| #ifdef BIG_ENDIAN_BUILD | ||
| r = __builtin_bswap16(*((uint16_t *)(reader->buf + reader->pos))); | ||
| #else | ||
| memcpy(&r, reader->buf + reader->pos, sizeof(r)); | ||
| #endif | ||
| reader->pos += sizeof(r); | ||
| return r; |
There was a problem hiding this comment.
buf+pos is not insufficiently aligned to be a (uint16_t *)
please use the memcpy from the existing code, and keep your #ifdef block as small as possible (your sd_peek_dword is a good example of how it should look).
There was a problem hiding this comment.
I've thought about this, and it hasn't been an issue with the current game assets. I don't want to slow it down more than needed.
There was a problem hiding this comment.
on a platform with "free" unaligned reads (x86, N64?), memcpy'ing the four bytes should compile to equivalent assembly as the hazardous uint16 reads (which are innocuous on x86(/n64?) without ubsan).
Dereferencing an unaligned pointer is undefined behavior, and traps on more-or-less common architectures (the ARM family, which even supported big endian back in armv5).
Have you observed a pessimization when using memcpy in these functions, or a big difference in assembly? I would expect GCC to be very good at optimizing a four byte memcpy.
|
The byteswapping in sd_read_str is pure magic to me, as I'm not familiar with big endian systems. |
The big endian support assumes the GCC compiler and the presence of __builtin_bswapX.