I expect the performance of the quad prec literal NUM_rq to have the same performance of doing real128(NUM.q), however it is much slower. Running perf on some sample code shows that the conversion from string to __float128 is not happening at compile time as seems to be the case with .q, but occurs at runtime:

When I use real128(NUM.q), there is no function call to strtofloat:

Changing from _rq to .q made my code about 1.5 times faster.