Conversation
|
Good idea Howard! could be replaced by something like this: I don't know if from the point of the compiler the second option could use SIMD instructions, since there are no output+_ and input++ around, but perhaps the compiler figures it out since it is idiomatic. There's also the volk library (https://www.libvolk.org/) that has a couple of functions called 'volk_16i_s32f_convert_32f' (https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and 'volk_16ic_convert_32fc' (https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are optimized for different types of hardware, but I am not sure if its license (GPL v3) is compatible with this project's license. Franco |
|
I got the idea from CMSIS-DSP. Ideally, hand write instruction can do a
better job but hard to adopt to different SIMD solutions. I will do more
experiment to see how difference they are.
…On Sat, Jul 6, 2024 at 10:54 PM Franco Venturi ***@***.***> wrote:
Good idea Howard!
I don't know how C++ compilers work these days, but perhaps a block like
this:
*output++ = float(*input++);
*output++ = float(*input++);
*output++ = float(*input++);
*output++ = float(*input++);
could be replaced by something like this:
const int16_t *in = input + 4 * m;
float *out = output + 4 * m;
out[0] = float(int[0]);
out[1] = float(int[1]);
out[2] = float(int[2]);
out[3] = float(int[3]);
I don't know if from the point of the compiler the second option could use
SIMD instructions, since there are no output+_ and input++ around, but
perhaps the compiler figures it out since it is idiomatic.
There's also the volk library (https://www.libvolk.org/) that has a
couple of functions called 'volk_16i_s32f_convert_32f' (
https://www.libvolk.org/doxygen/volk_16i_s32f_convert_32f.html) and
'volk_16ic_convert_32fc' (
https://www.libvolk.org/doxygen/volk_16ic_convert_32fc.html), which are
optimized for different types of hardware, but I am not sure if its license
(GPL v3) is compatible with this project's license.
Franco
—
Reply to this email directly, view it on GitHub
<#235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRG3H7J2CTQAXZGSAVLZLAAL7AVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRG44DSMBXHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
-Howard
|
|
what do you think about fast int16_t to float conversion like this https://github.com/m-ou-se/floatconv |
|
This is interesting blog. I will look into it and port i16_to_float over.
…On Sun, Jul 7, 2024 at 5:22 AM Ruslan Migirov ***@***.***> wrote:
https://blog.m-ou.se/floats/
—
Reply to this email directly, view it on GitHub
<#235 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAF3GRDEZWWXJHMFTVMP6RLZLBNXRAVCNFSM6AAAAABKOOA3BSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEMJRHE3TEOBRGM>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
--
-Howard
|
|
i tried this optimization and i have found only convert_float one showed under one percent better cpu usage on r2iq thread... |
No description provided.