flat arrays, flat strings, flat variables #1143
Replies: 21 comments
-
Posted at 2018-02-04 by @gfwilliams Nice - are you trying to set up DMA using JavaScript to do SPI sends? There's a tool at https://github.com/espruino/Espruino/blob/master/scripts/build_js_hardware.js that'll generate JS for the registers in bits of hardware so you can access it more easily. Yes, below 23 bytes it's more efficient to not use flat strings to back the arrays, so Espruino does that. Also for Uint8Array it'll try once to make a flat string, but if it doesn't succeed it'll just back it with a sparse one. Have you seen http://www.espruino.com/Reference#l_E_getAddressOf ? You can at least run that with the second argument as true, and it'll return 0 if there isn't a flat array. In the latest builds (post 1v95) of Espruino you'll find
Hope that helps! |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-07 by mrQ many thx - nice hack to get a flat Uint8Array ;) i packed it into some handy functions extending your E instance:
at the moment, my DMA extension for SPI supports TX only - but gives a really fine performance improvement when running at 12.5 MBaud. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-08 by @gfwilliams That's awesome - and your DMA extension is all written in JS? Have you managed to make a new ILI9341 module that uses it? |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-08 by mrQ yes, i had it all written in JS, but changed then to asm because of the high DMA setup costs in JS. In JS it took approx. 7ms, which results at eff. 5MBaud in 4375 byte. or in other words - transmissions of less than 4,4kbyte would be LESS efficient using DMA over the native SPI implementation. with asm i could reduce the time by approx. 80%, so DMA makes sense for any packet >500byte. regarding the ILI93141 module: on the standard module only the fillrect benefits from DMA, for the ILI93141pal things are a bit better. but at the end i decided to replace both with my own ILI9341 driver, adding some pretty features such as smoothed fonts (incl. the font generator necessary to build them from any google font). if you like, i can provide it to you in the next days. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-08 by @gfwilliams Wow, using http://www.espruino.com/Assembler - or actually using compiled-in C code? http://www.espruino.com/Compilation might also be an option, since it's got some shortcuts in there to make peek and poke really quick :) I'd be really interested in seeing what you've done - I can't promise much about pulling it in, but smooth fonts would be really neat. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-08 by mrQ i am using the E.asm(...). it's a great tool to get things done (even when some thumb instructions are missing;). compilation is fine, but does not bring that boost as E.asm does - even when just using peek and poke. I implemented the same operation in 3 different ways, and called each 1000 times:
i took a look on the compiler output - the differnces in code between rclr2 and rclr3 speak for themselves:
ps: seems that the unary '~' operator has been forgotten in the compiler |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-08 by mrQ I published the current status of the SPI DMA driver at https://github.com/andiy/espruino.git It's important to have in mind, that DMA is only of advantage when sending a minimum amount of data. Below are some benchmarks to have an idea when DMA may be of advantage. Times [ms] for sending a data buffer of length N:
CONCLUSIO:
When sending a small buffer of 1, 2 or 4 byte multiple times, the results are as below.
CONCLUSIO:
Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by @gfwilliams Wow, thanks for this - that's an awesome bit of work! As far as I know you're the first person that's used
Also, I just added You initial code:
Should be much better now. However it's not perfect because the argument still comes in as a This one's very slightly better, but again not great.
Honestly if you're happy with writing Assembler then that's definitely best :) |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ wow² ;) you are really speedy! you are right - i was missing the half word operations E.asm seems to be available in the editor window only. when used in modules, it does not work. is there an easy way to have E.asm for modules, too? my SPI DMA driver has a very strange issue open (marked as i#2). it applies only to the writeInterlaced( buf, N) call - e.g. when repeating the buffer. in this case, the display shows just random data for the last chunk sent. when adding a dummy chunk (just 1 pixel) at the end, it works fine (but for the price, that the function has to wait until everything sent). behaviour occurs independent of SPIx, byte count, baudrate. the only thing i saw was that writing to a certain JSVar while having the DMA running in background seems to change to DMA data. but when checking the DMA it pointed definitiely not to the JSVar... very strange... in fact, i could not figure out whats the real reason. do you use the DMA in the firmware? do you have any experience with DMA in FIFO mode + repeating source data ? |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by @gfwilliams Just a thought - You could write some assembler code that did basically what the ILI9341pal driver does, but with DMA:
Obviously you've got your current solution with the nice fonts so it's not a big deal, that that could end up being really interesting. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by @gfwilliams
I just tested, and you can turn on I'm not sure about your DMA issues... All I've used it for is the TV output capability, and I'm pretty sure it's not used anywhere else. Not sure what to suggest really - variables don't get moved around in memory during GC so I don't think that could be an issue either. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ i implemented something similar, but with larger chunks (1..10kBytes); sending just 16bits with DMA is very slow. it might be even better to write directly to SPI TX? maybe DMA double buffer (DBM) helps, i did not try till now. but i am not sure if 16bits are enough get rid of the inter-byte gap (due to CPU load for DMA ready scanning). think i will give it a try. i have identified another performance brake: E.mapInPlace; i think it's use of JSVars slows down the lookup. using asm coded specialiced functions for 1/2/4bpp is about 20x faster. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ ufff - "compiled" is damn fast now. same benchmark as before, but now compiled is even 4ms (~2%) faster than my asm function! seems you implemented some quantum technology ;) FYI: the |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ
great - but with this option ON, "compiled" produces an error (see attachement). but don't worry, i can live without (with some less comfort). and the new lightning fast peek/poke compilation already helps pretty much.Attachments: |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by @gfwilliams
Great!
That's interesting... Are you sure it's not just that there was an issue connecting to the board that time? If you disconnect and reconnect via the IDE button then it may work |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ
nope, even restarting the web IDE does not help --- but dis/reconnecting the Espruino solves it! |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by mrQ
there seems to sit a little bug - this does not work:
this is fine:
this is fine, too:
another - similar? - flaw i have seen:
generates this output: 40013004 [object Object] expected output: 40013004 2 to achieve the expected behaviour, i have either to
|
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-09 by ClearMemory041063 Which hardware does this work with? Don't the register addresses depend on the hardware platform? It would be cool to use this with an audio codec to record or play waveforms. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-10 by mrQ It's tested for the EspruinoWIFI based on STM32F411. Adaption to another board should be quite easy, as long as it is a ARM processor. I think changing the xxx_BASE constants should be all do be done. |
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-10 by mrQ
As already mentioned - to do a full DMA setup per 16bit is much too slow (because the necessary procedure for setting up, starting and then stopping dma/spi). so i tried it with double buffer (DMA) feature. basically this works nice, but it has some drawbacks:
the best way of pressing paletted image data seems to me:
some simple measurements of optimized asm 1/2/4 into 16bit lookup functions show that unpaletting is typ. 3..10 times faster than the net SPI transmission time. e.g. on 10k pixels @1bpp we bring >11ms (12.80-1.44) of cpu time back to JS compared to any blocking method.
|
Beta Was this translation helpful? Give feedback.
-
Posted at 2018-02-12 by @gfwilliams It sounds really good - yes, you'd need a biggish buffer to allow things like IRQs to get handled in the background if they need to be. With your compiler issues:
should have given you the error:
Because defining JS arrays from compiled code isn't supported yet unfortunately. Even if it had worked, your code would have ended up being slow because it would have defined a JavaScript array type though. With your other problem, that was definitely a compiler issue where it wasn't handling variables correctly when used as function args - that should now be fixed :) |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Posted at 2018-02-03 by mrQ
flat arrays are a mandatory thing when working with DMA.
but, what is the recommended way to create a flat array for sure?
this does not work for n<23, and does not reliably work and n>=23.
this generates always a flat variable, but it's a string and not a arraybuffer as needed.
furthermore i can not create a empty buffer just specifying the length.
good to know, that these two behave like new UintXArray (and not like E.toString)
in fact, this does not work either:
my workaround for the moment: create always a UintXArray >=23 byte
but this wether very elegant, nor guaranteed to work with future firmware versions.
are there any recommendations how to create a flat UintXArray, independent of it's length?
thx!
Beta Was this translation helpful? Give feedback.
All reactions