PIO Interrupts causing Pico to lock up - apparently two are 'colliding' in time #11529
-
I have a project which is concerned with aligning two devices in time, they communicate 'LTC' and one will sync with the other. In the interrupt routine I am using All of the processing of the LTC signal(s) is done in a collection of PIO blocks, it is these which issue There are two threads; one for co-ordinating the PIO blocks (filling/emptying FIFOs), and another for printing information to screen. Once synchronized, the 'slave' device will crash after a short period of time (say within 5mins). I think this is the device with the (slightly) slower clock/xtal, though not sure on that... The interrupt routine is:
I recently added the 'enable/disable' functions after reading the following, but that did not fix the issue. [Edit: ie the bug existed before they were added] A single device can measure it's own output (literally a loop back, ie cable between pins 13 and 18), however this does NOT trigger the issue. I think this is because the in/out clocks are EXACTLY the same rate, but with two devices they are slightly different (especially considering any start up variance, or fraction clock action). The project as a whole is complex, but the 'core' file can be run on it's own... with just a text output to Thonny. Instructions to recreate issue: If anyone has any suggestion on how to fix, I would happily implement them. :-) |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 8 replies
-
Communicating between threads running on two cores has many pitfalls, and you haven't explained how this is being done. I suggest you look at this doc. Don't be put off by the fact that the doc assumes that |
Beta Was this translation helpful? Give feedback.
-
Can you change the code so that you use different interrupts for 'master' and 'slave' device? As an alternative: Interrupts are not very fast and also introduce 'jitter' latency: Still the interrupt issue seems worth to be investigated. I have the feeling that it might be disabled twice simultaneously (with counter) and enabled back to the wrong state of enable-disable-counter perhaps? Something like that might happen if both ISRs overlap. |
Beta Was this translation helpful? Give feedback.
-
For clarity when using a single device, I wrote the In/Out clocks are EXACTLY the same... whilst this is true (as best as I can get it with clock granularity) for the LTC streams, the TX and RX interrupts are separated by 1 bit clock (416us) due to processing delay in decoding the Differential Manchester Coding. This is likely why the 'one device' test does NOT fail. So I am theorizing that the slightly different clock/XTAL speed between two devices is causing them to drift enough that the problem DOES occur. Indeed if I swap device A and B over, the code ran for 5hours (overnight) before the bug triggered. I had previously noted that it took ~5hrs for the devices to drift apart by 1 frame, meaning that interrupts would again overlap. I mentioned in a reply, removing setting of variable in the interrupt did not solve the bug. I have a feeling that the bug may be in code 'above' mine... maybe in microPython. I am new to the Pico, and have not (yet) used the 3pin debug header. Started Googling on this, but can anyone point me at a good guide on "how to find out where your code is crashing"...? For reference I am using "rp2-pico-20230426-v1.20.0.uf2" microPython. |
Beta Was this translation helpful? Give feedback.
@mungewell please have a look at discussion/10638.
My hypothesis, more elaborated, is that the
irq_handler(m)
is running on both cores in parallel.You may test this by evaluating mem32[0xd0000000], the CPU core id register. Store it's value an a preallocated array, while in the
irq_handler(m)
...If this hypothesis is true then we have a case for which no precaution was taken against yet. It could be that parallel/overlapping
machine.disable_irq()
's andmachine.enable_irq(disable)
's are the culprit. You might try to prevent such a race condition byanother use of acquire()/release() locking within the ISR although I have no idea if this is allowable :-) - interesting new territory, it…