-
Notifications
You must be signed in to change notification settings - Fork 1.4k
fixes and missing logic in hcd_rp2040.c and rp2040_usb.c #3533
Description
Operating System
Linux
Commit SHA
Board
rp2350
Firmware
RP6502-RIA? I don't understand what goes here.
What happened ?
Data Seq Error, lockups, and other panics.
How to reproduce ?
Do anything interesting with an RP2350 in host mode with a hub.
Debug Log as txt file (LOG/CFG_TUSB_DEBUG=2)
Literally millions of lines of logs were processed to find these bugs. They were discarded as I fixed the bugs.,
Screenshots
Hi. I wrote a custom MSC host that supports CBI (Control/Bulk/Interrupt) transport, LUNs, and removable media for the RP2350-based Picocomputer 6502. Getting that working required me to track down and fix a number of bugs and missing logic in the existing RP2040 host controller driver. I'm submitting those low-level fixes for the benefit of everyone using TinyUSB on RP2040/RP2350.
Much of the heavy lifting was done by AI but I spent roughly 200 hours of my own time getting this to work. It was extremely difficult because of the previously unknown silicon quirk with EPX which I didn't find until a couple days ago.
The new files live here:
https://github.com/picocomputer/rp6502/tree/main/src/tinyusb_rp6502
If there's interest, I'll make a PR with the two RP2040 files. The HCD diff is unreadable, so here's an AI summary of all the changes.
Most Important: Silicon Quirk — SIE Shared Handshake Latches
I don't believe this has been documented anywhere else.
The RP2040 SIE shares a single set of handshake latches (ACK_REC, NAK_REC, STALL_REC, DATA_SEQ_ERROR, RX_TIMEOUT, etc.) across both EPX and the interrupt (async) endpoint transactions. When the SIE finishes an EPX transaction and immediately starts polling an interrupt endpoint, the poll's handshake response overwrites the EPX latches before the IRQ handler has a chance to read them.
The result: a perfectly valid, ACK'd EPX completion appears as RX_TIMEOUT or DATA_SEQ_ERROR, breaking control-transfer enumeration — particularly visible when interrupt endpoints (e.g., HID devices connected through a hub) are configured alongside regular control traffic.
Fix: Disable all interrupt endpoint polling (int_ep_ctrl = 0) immediately before every EPX START_TRANS write, and restore it only after the IRQ handler has finished processing the EPX completion result. New helpers suppress_int_polling() and unsuppress_int_polling() + _int_ep_suppressed flag implement this. The restoration is deferred to the end of the ISR and only fires once epx.active is false (multi-packet double-buffered transfers span several IRQs — re-enabling polling mid-transfer would reintroduce the race).
hcd_rp2040.c Changes
1. busy_wait_at_least_cycles(12) uses the wrong clock
busy_wait_at_least_cycles counts CPU clocks, but the SIE operates on a 48 MHz USB clock. On a 125 MHz RP2040, 12 CPU cycles ≈ 96 ns, which is well short of the 12 SIE clock cycles (≈ 250 ns) required for APB writes to propagate to sie_ctrl, sie_status, buf_status, and buffer_control AVAIL (RP2040 §4.1.2.5.1). Changed to busy_wait_us(1), extracted into a named helper hw_sie_settle() used consistently throughout.
2. STATUS-phase ZLP never completed — control transfers stuck
A STATUS-phase ZLP (zero-length data phase, total_bytes = 0) causes the SIE to raise TRANS_COMPLETE without raising BUFF_STATUS, because BUFF_STATUS only fires on actual DPRAM buffer movement and zero bytes means no movement. The upstream driver discarded this TRANS_COMPLETE with a comment "Don't care. Will handle this in buff status" — this is wrong. Added: if TRANS_COMPLETE fires while epx.active == true and buf_status bits [1:0] are clear, complete the EPX transfer with XFER_RESULT_SUCCESS.
3. DATA_SEQ_ERROR must be handled before BUFF_STATUS
USB_INTS_ERROR_DATA_SEQ_BITS fires in the same snapshot as BUFF_STATUS. If BUFF_STATUS runs first, handle_hwbuf_status() sees the FULL bit set and accepts the wrong-PID data as a successful completion, delivering silently corrupted data to the class driver. The fix processes DATA_SEQ_ERROR first, suppresses the coincident BUFF_STATUS/TRANS_COMPLETE bits, resets EPX via epx_hard_reset(), and reports XFER_RESULT_FAILED. Previously this path called panic().
4. RX_TIMEOUT handling was a silent no-op
The upstream handler cleared USB_SIE_STATUS_RX_TIMEOUT_BITS and did nothing else — no transfer completion, no endpoint reset. This leaves the endpoint stuck with ep->active == true forever. Now: complete the active EPX transfer with XFER_RESULT_FAILED, call epx_hard_reset(), suppress coincident BUFF_STATUS/TRANS_COMPLETE bits.
5. Disconnect race — BUFF_STATUS + TRANS_COMPLETE + disconnect in same snapshot
When a device is unplugged while a transfer is in flight, the hardware can assert BUFF_STATUS and/or TRANS_COMPLETE in the same IRQ snapshot as the disconnect event. If the subsequent if-blocks run after the disconnect is processed, they deliver hcd_event_xfer_complete for a device the stack is already tearing down, corrupting enumeration state for the next device. Fix: on detecting disconnect, immediately zero sie_ctrl, clear int_ep_ctrl, clear all buf_status bits, zero epx buffer_control, and call hw_endpoint_reset_transfer() on every active endpoint without delivering any completions — all before the other if-blocks execute.
6. epx_prepare_for_start() — clear stale FULL/AVAIL bits before every START_TRANS
Stale FULL or AVAIL bits left in buf_status or buffer_control from a previous phase can be misread as a new completion immediately after START_TRANS. Added epx_prepare_for_start() which: clears buf_status[1:0], zeros EPX buffer_control, and clears all transient SIE status latches (ACK_REC, NAK_REC, STALL_REC, DATA_SEQ_ERROR, RX_TIMEOUT, RX_SHORT_PACKET, TRANS_COMPLETE, ENDPOINT_ERROR) with a hw_sie_settle() after to ensure the writes propagate before START_TRANS is written. Called from both hcd_edpt_xfer() and hcd_setup_send() under IRQ save.
7. hcd_edpt_abort_xfer() was a stub returning false
Fully implemented: stops EPX via usb_hw->sie_ctrl = SIE_CTRL_BASE or clears the interrupt endpoint's int_ep_ctrl slot; resets software endpoint state; zeros buf_ctrl; clears pending BUFF_STATUS bits; restores interrupt endpoint polling for EPX aborts.
8. hcd_edpt_clear_stall() was panic("hcd_clear_stall")
Implemented: resets ep->next_pid = 0 inside a USB IRQ critical section (the IRQ reads then XORs next_pid, so the reset must be atomic); also zeros buffer_control and clears buf_status for interrupt endpoints.
9. hcd_edpt_close() was a stub returning false
Implemented: calls new hw_intep_close() helper which disables the interrupt endpoint's int_ep_ctrl slot, zeros int_ep_addr_ctrl, clears ep->configured, and zeroes the hardware control registers — all under IRQ save.
10. EP0 max packet size tracking (_ep0_mps[])
EPX is shared across all devices. Its wMaxPacketSize becomes stale when control traffic interleaves between a low-speed device (MPS=8) and a full-speed device (MPS=64). Added _ep0_mps[CFG_TUH_DEVICE_MAX + CFG_TUH_HUB + 1] to track the correct EP0 MPS per device address; hcd_edpt_xfer() and hcd_setup_send() re-init EPX from this table on every phase rather than using the (possibly stale) live register value.
11. SIE speed-change latch must be cleared before reading dev_speed()
In the connect path, USB_SIE_STATUS_SPEED_BITS must be cleared before calling dev_speed(). If the APB write hasn't propagated when dev_speed() reads back the register, the read returns 0 (Heisenbug: UART logging was inadvertently masking this timing window). Fix: clear the latch first, hw_sie_settle(), then read speed.
12. USB_SIE_STATUS_RX_SHORT_PACKET_BITS and USB_SIE_STATUS_ENDPOINT_ERROR_BITS missing from RP2040 SDK
Both constants exist in the RP2350 SDK but are absent from the RP2040 SDK headers. Added #ifndef guards so the driver compiles and correctly clears those status bits on both silicon revisions.
13. HOST_RESUME interrupt unhandled
Added a handler for USB_INTS_HOST_RESUME_BITS that acknowledges remote wakeup signalling (USB_SIE_STATUS_RESUME_BITS). Without this the interrupt fires continuously after a device signals resume.
14. usb_irq_save() / usb_irq_restore() fine-grained critical sections
Added helpers that disable/restore only USBCTRL_IRQ rather than all interrupts, used for the START_TRANS window and endpoint close/clear-stall paths.
15. USB_HOST_INTERRUPT_ENDPOINTS → PICO_USB_HOST_INTERRUPT_ENDPOINTS
Renamed to match the RP2350 SDK constant name.
rp2040_usb.c Changes
1. busy_wait_at_least_cycles(12) uses the wrong clock
Same issue as in hcd_rp2040.c. Changed to busy_wait_us(1).
2. FIFO copy routines use unaligned_memcpy — wrong for hardware FIFOs
unaligned_memcpy may use word-wide or multi-byte operations that are not safe for memory-mapped hardware FIFOs, which require strict byte-at-a-time volatile accesses. Changed both usb_hw_write_fifo and usb_hw_read_fifo to explicit volatile byte loops: volatile uint8_t *dst8 = (volatile uint8_t *)hwfifo; while (len--) *dst8++ = *src++; (and symmetric for read).
3. Double-buffer abort: removed dead #if 0 block, added chip-version gate
The abort logic was wrapped in #if 0. Restored the abort path with a chip-version guard (rp2040_chip_version() >= 2) because on B1 silicon the abort register has no effect. On B2+ it is used correctly: usb_hw_set->abort, poll abort_done, then usb_hw_clear->abort. Also switches back to single-buffer mode and zeros buffer_control before returning, preventing the SIE from re-arming buf1 from stale DPRAM state.
I have checked existing issues, discussion and documentation
- I confirm I have checked existing issues, discussion and documentation.