Skip to content

Immediate EOI in S-EL2 causes level-triggered IRQ to ping-pong between S-EL2 and S-EL1 #17

@firewof

Description

@firewof

Hi,
Our SP developers have reported an IRQ‑related issue. The problem is as follows:
Hafnium receives a pIRQ and signals the S‑EL1 UP SP via a vIRQ; the IRQ is level-triggered and the SP’s handler deasserts the interrupt line. However, Hafnium performs EOI in S-EL2 before switching back to the SP, Link

if (v_intid != SPURIOUS_INTID_OTHER_WORLD) {
	/*
	 * End the interrupt to drop the running priority. It also
	 * deactivates the physical interrupt. If not, the interrupt
	 * could trigger again after resuming current vCPU.
	 */
	plat_interrupts_end_of_interrupt(intid);
}

so as soon as we resume the SP the same pIRQ is immediately re-presented and the CPU returns to Hafnium’s irq_lower handler.

With instrumentation we observe the SP’s PC stuck at the IRQ handler entrypoint (0x93603280) and the system bouncing between Hafnium and the SP without making forward progress.
[3.611642][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.617131][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93600094
[3.622696][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.628344][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
[3.633909][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.639558][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
[3.645122][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.650771][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
[3.656336][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.661985][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
[3.667550][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.673198][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
[3.678764][HF] (0) ERROR: ffa_interrupts_handle_secure_interrupt
[3.684412][HF] (0) ERROR: intid: 681 vm: 0x8001, pc: 0x93603280
Additionally, we found that removing Hafnium’s EOI allows the SP to complete its IRQ handler.

We also noticed a comment stating that delaying EOI would cause a re-trigger after resume current vCPU; for level-triggered IRQs our observation appears to be the opposite—early EOI before the device deasserts the line causes an immediate re-trigger and ping-pong. Do you have any suggestions?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions