Skip to content

Commit f181399

Browse files
palibjorn-helgaas
authored andcommitted
PCI: aardvark: Fix kernel panic during PIO transfer
Trying to start a new PIO transfer by writing value 0 in PIO_START register when previous transfer has not yet completed (which is indicated by value 1 in PIO_START) causes an External Abort on CPU, which results in kernel panic: SError Interrupt on CPU0, code 0xbf000002 -- SError Kernel panic - not syncing: Asynchronous SError Interrupt To prevent kernel panic, it is required to reject a new PIO transfer when previous one has not finished yet. If previous PIO transfer is not finished yet, the kernel may issue a new PIO request only if the previous PIO transfer timed out. In the past the root cause of this issue was incorrectly identified (as it often happens during link retraining or after link down event) and special hack was implemented in Trusted Firmware to catch all SError events in EL3, to ignore errors with code 0xbf000002 and not forwarding any other errors to kernel and instead throw panic from EL3 Trusted Firmware handler. Links to discussion and patches about this issue: https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50 https://lore.kernel.org/linux-pci/[email protected]/ https://lore.kernel.org/linux-pci/[email protected]/ https://review.trustedfirmware.org/c/TF-A/trusted-firmware-a/+/1541 But the real cause was the fact that during link retraining or after link down event the PIO transfer may take longer time, up to the 1.44s until it times out. This increased probability that a new PIO transfer would be issued by kernel while previous one has not finished yet. After applying this change into the kernel, it is possible to revert the mentioned TF-A hack and SError events do not have to be caught in TF-A EL3. Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Pali Rohár <[email protected]> Signed-off-by: Lorenzo Pieralisi <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]> Reviewed-by: Marek Behún <[email protected]> Cc: [email protected] # 7fbcb5d ("PCI: aardvark: Don't rely on jiffies while holding spinlock")
1 parent cacf994 commit f181399

File tree

1 file changed

+40
-9
lines changed

1 file changed

+40
-9
lines changed

drivers/pci/controller/pci-aardvark.c

Lines changed: 40 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -514,7 +514,7 @@ static int advk_pcie_wait_pio(struct advk_pcie *pcie)
514514
udelay(PIO_RETRY_DELAY);
515515
}
516516

517-
dev_err(dev, "config read/write timed out\n");
517+
dev_err(dev, "PIO read/write transfer time out\n");
518518
return -ETIMEDOUT;
519519
}
520520

@@ -657,6 +657,35 @@ static bool advk_pcie_valid_device(struct advk_pcie *pcie, struct pci_bus *bus,
657657
return true;
658658
}
659659

660+
static bool advk_pcie_pio_is_running(struct advk_pcie *pcie)
661+
{
662+
struct device *dev = &pcie->pdev->dev;
663+
664+
/*
665+
* Trying to start a new PIO transfer when previous has not completed
666+
* cause External Abort on CPU which results in kernel panic:
667+
*
668+
* SError Interrupt on CPU0, code 0xbf000002 -- SError
669+
* Kernel panic - not syncing: Asynchronous SError Interrupt
670+
*
671+
* Functions advk_pcie_rd_conf() and advk_pcie_wr_conf() are protected
672+
* by raw_spin_lock_irqsave() at pci_lock_config() level to prevent
673+
* concurrent calls at the same time. But because PIO transfer may take
674+
* about 1.5s when link is down or card is disconnected, it means that
675+
* advk_pcie_wait_pio() does not always have to wait for completion.
676+
*
677+
* Some versions of ARM Trusted Firmware handles this External Abort at
678+
* EL3 level and mask it to prevent kernel panic. Relevant TF-A commit:
679+
* https://git.trustedfirmware.org/TF-A/trusted-firmware-a.git/commit/?id=3c7dcdac5c50
680+
*/
681+
if (advk_readl(pcie, PIO_START)) {
682+
dev_err(dev, "Previous PIO read/write transfer is still running\n");
683+
return true;
684+
}
685+
686+
return false;
687+
}
688+
660689
static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
661690
int where, int size, u32 *val)
662691
{
@@ -673,9 +702,10 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
673702
return pci_bridge_emul_conf_read(&pcie->bridge, where,
674703
size, val);
675704

676-
/* Start PIO */
677-
advk_writel(pcie, 0, PIO_START);
678-
advk_writel(pcie, 1, PIO_ISR);
705+
if (advk_pcie_pio_is_running(pcie)) {
706+
*val = 0xffffffff;
707+
return PCIBIOS_SET_FAILED;
708+
}
679709

680710
/* Program the control register */
681711
reg = advk_readl(pcie, PIO_CTRL);
@@ -694,7 +724,8 @@ static int advk_pcie_rd_conf(struct pci_bus *bus, u32 devfn,
694724
/* Program the data strobe */
695725
advk_writel(pcie, 0xf, PIO_WR_DATA_STRB);
696726

697-
/* Start the transfer */
727+
/* Clear PIO DONE ISR and start the transfer */
728+
advk_writel(pcie, 1, PIO_ISR);
698729
advk_writel(pcie, 1, PIO_START);
699730

700731
ret = advk_pcie_wait_pio(pcie);
@@ -734,9 +765,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
734765
if (where % size)
735766
return PCIBIOS_SET_FAILED;
736767

737-
/* Start PIO */
738-
advk_writel(pcie, 0, PIO_START);
739-
advk_writel(pcie, 1, PIO_ISR);
768+
if (advk_pcie_pio_is_running(pcie))
769+
return PCIBIOS_SET_FAILED;
740770

741771
/* Program the control register */
742772
reg = advk_readl(pcie, PIO_CTRL);
@@ -763,7 +793,8 @@ static int advk_pcie_wr_conf(struct pci_bus *bus, u32 devfn,
763793
/* Program the data strobe */
764794
advk_writel(pcie, data_strobe, PIO_WR_DATA_STRB);
765795

766-
/* Start the transfer */
796+
/* Clear PIO DONE ISR and start the transfer */
797+
advk_writel(pcie, 1, PIO_ISR);
767798
advk_writel(pcie, 1, PIO_START);
768799

769800
ret = advk_pcie_wait_pio(pcie);

0 commit comments

Comments
 (0)