Skip to content

Commit a89c822

Browse files
maciej-w-rozyckibjorn-helgaas
authored andcommitted
PCI: Work around PCIe link training failures
Attempt to handle cases such as with a downstream port of the ASMedia ASM2824 PCIe switch where link training never completes and the link continues switching between speeds indefinitely with the data link layer never reaching the active state. It has been observed with a downstream port of the ASMedia ASM2824 Gen 3 switch wired to the upstream port of the Pericom PI7C9X2G304 Gen 2 switch, using a Delock Riser Card PCI Express x1 > 2 x PCIe x1 device, P/N 41433, wired to a SiFive HiFive Unmatched board. In this setup the switches should negotiate a link speed of 5.0GT/s, falling back to 2.5GT/s if necessary. Instead the link continues oscillating between the two speeds, at the rate of 34-35 times per second, with link training reported repeatedly active ~84% of the time. Limiting the target link speed to 2.5GT/s with the upstream ASM2824 device makes the two switches communicate correctly. Removing the speed restriction afterwards makes the two devices switch to 5.0GT/s then. Make use of these observations and detect the inability to train the link by checking for the Data Link Layer Link Active status bit being off while the Link Bandwidth Management Status indicating that hardware has changed the link speed or width in an attempt to correct unreliable link operation. Restrict the speed to 2.5GT/s then with the Target Link Speed field, request a retrain and wait 200ms for the data link to go up. If this is successful, lift the restriction, letting the devices negotiate a higher speed. Also check for a 2.5GT/s speed restriction the firmware may have already arranged and lift it too with ports of devices known to continue working afterwards (currently only ASM2824), that already report their data link being up. [bhelgaas: reorder and squash stubs from https://lore.kernel.org/r/[email protected] to avoid adding stubs that do nothing] Link: https://lore.kernel.org/r/[email protected]/ Link: https://source.denx.de/u-boot/u-boot/-/commit/a398a51ccc68 Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Maciej W. Rozycki <[email protected]> Signed-off-by: Bjorn Helgaas <[email protected]>
1 parent 7604bc2 commit a89c822

File tree

4 files changed

+102
-0
lines changed

4 files changed

+102
-0
lines changed

drivers/pci/pci.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4951,6 +4951,8 @@ static bool pcie_wait_for_link_delay(struct pci_dev *pdev, bool active,
49514951
if (active)
49524952
msleep(20);
49534953
ret = pcie_wait_for_link_status(pdev, false, active);
4954+
if (active && !ret)
4955+
ret = pcie_failed_link_retrain(pdev);
49544956
if (active && ret)
49554957
msleep(delay);
49564958

drivers/pci/pci.h

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -543,6 +543,7 @@ void pci_acs_init(struct pci_dev *dev);
543543
int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
544544
int pci_dev_specific_enable_acs(struct pci_dev *dev);
545545
int pci_dev_specific_disable_acs_redir(struct pci_dev *dev);
546+
bool pcie_failed_link_retrain(struct pci_dev *dev);
546547
#else
547548
static inline int pci_dev_specific_acs_enabled(struct pci_dev *dev,
548549
u16 acs_flags)
@@ -557,6 +558,10 @@ static inline int pci_dev_specific_disable_acs_redir(struct pci_dev *dev)
557558
{
558559
return -ENOTTY;
559560
}
561+
static inline bool pcie_failed_link_retrain(struct pci_dev *dev)
562+
{
563+
return false;
564+
}
560565
#endif
561566

562567
/* PCI error reporting and recovery */

drivers/pci/probe.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2550,6 +2550,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
25502550
dma_set_max_seg_size(&dev->dev, 65536);
25512551
dma_set_seg_boundary(&dev->dev, 0xffffffff);
25522552

2553+
pcie_failed_link_retrain(dev);
2554+
25532555
/* Fix up broken headers */
25542556
pci_fixup_device(pci_fixup_header, dev);
25552557

drivers/pci/quirks.c

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,99 @@
3333
#include <linux/switchtec.h>
3434
#include "pci.h"
3535

36+
/*
37+
* Retrain the link of a downstream PCIe port by hand if necessary.
38+
*
39+
* This is needed at least where a downstream port of the ASMedia ASM2824
40+
* Gen 3 switch is wired to the upstream port of the Pericom PI7C9X2G304
41+
* Gen 2 switch, and observed with the Delock Riser Card PCI Express x1 >
42+
* 2 x PCIe x1 device, P/N 41433, plugged into the SiFive HiFive Unmatched
43+
* board.
44+
*
45+
* In such a configuration the switches are supposed to negotiate the link
46+
* speed of preferably 5.0GT/s, falling back to 2.5GT/s. However the link
47+
* continues switching between the two speeds indefinitely and the data
48+
* link layer never reaches the active state, with link training reported
49+
* repeatedly active ~84% of the time. Forcing the target link speed to
50+
* 2.5GT/s with the upstream ASM2824 device makes the two switches talk to
51+
* each other correctly however. And more interestingly retraining with a
52+
* higher target link speed afterwards lets the two successfully negotiate
53+
* 5.0GT/s.
54+
*
55+
* With the ASM2824 we can rely on the otherwise optional Data Link Layer
56+
* Link Active status bit and in the failed link training scenario it will
57+
* be off along with the Link Bandwidth Management Status indicating that
58+
* hardware has changed the link speed or width in an attempt to correct
59+
* unreliable link operation. For a port that has been left unconnected
60+
* both bits will be clear. So use this information to detect the problem
61+
* rather than polling the Link Training bit and watching out for flips or
62+
* at least the active status.
63+
*
64+
* Since the exact nature of the problem isn't known and in principle this
65+
* could trigger where an ASM2824 device is downstream rather upstream,
66+
* apply this erratum workaround to any downstream ports as long as they
67+
* support Link Active reporting and have the Link Control 2 register.
68+
* Restrict the speed to 2.5GT/s then with the Target Link Speed field,
69+
* request a retrain and wait 200ms for the data link to go up.
70+
*
71+
* If this turns out successful and we know by the Vendor:Device ID it is
72+
* safe to do so, then lift the restriction, letting the devices negotiate
73+
* a higher speed. Also check for a similar 2.5GT/s speed restriction the
74+
* firmware may have already arranged and lift it with ports that already
75+
* report their data link being up.
76+
*
77+
* Return TRUE if the link has been successfully retrained, otherwise FALSE.
78+
*/
79+
bool pcie_failed_link_retrain(struct pci_dev *dev)
80+
{
81+
static const struct pci_device_id ids[] = {
82+
{ PCI_VDEVICE(ASMEDIA, 0x2824) }, /* ASMedia ASM2824 */
83+
{}
84+
};
85+
u16 lnksta, lnkctl2;
86+
87+
if (!pci_is_pcie(dev) || !pcie_downstream_port(dev) ||
88+
!pcie_cap_has_lnkctl2(dev) || !dev->link_active_reporting)
89+
return false;
90+
91+
pcie_capability_read_word(dev, PCI_EXP_LNKCTL2, &lnkctl2);
92+
pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
93+
if ((lnksta & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_DLLLA)) ==
94+
PCI_EXP_LNKSTA_LBMS) {
95+
pci_info(dev, "broken device, retraining non-functional downstream link at 2.5GT/s\n");
96+
97+
lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
98+
lnkctl2 |= PCI_EXP_LNKCTL2_TLS_2_5GT;
99+
pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
100+
101+
if (!pcie_retrain_link(dev, false)) {
102+
pci_info(dev, "retraining failed\n");
103+
return false;
104+
}
105+
106+
pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &lnksta);
107+
}
108+
109+
if ((lnksta & PCI_EXP_LNKSTA_DLLLA) &&
110+
(lnkctl2 & PCI_EXP_LNKCTL2_TLS) == PCI_EXP_LNKCTL2_TLS_2_5GT &&
111+
pci_match_id(ids, dev)) {
112+
u32 lnkcap;
113+
114+
pci_info(dev, "removing 2.5GT/s downstream link speed restriction\n");
115+
pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
116+
lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
117+
lnkctl2 |= lnkcap & PCI_EXP_LNKCAP_SLS;
118+
pcie_capability_write_word(dev, PCI_EXP_LNKCTL2, lnkctl2);
119+
120+
if (!pcie_retrain_link(dev, false)) {
121+
pci_info(dev, "retraining failed\n");
122+
return false;
123+
}
124+
}
125+
126+
return true;
127+
}
128+
36129
static ktime_t fixup_debug_start(struct pci_dev *dev,
37130
void (*fn)(struct pci_dev *dev))
38131
{

0 commit comments

Comments
 (0)