Skip to content

Commit d6af2ed

Browse files
whu2014Lorenzo Pieralisi
authored andcommitted
PCI: hv: Fix a timing issue which causes kdump to fail occasionally
Kdump could fail sometime on Hyper-V guest because the retry in hv_pci_enter_d0() releases child device structures in hv_pci_bus_exit(). Although there is a second asynchronous device relations message sending from the host, if this message arrives to the guest after hv_send_resource_allocated() is called, the retry would fail. Fix the problem by moving retry to hv_pci_probe() and start the retry from hv_pci_query_relations() call. This will cause a device relations message to arrive to the guest synchronously; the guest would then be able to rebuild the child device structures before calling hv_send_resource_allocated(). Link: https://lore.kernel.org/r/[email protected] Fixes: c81992e ("PCI: hv: Retry PCI bus D0 entry on invalid device state") Signed-off-by: Wei Hu <[email protected]> [[email protected]: fixed a comment and commit log] Signed-off-by: Lorenzo Pieralisi <[email protected]> Reviewed-by: Michael Kelley <[email protected]>
1 parent b3a9e3b commit d6af2ed

File tree

1 file changed

+37
-34
lines changed

1 file changed

+37
-34
lines changed

drivers/pci/controller/pci-hyperv.c

Lines changed: 37 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -2759,10 +2759,8 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
27592759
struct pci_bus_d0_entry *d0_entry;
27602760
struct hv_pci_compl comp_pkt;
27612761
struct pci_packet *pkt;
2762-
bool retry = true;
27632762
int ret;
27642763

2765-
enter_d0_retry:
27662764
/*
27672765
* Tell the host that the bus is ready to use, and moved into the
27682766
* powered-on state. This includes telling the host which region
@@ -2789,38 +2787,6 @@ static int hv_pci_enter_d0(struct hv_device *hdev)
27892787
if (ret)
27902788
goto exit;
27912789

2792-
/*
2793-
* In certain case (Kdump) the pci device of interest was
2794-
* not cleanly shut down and resource is still held on host
2795-
* side, the host could return invalid device status.
2796-
* We need to explicitly request host to release the resource
2797-
* and try to enter D0 again.
2798-
*/
2799-
if (comp_pkt.completion_status < 0 && retry) {
2800-
retry = false;
2801-
2802-
dev_err(&hdev->device, "Retrying D0 Entry\n");
2803-
2804-
/*
2805-
* Hv_pci_bus_exit() calls hv_send_resource_released()
2806-
* to free up resources of its child devices.
2807-
* In the kdump kernel we need to set the
2808-
* wslot_res_allocated to 255 so it scans all child
2809-
* devices to release resources allocated in the
2810-
* normal kernel before panic happened.
2811-
*/
2812-
hbus->wslot_res_allocated = 255;
2813-
2814-
ret = hv_pci_bus_exit(hdev, true);
2815-
2816-
if (ret == 0) {
2817-
kfree(pkt);
2818-
goto enter_d0_retry;
2819-
}
2820-
dev_err(&hdev->device,
2821-
"Retrying D0 failed with ret %d\n", ret);
2822-
}
2823-
28242790
if (comp_pkt.completion_status < 0) {
28252791
dev_err(&hdev->device,
28262792
"PCI Pass-through VSP failed D0 Entry with status %x\n",
@@ -3058,6 +3024,7 @@ static int hv_pci_probe(struct hv_device *hdev,
30583024
struct hv_pcibus_device *hbus;
30593025
u16 dom_req, dom;
30603026
char *name;
3027+
bool enter_d0_retry = true;
30613028
int ret;
30623029

30633030
/*
@@ -3178,11 +3145,47 @@ static int hv_pci_probe(struct hv_device *hdev,
31783145
if (ret)
31793146
goto free_fwnode;
31803147

3148+
retry:
31813149
ret = hv_pci_query_relations(hdev);
31823150
if (ret)
31833151
goto free_irq_domain;
31843152

31853153
ret = hv_pci_enter_d0(hdev);
3154+
/*
3155+
* In certain case (Kdump) the pci device of interest was
3156+
* not cleanly shut down and resource is still held on host
3157+
* side, the host could return invalid device status.
3158+
* We need to explicitly request host to release the resource
3159+
* and try to enter D0 again.
3160+
* Since the hv_pci_bus_exit() call releases structures
3161+
* of all its child devices, we need to start the retry from
3162+
* hv_pci_query_relations() call, requesting host to send
3163+
* the synchronous child device relations message before this
3164+
* information is needed in hv_send_resources_allocated()
3165+
* call later.
3166+
*/
3167+
if (ret == -EPROTO && enter_d0_retry) {
3168+
enter_d0_retry = false;
3169+
3170+
dev_err(&hdev->device, "Retrying D0 Entry\n");
3171+
3172+
/*
3173+
* Hv_pci_bus_exit() calls hv_send_resources_released()
3174+
* to free up resources of its child devices.
3175+
* In the kdump kernel we need to set the
3176+
* wslot_res_allocated to 255 so it scans all child
3177+
* devices to release resources allocated in the
3178+
* normal kernel before panic happened.
3179+
*/
3180+
hbus->wslot_res_allocated = 255;
3181+
ret = hv_pci_bus_exit(hdev, true);
3182+
3183+
if (ret == 0)
3184+
goto retry;
3185+
3186+
dev_err(&hdev->device,
3187+
"Retrying D0 failed with ret %d\n", ret);
3188+
}
31863189
if (ret)
31873190
goto free_irq_domain;
31883191

0 commit comments

Comments
 (0)