Skip to content

Commit 897e860

Browse files
halil-pasicPaolo Abeni
authored andcommitted
s390/ism: fix concurrency management in ism_cmd()
The s390x ISM device data sheet clearly states that only one request-response sequence is allowable per ISM function at any point in time. Unfortunately as of today the s390/ism driver in Linux does not honor that requirement. This patch aims to rectify that. This problem was discovered based on Aliaksei's bug report which states that for certain workloads the ISM functions end up entering error state (with PEC 2 as seen from the logs) after a while and as a consequence connections handled by the respective function break, and for future connection requests the ISM device is not considered -- given it is in a dysfunctional state. During further debugging PEC 3A was observed as well. A kernel message like [ 1211.244319] zpci: 061a:00:00.0: Event 0x2 reports an error for PCI function 0x61a is a reliable indicator of the stated function entering error state with PEC 2. Let me also point out that a kernel message like [ 1211.244325] zpci: 061a:00:00.0: The ism driver bound to the device does not support error recovery is a reliable indicator that the ISM function won't be auto-recovered because the ISM driver currently lacks support for it. On a technical level, without this synchronization, commands (inputs to the FW) may be partially or fully overwritten (corrupted) by another CPU trying to issue commands on the same function. There is hard evidence that this can lead to DMB token values being used as DMB IOVAs, leading to PEC 2 PCI events indicating invalid DMA. But this is only one of the failure modes imaginable. In theory even completely losing one command and executing another one twice and then trying to interpret the outputs as if the command we intended to execute was actually executed and not the other one is also possible. Frankly, I don't feel confident about providing an exhaustive list of possible consequences. Fixes: 684b89b ("s390/ism: add device driver for internal shared memory") Reported-by: Aliaksei Makarau <[email protected]> Tested-by: Mahanta Jambigi <[email protected]> Tested-by: Aliaksei Makarau <[email protected]> Signed-off-by: Halil Pasic <[email protected]> Reviewed-by: Alexandra Winter <[email protected]> Signed-off-by: Alexandra Winter <[email protected]> Reviewed-by: Simon Horman <[email protected]> Link: https://patch.msgid.link/[email protected] Signed-off-by: Paolo Abeni <[email protected]>
1 parent 8694138 commit 897e860

File tree

2 files changed

+4
-0
lines changed

2 files changed

+4
-0
lines changed

drivers/s390/net/ism_drv.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -130,6 +130,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
130130
struct ism_req_hdr *req = cmd;
131131
struct ism_resp_hdr *resp = cmd;
132132

133+
spin_lock(&ism->cmd_lock);
133134
__ism_write_cmd(ism, req + 1, sizeof(*req), req->len - sizeof(*req));
134135
__ism_write_cmd(ism, req, 0, sizeof(*req));
135136

@@ -143,6 +144,7 @@ static int ism_cmd(struct ism_dev *ism, void *cmd)
143144
}
144145
__ism_read_cmd(ism, resp + 1, sizeof(*resp), resp->len - sizeof(*resp));
145146
out:
147+
spin_unlock(&ism->cmd_lock);
146148
return resp->ret;
147149
}
148150

@@ -606,6 +608,7 @@ static int ism_probe(struct pci_dev *pdev, const struct pci_device_id *id)
606608
return -ENOMEM;
607609

608610
spin_lock_init(&ism->lock);
611+
spin_lock_init(&ism->cmd_lock);
609612
dev_set_drvdata(&pdev->dev, ism);
610613
ism->pdev = pdev;
611614
ism->dev.parent = &pdev->dev;

include/linux/ism.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@ struct ism_dmb {
2828

2929
struct ism_dev {
3030
spinlock_t lock; /* protects the ism device */
31+
spinlock_t cmd_lock; /* serializes cmds */
3132
struct list_head list;
3233
struct pci_dev *pdev;
3334

0 commit comments

Comments
 (0)