Skip to content

Commit fe6d269

Browse files
anadavjoergroedel
authored andcommitted
iommu/amd: Tailored gather logic for AMD
AMD's IOMMU can flush efficiently (i.e., in a single flush) any range. This is in contrast, for instnace, to Intel IOMMUs that have a limit on the number of pages that can be flushed in a single flush. In addition, AMD's IOMMU do not care about the page-size, so changes of the page size do not need to trigger a TLB flush. So in most cases, a TLB flush due to disjoint range is not needed for AMD. Yet, vIOMMUs require the hypervisor to synchronize the virtualized IOMMU's PTEs with the physical ones. This process induce overheads, so it is better not to cause unnecessary flushes, i.e., flushes of PTEs that were not modified. Implement and use amd_iommu_iotlb_gather_add_page() and use it instead of the generic iommu_iotlb_gather_add_page(). Ignore disjoint regions unless "non-present cache" feature is reported by the IOMMU capabilities, as this is an indication we are running on a physical IOMMU. A similar indication is used by VT-d (see "caching mode"). The new logic retains the same flushing behavior that we had before the introduction of page-selective IOTLB flushes for AMD. On virtualized environments, check if the newly flushed region and the gathered one are disjoint and flush if it is. Cc: Joerg Roedel <[email protected]> Cc: Will Deacon <[email protected]> Cc: Jiajun Cao <[email protected]> Cc: Lu Baolu <[email protected]> Cc: [email protected] Cc: [email protected]> Reviewed-by: Robin Murphy <[email protected]> Signed-off-by: Nadav Amit <[email protected]> Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Joerg Roedel <[email protected]>
1 parent febb82c commit fe6d269

File tree

1 file changed

+22
-1
lines changed

1 file changed

+22
-1
lines changed

drivers/iommu/amd/iommu.c

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2054,6 +2054,27 @@ static int amd_iommu_map(struct iommu_domain *dom, unsigned long iova,
20542054
return ret;
20552055
}
20562056

2057+
static void amd_iommu_iotlb_gather_add_page(struct iommu_domain *domain,
2058+
struct iommu_iotlb_gather *gather,
2059+
unsigned long iova, size_t size)
2060+
{
2061+
/*
2062+
* AMD's IOMMU can flush as many pages as necessary in a single flush.
2063+
* Unless we run in a virtual machine, which can be inferred according
2064+
* to whether "non-present cache" is on, it is probably best to prefer
2065+
* (potentially) too extensive TLB flushing (i.e., more misses) over
2066+
* mutliple TLB flushes (i.e., more flushes). For virtual machines the
2067+
* hypervisor needs to synchronize the host IOMMU PTEs with those of
2068+
* the guest, and the trade-off is different: unnecessary TLB flushes
2069+
* should be avoided.
2070+
*/
2071+
if (amd_iommu_np_cache &&
2072+
iommu_iotlb_gather_is_disjoint(gather, iova, size))
2073+
iommu_iotlb_sync(domain, gather);
2074+
2075+
iommu_iotlb_gather_add_range(gather, iova, size);
2076+
}
2077+
20572078
static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
20582079
size_t page_size,
20592080
struct iommu_iotlb_gather *gather)
@@ -2068,7 +2089,7 @@ static size_t amd_iommu_unmap(struct iommu_domain *dom, unsigned long iova,
20682089

20692090
r = (ops->unmap) ? ops->unmap(ops, iova, page_size, gather) : 0;
20702091

2071-
iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
2092+
amd_iommu_iotlb_gather_add_page(dom, gather, iova, page_size);
20722093

20732094
return r;
20742095
}

0 commit comments

Comments
 (0)