Skip to content

Commit 4dc0d12

Browse files
nicolincjgunthorpe
authored andcommitted
iommu/tegra241-cmdqv: Add user-space use support
The CMDQV HW supports a user-space use for virtualization cases. It allows the VM to issue guest-level TLBI or ATC_INV commands directly to the queue and executes them without a VMEXIT, as HW will replace the VMID field in a TLBI command and the SID field in an ATC_INV command with the preset VMID and SID. This is built upon the vIOMMU infrastructure by allowing VMM to allocate a VINTF (as a vIOMMU object) and assign VCMDQs (HW QUEUE objs) to the VINTF. So firstly, replace the standard vSMMU model with the VINTF implementation but reuse the standard cache_invalidate op (for unsupported commands) and the standard alloc_domain_nested op (for standard nested STE). Each VINTF has two 64KB MMIO pages (128B per logical VCMDQ): - Page0 (directly accessed by guest) has all the control and status bits. - Page1 (trapped by VMM) has guest-owned queue memory location/size info. VMM should trap the emulated VINTF0's page1 of the guest VM for the guest- level VCMDQ location/size info and forward that to the kernel to translate to a physical memory location to program the VCMDQ HW during an allocation call. Then, it should mmap the assigned VINTF's page0 to the VINTF0 page0 of the guest VM. This allows the guest OS to read and write the guest-own VINTF's page0 for direct control of the VCMDQ HW. For ATC invalidation commands that hold an SID, it requires all devices to register their virtual SIDs to the SID_MATCH registers and their physical SIDs to the pairing SID_REPLACE registers, so that HW can use those as a lookup table to replace those virtual SIDs with the correct physical SIDs. Thus, implement the driver-allocated vDEVICE op with a tegra241_vintf_sid structure to allocate SID_REPLACE and to program the SIDs accordingly. This enables the HW accelerated feature for NVIDIA Grace CPU. Compared to the standard SMMUv3 operating in the nested translation mode trapping CMDQ for TLBI and ATC_INV commands, this gives a huge performance improvement: 70% to 90% reductions of invalidation time were measured by various DMA unmap tests running in a guest OS. Link: https://patch.msgid.link/r/fb0eab83f529440b6aa181798912a6f0afa21eb0.1752126748.git.nicolinc@nvidia.com Reviewed-by: Jason Gunthorpe <[email protected]> Reviewed-by: Pranjal Shrivastava <[email protected]> Signed-off-by: Nicolin Chen <[email protected]> Signed-off-by: Jason Gunthorpe <[email protected]>
1 parent 81f81db commit 4dc0d12

File tree

4 files changed

+466
-6
lines changed

4 files changed

+466
-6
lines changed

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3-iommufd.c

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -225,7 +225,7 @@ static int arm_smmu_validate_vste(struct iommu_hwpt_arm_smmuv3 *arg,
225225
return 0;
226226
}
227227

228-
static struct iommu_domain *
228+
struct iommu_domain *
229229
arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
230230
const struct iommu_user_data *user_data)
231231
{
@@ -336,8 +336,8 @@ static int arm_vsmmu_convert_user_cmd(struct arm_vsmmu *vsmmu,
336336
return 0;
337337
}
338338

339-
static int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
340-
struct iommu_user_data_array *array)
339+
int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
340+
struct iommu_user_data_array *array)
341341
{
342342
struct arm_vsmmu *vsmmu = container_of(viommu, struct arm_vsmmu, core);
343343
struct arm_smmu_device *smmu = vsmmu->smmu;

drivers/iommu/arm/arm-smmu-v3/arm-smmu-v3.h

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1057,10 +1057,17 @@ int arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,
10571057
void arm_smmu_attach_commit_vmaster(struct arm_smmu_attach_state *state);
10581058
void arm_smmu_master_clear_vmaster(struct arm_smmu_master *master);
10591059
int arm_vmaster_report_event(struct arm_smmu_vmaster *vmaster, u64 *evt);
1060+
struct iommu_domain *
1061+
arm_vsmmu_alloc_domain_nested(struct iommufd_viommu *viommu, u32 flags,
1062+
const struct iommu_user_data *user_data);
1063+
int arm_vsmmu_cache_invalidate(struct iommufd_viommu *viommu,
1064+
struct iommu_user_data_array *array);
10601065
#else
10611066
#define arm_smmu_get_viommu_size NULL
10621067
#define arm_smmu_hw_info NULL
10631068
#define arm_vsmmu_init NULL
1069+
#define arm_vsmmu_alloc_domain_nested NULL
1070+
#define arm_vsmmu_cache_invalidate NULL
10641071

10651072
static inline int
10661073
arm_smmu_attach_prepare_vmaster(struct arm_smmu_attach_state *state,

0 commit comments

Comments
 (0)