Skip to content

Commit ec14325

Browse files
author
Alexei Starovoitov
committed
Merge branch 'xdp-metadata-via-kfuncs-for-ice-vlan-hint'
Larysa Zaremba says: ==================== XDP metadata via kfuncs for ice + VLAN hint This series introduces XDP hints via kfuncs [0] to the ice driver. Series brings the following existing hints to the ice driver: - HW timestamp - RX hash with type Series also introduces VLAN tag with protocol XDP hint, it now be accessed by XDP and userspace (AF_XDP) programs. They can also be checked with xdp_metadata test and xdp_hw_metadata program. Impact of these patches on ice performance: ZC: * Full hints implementation decreases pps in ZC mode by less than 3% (64B, rxdrop) skb (packets with invalid IP, dropped by stack): * Overall, patchset improves peak performance in skb mode by about 0.5% [0] https://patchwork.kernel.org/project/netdevbpf/cover/[email protected]/ v7: https://lore.kernel.org/bpf/[email protected]/ v6: https://lore.kernel.org/bpf/[email protected]/ Intermediate RFC v2: https://lore.kernel.org/bpf/[email protected]/ Intermediate RFC v1: https://lore.kernel.org/bpf/[email protected]/ v5: https://lore.kernel.org/bpf/[email protected]/ v4: https://lore.kernel.org/bpf/[email protected]/ v3: https://lore.kernel.org/bpf/[email protected]/ v2: https://lore.kernel.org/bpf/[email protected]/ v1: https://lore.kernel.org/all/[email protected]/ Changes since v7: * shorten timestamp assignment in ice * change first argument of ice_fill_rx_descs back to xsk_buff_pool * fix kernel-doc for ice_run_xdp_zc * add missing XSK_CHECK_PRIV_TYPE() in ice * resolved selftests merge conflicts with TX hints * AF_INET patch adds new packet generation, not replaces AF_XDP one * fix destination port in xdp_metadata Changes since v6: * add ability to fill cb of all xdp_buffs in xsk_buff_pool * place just pointer to packet context in ice_xdp_buff * add const qualifiers in veth implementation * generate uapi for VLAN hint Changes since v5: * drop checksum hint from the patchset entirely * Alex's patch that lifts the data_meta size limitation is no longer required in this patchset, so will be sent separately * new patch: hide some ice hints code behind a static key * fix several bugs in ZC mode (ice) * change argument order in VLAN hint kfunc (tci, proto -> proto, tci) * cosmetic changes * analyze performance impact Changes since v4: * Drop the concept of partial checksum from the hint design * Drop the concept of checksum level from the hint design Changes since v3: * use XDP_CHECKSUM_VALID_LVL0 + csum_level instead of csum_level + 1 * fix spelling mistakes * read XDP timestamp unconditionally * add TO_STR() macro Changes since v2: * redesign checksum hint, so now it gives full status * rename vlan_tag -> vlan_tci, where applicable * use open_netns() and close_netns() in xdp_metadata * improve VLAN hint documentation * replace CFI with DEI * use VLAN_VID_MASK in xdp_metadata * make vlan_get_tag() return -ENODATA * remove unused rx_ptype in ice_xsk.c * fix ice timestamp code division between patches Changes since v1: * directly return RX hash, RX timestamp and RX checksum status in skb-common functions * use intermediate enum value for checksum status in ice * get rid of ring structure dependency in ice kfunc implementation * make variables const, when possible, in ice implementation * use -ENODATA instead of -EOPNOTSUPP for driver implementation * instead of having 2 separate functions for c-tag and s-tag, use 1 function that outputs both VLAN tag and protocol ID * improve documentation for introduced hints * update xdp_metadata selftest to test new hints * implement new hints in veth, so they can be tested in xdp_metadata * parse VLAN tag in xdp_hw_metadata ==================== Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Alexei Starovoitov <[email protected]>
2 parents 7337632 + 4c6612f commit ec14325

File tree

31 files changed

+850
-309
lines changed

31 files changed

+850
-309
lines changed

Documentation/netlink/specs/netdev.yaml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,10 @@ definitions:
5454
name: hash
5555
doc:
5656
Device is capable of exposing receive packet hash via bpf_xdp_metadata_rx_hash().
57+
-
58+
name: vlan-tag
59+
doc:
60+
Device is capable of exposing receive packet VLAN tag via bpf_xdp_metadata_rx_vlan_tag().
5761
-
5862
type: flags
5963
name: xsk-flags

Documentation/networking/xdp-rx-metadata.rst

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,13 @@ Currently, the following kfuncs are supported. In the future, as more
2020
metadata is supported, this set will grow:
2121

2222
.. kernel-doc:: net/core/xdp.c
23-
:identifiers: bpf_xdp_metadata_rx_timestamp bpf_xdp_metadata_rx_hash
23+
:identifiers: bpf_xdp_metadata_rx_timestamp
24+
25+
.. kernel-doc:: net/core/xdp.c
26+
:identifiers: bpf_xdp_metadata_rx_hash
27+
28+
.. kernel-doc:: net/core/xdp.c
29+
:identifiers: bpf_xdp_metadata_rx_vlan_tag
2430

2531
An XDP program can use these kfuncs to read the metadata into stack
2632
variables for its own consumption. Or, to pass the metadata on to other

drivers/net/ethernet/intel/ice/ice.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -996,4 +996,6 @@ static inline void ice_clear_rdma_cap(struct ice_pf *pf)
996996
set_bit(ICE_FLAG_UNPLUG_AUX_DEV, pf->flags);
997997
clear_bit(ICE_FLAG_RDMA_ENA, pf->flags);
998998
}
999+
1000+
extern const struct xdp_metadata_ops ice_xdp_md_ops;
9991001
#endif /* _ICE_H_ */

drivers/net/ethernet/intel/ice/ice_base.c

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -519,6 +519,19 @@ static int ice_setup_rx_ctx(struct ice_rx_ring *ring)
519519
return 0;
520520
}
521521

522+
static void ice_xsk_pool_fill_cb(struct ice_rx_ring *ring)
523+
{
524+
void *ctx_ptr = &ring->pkt_ctx;
525+
struct xsk_cb_desc desc = {};
526+
527+
XSK_CHECK_PRIV_TYPE(struct ice_xdp_buff);
528+
desc.src = &ctx_ptr;
529+
desc.off = offsetof(struct ice_xdp_buff, pkt_ctx) -
530+
sizeof(struct xdp_buff);
531+
desc.bytes = sizeof(ctx_ptr);
532+
xsk_pool_fill_cb(ring->xsk_pool, &desc);
533+
}
534+
522535
/**
523536
* ice_vsi_cfg_rxq - Configure an Rx queue
524537
* @ring: the ring being configured
@@ -553,6 +566,7 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
553566
if (err)
554567
return err;
555568
xsk_pool_set_rxq_info(ring->xsk_pool, &ring->xdp_rxq);
569+
ice_xsk_pool_fill_cb(ring);
556570

557571
dev_info(dev, "Registered XDP mem model MEM_TYPE_XSK_BUFF_POOL on Rx ring %d\n",
558572
ring->q_index);
@@ -575,6 +589,7 @@ int ice_vsi_cfg_rxq(struct ice_rx_ring *ring)
575589

576590
xdp_init_buff(&ring->xdp, ice_rx_pg_size(ring) / 2, &ring->xdp_rxq);
577591
ring->xdp.data = NULL;
592+
ring->xdp_ext.pkt_ctx = &ring->pkt_ctx;
578593
err = ice_setup_rx_ctx(ring);
579594
if (err) {
580595
dev_err(dev, "ice_setup_rx_ctx failed for RxQ %d, err %d\n",

drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h

Lines changed: 208 additions & 204 deletions
Large diffs are not rendered by default.

drivers/net/ethernet/intel/ice/ice_main.c

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3397,6 +3397,7 @@ static void ice_set_ops(struct ice_vsi *vsi)
33973397

33983398
netdev->netdev_ops = &ice_netdev_ops;
33993399
netdev->udp_tunnel_nic_info = &pf->hw.udp_tunnel_nic;
3400+
netdev->xdp_metadata_ops = &ice_xdp_md_ops;
34003401
ice_set_ethtool_ops(netdev);
34013402

34023403
if (vsi->type != ICE_VSI_PF)
@@ -6042,6 +6043,23 @@ ice_fix_features(struct net_device *netdev, netdev_features_t features)
60426043
return features;
60436044
}
60446045

6046+
/**
6047+
* ice_set_rx_rings_vlan_proto - update rings with new stripped VLAN proto
6048+
* @vsi: PF's VSI
6049+
* @vlan_ethertype: VLAN ethertype (802.1Q or 802.1ad) in network byte order
6050+
*
6051+
* Store current stripped VLAN proto in ring packet context,
6052+
* so it can be accessed more efficiently by packet processing code.
6053+
*/
6054+
static void
6055+
ice_set_rx_rings_vlan_proto(struct ice_vsi *vsi, __be16 vlan_ethertype)
6056+
{
6057+
u16 i;
6058+
6059+
ice_for_each_alloc_rxq(vsi, i)
6060+
vsi->rx_rings[i]->pkt_ctx.vlan_proto = vlan_ethertype;
6061+
}
6062+
60456063
/**
60466064
* ice_set_vlan_offload_features - set VLAN offload features for the PF VSI
60476065
* @vsi: PF's VSI
@@ -6084,6 +6102,9 @@ ice_set_vlan_offload_features(struct ice_vsi *vsi, netdev_features_t features)
60846102
if (strip_err || insert_err)
60856103
return -EIO;
60866104

6105+
ice_set_rx_rings_vlan_proto(vsi, enable_stripping ?
6106+
htons(vlan_ethertype) : 0);
6107+
60876108
return 0;
60886109
}
60896110

drivers/net/ethernet/intel/ice/ice_ptp.c

Lines changed: 8 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -2127,30 +2127,26 @@ int ice_ptp_set_ts_config(struct ice_pf *pf, struct ifreq *ifr)
21272127
}
21282128

21292129
/**
2130-
* ice_ptp_rx_hwtstamp - Check for an Rx timestamp
2131-
* @rx_ring: Ring to get the VSI info
2130+
* ice_ptp_get_rx_hwts - Get packet Rx timestamp in ns
21322131
* @rx_desc: Receive descriptor
2133-
* @skb: Particular skb to send timestamp with
2132+
* @pkt_ctx: Packet context to get the cached time
21342133
*
21352134
* The driver receives a notification in the receive descriptor with timestamp.
2136-
* The timestamp is in ns, so we must convert the result first.
21372135
*/
2138-
void
2139-
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
2140-
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb)
2136+
u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
2137+
const struct ice_pkt_ctx *pkt_ctx)
21412138
{
2142-
struct skb_shared_hwtstamps *hwtstamps;
21432139
u64 ts_ns, cached_time;
21442140
u32 ts_high;
21452141

21462142
if (!(rx_desc->wb.time_stamp_low & ICE_PTP_TS_VALID))
2147-
return;
2143+
return 0;
21482144

2149-
cached_time = READ_ONCE(rx_ring->cached_phctime);
2145+
cached_time = READ_ONCE(pkt_ctx->cached_phctime);
21502146

21512147
/* Do not report a timestamp if we don't have a cached PHC time */
21522148
if (!cached_time)
2153-
return;
2149+
return 0;
21542150

21552151
/* Use ice_ptp_extend_32b_ts directly, using the ring-specific cached
21562152
* PHC value, rather than accessing the PF. This also allows us to
@@ -2161,9 +2157,7 @@ ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
21612157
ts_high = le32_to_cpu(rx_desc->wb.flex_ts.ts_high);
21622158
ts_ns = ice_ptp_extend_32b_ts(cached_time, ts_high);
21632159

2164-
hwtstamps = skb_hwtstamps(skb);
2165-
memset(hwtstamps, 0, sizeof(*hwtstamps));
2166-
hwtstamps->hwtstamp = ns_to_ktime(ts_ns);
2160+
return ts_ns;
21672161
}
21682162

21692163
/**

drivers/net/ethernet/intel/ice/ice_ptp.h

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -298,9 +298,8 @@ void ice_ptp_extts_event(struct ice_pf *pf);
298298
s8 ice_ptp_request_ts(struct ice_ptp_tx *tx, struct sk_buff *skb);
299299
enum ice_tx_tstamp_work ice_ptp_process_ts(struct ice_pf *pf);
300300

301-
void
302-
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
303-
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb);
301+
u64 ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
302+
const struct ice_pkt_ctx *pkt_ctx);
304303
void ice_ptp_reset(struct ice_pf *pf);
305304
void ice_ptp_prepare_for_reset(struct ice_pf *pf);
306305
void ice_ptp_init(struct ice_pf *pf);
@@ -329,9 +328,14 @@ static inline bool ice_ptp_process_ts(struct ice_pf *pf)
329328
{
330329
return true;
331330
}
332-
static inline void
333-
ice_ptp_rx_hwtstamp(struct ice_rx_ring *rx_ring,
334-
union ice_32b_rx_flex_desc *rx_desc, struct sk_buff *skb) { }
331+
332+
static inline u64
333+
ice_ptp_get_rx_hwts(const union ice_32b_rx_flex_desc *rx_desc,
334+
const struct ice_pkt_ctx *pkt_ctx)
335+
{
336+
return 0;
337+
}
338+
335339
static inline void ice_ptp_reset(struct ice_pf *pf) { }
336340
static inline void ice_ptp_prepare_for_reset(struct ice_pf *pf) { }
337341
static inline void ice_ptp_init(struct ice_pf *pf) { }

drivers/net/ethernet/intel/ice/ice_txrx.c

Lines changed: 9 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -557,20 +557,23 @@ ice_rx_frame_truesize(struct ice_rx_ring *rx_ring, const unsigned int size)
557557
* @xdp_prog: XDP program to run
558558
* @xdp_ring: ring to be used for XDP_TX action
559559
* @rx_buf: Rx buffer to store the XDP action
560+
* @eop_desc: Last descriptor in packet to read metadata from
560561
*
561562
* Returns any of ICE_XDP_{PASS, CONSUMED, TX, REDIR}
562563
*/
563564
static void
564565
ice_run_xdp(struct ice_rx_ring *rx_ring, struct xdp_buff *xdp,
565566
struct bpf_prog *xdp_prog, struct ice_tx_ring *xdp_ring,
566-
struct ice_rx_buf *rx_buf)
567+
struct ice_rx_buf *rx_buf, union ice_32b_rx_flex_desc *eop_desc)
567568
{
568569
unsigned int ret = ICE_XDP_PASS;
569570
u32 act;
570571

571572
if (!xdp_prog)
572573
goto exit;
573574

575+
ice_xdp_meta_set_desc(xdp, eop_desc);
576+
574577
act = bpf_prog_run_xdp(xdp_prog, xdp);
575578
switch (act) {
576579
case XDP_PASS:
@@ -1180,8 +1183,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
11801183
struct sk_buff *skb;
11811184
unsigned int size;
11821185
u16 stat_err_bits;
1183-
u16 vlan_tag = 0;
1184-
u16 rx_ptype;
1186+
u16 vlan_tci;
11851187

11861188
/* get the Rx desc from Rx ring based on 'next_to_clean' */
11871189
rx_desc = ICE_RX_DESC(rx_ring, ntc);
@@ -1241,7 +1243,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
12411243
if (ice_is_non_eop(rx_ring, rx_desc))
12421244
continue;
12431245

1244-
ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf);
1246+
ice_run_xdp(rx_ring, xdp, xdp_prog, xdp_ring, rx_buf, rx_desc);
12451247
if (rx_buf->act == ICE_XDP_PASS)
12461248
goto construct_skb;
12471249
total_rx_bytes += xdp_get_buff_len(xdp);
@@ -1276,7 +1278,7 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
12761278
continue;
12771279
}
12781280

1279-
vlan_tag = ice_get_vlan_tag_from_rx_desc(rx_desc);
1281+
vlan_tci = ice_get_vlan_tci(rx_desc);
12801282

12811283
/* pad the skb if needed, to make a valid ethernet frame */
12821284
if (eth_skb_pad(skb))
@@ -1286,14 +1288,11 @@ int ice_clean_rx_irq(struct ice_rx_ring *rx_ring, int budget)
12861288
total_rx_bytes += skb->len;
12871289

12881290
/* populate checksum, VLAN, and protocol */
1289-
rx_ptype = le16_to_cpu(rx_desc->wb.ptype_flex_flags0) &
1290-
ICE_RX_FLEX_DESC_PTYPE_M;
1291-
1292-
ice_process_skb_fields(rx_ring, rx_desc, skb, rx_ptype);
1291+
ice_process_skb_fields(rx_ring, rx_desc, skb);
12931292

12941293
ice_trace(clean_rx_irq_indicate, rx_ring, rx_desc, skb);
12951294
/* send completed skb up the stack */
1296-
ice_receive_skb(rx_ring, skb, vlan_tag);
1295+
ice_receive_skb(rx_ring, skb, vlan_tci);
12971296

12981297
/* update budget accounting */
12991298
total_rx_pkts++;

drivers/net/ethernet/intel/ice/ice_txrx.h

Lines changed: 28 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -257,6 +257,20 @@ enum ice_rx_dtype {
257257
ICE_RX_DTYPE_SPLIT_ALWAYS = 2,
258258
};
259259

260+
struct ice_pkt_ctx {
261+
u64 cached_phctime;
262+
__be16 vlan_proto;
263+
};
264+
265+
struct ice_xdp_buff {
266+
struct xdp_buff xdp_buff;
267+
const union ice_32b_rx_flex_desc *eop_desc;
268+
const struct ice_pkt_ctx *pkt_ctx;
269+
};
270+
271+
/* Required for compatibility with xdp_buffs from xsk_pool */
272+
static_assert(offsetof(struct ice_xdp_buff, xdp_buff) == 0);
273+
260274
/* indices into GLINT_ITR registers */
261275
#define ICE_RX_ITR ICE_IDX_ITR0
262276
#define ICE_TX_ITR ICE_IDX_ITR1
@@ -298,7 +312,6 @@ enum ice_dynamic_itr {
298312
/* descriptor ring, associated with a VSI */
299313
struct ice_rx_ring {
300314
/* CL1 - 1st cacheline starts here */
301-
struct ice_rx_ring *next; /* pointer to next ring in q_vector */
302315
void *desc; /* Descriptor ring memory */
303316
struct device *dev; /* Used for DMA mapping */
304317
struct net_device *netdev; /* netdev ring maps to */
@@ -310,13 +323,24 @@ struct ice_rx_ring {
310323
u16 count; /* Number of descriptors */
311324
u16 reg_idx; /* HW register index of the ring */
312325
u16 next_to_alloc;
313-
/* CL2 - 2nd cacheline starts here */
326+
314327
union {
315328
struct ice_rx_buf *rx_buf;
316329
struct xdp_buff **xdp_buf;
317330
};
318-
struct xdp_buff xdp;
331+
/* CL2 - 2nd cacheline starts here */
332+
union {
333+
struct ice_xdp_buff xdp_ext;
334+
struct xdp_buff xdp;
335+
};
319336
/* CL3 - 3rd cacheline starts here */
337+
union {
338+
struct ice_pkt_ctx pkt_ctx;
339+
struct {
340+
u64 cached_phctime;
341+
__be16 vlan_proto;
342+
};
343+
};
320344
struct bpf_prog *xdp_prog;
321345
u16 rx_offset;
322346

@@ -332,9 +356,9 @@ struct ice_rx_ring {
332356
/* CL4 - 4th cacheline starts here */
333357
struct ice_channel *ch;
334358
struct ice_tx_ring *xdp_ring;
359+
struct ice_rx_ring *next; /* pointer to next ring in q_vector */
335360
struct xsk_buff_pool *xsk_pool;
336361
dma_addr_t dma; /* physical address of ring */
337-
u64 cached_phctime;
338362
u16 rx_buf_len;
339363
u8 dcb_tc; /* Traffic class of ring */
340364
u8 ptp_rx;

0 commit comments

Comments
 (0)