Skip to content

Commit 7b589a9

Browse files
dhowellsbrauner
authored andcommitted
netfs: Fix handling of USE_PGPRIV2 and WRITE_TO_CACHE flags
The NETFS_RREQ_USE_PGPRIV2 and NETFS_RREQ_WRITE_TO_CACHE flags aren't used correctly. The problem is that we try to set them up in the request initialisation, but we the cache may be in the process of setting up still, and so the state may not be correct. Further, we secondarily sample the cache state and make contradictory decisions later. The issue arises because we set up the cache resources, which allows the cache's ->prepare_read() to switch on NETFS_SREQ_COPY_TO_CACHE - which triggers cache writing even if we didn't set the flags when allocating. Fix this in the following way: (1) Drop NETFS_ICTX_USE_PGPRIV2 and instead set NETFS_RREQ_USE_PGPRIV2 in ->init_request() rather than trying to juggle that in netfs_alloc_request(). (2) Repurpose NETFS_RREQ_USE_PGPRIV2 to merely indicate that if caching is to be done, then PG_private_2 is to be used rather than only setting it if we decide to cache and then having netfs_rreq_unlock_folios() set the non-PG_private_2 writeback-to-cache if it wasn't set. (3) Split netfs_rreq_unlock_folios() into two functions, one of which contains the deprecated code for using PG_private_2 to avoid accidentally doing the writeback path - and always use it if USE_PGPRIV2 is set. (4) As NETFS_ICTX_USE_PGPRIV2 is removed, make netfs_write_begin() always wait for PG_private_2. This function is deprecated and only used by ceph anyway, and so label it so. (5) Drop the NETFS_RREQ_WRITE_TO_CACHE flag and use fscache_operation_valid() on the cache_resources instead. This has the advantage of picking up the result of netfs_begin_cache_read() and fscache_begin_write_operation() - which are called after the object is initialised and will wait for the cache to come to a usable state. Just reverting ae67831[1] isn't a sufficient fix, so this need to be applied on top of that. Without this as well, things like: rcu: INFO: rcu_sched detected expedited stalls on CPUs/tasks: { and: WARNING: CPU: 13 PID: 3621 at fs/ceph/caps.c:3386 may happen, along with some UAFs due to PG_private_2 not getting used to wait on writeback completion. Fixes: 2ff1e97 ("netfs: Replace PG_fscache by setting folio->private and marking dirty") Reported-by: Max Kellermann <[email protected]> Signed-off-by: David Howells <[email protected]> cc: Ilya Dryomov <[email protected]> cc: Xiubo Li <[email protected]> cc: Hristo Venev <[email protected]> cc: Jeff Layton <[email protected]> cc: Matthew Wilcox <[email protected]> cc: [email protected] cc: [email protected] cc: [email protected] cc: [email protected] Link: https://lore.kernel.org/r/[email protected]/ [1] Link: https://lore.kernel.org/r/[email protected] Signed-off-by: Christian Brauner <[email protected]>
1 parent 8e5ced7 commit 7b589a9

File tree

9 files changed

+116
-36
lines changed

9 files changed

+116
-36
lines changed

fs/ceph/addr.c

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -424,6 +424,9 @@ static int ceph_init_request(struct netfs_io_request *rreq, struct file *file)
424424
struct ceph_netfs_request_data *priv;
425425
int ret = 0;
426426

427+
/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
428+
__set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
429+
427430
if (rreq->origin != NETFS_READAHEAD)
428431
return 0;
429432

fs/ceph/inode.c

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -577,8 +577,6 @@ struct inode *ceph_alloc_inode(struct super_block *sb)
577577

578578
/* Set parameters for the netfs library */
579579
netfs_inode_init(&ci->netfs, &ceph_netfs_ops, false);
580-
/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
581-
__set_bit(NETFS_ICTX_USE_PGPRIV2, &ci->netfs.flags);
582580

583581
spin_lock_init(&ci->i_ceph_lock);
584582

fs/netfs/buffered_read.c

Lines changed: 107 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,97 @@
99
#include <linux/task_io_accounting_ops.h>
1010
#include "internal.h"
1111

12+
/*
13+
* [DEPRECATED] Unlock the folios in a read operation for when the filesystem
14+
* is using PG_private_2 and direct writing to the cache from here rather than
15+
* marking the page for writeback.
16+
*
17+
* Note that we don't touch folio->private in this code.
18+
*/
19+
static void netfs_rreq_unlock_folios_pgpriv2(struct netfs_io_request *rreq,
20+
size_t *account)
21+
{
22+
struct netfs_io_subrequest *subreq;
23+
struct folio *folio;
24+
pgoff_t start_page = rreq->start / PAGE_SIZE;
25+
pgoff_t last_page = ((rreq->start + rreq->len) / PAGE_SIZE) - 1;
26+
bool subreq_failed = false;
27+
28+
XA_STATE(xas, &rreq->mapping->i_pages, start_page);
29+
30+
/* Walk through the pagecache and the I/O request lists simultaneously.
31+
* We may have a mixture of cached and uncached sections and we only
32+
* really want to write out the uncached sections. This is slightly
33+
* complicated by the possibility that we might have huge pages with a
34+
* mixture inside.
35+
*/
36+
subreq = list_first_entry(&rreq->subrequests,
37+
struct netfs_io_subrequest, rreq_link);
38+
subreq_failed = (subreq->error < 0);
39+
40+
trace_netfs_rreq(rreq, netfs_rreq_trace_unlock_pgpriv2);
41+
42+
rcu_read_lock();
43+
xas_for_each(&xas, folio, last_page) {
44+
loff_t pg_end;
45+
bool pg_failed = false;
46+
bool folio_started = false;
47+
48+
if (xas_retry(&xas, folio))
49+
continue;
50+
51+
pg_end = folio_pos(folio) + folio_size(folio) - 1;
52+
53+
for (;;) {
54+
loff_t sreq_end;
55+
56+
if (!subreq) {
57+
pg_failed = true;
58+
break;
59+
}
60+
61+
if (!folio_started &&
62+
test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags) &&
63+
fscache_operation_valid(&rreq->cache_resources)) {
64+
trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
65+
folio_start_private_2(folio);
66+
folio_started = true;
67+
}
68+
69+
pg_failed |= subreq_failed;
70+
sreq_end = subreq->start + subreq->len - 1;
71+
if (pg_end < sreq_end)
72+
break;
73+
74+
*account += subreq->transferred;
75+
if (!list_is_last(&subreq->rreq_link, &rreq->subrequests)) {
76+
subreq = list_next_entry(subreq, rreq_link);
77+
subreq_failed = (subreq->error < 0);
78+
} else {
79+
subreq = NULL;
80+
subreq_failed = false;
81+
}
82+
83+
if (pg_end == sreq_end)
84+
break;
85+
}
86+
87+
if (!pg_failed) {
88+
flush_dcache_folio(folio);
89+
folio_mark_uptodate(folio);
90+
}
91+
92+
if (!test_bit(NETFS_RREQ_DONT_UNLOCK_FOLIOS, &rreq->flags)) {
93+
if (folio->index == rreq->no_unlock_folio &&
94+
test_bit(NETFS_RREQ_NO_UNLOCK_FOLIO, &rreq->flags))
95+
_debug("no unlock");
96+
else
97+
folio_unlock(folio);
98+
}
99+
}
100+
rcu_read_unlock();
101+
}
102+
12103
/*
13104
* Unlock the folios in a read operation. We need to set PG_writeback on any
14105
* folios we're going to write back before we unlock them.
@@ -35,6 +126,12 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
35126
}
36127
}
37128

129+
/* Handle deprecated PG_private_2 case. */
130+
if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
131+
netfs_rreq_unlock_folios_pgpriv2(rreq, &account);
132+
goto out;
133+
}
134+
38135
/* Walk through the pagecache and the I/O request lists simultaneously.
39136
* We may have a mixture of cached and uncached sections and we only
40137
* really want to write out the uncached sections. This is slightly
@@ -52,7 +149,6 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
52149
loff_t pg_end;
53150
bool pg_failed = false;
54151
bool wback_to_cache = false;
55-
bool folio_started = false;
56152

57153
if (xas_retry(&xas, folio))
58154
continue;
@@ -66,17 +162,8 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
66162
pg_failed = true;
67163
break;
68164
}
69-
if (test_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags)) {
70-
if (!folio_started && test_bit(NETFS_SREQ_COPY_TO_CACHE,
71-
&subreq->flags)) {
72-
trace_netfs_folio(folio, netfs_folio_trace_copy_to_cache);
73-
folio_start_private_2(folio);
74-
folio_started = true;
75-
}
76-
} else {
77-
wback_to_cache |=
78-
test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
79-
}
165+
166+
wback_to_cache |= test_bit(NETFS_SREQ_COPY_TO_CACHE, &subreq->flags);
80167
pg_failed |= subreq_failed;
81168
sreq_end = subreq->start + subreq->len - 1;
82169
if (pg_end < sreq_end)
@@ -124,6 +211,7 @@ void netfs_rreq_unlock_folios(struct netfs_io_request *rreq)
124211
}
125212
rcu_read_unlock();
126213

214+
out:
127215
task_io_account_read(account);
128216
if (rreq->netfs_ops->done)
129217
rreq->netfs_ops->done(rreq);
@@ -395,7 +483,7 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
395483
}
396484

397485
/**
398-
* netfs_write_begin - Helper to prepare for writing
486+
* netfs_write_begin - Helper to prepare for writing [DEPRECATED]
399487
* @ctx: The netfs context
400488
* @file: The file to read from
401489
* @mapping: The mapping to read from
@@ -426,6 +514,9 @@ static bool netfs_skip_folio_read(struct folio *folio, loff_t pos, size_t len,
426514
* inode before calling this.
427515
*
428516
* This is usable whether or not caching is enabled.
517+
*
518+
* Note that this should be considered deprecated and netfs_perform_write()
519+
* used instead.
429520
*/
430521
int netfs_write_begin(struct netfs_inode *ctx,
431522
struct file *file, struct address_space *mapping,
@@ -507,11 +598,9 @@ int netfs_write_begin(struct netfs_inode *ctx,
507598
netfs_put_request(rreq, false, netfs_rreq_trace_put_return);
508599

509600
have_folio:
510-
if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags)) {
511-
ret = folio_wait_private_2_killable(folio);
512-
if (ret < 0)
513-
goto error;
514-
}
601+
ret = folio_wait_private_2_killable(folio);
602+
if (ret < 0)
603+
goto error;
515604
have_folio_no_wait:
516605
*_folio = folio;
517606
_leave(" = 0");

fs/netfs/objects.c

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -24,10 +24,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
2424
struct netfs_io_request *rreq;
2525
mempool_t *mempool = ctx->ops->request_pool ?: &netfs_request_pool;
2626
struct kmem_cache *cache = mempool->pool_data;
27-
bool is_unbuffered = (origin == NETFS_UNBUFFERED_WRITE ||
28-
origin == NETFS_DIO_READ ||
29-
origin == NETFS_DIO_WRITE);
30-
bool cached = !is_unbuffered && netfs_is_cache_enabled(ctx);
3127
int ret;
3228

3329
for (;;) {
@@ -56,12 +52,6 @@ struct netfs_io_request *netfs_alloc_request(struct address_space *mapping,
5652
refcount_set(&rreq->ref, 1);
5753

5854
__set_bit(NETFS_RREQ_IN_PROGRESS, &rreq->flags);
59-
if (cached) {
60-
__set_bit(NETFS_RREQ_WRITE_TO_CACHE, &rreq->flags);
61-
if (test_bit(NETFS_ICTX_USE_PGPRIV2, &ctx->flags))
62-
/* Filesystem uses deprecated PG_private_2 marking. */
63-
__set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
64-
}
6555
if (file && file->f_flags & O_NONBLOCK)
6656
__set_bit(NETFS_RREQ_NONBLOCK, &rreq->flags);
6757
if (rreq->netfs_ops->init_request) {

fs/netfs/write_issue.c

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -94,6 +94,8 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
9494
{
9595
struct netfs_io_request *wreq;
9696
struct netfs_inode *ictx;
97+
bool is_buffered = (origin == NETFS_WRITEBACK ||
98+
origin == NETFS_WRITETHROUGH);
9799

98100
wreq = netfs_alloc_request(mapping, file, start, 0, origin);
99101
if (IS_ERR(wreq))
@@ -102,7 +104,7 @@ struct netfs_io_request *netfs_create_write_req(struct address_space *mapping,
102104
_enter("R=%x", wreq->debug_id);
103105

104106
ictx = netfs_inode(wreq->inode);
105-
if (test_bit(NETFS_RREQ_WRITE_TO_CACHE, &wreq->flags))
107+
if (is_buffered && netfs_is_cache_enabled(ictx))
106108
fscache_begin_write_operation(&wreq->cache_resources, netfs_i_cookie(ictx));
107109

108110
wreq->contiguity = wreq->start;

fs/nfs/fscache.c

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,8 @@ static int nfs_netfs_init_request(struct netfs_io_request *rreq, struct file *fi
265265
{
266266
rreq->netfs_priv = get_nfs_open_context(nfs_file_open_context(file));
267267
rreq->debug_id = atomic_inc_return(&nfs_netfs_debug_id);
268+
/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
269+
__set_bit(NETFS_RREQ_USE_PGPRIV2, &rreq->flags);
268270

269271
return 0;
270272
}

fs/nfs/fscache.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,6 @@ static inline void nfs_netfs_put(struct nfs_netfs_io_data *netfs)
8181
static inline void nfs_netfs_inode_init(struct nfs_inode *nfsi)
8282
{
8383
netfs_inode_init(&nfsi->netfs, &nfs_netfs_ops, false);
84-
/* [DEPRECATED] Use PG_private_2 to mark folio being written to the cache. */
85-
__set_bit(NETFS_ICTX_USE_PGPRIV2, &nfsi->netfs.flags);
8684
}
8785
extern void nfs_netfs_initiate_read(struct nfs_pgio_header *hdr);
8886
extern void nfs_netfs_read_completion(struct nfs_pgio_header *hdr);

include/linux/netfs.h

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -73,8 +73,6 @@ struct netfs_inode {
7373
#define NETFS_ICTX_ODIRECT 0 /* The file has DIO in progress */
7474
#define NETFS_ICTX_UNBUFFERED 1 /* I/O should not use the pagecache */
7575
#define NETFS_ICTX_WRITETHROUGH 2 /* Write-through caching */
76-
#define NETFS_ICTX_USE_PGPRIV2 31 /* [DEPRECATED] Use PG_private_2 to mark
77-
* write to cache on read */
7876
};
7977

8078
/*
@@ -269,7 +267,6 @@ struct netfs_io_request {
269267
#define NETFS_RREQ_DONT_UNLOCK_FOLIOS 3 /* Don't unlock the folios on completion */
270268
#define NETFS_RREQ_FAILED 4 /* The request failed */
271269
#define NETFS_RREQ_IN_PROGRESS 5 /* Unlocked when the request completes */
272-
#define NETFS_RREQ_WRITE_TO_CACHE 7 /* Need to write to the cache */
273270
#define NETFS_RREQ_UPLOAD_TO_SERVER 8 /* Need to write to the server */
274271
#define NETFS_RREQ_NONBLOCK 9 /* Don't block if possible (O_NONBLOCK) */
275272
#define NETFS_RREQ_BLOCKED 10 /* We blocked */

include/trace/events/netfs.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,7 @@
5151
EM(netfs_rreq_trace_resubmit, "RESUBMT") \
5252
EM(netfs_rreq_trace_set_pause, "PAUSE ") \
5353
EM(netfs_rreq_trace_unlock, "UNLOCK ") \
54+
EM(netfs_rreq_trace_unlock_pgpriv2, "UNLCK-2") \
5455
EM(netfs_rreq_trace_unmark, "UNMARK ") \
5556
EM(netfs_rreq_trace_wait_ip, "WAIT-IP") \
5657
EM(netfs_rreq_trace_wait_pause, "WT-PAUS") \

0 commit comments

Comments
 (0)