Skip to content

Commit 700170b

Browse files
committed
Merge tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs
Pull NFS client updates from Anna Schumaker: "New Features: - Add support for 'dacl' and 'sacl' attributes Bugfixes and Cleanups: - Fixes for reporting mapping errors - Fixes for memory allocation errors - Improve warning message when locks are lost - Update documentation for the nfs4_unique_id parameter - Add an explanation of NFSv4 client identifiers - Ensure the i_size attribute is written to the fscache storage - Fix freeing uninitialized nfs4_labels - Better handling when xprtrdma bc_serv is NULL - Mark qualified async operations as MOVEABLE tasks" * tag 'nfs-for-5.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: NFSv4.1 mark qualified async operations as MOVEABLE tasks xprtrdma: treat all calls not a bcall when bc_serv is NULL NFSv4: Fix free of uninitialized nfs4_label on referral lookup. NFS: Pass i_size to fscache_unuse_cookie() when a file is released Documentation: Add an explanation of NFSv4 client identifiers NFS: update documentation for the nfs4_unique_id parameter NFS: Improve warning message when locks are lost. NFSv4.1: Enable access to the NFSv4.1 'dacl' and 'sacl' attributes NFSv4: Add encoders/decoders for the NFSv4.1 dacl and sacl attributes NFSv4: Specify the type of ACL to cache NFSv4: Don't hold the layoutget locks across multiple RPC calls pNFS/files: Fall back to I/O through the MDS on non-fatal layout errors NFS: Further fixes to the writeback error handling NFSv4/pNFS: Do not fail I/O when we fail to allocate the pNFS layout NFS: Memory allocation failures are not server fatal errors NFS: Don't report errors from nfs_pageio_complete() more than once NFS: Do not report flush errors in nfs_write_end() NFS: Don't report ENOSPC write errors twice NFS: fsync() should report filesystem errors over EINTR/ERESTARTSYS NFS: Do not report EINTR/ERESTARTSYS as mapping errors
2 parents 1501f70 + 118f09e commit 700170b

File tree

19 files changed

+548
-156
lines changed

19 files changed

+548
-156
lines changed

Documentation/admin-guide/nfs/nfs-client.rst

Lines changed: 9 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -36,10 +36,9 @@ administrative requirements that require particular behavior that does not
3636
work well as part of an nfs_client_id4 string.
3737

3838
The nfs.nfs4_unique_id boot parameter specifies a unique string that can be
39-
used instead of a system's node name when an NFS client identifies itself to
40-
a server. Thus, if the system's node name is not unique, or it changes, its
41-
nfs.nfs4_unique_id stays the same, preventing collision with other clients
42-
or loss of state during NFS reboot recovery or transparent state migration.
39+
used together with a system's node name when an NFS client identifies itself to
40+
a server. Thus, if the system's node name is not unique, its
41+
nfs.nfs4_unique_id can help prevent collisions with other clients.
4342

4443
The nfs.nfs4_unique_id string is typically a UUID, though it can contain
4544
anything that is believed to be unique across all NFS clients. An
@@ -53,8 +52,12 @@ outstanding NFSv4 state has expired, to prevent loss of NFSv4 state.
5352

5453
This string can be stored in an NFS client's grub.conf, or it can be provided
5554
via a net boot facility such as PXE. It may also be specified as an nfs.ko
56-
module parameter. Specifying a uniquifier string is not support for NFS
57-
clients running in containers.
55+
module parameter.
56+
57+
This uniquifier string will be the same for all NFS clients running in
58+
containers unless it is overridden by a value written to
59+
/sys/fs/nfs/net/nfs_client/identifier which will be local to the network
60+
namespace of the process which writes.
5861

5962

6063
The DNS resolver
Lines changed: 216 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,216 @@
1+
.. SPDX-License-Identifier: GPL-2.0
2+
3+
=======================
4+
NFSv4 client identifier
5+
=======================
6+
7+
This document explains how the NFSv4 protocol identifies client
8+
instances in order to maintain file open and lock state during
9+
system restarts. A special identifier and principal are maintained
10+
on each client. These can be set by administrators, scripts
11+
provided by site administrators, or tools provided by Linux
12+
distributors.
13+
14+
There are risks if a client's NFSv4 identifier and its principal
15+
are not chosen carefully.
16+
17+
18+
Introduction
19+
------------
20+
21+
The NFSv4 protocol uses "lease-based file locking". Leases help
22+
NFSv4 servers provide file lock guarantees and manage their
23+
resources.
24+
25+
Simply put, an NFSv4 server creates a lease for each NFSv4 client.
26+
The server collects each client's file open and lock state under
27+
the lease for that client.
28+
29+
The client is responsible for periodically renewing its leases.
30+
While a lease remains valid, the server holding that lease
31+
guarantees the file locks the client has created remain in place.
32+
33+
If a client stops renewing its lease (for example, if it crashes),
34+
the NFSv4 protocol allows the server to remove the client's open
35+
and lock state after a certain period of time. When a client
36+
restarts, it indicates to servers that open and lock state
37+
associated with its previous leases is no longer valid and can be
38+
destroyed immediately.
39+
40+
In addition, each NFSv4 server manages a persistent list of client
41+
leases. When the server restarts and clients attempt to recover
42+
their state, the server uses this list to distinguish amongst
43+
clients that held state before the server restarted and clients
44+
sending fresh OPEN and LOCK requests. This enables file locks to
45+
persist safely across server restarts.
46+
47+
NFSv4 client identifiers
48+
------------------------
49+
50+
Each NFSv4 client presents an identifier to NFSv4 servers so that
51+
they can associate the client with its lease. Each client's
52+
identifier consists of two elements:
53+
54+
- co_ownerid: An arbitrary but fixed string.
55+
56+
- boot verifier: A 64-bit incarnation verifier that enables a
57+
server to distinguish successive boot epochs of the same client.
58+
59+
The NFSv4.0 specification refers to these two items as an
60+
"nfs_client_id4". The NFSv4.1 specification refers to these two
61+
items as a "client_owner4".
62+
63+
NFSv4 servers tie this identifier to the principal and security
64+
flavor that the client used when presenting it. Servers use this
65+
principal to authorize subsequent lease modification operations
66+
sent by the client. Effectively this principal is a third element of
67+
the identifier.
68+
69+
As part of the identity presented to servers, a good
70+
"co_ownerid" string has several important properties:
71+
72+
- The "co_ownerid" string identifies the client during reboot
73+
recovery, therefore the string is persistent across client
74+
reboots.
75+
- The "co_ownerid" string helps servers distinguish the client
76+
from others, therefore the string is globally unique. Note
77+
that there is no central authority that assigns "co_ownerid"
78+
strings.
79+
- Because it often appears on the network in the clear, the
80+
"co_ownerid" string does not reveal private information about
81+
the client itself.
82+
- The content of the "co_ownerid" string is set and unchanging
83+
before the client attempts NFSv4 mounts after a restart.
84+
- The NFSv4 protocol places a 1024-byte limit on the size of the
85+
"co_ownerid" string.
86+
87+
Protecting NFSv4 lease state
88+
----------------------------
89+
90+
NFSv4 servers utilize the "client_owner4" as described above to
91+
assign a unique lease to each client. Under this scheme, there are
92+
circumstances where clients can interfere with each other. This is
93+
referred to as "lease stealing".
94+
95+
If distinct clients present the same "co_ownerid" string and use
96+
the same principal (for example, AUTH_SYS and UID 0), a server is
97+
unable to tell that the clients are not the same. Each distinct
98+
client presents a different boot verifier, so it appears to the
99+
server as if there is one client that is rebooting frequently.
100+
Neither client can maintain open or lock state in this scenario.
101+
102+
If distinct clients present the same "co_ownerid" string and use
103+
distinct principals, the server is likely to allow the first client
104+
to operate normally but reject subsequent clients with the same
105+
"co_ownerid" string.
106+
107+
If a client's "co_ownerid" string or principal are not stable,
108+
state recovery after a server or client reboot is not guaranteed.
109+
If a client unexpectedly restarts but presents a different
110+
"co_ownerid" string or principal to the server, the server orphans
111+
the client's previous open and lock state. This blocks access to
112+
locked files until the server removes the orphaned state.
113+
114+
If the server restarts and a client presents a changed "co_ownerid"
115+
string or principal to the server, the server will not allow the
116+
client to reclaim its open and lock state, and may give those locks
117+
to other clients in the meantime. This is referred to as "lock
118+
stealing".
119+
120+
Lease stealing and lock stealing increase the potential for denial
121+
of service and in rare cases even data corruption.
122+
123+
Selecting an appropriate client identifier
124+
------------------------------------------
125+
126+
By default, the Linux NFSv4 client implementation constructs its
127+
"co_ownerid" string starting with the words "Linux NFS" followed by
128+
the client's UTS node name (the same node name, incidentally, that
129+
is used as the "machine name" in an AUTH_SYS credential). In small
130+
deployments, this construction is usually adequate. Often, however,
131+
the node name by itself is not adequately unique, and can change
132+
unexpectedly. Problematic situations include:
133+
134+
- NFS-root (diskless) clients, where the local DCHP server (or
135+
equivalent) does not provide a unique host name.
136+
137+
- "Containers" within a single Linux host. If each container has
138+
a separate network namespace, but does not use the UTS namespace
139+
to provide a unique host name, then there can be multiple NFS
140+
client instances with the same host name.
141+
142+
- Clients across multiple administrative domains that access a
143+
common NFS server. If hostnames are not assigned centrally
144+
then uniqueness cannot be guaranteed unless a domain name is
145+
included in the hostname.
146+
147+
Linux provides two mechanisms to add uniqueness to its "co_ownerid"
148+
string:
149+
150+
nfs.nfs4_unique_id
151+
This module parameter can set an arbitrary uniquifier string
152+
via the kernel command line, or when the "nfs" module is
153+
loaded.
154+
155+
/sys/fs/nfs/client/net/identifier
156+
This virtual file, available since Linux 5.3, is local to the
157+
network namespace in which it is accessed and so can provide
158+
distinction between network namespaces (containers) when the
159+
hostname remains uniform.
160+
161+
Note that this file is empty on name-space creation. If the
162+
container system has access to some sort of per-container identity
163+
then that uniquifier can be used. For example, a uniquifier might
164+
be formed at boot using the container's internal identifier:
165+
166+
sha256sum /etc/machine-id | awk '{print $1}' \\
167+
> /sys/fs/nfs/client/net/identifier
168+
169+
Security considerations
170+
-----------------------
171+
172+
The use of cryptographic security for lease management operations
173+
is strongly encouraged.
174+
175+
If NFS with Kerberos is not configured, a Linux NFSv4 client uses
176+
AUTH_SYS and UID 0 as the principal part of its client identity.
177+
This configuration is not only insecure, it increases the risk of
178+
lease and lock stealing. However, it might be the only choice for
179+
client configurations that have no local persistent storage.
180+
"co_ownerid" string uniqueness and persistence is critical in this
181+
case.
182+
183+
When a Kerberos keytab is present on a Linux NFS client, the client
184+
attempts to use one of the principals in that keytab when
185+
identifying itself to servers. The "sec=" mount option does not
186+
control this behavior. Alternately, a single-user client with a
187+
Kerberos principal can use that principal in place of the client's
188+
host principal.
189+
190+
Using Kerberos for this purpose enables the client and server to
191+
use the same lease for operations covered by all "sec=" settings.
192+
Additionally, the Linux NFS client uses the RPCSEC_GSS security
193+
flavor with Kerberos and the integrity QOS to prevent in-transit
194+
modification of lease modification requests.
195+
196+
Additional notes
197+
----------------
198+
The Linux NFSv4 client establishes a single lease on each NFSv4
199+
server it accesses. NFSv4 mounts from a Linux NFSv4 client of a
200+
particular server then share that lease.
201+
202+
Once a client establishes open and lock state, the NFSv4 protocol
203+
enables lease state to transition to other servers, following data
204+
that has been migrated. This hides data migration completely from
205+
running applications. The Linux NFSv4 client facilitates state
206+
migration by presenting the same "client_owner4" to all servers it
207+
encounters.
208+
209+
========
210+
See Also
211+
========
212+
213+
- nfs(5)
214+
- kerberos(7)
215+
- RFC 7530 for the NFSv4.0 specification
216+
- RFC 8881 for the NFSv4.1 specification.

Documentation/filesystems/nfs/index.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@ NFS
66
.. toctree::
77
:maxdepth: 1
88

9+
client-identifier
10+
exporting
911
pnfs
1012
rpc-cache
1113
rpc-server-gss

fs/nfs/file.c

Lines changed: 21 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -206,15 +206,16 @@ static int
206206
nfs_file_fsync_commit(struct file *file, int datasync)
207207
{
208208
struct inode *inode = file_inode(file);
209-
int ret;
209+
int ret, ret2;
210210

211211
dprintk("NFS: fsync file(%pD2) datasync %d\n", file, datasync);
212212

213213
nfs_inc_stats(inode, NFSIOS_VFSFSYNC);
214214
ret = nfs_commit_inode(inode, FLUSH_SYNC);
215-
if (ret < 0)
216-
return ret;
217-
return file_check_and_advance_wb_err(file);
215+
ret2 = file_check_and_advance_wb_err(file);
216+
if (ret2 < 0)
217+
return ret2;
218+
return ret;
218219
}
219220

220221
int
@@ -387,11 +388,8 @@ static int nfs_write_end(struct file *file, struct address_space *mapping,
387388
return status;
388389
NFS_I(mapping->host)->write_io += copied;
389390

390-
if (nfs_ctx_key_to_expire(ctx, mapping->host)) {
391-
status = nfs_wb_all(mapping->host);
392-
if (status < 0)
393-
return status;
394-
}
391+
if (nfs_ctx_key_to_expire(ctx, mapping->host))
392+
nfs_wb_all(mapping->host);
395393

396394
return copied;
397395
}
@@ -606,18 +604,6 @@ static const struct vm_operations_struct nfs_file_vm_ops = {
606604
.page_mkwrite = nfs_vm_page_mkwrite,
607605
};
608606

609-
static int nfs_need_check_write(struct file *filp, struct inode *inode,
610-
int error)
611-
{
612-
struct nfs_open_context *ctx;
613-
614-
ctx = nfs_file_open_context(filp);
615-
if (nfs_error_is_fatal_on_server(error) ||
616-
nfs_ctx_key_to_expire(ctx, inode))
617-
return 1;
618-
return 0;
619-
}
620-
621607
ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
622608
{
623609
struct file *file = iocb->ki_filp;
@@ -645,7 +631,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
645631
if (iocb->ki_flags & IOCB_APPEND || iocb->ki_pos > i_size_read(inode)) {
646632
result = nfs_revalidate_file_size(inode, file);
647633
if (result)
648-
goto out;
634+
return result;
649635
}
650636

651637
nfs_clear_invalid_mapping(file->f_mapping);
@@ -664,6 +650,7 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
664650

665651
written = result;
666652
iocb->ki_pos += written;
653+
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
667654

668655
if (mntflags & NFS_MOUNT_WRITE_EAGER) {
669656
result = filemap_fdatawrite_range(file->f_mapping,
@@ -681,17 +668,22 @@ ssize_t nfs_file_write(struct kiocb *iocb, struct iov_iter *from)
681668
}
682669
result = generic_write_sync(iocb, written);
683670
if (result < 0)
684-
goto out;
671+
return result;
685672

673+
out:
686674
/* Return error values */
687675
error = filemap_check_wb_err(file->f_mapping, since);
688-
if (nfs_need_check_write(file, inode, error)) {
689-
int err = nfs_wb_all(inode);
690-
if (err < 0)
691-
result = err;
676+
switch (error) {
677+
default:
678+
break;
679+
case -EDQUOT:
680+
case -EFBIG:
681+
case -ENOSPC:
682+
nfs_wb_all(inode);
683+
error = file_check_and_advance_wb_err(file);
684+
if (error < 0)
685+
result = error;
692686
}
693-
nfs_add_stats(inode, NFSIOS_NORMALWRITTENBYTES, written);
694-
out:
695687
return result;
696688

697689
out_swapfile:

fs/nfs/filelayout/filelayout.c

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -839,7 +839,12 @@ fl_pnfs_update_layout(struct inode *ino,
839839

840840
lseg = pnfs_update_layout(ino, ctx, pos, count, iomode, strict_iomode,
841841
gfp_flags);
842-
if (IS_ERR_OR_NULL(lseg))
842+
if (IS_ERR(lseg)) {
843+
/* Fall back to MDS on recoverable errors */
844+
if (!nfs_error_is_fatal_on_server(PTR_ERR(lseg)))
845+
lseg = NULL;
846+
goto out;
847+
} else if (!lseg)
843848
goto out;
844849

845850
lo = NFS_I(ino)->layout;

fs/nfs/fscache.c

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -231,11 +231,10 @@ void nfs_fscache_release_file(struct inode *inode, struct file *filp)
231231
{
232232
struct nfs_fscache_inode_auxdata auxdata;
233233
struct fscache_cookie *cookie = nfs_i_fscache(inode);
234+
loff_t i_size = i_size_read(inode);
234235

235-
if (fscache_cookie_valid(cookie)) {
236-
nfs_fscache_update_auxdata(&auxdata, inode);
237-
fscache_unuse_cookie(cookie, &auxdata, NULL);
238-
}
236+
nfs_fscache_update_auxdata(&auxdata, inode);
237+
fscache_unuse_cookie(cookie, &auxdata, &i_size);
239238
}
240239

241240
/*

fs/nfs/internal.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -841,6 +841,7 @@ static inline bool nfs_error_is_fatal_on_server(int err)
841841
case 0:
842842
case -ERESTARTSYS:
843843
case -EINTR:
844+
case -ENOMEM:
844845
return false;
845846
}
846847
return nfs_error_is_fatal(err);

0 commit comments

Comments
 (0)