Skip to content

Commit 0a790fe

Browse files
Gabriel Krisman Bertazitytso
authored andcommitted
docs: ext4.rst: document case-insensitive directories
Introduces the case-insensitive features on ext4 for system administrators. Explain the minimum of design decisions that are important for sysadmins wanting to enable this feature. Signed-off-by: Gabriel Krisman Bertazi <[email protected]> Signed-off-by: Theodore Ts'o <[email protected]>
1 parent b886ee3 commit 0a790fe

File tree

1 file changed

+38
-0
lines changed

1 file changed

+38
-0
lines changed

Documentation/admin-guide/ext4.rst

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,10 +91,48 @@ Currently Available
9191
* large block (up to pagesize) support
9292
* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
9393
the ordering)
94+
* Case-insensitive file name lookups
9495

9596
[1] Filesystems with a block size of 1k may see a limit imposed by the
9697
directory hash tree having a maximum depth of two.
9798

99+
case-insensitive file name lookups
100+
======================================================
101+
102+
The case-insensitive file name lookup feature is supported on a
103+
per-directory basis, allowing the user to mix case-insensitive and
104+
case-sensitive directories in the same filesystem. It is enabled by
105+
flipping the +F inode attribute of an empty directory. The
106+
case-insensitive string match operation is only defined when we know how
107+
text in encoded in a byte sequence. For that reason, in order to enable
108+
case-insensitive directories, the filesystem must have the
109+
casefold feature, which stores the filesystem-wide encoding
110+
model used. By default, the charset adopted is the latest version of
111+
Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
112+
form. The comparison algorithm is implemented by normalizing the
113+
strings to the Canonical decomposition form, as defined by Unicode,
114+
followed by a byte per byte comparison.
115+
116+
The case-awareness is name-preserving on the disk, meaning that the file
117+
name provided by userspace is a byte-per-byte match to what is actually
118+
written in the disk. The Unicode normalization format used by the
119+
kernel is thus an internal representation, and not exposed to the
120+
userspace nor to the disk, with the important exception of disk hashes,
121+
used on large case-insensitive directories with DX feature. On DX
122+
directories, the hash must be calculated using the casefolded version of
123+
the filename, meaning that the normalization format used actually has an
124+
impact on where the directory entry is stored.
125+
126+
When we change from viewing filenames as opaque byte sequences to seeing
127+
them as encoded strings we need to address what happens when a program
128+
tries to create a file with an invalid name. The Unicode subsystem
129+
within the kernel leaves the decision of what to do in this case to the
130+
filesystem, which select its preferred behavior by enabling/disabling
131+
the strict mode. When Ext4 encounters one of those strings and the
132+
filesystem did not require strict mode, it falls back to considering the
133+
entire string as an opaque byte sequence, which still allows the user to
134+
operate on that file, but the case-insensitive lookups won't work.
135+
98136
Options
99137
=======
100138

0 commit comments

Comments
 (0)