@@ -91,10 +91,48 @@ Currently Available
91
91
* large block (up to pagesize) support
92
92
* efficient new ordered mode in JBD2 and ext4 (avoid using buffer head to force
93
93
the ordering)
94
+ * Case-insensitive file name lookups
94
95
95
96
[1] Filesystems with a block size of 1k may see a limit imposed by the
96
97
directory hash tree having a maximum depth of two.
97
98
99
+ case-insensitive file name lookups
100
+ ======================================================
101
+
102
+ The case-insensitive file name lookup feature is supported on a
103
+ per-directory basis, allowing the user to mix case-insensitive and
104
+ case-sensitive directories in the same filesystem. It is enabled by
105
+ flipping the +F inode attribute of an empty directory. The
106
+ case-insensitive string match operation is only defined when we know how
107
+ text in encoded in a byte sequence. For that reason, in order to enable
108
+ case-insensitive directories, the filesystem must have the
109
+ casefold feature, which stores the filesystem-wide encoding
110
+ model used. By default, the charset adopted is the latest version of
111
+ Unicode (12.1.0, by the time of this writing), encoded in the UTF-8
112
+ form. The comparison algorithm is implemented by normalizing the
113
+ strings to the Canonical decomposition form, as defined by Unicode,
114
+ followed by a byte per byte comparison.
115
+
116
+ The case-awareness is name-preserving on the disk, meaning that the file
117
+ name provided by userspace is a byte-per-byte match to what is actually
118
+ written in the disk. The Unicode normalization format used by the
119
+ kernel is thus an internal representation, and not exposed to the
120
+ userspace nor to the disk, with the important exception of disk hashes,
121
+ used on large case-insensitive directories with DX feature. On DX
122
+ directories, the hash must be calculated using the casefolded version of
123
+ the filename, meaning that the normalization format used actually has an
124
+ impact on where the directory entry is stored.
125
+
126
+ When we change from viewing filenames as opaque byte sequences to seeing
127
+ them as encoded strings we need to address what happens when a program
128
+ tries to create a file with an invalid name. The Unicode subsystem
129
+ within the kernel leaves the decision of what to do in this case to the
130
+ filesystem, which select its preferred behavior by enabling/disabling
131
+ the strict mode. When Ext4 encounters one of those strings and the
132
+ filesystem did not require strict mode, it falls back to considering the
133
+ entire string as an opaque byte sequence, which still allows the user to
134
+ operate on that file, but the case-insensitive lookups won't work.
135
+
98
136
Options
99
137
=======
100
138
0 commit comments