Skip to content

Commit 1ac0ce4

Browse files
committed
Merge branch 'ls/checkout-encoding'
The new "checkout-encoding" attribute can ask Git to convert the contents to the specified encoding when checking out to the working tree (and the other way around when checking in). * ls/checkout-encoding: convert: add round trip check based on 'core.checkRoundtripEncoding' convert: add tracing for 'working-tree-encoding' attribute convert: check for detectable errors in UTF encodings convert: add 'working-tree-encoding' attribute utf8: add function to detect a missing UTF-16/32 BOM utf8: add function to detect prohibited UTF-16/32 BOM utf8: teach same_encoding() alternative UTF encoding names strbuf: add a case insensitive starts_with() strbuf: add xstrdup_toupper() strbuf: remove unnecessary NUL assignment in xstrdup_tolower()
2 parents 7d7d051 + e92d622 commit 1ac0ce4

File tree

13 files changed

+737
-5
lines changed

13 files changed

+737
-5
lines changed

Documentation/config.txt

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -530,6 +530,12 @@ core.autocrlf::
530530
This variable can be set to 'input',
531531
in which case no output conversion is performed.
532532

533+
core.checkRoundtripEncoding::
534+
A comma and/or whitespace separated list of encodings that Git
535+
performs UTF-8 round trip checks on if they are used in an
536+
`working-tree-encoding` attribute (see linkgit:gitattributes[5]).
537+
The default value is `SHIFT-JIS`.
538+
533539
core.symlinks::
534540
If false, symbolic links are checked out as small plain files that
535541
contain the link text. linkgit:git-update-index[1] and

Documentation/gitattributes.txt

Lines changed: 88 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -279,6 +279,94 @@ few exceptions. Even though...
279279
catch potential problems early, safety triggers.
280280

281281

282+
`working-tree-encoding`
283+
^^^^^^^^^^^^^^^^^^^^^^^
284+
285+
Git recognizes files encoded in ASCII or one of its supersets (e.g.
286+
UTF-8, ISO-8859-1, ...) as text files. Files encoded in certain other
287+
encodings (e.g. UTF-16) are interpreted as binary and consequently
288+
built-in Git text processing tools (e.g. 'git diff') as well as most Git
289+
web front ends do not visualize the contents of these files by default.
290+
291+
In these cases you can tell Git the encoding of a file in the working
292+
directory with the `working-tree-encoding` attribute. If a file with this
293+
attribute is added to Git, then Git reencodes the content from the
294+
specified encoding to UTF-8. Finally, Git stores the UTF-8 encoded
295+
content in its internal data structure (called "the index"). On checkout
296+
the content is reencoded back to the specified encoding.
297+
298+
Please note that using the `working-tree-encoding` attribute may have a
299+
number of pitfalls:
300+
301+
- Alternative Git implementations (e.g. JGit or libgit2) and older Git
302+
versions (as of March 2018) do not support the `working-tree-encoding`
303+
attribute. If you decide to use the `working-tree-encoding` attribute
304+
in your repository, then it is strongly recommended to ensure that all
305+
clients working with the repository support it.
306+
307+
For example, Microsoft Visual Studio resources files (`*.rc`) or
308+
PowerShell script files (`*.ps1`) are sometimes encoded in UTF-16.
309+
If you declare `*.ps1` as files as UTF-16 and you add `foo.ps1` with
310+
a `working-tree-encoding` enabled Git client, then `foo.ps1` will be
311+
stored as UTF-8 internally. A client without `working-tree-encoding`
312+
support will checkout `foo.ps1` as UTF-8 encoded file. This will
313+
typically cause trouble for the users of this file.
314+
315+
If a Git client, that does not support the `working-tree-encoding`
316+
attribute, adds a new file `bar.ps1`, then `bar.ps1` will be
317+
stored "as-is" internally (in this example probably as UTF-16).
318+
A client with `working-tree-encoding` support will interpret the
319+
internal contents as UTF-8 and try to convert it to UTF-16 on checkout.
320+
That operation will fail and cause an error.
321+
322+
- Reencoding content to non-UTF encodings can cause errors as the
323+
conversion might not be UTF-8 round trip safe. If you suspect your
324+
encoding to not be round trip safe, then add it to
325+
`core.checkRoundtripEncoding` to make Git check the round trip
326+
encoding (see linkgit:git-config[1]). SHIFT-JIS (Japanese character
327+
set) is known to have round trip issues with UTF-8 and is checked by
328+
default.
329+
330+
- Reencoding content requires resources that might slow down certain
331+
Git operations (e.g 'git checkout' or 'git add').
332+
333+
Use the `working-tree-encoding` attribute only if you cannot store a file
334+
in UTF-8 encoding and if you want Git to be able to process the content
335+
as text.
336+
337+
As an example, use the following attributes if your '*.ps1' files are
338+
UTF-16 encoded with byte order mark (BOM) and you want Git to perform
339+
automatic line ending conversion based on your platform.
340+
341+
------------------------
342+
*.ps1 text working-tree-encoding=UTF-16
343+
------------------------
344+
345+
Use the following attributes if your '*.ps1' files are UTF-16 little
346+
endian encoded without BOM and you want Git to use Windows line endings
347+
in the working directory. Please note, it is highly recommended to
348+
explicitly define the line endings with `eol` if the `working-tree-encoding`
349+
attribute is used to avoid ambiguity.
350+
351+
------------------------
352+
*.ps1 text working-tree-encoding=UTF-16LE eol=CRLF
353+
------------------------
354+
355+
You can get a list of all available encodings on your platform with the
356+
following command:
357+
358+
------------------------
359+
iconv --list
360+
------------------------
361+
362+
If you do not know the encoding of a file, then you can use the `file`
363+
command to guess the encoding:
364+
365+
------------------------
366+
file foo.ps1
367+
------------------------
368+
369+
282370
`ident`
283371
^^^^^^^
284372

config.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1239,6 +1239,11 @@ static int git_default_core_config(const char *var, const char *value)
12391239
return 0;
12401240
}
12411241

1242+
if (!strcmp(var, "core.checkroundtripencoding")) {
1243+
check_roundtrip_encoding = xstrdup(value);
1244+
return 0;
1245+
}
1246+
12421247
if (!strcmp(var, "core.notesref")) {
12431248
notes_ref_name = xstrdup(value);
12441249
return 0;

0 commit comments

Comments
 (0)