Skip to content

Commit 26969f8

Browse files
committed
Update paper
1 parent 3625315 commit 26969f8

File tree

2 files changed

+74
-3
lines changed

2 files changed

+74
-3
lines changed

papers/p3904.bs

Lines changed: 45 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,12 @@ not always contain Unicode text, quoting [[WIN32-FILEIO]]:
4545

4646
> the file system treats path and file names as an opaque sequence of `WCHAR`s
4747

48+
This is also true on POSIX ([[PEP383]]):
49+
50+
> File names, environment variables, and command line arguments are defined as
51+
> being character data in POSIX; the C APIs however allow passing arbitrary
52+
> bytes - whether these conform to a certain encoding or not.
53+
4854
Arbitrary paths are formatted on POSIX such that there is no data loss.
4955
Unfortunately this is not the case on Windows, for example:
5056

@@ -113,7 +119,30 @@ std::format("{}\n", std::filesystem::path(L"\xD801"));
113119
</tr>
114120
</table>
115121

116-
TODO
122+
At the same time this will preserve the observable behavior for `std::print`
123+
when printing to a terminal. For example:
124+
125+
```c++
126+
std::print("{}\n", std::filesystem::path(L"\xD800"));
127+
```
128+
129+
will still print
130+
131+
```
132+
133+
```
134+
135+
on implementations that follow the recommended practice from
136+
[[ostream.formatted.print](https://eel.is/c++draft/ostream.formatted.print)]:
137+
138+
> *Recommended practice*: For `vprint_unicode`, if invoking the native Unicode
139+
> API requires transcoding, implementations should substitute invalid code
140+
> units with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9
141+
> U+FFFD Substitution in Conversion.
142+
143+
WTF-8 is used to handle invalid UTF-16 in Rust ([[RUST-OSSTRING]]) and Node.js
144+
libuv ([[LIBUV]]). Python also handles this but with a different mechanism
145+
([[PEP383]]).
117146

118147
<pre class=biblio>
119148
{
@@ -122,6 +151,21 @@ TODO
122151
"authors": ["Victor Zverovich"],
123152
"href": "https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2845r8.html"
124153
},
154+
"PEP383": {
155+
"title": "PEP 383 – Non-decodable Bytes in System Character Interfaces",
156+
"authors": ["Martin von Löwis"],
157+
"href": "https://peps.python.org/pep-0383/"
158+
},
159+
"RUST-OSSTRING": {
160+
"title": "OsString Struct. The Rust Standard Library.",
161+
"authors": "Rust Project Developers",
162+
"href": "https://doc.rust-lang.org/std/ffi/struct.OsString.html"
163+
},
164+
"LIBUV": {
165+
"title": "Miscellaneous utilities. libuv Documentation.",
166+
"authors": "libuv contributors,",
167+
"href": "https://docs.libuv.org/en/v1.x/misc.html"
168+
},
125169
"WTF": {
126170
"title": "The WTF-8 encoding",
127171
"authors": ["Simon Sapin"],

papers/p3904.html

Lines changed: 29 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1563,7 +1563,7 @@
15631563
</style>
15641564
<meta content="Bikeshed version 4416b18d5, updated Tue Jan 2 15:52:39 2024 -0800" name="generator">
15651565
<link href="https://isocpp.org/favicon.ico" rel="icon">
1566-
<meta content="13d7c997af9d4ef75ba4d5addd710fe9c58a4268" name="revision">
1566+
<meta content="36253156092910ff21aa2d4b2a5b9a7ec762c0bf" name="revision">
15671567
<style>/* Boilerplate: style-autolinks */
15681568
.css.css, .property.property, .descriptor.descriptor {
15691569
color: var(--a-normal-text);
@@ -2136,6 +2136,12 @@ <h2 class="heading settled" data-level="2" id="motivation"><span class="secno">2
21362136
<blockquote>
21372137
<p>the file system treats path and file names as an opaque sequence of <code class="highlight"><c- n>WCHAR</c-></code>s</p>
21382138
</blockquote>
2139+
<p>This is also true on POSIX (<a data-link-type="biblio" href="#biblio-pep383" title="PEP 383 – Non-decodable Bytes in System Character Interfaces">[PEP383]</a>):</p>
2140+
<blockquote>
2141+
<p>File names, environment variables, and command line arguments are defined as
2142+
being character data in POSIX; the C APIs however allow passing arbitrary
2143+
bytes - whether these conform to a certain encoding or not.</p>
2144+
</blockquote>
21392145
<p>Arbitrary paths are formatted on POSIX such that there is no data loss.
21402146
Unfortunately this is not the case on Windows, for example:</p>
21412147
<pre class="language-c++ highlight"><c- k>auto</c-> <c- n>p1</c-> <c- o>=</c-> <c- n>std</c-><c- o>::</c-><c- n>filesystem</c-><c- o>::</c-><c- n>path</c-><c- p>(</c->L<c- s>"</c-><c- se>\xD800</c-><c- s>"</c-><c- p>);</c-> <c- c1>// a lone surrogate</c->
@@ -2186,7 +2192,22 @@ <h2 class="heading settled" data-level="3" id="proposal"><span class="secno">3.
21862192
<pre class="highlight"><c- s>"</c-><c- se>\xED\xA0\x81</c-><c- s>"</c->
21872193
</pre>
21882194
</table>
2189-
<p>TODO</p>
2195+
<p>At the same time this will preserve the observable behavior for <code class="highlight"><c- n>std</c-><c- o>::</c-><c- n>print</c-></code> when printing to a terminal. For example:</p>
2196+
<pre class="language-c++ highlight"><c- n>std</c-><c- o>::</c-><c- n>print</c-><c- p>(</c-><c- s>"{}</c-><c- se>\n</c-><c- s>"</c-><c- p>,</c-> <c- n>std</c-><c- o>::</c-><c- n>filesystem</c-><c- o>::</c-><c- n>path</c-><c- p>(</c->L<c- s>"</c-><c- se>\xD800</c-><c- s>"</c-><c- p>));</c->
2197+
</pre>
2198+
<p>will still print</p>
2199+
<pre class="highlight">
2200+
</pre>
2201+
<p>on implementations that follow the recommended practice from <a href="https://eel.is/c++draft/ostream.formatted.print">[ostream.formatted.print</a>]:</p>
2202+
<blockquote>
2203+
<p><em>Recommended practice</em>: For <code class="highlight"><c- n>vprint_unicode</c-></code>, if invoking the native Unicode
2204+
API requires transcoding, implementations should substitute invalid code
2205+
units with U+FFFD REPLACEMENT CHARACTER per the Unicode Standard, Chapter 3.9
2206+
U+FFFD Substitution in Conversion.</p>
2207+
</blockquote>
2208+
<p>WTF-8 is used to handle invalid UTF-16 in Rust (<a data-link-type="biblio" href="#biblio-rust-osstring" title="OsString Struct. The Rust Standard Library.">[RUST-OSSTRING]</a>) and Node.js
2209+
libuv (<a data-link-type="biblio" href="#biblio-libuv" title="Miscellaneous utilities. libuv Documentation.">[LIBUV]</a>). Python also handles this but with a different mechanism
2210+
(<a data-link-type="biblio" href="#biblio-pep383" title="PEP 383 – Non-decodable Bytes in System Character Interfaces">[PEP383]</a>).</p>
21902211
</main>
21912212
<script>
21922213
(function() {
@@ -2320,8 +2341,14 @@ <h2 class="heading settled" data-level="3" id="proposal"><span class="secno">3.
23202341
<h2 class="no-num no-ref heading settled" id="references"><span class="content">References</span><a class="self-link" href="#references"></a></h2>
23212342
<h3 class="no-num no-ref heading settled" id="informative"><span class="content">Informative References</span><a class="self-link" href="#informative"></a></h3>
23222343
<dl>
2344+
<dt id="biblio-libuv">[LIBUV]
2345+
<dd>l; et al. <a href="https://docs.libuv.org/en/v1.x/misc.html"><cite>Miscellaneous utilities. libuv Documentation.</cite></a>. URL: <a href="https://docs.libuv.org/en/v1.x/misc.html">https://docs.libuv.org/en/v1.x/misc.html</a>
23232346
<dt id="biblio-p2845">[P2845]
23242347
<dd>Victor Zverovich. <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2845r8.html"><cite>Formatting of std::filesystem::path</cite></a>. URL: <a href="https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2845r8.html">https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p2845r8.html</a>
2348+
<dt id="biblio-pep383">[PEP383]
2349+
<dd>Martin von Löwis. <a href="https://peps.python.org/pep-0383/"><cite>PEP 383 – Non-decodable Bytes in System Character Interfaces</cite></a>. URL: <a href="https://peps.python.org/pep-0383/">https://peps.python.org/pep-0383/</a>
2350+
<dt id="biblio-rust-osstring">[RUST-OSSTRING]
2351+
<dd>R; et al. <a href="https://doc.rust-lang.org/std/ffi/struct.OsString.html"><cite>OsString Struct. The Rust Standard Library.</cite></a>. URL: <a href="https://doc.rust-lang.org/std/ffi/struct.OsString.html">https://doc.rust-lang.org/std/ffi/struct.OsString.html</a>
23252352
<dt id="biblio-win32-fileio">[WIN32-FILEIO]
23262353
<dd>Microsoft Corporation. <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation"><cite>Maximum Path Length Limitation – Local file systems</cite></a>. URL: <a href="https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation">https://learn.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation</a>
23272354
<dt id="biblio-wtf">[WTF]

0 commit comments

Comments
 (0)