Skip to content

Commit 9b575f1

Browse files
committed
Improve fs::PathToString documentation
1 parent 7f0f853 commit 9b575f1

File tree

3 files changed

+34
-21
lines changed

3 files changed

+34
-21
lines changed

doc/developer-notes.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1254,6 +1254,12 @@ A few guidelines for introducing and reviewing new RPC interfaces:
12541254

12551255
- *Rationale*: User-facing consistency.
12561256

1257+
- Use `fs::path::u8string()` and `fs::u8path()` functions when converting path
1258+
to JSON strings, not `fs::PathToString` and `fs::PathFromString`
1259+
1260+
- *Rationale*: JSON strings are Unicode strings, not byte strings, and
1261+
RFC8259 requires JSON to be encoded as UTF-8.
1262+
12571263
Internal interface guidelines
12581264
-----------------------------
12591265

src/dbwrapper.cpp

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,10 @@ CDBWrapper::CDBWrapper(const fs::path& path, size_t nCacheSize, bool fMemory, bo
136136
TryCreateDirectories(path);
137137
LogPrintf("Opening LevelDB in %s\n", fs::PathToString(path));
138138
}
139+
// PathToString() return value is safe to pass to leveldb open function,
140+
// because on POSIX leveldb passes the byte string directly to ::open(), and
141+
// on Windows it converts from UTF-8 to UTF-16 before calling ::CreateFileW
142+
// (see env_posix.cc and env_windows.cc).
139143
leveldb::Status status = leveldb::DB::Open(options, fs::PathToString(path), &pdb);
140144
dbwrapper_private::HandleError(status);
141145
LogPrintf("Opened LevelDB successfully\n");

src/fs.h

Lines changed: 24 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -94,31 +94,34 @@ static inline path operator+(path p1, path p2)
9494

9595
/**
9696
* Convert path object to byte string. On POSIX, paths natively are byte
97-
* strings so this is trivial. On Windows, paths natively are Unicode, so an
98-
* encoding step is necessary.
97+
* strings, so this is trivial. On Windows, paths natively are Unicode, so an
98+
* encoding step is necessary. The inverse of \ref PathToString is \ref
99+
* PathFromString. The strings returned and parsed by these functions can be
100+
* used to call POSIX APIs, and for roundtrip conversion, logging, and
101+
* debugging.
99102
*
100-
* The inverse of \ref PathToString is \ref PathFromString. The strings
101-
* returned and parsed by these functions can be used to call POSIX APIs, and
102-
* for roundtrip conversion, logging, and debugging. But they are not
103-
* guaranteed to be valid UTF-8, and are generally meant to be used internally,
104-
* not externally. When communicating with external programs and libraries that
105-
* require UTF-8, fs::path::u8string() and fs::u8path() methods can be used.
106-
* For other applications, if support for non UTF-8 paths is required, or if
107-
* higher-level JSON or XML or URI or C-style escapes are preferred, it may be
108-
* also be appropriate to use different path encoding functions.
109-
*
110-
* Implementation note: On Windows, the std::filesystem::path(string)
111-
* constructor and std::filesystem::path::string() method are not safe to use
112-
* here, because these methods encode the path using C++'s narrow multibyte
113-
* encoding, which on Windows corresponds to the current "code page", which is
114-
* unpredictable and typically not able to represent all valid paths. So
115-
* std::filesystem::path::u8string() and std::filesystem::u8path() functions
116-
* are used instead on Windows. On POSIX, u8string/u8path functions are not
117-
* safe to use because paths are not always valid UTF-8, so plain string
118-
* methods which do not transform the path there are used.
103+
* Because \ref PathToString and \ref PathFromString functions don't specify an
104+
* encoding, they are meant to be used internally, not externally. They are not
105+
* appropriate to use in applications requiring UTF-8, where
106+
* fs::path::u8string() and fs::u8path() methods should be used instead. Other
107+
* applications could require still different encodings. For example, JSON, XML,
108+
* or URI applications might prefer to use higher level escapes (\uXXXX or
109+
* &XXXX; or %XX) instead of multibyte encoding. Rust, Python, Java applications
110+
* may require encoding paths with their respective UTF-8 derivatives WTF-8,
111+
* PEP-383, and CESU-8 (see https://en.wikipedia.org/wiki/UTF-8#Derivatives).
119112
*/
120113
static inline std::string PathToString(const path& path)
121114
{
115+
// Implementation note: On Windows, the std::filesystem::path(string)
116+
// constructor and std::filesystem::path::string() method are not safe to
117+
// use here, because these methods encode the path using C++'s narrow
118+
// multibyte encoding, which on Windows corresponds to the current "code
119+
// page", which is unpredictable and typically not able to represent all
120+
// valid paths. So std::filesystem::path::u8string() and
121+
// std::filesystem::u8path() functions are used instead on Windows. On
122+
// POSIX, u8string/u8path functions are not safe to use because paths are
123+
// not always valid UTF-8, so plain string methods which do not transform
124+
// the path there are used.
122125
#ifdef WIN32
123126
return path.u8string();
124127
#else

0 commit comments

Comments
 (0)