Skip to content

Commit 5451b72

Browse files
Expand the introduction to the ffi module.
We describe the representation of C strings, and the purpose of OsString/OsStr. Part of #29354
1 parent ee409a4 commit 5451b72

File tree

1 file changed

+100
-1
lines changed

1 file changed

+100
-1
lines changed

src/libstd/ffi/mod.rs

Lines changed: 100 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,106 @@
88
// option. This file may not be copied, modified, or distributed
99
// except according to those terms.
1010

11-
//! Utilities related to FFI bindings.
11+
//! This module provides utilities to handle C-like strings. It is
12+
//! mainly of use for FFI (Foreign Function Interface) bindings and
13+
//! code that needs to exchange C-like strings with other languages.
14+
//!
15+
//! # Overview
16+
//!
17+
//! Rust represents owned strings with the [`String`] type, and
18+
//! borrowed slices of strings with the [`str`] primitive. Both are
19+
//! always in UTF-8 encoding, and may contain nul bytes in the middle,
20+
//! i.e. if you look at the bytes that make up the string, there may
21+
//! be a `0` among them. Both `String` and `str` know their length;
22+
//! there are no nul terminators at the end of strings like in C.
23+
//!
24+
//! C strings are different from Rust strings:
25+
//!
26+
//! * **Encodings** - C strings may have different encodings. If
27+
//! you are bringing in strings from C APIs, you should check what
28+
//! encoding you are getting. Rust strings are always UTF-8.
29+
//!
30+
//! * **Character width** - C strings may use "normal" or "wide"
31+
//! characters, i.e. `char` or `wchar_t`, respectively. The C
32+
//! standard leaves the actual sizes of those types open to
33+
//! interpretation, but defines different APIs for strings made up of
34+
//! each character type. Rust strings are always UTF-8, so different
35+
//! Unicode characters will be encoded in a variable number of bytes
36+
//! each. The Rust type [`char`] represents a '[Unicode
37+
//! scalar value]', which is similar to, but not the same as, a
38+
//! '[Unicode code point]'.
39+
//!
40+
//! * **Nul terminators and implicit string lengths** - Often, C
41+
//! strings are nul-terminated, i.e. they have a `0` character at the
42+
//! end. The length of a string buffer is not known *a priori*;
43+
//! instead, to compute the length of a string, C code must manually
44+
//! call a function like `strlen()` for `char`-based strings, or
45+
//! `wcslen()` for `wchar_t`-based ones. Those functions return the
46+
//! number of characters in the string excluding the nul terminator,
47+
//! so the buffer length is really `len+1` characters. Rust strings
48+
//! don't have a nul terminator, and they always know their length.
49+
//!
50+
//! * **No nul characters in the middle of the string** - When C
51+
//! strings have a nul terminator character, this usually means that
52+
//! they cannot have nul characters in the middle — a nul character
53+
//! would essentially truncate the string. Rust strings *can* have
54+
//! nul characters in the middle, since they don't use nul
55+
//! terminators.
56+
//!
57+
//! # Representations of non-Rust strings
58+
//!
59+
//! [`CString`] and [`CStr`] are useful when you need to transfer
60+
//! UTF-8 strings to and from C, respectively:
61+
//!
62+
//! * **From Rust to C:** [`CString`] represents an owned, C-friendly
63+
//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no
64+
//! nul characters in the middle. Rust code can create a `CString`
65+
//! out of a normal string (provided that the string doesn't have nul
66+
//! characters in the middle), and then use a variety of methods to
67+
//! obtain a raw `*mut u8` that can then be passed as an argument to C
68+
//! functions.
69+
//!
70+
//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it
71+
//! is what you would use to wrap a raw `*const u8` that you got from
72+
//! a C function. A `CStr` is just guaranteed to be a nul-terminated
73+
//! array of bytes; the UTF-8 validation step only happens when you
74+
//! request to convert it to a `&str`.
75+
//!
76+
//! [`OsString`] and [`OsStr`] are useful when you need to transfer
77+
//! strings to and from operating system calls. If you need Rust
78+
//! strings out of them, they can take care of conversion to and from
79+
//! the operating system's preferred form for strings — of course, it
80+
//! may not be possible to convert all valid operating system strings
81+
//! into valid UTF-8; the `OsString` and `OsStr` functions let you know
82+
//! when this is the case.
83+
//!
84+
//! * [`OsString`] represents an owned string in whatever
85+
//! representation the operating system prefers. In the Rust standard
86+
//! library, various APIs that transfer strings to/from the operating
87+
//! system use `OsString` instead of plain strings. For example,
88+
//! [`env::var_os()`] is used to query environment variables; it
89+
//! returns an `Option<OsString>`. If the environment variable exists
90+
//! you will get a `Some(os_string)`, which you can *then* try to
91+
//! convert to a Rust string. This yields a [`Result<>`], so that
92+
//! your code can detect errors in case the environment variable did
93+
//! not in fact contain valid Unicode data.
94+
//!
95+
//! * [`OsStr`] represents a borrowed reference to a string in a format that
96+
//! can be passed to the operating system. It can be converted into
97+
//! an UTF-8 Rust string slice in a similar way to `OsString`.
98+
//!
99+
//! [`String`]: ../string/struct.String.html
100+
//! [`str`]: ../primitive.str.html
101+
//! [`char`]: ../primitive.char.html
102+
//! [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value
103+
//! [Unicode code point]: http://www.unicode.org/glossary/#code_point
104+
//! [`CString`]: struct.CString.html
105+
//! [`CStr`]: struct.CStr.html
106+
//! [`OsString`]: struct.OsString.html
107+
//! [`OsStr`]: struct.OsStr.html
108+
//! [`env::set_var()`]: ../env/fn.set_var.html
109+
//! [`env::var_os()`]: ../env/fn.var_os.html
110+
//! [`Result<>`]: ../result/enum.Result.html
12111
13112
#![stable(feature = "rust1", since = "1.0.0")]
14113

0 commit comments

Comments
 (0)