|
8 | 8 | // option. This file may not be copied, modified, or distributed
|
9 | 9 | // except according to those terms.
|
10 | 10 |
|
11 |
| -//! Utilities related to FFI bindings. |
| 11 | +//! This module provides utilities to handle C-like strings. It is |
| 12 | +//! mainly of use for FFI (Foreign Function Interface) bindings and |
| 13 | +//! code that needs to exchange C-like strings with other languages. |
| 14 | +//! |
| 15 | +//! # Overview |
| 16 | +//! |
| 17 | +//! Rust represents owned strings with the [`String`] type, and |
| 18 | +//! borrowed slices of strings with the [`str`] primitive. Both are |
| 19 | +//! always in UTF-8 encoding, and may contain nul bytes in the middle, |
| 20 | +//! i.e. if you look at the bytes that make up the string, there may |
| 21 | +//! be a `0` among them. Both `String` and `str` know their length; |
| 22 | +//! there are no nul terminators at the end of strings like in C. |
| 23 | +//! |
| 24 | +//! C strings are different from Rust strings: |
| 25 | +//! |
| 26 | +//! * **Encodings** - C strings may have different encodings. If |
| 27 | +//! you are bringing in strings from C APIs, you should check what |
| 28 | +//! encoding you are getting. Rust strings are always UTF-8. |
| 29 | +//! |
| 30 | +//! * **Character width** - C strings may use "normal" or "wide" |
| 31 | +//! characters, i.e. `char` or `wchar_t`, respectively. The C |
| 32 | +//! standard leaves the actual sizes of those types open to |
| 33 | +//! interpretation, but defines different APIs for strings made up of |
| 34 | +//! each character type. Rust strings are always UTF-8, so different |
| 35 | +//! Unicode characters will be encoded in a variable number of bytes |
| 36 | +//! each. The Rust type [`char`] represents a '[Unicode |
| 37 | +//! scalar value]', which is similar to, but not the same as, a |
| 38 | +//! '[Unicode code point]'. |
| 39 | +//! |
| 40 | +//! * **Nul terminators and implicit string lengths** - Often, C |
| 41 | +//! strings are nul-terminated, i.e. they have a `0` character at the |
| 42 | +//! end. The length of a string buffer is not known *a priori*; |
| 43 | +//! instead, to compute the length of a string, C code must manually |
| 44 | +//! call a function like `strlen()` for `char`-based strings, or |
| 45 | +//! `wcslen()` for `wchar_t`-based ones. Those functions return the |
| 46 | +//! number of characters in the string excluding the nul terminator, |
| 47 | +//! so the buffer length is really `len+1` characters. Rust strings |
| 48 | +//! don't have a nul terminator, and they always know their length. |
| 49 | +//! |
| 50 | +//! * **No nul characters in the middle of the string** - When C |
| 51 | +//! strings have a nul terminator character, this usually means that |
| 52 | +//! they cannot have nul characters in the middle — a nul character |
| 53 | +//! would essentially truncate the string. Rust strings *can* have |
| 54 | +//! nul characters in the middle, since they don't use nul |
| 55 | +//! terminators. |
| 56 | +//! |
| 57 | +//! # Representations of non-Rust strings |
| 58 | +//! |
| 59 | +//! [`CString`] and [`CStr`] are useful when you need to transfer |
| 60 | +//! UTF-8 strings to and from C, respectively: |
| 61 | +//! |
| 62 | +//! * **From Rust to C:** [`CString`] represents an owned, C-friendly |
| 63 | +//! UTF-8 string: it is valid UTF-8, it is nul-terminated, and has no |
| 64 | +//! nul characters in the middle. Rust code can create a `CString` |
| 65 | +//! out of a normal string (provided that the string doesn't have nul |
| 66 | +//! characters in the middle), and then use a variety of methods to |
| 67 | +//! obtain a raw `*mut u8` that can then be passed as an argument to C |
| 68 | +//! functions. |
| 69 | +//! |
| 70 | +//! * **From C to Rust:** [`CStr`] represents a borrowed C string; it |
| 71 | +//! is what you would use to wrap a raw `*const u8` that you got from |
| 72 | +//! a C function. A `CStr` is just guaranteed to be a nul-terminated |
| 73 | +//! array of bytes; the UTF-8 validation step only happens when you |
| 74 | +//! request to convert it to a `&str`. |
| 75 | +//! |
| 76 | +//! [`OsString`] and [`OsStr`] are useful when you need to transfer |
| 77 | +//! strings to and from operating system calls. If you need Rust |
| 78 | +//! strings out of them, they can take care of conversion to and from |
| 79 | +//! the operating system's preferred form for strings — of course, it |
| 80 | +//! may not be possible to convert all valid operating system strings |
| 81 | +//! into valid UTF-8; the `OsString` and `OsStr` functions let you know |
| 82 | +//! when this is the case. |
| 83 | +//! |
| 84 | +//! * [`OsString`] represents an owned string in whatever |
| 85 | +//! representation the operating system prefers. In the Rust standard |
| 86 | +//! library, various APIs that transfer strings to/from the operating |
| 87 | +//! system use `OsString` instead of plain strings. For example, |
| 88 | +//! [`env::var_os()`] is used to query environment variables; it |
| 89 | +//! returns an `Option<OsString>`. If the environment variable exists |
| 90 | +//! you will get a `Some(os_string)`, which you can *then* try to |
| 91 | +//! convert to a Rust string. This yields a [`Result<>`], so that |
| 92 | +//! your code can detect errors in case the environment variable did |
| 93 | +//! not in fact contain valid Unicode data. |
| 94 | +//! |
| 95 | +//! * [`OsStr`] represents a borrowed reference to a string in a format that |
| 96 | +//! can be passed to the operating system. It can be converted into |
| 97 | +//! an UTF-8 Rust string slice in a similar way to `OsString`. |
| 98 | +//! |
| 99 | +//! [`String`]: ../string/struct.String.html |
| 100 | +//! [`str`]: ../primitive.str.html |
| 101 | +//! [`char`]: ../primitive.char.html |
| 102 | +//! [Unicode scalar value]: http://www.unicode.org/glossary/#unicode_scalar_value |
| 103 | +//! [Unicode code point]: http://www.unicode.org/glossary/#code_point |
| 104 | +//! [`CString`]: struct.CString.html |
| 105 | +//! [`CStr`]: struct.CStr.html |
| 106 | +//! [`OsString`]: struct.OsString.html |
| 107 | +//! [`OsStr`]: struct.OsStr.html |
| 108 | +//! [`env::set_var()`]: ../env/fn.set_var.html |
| 109 | +//! [`env::var_os()`]: ../env/fn.var_os.html |
| 110 | +//! [`Result<>`]: ../result/enum.Result.html |
12 | 111 |
|
13 | 112 | #![stable(feature = "rust1", since = "1.0.0")]
|
14 | 113 |
|
|
0 commit comments