Skip to content

Commit 0f91e24

Browse files
author
José Valim
committed
Improve documentation for the String module
1 parent 7e9c5c9 commit 0f91e24

File tree

1 file changed

+32
-8
lines changed

1 file changed

+32
-8
lines changed

lib/elixir/lib/string.ex

Lines changed: 32 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,34 @@
11
defmodule String do
22
@moduledoc %B"""
3-
A string in Elixir is a UTF-8 encoded binary.
3+
A String in Elixir is a UTF-8 encoded binary.
4+
5+
## String and binary operations
46
57
The functions in this module act according to the
6-
Unicode Standard, version 6.2.0. A codepoint is a
7-
Unicode Character, which may be represented by one
8-
or more bytes. For example, the character "é" is
9-
represented with two bytes:
8+
Unicode Standard, version 6.2.0. For example,
9+
`titlecase`, `downcase`, `strip` are provided by this
10+
module.
11+
12+
Besides this module, Elixir provides more low-level
13+
operations that works directly with binaries. Some
14+
of those can be found in the `Kernel` module, as:
15+
16+
* `binary_part/2` and `binary_part/3` - retrieves part of the binary
17+
* `bit_size/1` and `byte_size/1` - size related functions
18+
* `is_bitstring/1` and `is_binary/1` - type checking function
19+
* Plus a bunch of conversion functions, like `binary_to_atom/2`,
20+
`binary_to_integer/2`, `binary_to_term/1` and their opposite
21+
like `integer_to_binary/2`
22+
23+
Finally, [the `:binary` module](http://erlang.org/doc/man/binary.html)
24+
provides a couple other functions that works on the byte level.
25+
26+
## Codepoints and graphemes
27+
28+
As per the Unicode Standard, a codepoint is an Unicode
29+
Character, which may be represented by one or more bytes.
30+
For example, the character "é" is represented with two
31+
bytes:
1032
1133
string = "é"
1234
#=> "é"
@@ -40,7 +62,7 @@ defmodule String do
4062
## Integer codepoints
4163
4264
Although codepoints could be represented as integers, this
43-
module represents all codepoints as binaries. For example:
65+
module represents all codepoints as strings. For example:
4466
4567
String.codepoints "josé" #=> ["j", "o", "s", "é"]
4668
@@ -72,8 +94,10 @@ defmodule String do
7294
characters. For example, `String.length` is going to return
7395
a correct result even if an invalid codepoint is fed into it.
7496
75-
In the future, bang version of such functions may be
76-
provided which will rather raise on such invalid data.
97+
In other words, this module expects invalid data to be detected
98+
when retrieving data from the external source. For example, a
99+
driver that reads strings from a database will be the one
100+
responsible to check the validity of the encoding.
77101
"""
78102

79103
@type t :: binary

0 commit comments

Comments
 (0)