Skip to content

Commit e12a0c5

Browse files
author
José Valim
committed
Backport improvements to <<>>' docs
1 parent 61c3e82 commit e12a0c5

File tree

1 file changed

+140
-77
lines changed

1 file changed

+140
-77
lines changed

lib/elixir/lib/kernel/special_forms.ex

Lines changed: 140 additions & 77 deletions
Original file line numberDiff line numberDiff line change
@@ -172,10 +172,22 @@ defmodule Kernel.SpecialForms do
172172
iex> << 1, 2, 3 >>
173173
<< 1, 2, 3 >>
174174
175-
## Bitstring types
175+
## Types
176176
177-
A bitstring is made of many segments. Each segment has a
178-
type, which defaults to integer:
177+
A bitstring is made of many segments and each segment has a
178+
type. There are 9 types used in bitstrings:
179+
180+
- `integer`
181+
- `float`
182+
- `bits` (alias for bitstring)
183+
- `bitstring`
184+
- `binary`
185+
- `bytes` (alias for binary)
186+
- `utf8`
187+
- `utf16`
188+
- `utf32`
189+
190+
When no type is specified, the default is `integer`:
179191
180192
iex> <<1, 2, 3>>
181193
<<1, 2, 3>>
@@ -186,118 +198,169 @@ defmodule Kernel.SpecialForms do
186198
iex> <<0, "foo">>
187199
<<0, 102, 111, 111>>
188200
189-
Any other type needs to be explicitly tagged. For example,
190-
in order to store a float type in the binary, one has to do:
191-
192-
iex> <<3.14 :: float>>
193-
<<64, 9, 30, 184, 81, 235, 133, 31>>
194-
195-
This also means that variables need to be explicitly tagged,
196-
otherwise Elixir defaults to integer:
201+
Variables or any other type need to be explicitly tagged:
197202
198203
iex> rest = "oo"
199204
iex> <<102, rest>>
200205
** (ArgumentError) argument error
201206
202207
We can solve this by explicitly tagging it as a binary:
203208
209+
iex> rest = "oo"
210+
iex> <<102, rest :: binary>>
211+
"foo"
212+
213+
The utf8, utf16, and utf32 types are for unicode codepoints. They
214+
can also be applied to literal strings and char lists:
215+
216+
iex> <<"foo" :: utf16>>
217+
<<0, 102, 0, 111, 0, 111>>
218+
iex> <<"foo" :: utf32>>
219+
<<0, 0, 0, 102, 0, 0, 0, 111, 0, 0, 0, 111>>
220+
221+
## Options
222+
223+
Many options can be given by using `-` as separator. Order is
224+
arbitrary, so the following are all equivalent:
225+
226+
<<102 :: integer-native, rest :: binary>>
227+
<<102 :: native-integer, rest :: binary>>
228+
<<102 :: unsigned-big-integer, rest :: binary>>
229+
<<102 :: unsigned-big-integer-size(8), rest :: binary>>
230+
<<102 :: unsigned-big-integer-8, rest :: binary>>
231+
<<102 :: 8-integer-big-unsigned, rest :: binary>>
204232
<<102, rest :: binary>>
205233
206-
The type can be integer, float, bitstring/bits, binary/bytes,
207-
utf8, utf16 or utf32, e.g.:
234+
### Unit and Size
208235
209-
<<102 :: float, rest :: binary>>
236+
The length of the match is equal to the `unit` (a number of bits) times the
237+
`size` (the number of repeated segnments of length `unit`).
210238
211-
An integer can be any arbitrary precision integer. A float is an
212-
IEEE 754 binary32 or binary64 floating point number. A bitstring
213-
is an arbitrary series of bits. A binary is a special case of
214-
bitstring that has a total size divisible by 8.
239+
Type | Default Unit
240+
--------- | ------------
241+
`integer` | 1 bit
242+
`float` | 1 bit
243+
`binary` | 8 bits
215244
216-
The utf8, utf16, and utf32 types are for unicode codepoints. They
217-
can also be applied to literal strings and char lists:
245+
Sizes for types are a bit more nuanced. The default size for integers is 8.
218246
219-
iex> <<"foo" :: utf16>>
220-
<<0,102,0,111,0,111>>
247+
For floats, it is 64. For floats, `size * unit` must result in 32 or 64,
248+
corresponding to [IEEE 754](http://en.wikipedia.org/wiki/IEEE_floating_point)
249+
binary32 and binary64, respectively.
221250
222-
The bits type is an alias for bitstring. The bytes type is an
223-
alias for binary.
251+
For binaries, the default is the size of the binary. Only the last binary in a
252+
match can use the default size. All others must have their size specified
253+
explicitly, even if the match is unambiguous. For example:
224254
225-
The signedness can also be given as signed or unsigned. The
226-
signedness only matters for matching and relevant only for
227-
integers. If unspecified, it defaults to unsigned. Example:
255+
iex> <<name::binary-size(5), " the ", species::binary>> = <<"Frank the Walrus">>
256+
"Frank the Walrus"
257+
iex> {name, species}
258+
{"Frank", "Walrus"}
228259
229-
iex> <<-100 :: signed, _rest :: binary>> = <<-100, "foo">>
230-
<<156,102,111,111>>
260+
Failing to specify the size for the non-last causes compilation to fail:
231261
232-
This match would have failed if we did not specify that the
233-
value -100 is signed. If we're matching into a variable instead
234-
of a value, the signedness won't be checked; rather, the number
235-
will simply be interpreted as having the given (or implied)
236-
signedness, e.g.:
262+
<<name::binary, " the ", species::binary>> = <<"Frank the Walrus">>
263+
** (CompileError): a binary field without size is only allowed at the end of a binary pattern
237264
238-
iex> <<val, _rest :: binary>> = <<-100, "foo">>
239-
iex> val
240-
156
265+
#### Shortcut Syntax
266+
267+
Size and unit can also be specified using a syntax shortcut
268+
when passing integer values:
241269
242-
Here, `val` is interpreted as unsigned.
270+
iex> x = 1
271+
iex> << x :: 8 >> == << x :: size(8) >>
272+
true
273+
iex> << x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
274+
true
243275
244-
The endianness of a segment can be big, little or native (the
245-
latter meaning it will be resolved at VM load time). Many options
246-
can be given by using `-` as separator:
276+
This syntax reflects the fact the effective size is given by
277+
multiplying the size by the unit.
247278
248-
<<102 :: integer-native, rest :: binary>>
279+
### Modifiers
249280
250-
Or:
281+
Some types have associated modifiers to clear up ambiguity in byte
282+
representation.
251283
252-
<<102 :: unsigned-big-integer, rest :: binary>>
284+
Modifier | Relevant Type(s)
285+
-------------------- | ----------------
286+
`signed` | `integer`
287+
`unsigned` (default) | `integer`
288+
`little` | `integer`, `utf16`, `utf32`
289+
`big` (default) | `integer`, `utf16`, `utf32`
290+
`native` | `integer`, `utf16`, `utf32`
253291
254-
And so on.
292+
### Sign
255293
256-
Endianness only makes sense for integers and some UTF code
257-
point types (utf16 and utf32).
294+
Integers can be `signed` or `unsigned`, defaulting to `unsigned`.
258295
259-
Finally, we can also specify size and unit for each segment. The
260-
unit is multiplied by the size to give the effective size of
261-
the segment in bits. The default unit for integers, floats,
262-
and bitstrings is 1. For binaries, it is 8.
296+
iex> <<int::integer>> = <<-100>>
297+
<<156>>
298+
iex> int
299+
156
300+
iex> <<int::integer-signed>> = <<-100>>
301+
<<156>>
302+
iex> int
303+
-100
263304
264-
Since integers are default, the default unit is 1. The example below
265-
matches because the string "foo" takes 24 bits and we match it
266-
against a segment of 24 bits, 8 of which are taken by the integer
267-
102 and the remaining 16 bits are specified on the rest.
305+
`signed` and `unsigned` are only used for matching binaries (see below) and
306+
are only used for integers.
268307
269-
iex> <<102, _rest :: size(16)>> = "foo"
270-
"foo"
308+
iex> <<-100 :: signed, _rest :: binary>> = <<-100, "foo">>
309+
<<156, 102, 111, 111>>
271310
272-
We can also match by specifying size and unit explicitly:
311+
### Endianness
273312
274-
iex> <<102, _rest :: size(2)-unit(8)>> = "foo"
275-
"foo"
313+
Elixir has three options for endianness: `big`, `little`, and `native`.
314+
The default is `big`. `native` is determined by the VM at startup.
276315
277-
However, if we expect a size of 32, it won't match:
316+
iex> <<number::little-integer-size(16)>> = <<0, 1>>
317+
<<0, 1>>
318+
iex> number
319+
256
320+
iex> <<number::big-integer-size(16)>> = <<0, 1>>
321+
<<0, 1>>
322+
iex> number
323+
1
324+
iex> <<number::native-integer-size(16)>> = <<0, 1>>
325+
<<0, 1>>
326+
iex> number
327+
256
278328
279-
iex> <<102, _rest :: size(32)>> = "foo"
280-
** (MatchError) no match of right hand side value: "foo"
329+
## Binary/Bitstring Matching
281330
282-
Size and unit are not applicable to utf8, utf16, and utf32.
331+
Binary matching is a powerful feature in Elixir that is useful for extracting
332+
information from binaries as well as pattern matching.
283333
284-
The default size for integers is 8. For floats, it is 64. For
285-
binaries, it is the size of the binary. Only the last binary
286-
in a binary match can use the default size (all others must
287-
have their size specified explicitly).
334+
Binary matching can be used by itself to extract information from binaries:
288335
289-
Size and unit can also be specified using a syntax shortcut
290-
when passing integer values:
336+
iex> <<"Hello, ", place::binary>> = "Hello, World"
337+
"Hello, World"
338+
iex> place
339+
"World"
291340
292-
<< x :: 8 >> == << x :: size(8) >>
293-
<< x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
294-
<< x :: _ * 4 >> == << x :: unit(4) >>
341+
Or as a part of function definitions to pattern match:
295342
296-
This syntax reflects the fact the effective size is given by
297-
multiplying the size by the unit.
343+
defmodule ImageTyper
344+
@png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
345+
13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
346+
@jpg_signature <<255::size(8), 216::size(8)>>
347+
348+
def type(<<@png_signature, rest::binary>>), do: :png
349+
def type(<<@jpg_signature, rest::binary>>), do: :jpg
350+
def type(_), do :unknown
351+
end
352+
353+
### Performance & Optimizations
354+
355+
The Erlang compiler can provide a number of optimizations on binary creation
356+
and matching. To see optimization output, set the `bin_opt_info` compiler
357+
option:
358+
359+
ERL_COMPILER_OPTIONS=bin_opt_info mix compile
298360
299-
For floats, `size * unit` must result in 32 or 64, corresponding
300-
to binary32 and binary64, respectively.
361+
To learn more about specific optimizations and performance considerations,
362+
check out
363+
[Erlang's Efficiency Guide on handling binaries](http://www.erlang.org/doc/efficiency_guide/binaryhandling.html).
301364
"""
302365
defmacro unquote(:<<>>)(args)
303366

0 commit comments

Comments
 (0)