@@ -172,10 +172,22 @@ defmodule Kernel.SpecialForms do
172
172
iex> << 1, 2, 3 >>
173
173
<< 1, 2, 3 >>
174
174
175
- ## Bitstring types
175
+ ## Types
176
176
177
- A bitstring is made of many segments. Each segment has a
178
- type, which defaults to integer:
177
+ A bitstring is made of many segments and each segment has a
178
+ type. There are 9 types used in bitstrings:
179
+
180
+ - `integer`
181
+ - `float`
182
+ - `bits` (alias for bitstring)
183
+ - `bitstring`
184
+ - `binary`
185
+ - `bytes` (alias for binary)
186
+ - `utf8`
187
+ - `utf16`
188
+ - `utf32`
189
+
190
+ When no type is specified, the default is `integer`:
179
191
180
192
iex> <<1, 2, 3>>
181
193
<<1, 2, 3>>
@@ -186,118 +198,169 @@ defmodule Kernel.SpecialForms do
186
198
iex> <<0, "foo">>
187
199
<<0, 102, 111, 111>>
188
200
189
- Any other type needs to be explicitly tagged. For example,
190
- in order to store a float type in the binary, one has to do:
191
-
192
- iex> <<3.14 :: float>>
193
- <<64, 9, 30, 184, 81, 235, 133, 31>>
194
-
195
- This also means that variables need to be explicitly tagged,
196
- otherwise Elixir defaults to integer:
201
+ Variables or any other type need to be explicitly tagged:
197
202
198
203
iex> rest = "oo"
199
204
iex> <<102, rest>>
200
205
** (ArgumentError) argument error
201
206
202
207
We can solve this by explicitly tagging it as a binary:
203
208
209
+ iex> rest = "oo"
210
+ iex> <<102, rest :: binary>>
211
+ "foo"
212
+
213
+ The utf8, utf16, and utf32 types are for unicode codepoints. They
214
+ can also be applied to literal strings and char lists:
215
+
216
+ iex> <<"foo" :: utf16>>
217
+ <<0, 102, 0, 111, 0, 111>>
218
+ iex> <<"foo" :: utf32>>
219
+ <<0, 0, 0, 102, 0, 0, 0, 111, 0, 0, 0, 111>>
220
+
221
+ ## Options
222
+
223
+ Many options can be given by using `-` as separator. Order is
224
+ arbitrary, so the following are all equivalent:
225
+
226
+ <<102 :: integer-native, rest :: binary>>
227
+ <<102 :: native-integer, rest :: binary>>
228
+ <<102 :: unsigned-big-integer, rest :: binary>>
229
+ <<102 :: unsigned-big-integer-size(8), rest :: binary>>
230
+ <<102 :: unsigned-big-integer-8, rest :: binary>>
231
+ <<102 :: 8-integer-big-unsigned, rest :: binary>>
204
232
<<102, rest :: binary>>
205
233
206
- The type can be integer, float, bitstring/bits, binary/bytes,
207
- utf8, utf16 or utf32, e.g.:
234
+ ### Unit and Size
208
235
209
- <<102 :: float, rest :: binary>>
236
+ The length of the match is equal to the `unit` (a number of bits) times the
237
+ `size` (the number of repeated segnments of length `unit`).
210
238
211
- An integer can be any arbitrary precision integer. A float is an
212
- IEEE 754 binary32 or binary64 floating point number. A bitstring
213
- is an arbitrary series of bits. A binary is a special case of
214
- bitstring that has a total size divisible by 8.
239
+ Type | Default Unit
240
+ --------- | ------------
241
+ `integer` | 1 bit
242
+ `float` | 1 bit
243
+ `binary` | 8 bits
215
244
216
- The utf8, utf16, and utf32 types are for unicode codepoints. They
217
- can also be applied to literal strings and char lists:
245
+ Sizes for types are a bit more nuanced. The default size for integers is 8.
218
246
219
- iex> <<"foo" :: utf16>>
220
- <<0,102,0,111,0,111>>
247
+ For floats, it is 64. For floats, `size * unit` must result in 32 or 64,
248
+ corresponding to [IEEE 754](http://en.wikipedia.org/wiki/IEEE_floating_point)
249
+ binary32 and binary64, respectively.
221
250
222
- The bits type is an alias for bitstring. The bytes type is an
223
- alias for binary.
251
+ For binaries, the default is the size of the binary. Only the last binary in a
252
+ match can use the default size. All others must have their size specified
253
+ explicitly, even if the match is unambiguous. For example:
224
254
225
- The signedness can also be given as signed or unsigned. The
226
- signedness only matters for matching and relevant only for
227
- integers. If unspecified, it defaults to unsigned. Example:
255
+ iex> <<name::binary-size(5), " the ", species::binary>> = <<"Frank the Walrus">>
256
+ "Frank the Walrus"
257
+ iex> {name, species}
258
+ {"Frank", "Walrus"}
228
259
229
- iex> <<-100 :: signed, _rest :: binary>> = <<-100, "foo">>
230
- <<156,102,111,111>>
260
+ Failing to specify the size for the non-last causes compilation to fail:
231
261
232
- This match would have failed if we did not specify that the
233
- value -100 is signed. If we're matching into a variable instead
234
- of a value, the signedness won't be checked; rather, the number
235
- will simply be interpreted as having the given (or implied)
236
- signedness, e.g.:
262
+ <<name::binary, " the ", species::binary>> = <<"Frank the Walrus">>
263
+ ** (CompileError): a binary field without size is only allowed at the end of a binary pattern
237
264
238
- iex> <<val, _rest :: binary>> = <<-100, "foo">>
239
- iex> val
240
- 156
265
+ #### Shortcut Syntax
266
+
267
+ Size and unit can also be specified using a syntax shortcut
268
+ when passing integer values:
241
269
242
- Here, `val` is interpreted as unsigned.
270
+ iex> x = 1
271
+ iex> << x :: 8 >> == << x :: size(8) >>
272
+ true
273
+ iex> << x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
274
+ true
243
275
244
- The endianness of a segment can be big, little or native (the
245
- latter meaning it will be resolved at VM load time). Many options
246
- can be given by using `-` as separator:
276
+ This syntax reflects the fact the effective size is given by
277
+ multiplying the size by the unit.
247
278
248
- <<102 :: integer-native, rest :: binary>>
279
+ ### Modifiers
249
280
250
- Or:
281
+ Some types have associated modifiers to clear up ambiguity in byte
282
+ representation.
251
283
252
- <<102 :: unsigned-big-integer, rest :: binary>>
284
+ Modifier | Relevant Type(s)
285
+ -------------------- | ----------------
286
+ `signed` | `integer`
287
+ `unsigned` (default) | `integer`
288
+ `little` | `integer`, `utf16`, `utf32`
289
+ `big` (default) | `integer`, `utf16`, `utf32`
290
+ `native` | `integer`, `utf16`, `utf32`
253
291
254
- And so on.
292
+ ### Sign
255
293
256
- Endianness only makes sense for integers and some UTF code
257
- point types (utf16 and utf32).
294
+ Integers can be `signed` or `unsigned`, defaulting to `unsigned`.
258
295
259
- Finally, we can also specify size and unit for each segment. The
260
- unit is multiplied by the size to give the effective size of
261
- the segment in bits. The default unit for integers, floats,
262
- and bitstrings is 1. For binaries, it is 8.
296
+ iex> <<int::integer>> = <<-100>>
297
+ <<156>>
298
+ iex> int
299
+ 156
300
+ iex> <<int::integer-signed>> = <<-100>>
301
+ <<156>>
302
+ iex> int
303
+ -100
263
304
264
- Since integers are default, the default unit is 1. The example below
265
- matches because the string "foo" takes 24 bits and we match it
266
- against a segment of 24 bits, 8 of which are taken by the integer
267
- 102 and the remaining 16 bits are specified on the rest.
305
+ `signed` and `unsigned` are only used for matching binaries (see below) and
306
+ are only used for integers.
268
307
269
- iex> <<102 , _rest :: size(16) >> = "foo"
270
- "foo"
308
+ iex> <<-100 :: signed , _rest :: binary >> = <<-100, "foo">>
309
+ <<156, 102, 111, 111>>
271
310
272
- We can also match by specifying size and unit explicitly:
311
+ ### Endianness
273
312
274
- iex> <<102, _rest :: size(2)-unit(8)>> = "foo"
275
- "foo"
313
+ Elixir has three options for endianness: `big`, `little`, and `native`.
314
+ The default is `big`. `native` is determined by the VM at startup.
276
315
277
- However, if we expect a size of 32, it won't match:
316
+ iex> <<number::little-integer-size(16)>> = <<0, 1>>
317
+ <<0, 1>>
318
+ iex> number
319
+ 256
320
+ iex> <<number::big-integer-size(16)>> = <<0, 1>>
321
+ <<0, 1>>
322
+ iex> number
323
+ 1
324
+ iex> <<number::native-integer-size(16)>> = <<0, 1>>
325
+ <<0, 1>>
326
+ iex> number
327
+ 256
278
328
279
- iex> <<102, _rest :: size(32)>> = "foo"
280
- ** (MatchError) no match of right hand side value: "foo"
329
+ ## Binary/Bitstring Matching
281
330
282
- Size and unit are not applicable to utf8, utf16, and utf32.
331
+ Binary matching is a powerful feature in Elixir that is useful for extracting
332
+ information from binaries as well as pattern matching.
283
333
284
- The default size for integers is 8. For floats, it is 64. For
285
- binaries, it is the size of the binary. Only the last binary
286
- in a binary match can use the default size (all others must
287
- have their size specified explicitly).
334
+ Binary matching can be used by itself to extract information from binaries:
288
335
289
- Size and unit can also be specified using a syntax shortcut
290
- when passing integer values:
336
+ iex> <<"Hello, ", place::binary>> = "Hello, World"
337
+ "Hello, World"
338
+ iex> place
339
+ "World"
291
340
292
- << x :: 8 >> == << x :: size(8) >>
293
- << x :: 8 * 4 >> == << x :: size(8)-unit(4) >>
294
- << x :: _ * 4 >> == << x :: unit(4) >>
341
+ Or as a part of function definitions to pattern match:
295
342
296
- This syntax reflects the fact the effective size is given by
297
- multiplying the size by the unit.
343
+ defmodule ImageTyper
344
+ @png_signature <<137::size(8), 80::size(8), 78::size(8), 71::size(8),
345
+ 13::size(8), 10::size(8), 26::size(8), 10::size(8)>>
346
+ @jpg_signature <<255::size(8), 216::size(8)>>
347
+
348
+ def type(<<@png_signature, rest::binary>>), do: :png
349
+ def type(<<@jpg_signature, rest::binary>>), do: :jpg
350
+ def type(_), do :unknown
351
+ end
352
+
353
+ ### Performance & Optimizations
354
+
355
+ The Erlang compiler can provide a number of optimizations on binary creation
356
+ and matching. To see optimization output, set the `bin_opt_info` compiler
357
+ option:
358
+
359
+ ERL_COMPILER_OPTIONS=bin_opt_info mix compile
298
360
299
- For floats, `size * unit` must result in 32 or 64, corresponding
300
- to binary32 and binary64, respectively.
361
+ To learn more about specific optimizations and performance considerations,
362
+ check out
363
+ [Erlang's Efficiency Guide on handling binaries](http://www.erlang.org/doc/efficiency_guide/binaryhandling.html).
301
364
"""
302
365
defmacro unquote ( :<<>> ) ( args )
303
366
0 commit comments