You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
@@ -9,7 +9,7 @@ In Chapter 1 of the "Objects & Classes" book of this series, we confronted the c
9
9
10
10
Here, we'll look at the core value types of JS, specifically the non-object types called *primitives*.
11
11
12
-
## Core Values
12
+
## Built-in Values
13
13
14
14
JS provides seven built-in, primitive (non-object) value types:
15
15
@@ -198,11 +198,35 @@ myName = "Kyle";
198
198
199
199
Strings can be delimited by double-quotes (`"`), single-quotes (`'`), or back-ticks (`` ` ``). The ending delimiter must always match the starting delimiter.
200
200
201
-
Strings have an intrinsic length which corresponds to how many code-points they contain. This does not necessarily correspond to the number of visible characters you type between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
201
+
Strings have an intrinsic length which corresponds to how many code-points they contain.
202
202
203
-
If `"` or `'` are used to delimit a string literal, the contents are only parsed for *character-escape sequences*: `\` followed by one or more characters that JS recognizes and parses with special meaning. Any other characters in a string that don't parse as escape-sequences (single-character or multi-character), are inserted as-is into the string value.
203
+
```js
204
+
myName = "Kyle";
205
+
206
+
myName.length; // 4
207
+
```
208
+
209
+
This does not necessarily correspond to the number of visible characters you type between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
210
+
211
+
#### JS Character Encodings
212
+
213
+
What type of character encoding does JS use for string characters?
204
214
205
-
#### Single-Character Escapes
215
+
One might assume UTF-8 (8-bit) or UTF-16 (16-bit). It's actually more complicated, because you also need to consider UCS-2 (2-byte Universal Character Set), which is similar to UTF-16, but not quite the same. [^UTFUCS]
216
+
217
+
The first 65,535 code points in Unicode is called the BMP (Basic Multilingual Plane). All the rest of the code points are grouped into 16 so called "supplemental planes" or "astral planes". When representing Unicode characters from the BMP, it's pretty straightforward.
218
+
219
+
But when representing extended characters outside the BMP, JS actually represents these characters code-points as a pairing of two separate code units, called *surrogate halves*.
220
+
221
+
For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fireworks symbol). JS stores this as two surrogate halve code units, `U+D83C` and `U+DF86`.
222
+
223
+
This has implications on the length of strings, because a single visible character like the `🎆` fireworks symbol, when in a JS string, is a counted as 2 characters for the purposes of the string length!
224
+
225
+
We'll revisit Unicode characters shortly.
226
+
227
+
#### Escape Sequences
228
+
229
+
If `"` or `'` are used to delimit a string literal, the contents are only parsed for *character-escape sequences*: `\` followed by one or more characters that JS recognizes and parses with special meaning. Any other characters in a string that don't parse as escape-sequences (single-character or multi-character), are inserted as-is into the string value.
206
230
207
231
For single-character escape sequences, the following characters are recognized after a `\`: `bfnrtv0'"\`. For example, `\n` (new-line), `\t` (tab), etc.
Hexidecimal escape sequences are used to encode any of the base ASCII characters (codes 0-255), and look like `\x` followed by exactly two hexadecimal characters (`0-9` and `a-f` / `A-F` -- case insensitive). For example, `A9`or `a9` are decimal value `169`, which corresponds to:
For any normal character that can be typed on a keyboard, such as `"a"`, it's usually most readable to just specify the literal character, as opposed to a more obfuscated hexadecimal representation:
When any character-escape sequence (regardless of length) is recognized, the single character it represents is inserted into the string, rather than the original separate characters. So, in the string `"\u263A"`, there's only one (smiley) character, not six individual characters.
243
281
282
+
Unicode code-points can go well above `65535` (`FFFF` in hexadecimal), up to a maximum of `1114111` (`10FFFF` in hexadecimal). For example, `1F4A9` is decimal code-point `128169`, which corresponds to the funny `💩` (pile of poo) character.
283
+
284
+
But `"\u1F4A9"` wouldn't work as expected, since it would be parsed as `\u1F4A` as a unicode escape sequence, followed by just the `9` literal character. To address this limitation, a variation of unicode escape sequences was introduced in ES6, to allow an arbitrary number of hexadecimal characters after the `\u`, by surrounding them with `{ .. }` curly braces:
285
+
286
+
```js
287
+
myReaction ="\u{1F4A9}";
288
+
289
+
console.log(myReaction);
290
+
// 💩
291
+
```
292
+
293
+
Recall the earlier discussion of extended (non-BMP) Unicode characters, *surrogate halves*? The same `💩` could also be defined with the explicit code-units:
294
+
295
+
```js
296
+
myReaction ="\uD83D\uDCA9";
297
+
298
+
console.log(myReaction);
299
+
// 💩
300
+
```
301
+
302
+
All three representations of this same character are stored internally by JS identically and are indistinguishable:
303
+
304
+
```js
305
+
"💩"==="\u{1F4A9}"; // true
306
+
"\u{1F4A9}"==="\uD83D\uDCA9"; // true
307
+
```
308
+
309
+
Even though JS doesn't care which way such a character is represented, consider the readability differences carefully when authoring your code.
310
+
244
311
#### Line Continuation
245
312
246
313
The `\` followed by an actual new-line character (not just literal `n`) is a special case, and it creates what's called a line-continuation:
@@ -250,15 +317,48 @@ greeting = "Hello \
250
317
Friends!";
251
318
252
319
console.log(greeting);
253
-
// Hello
254
-
// Friends!
320
+
// Hello Friends!
255
321
```
256
322
257
-
As you can see, the new-line at the end of the `greeting =` line is immediately preceded by a `\`, which allows this string literal to continue onto the subsequent line. Without the escaping `\` before it, a new-line appearing in a `"` or `'` delimited string literal would actually produce a JS syntax parsing error.
323
+
As you can see, the new-line at the end of the `greeting =` line is immediately preceded by a `\`, which allows this string literal to continue onto the subsequent line. Without the escaping `\` before it, a new-line -- the actual new-line, not the `\n` character escape sequence -- appearing in a `"` or `'` delimited string literal would actually produce a JS syntax parsing error.
258
324
259
-
The new-line itself is still in the string value.
325
+
Because the end-of-line `\` turns the new-line character into a line continuation, the new-line character is omitted from the string, as shown by the `console.log(..)` output.
260
326
261
-
// TODO
327
+
| NOTE: |
328
+
| :--- |
329
+
| This line-continuation feature is often referred to as "multi-line strings", but I think that's a confusing label. As you can see, the string value itself doesn't have multiple lines, it only was defined across multiple lines via the line continuations. A multi-line string would actually have multiple lines in the underlying value. |
330
+
331
+
#### Template Literals
332
+
333
+
I mentioned earlier that strings can alternately be delimited with `` `..` `` back-ticks:
334
+
335
+
```js
336
+
myName = `Kyle`;
337
+
```
338
+
339
+
All the same rules for character encodings, character escape sequences, and lengths apply to these types of strings.
340
+
341
+
However, the contents of these template (string) literals are additionally parsed for a special delimiter sequence `${ .. }`, which marks an expression to evaluate and interpolate into the string value at that location:
342
+
343
+
```js
344
+
myName = `Kyle`;
345
+
346
+
greeting = `Hello, ${myName}!`;
347
+
348
+
console.log(greeting); // Hello, Kyle!
349
+
```
350
+
351
+
Everything between the `{ .. }` in such a template literal is an arbitrary JS expression. It can be simple variables, or complex JS programs, or anything in between.
352
+
353
+
| TIP: |
354
+
| :--- |
355
+
| This feature is commonly called "template literals" or "template strings", but I think that's confusing. "Template" is usually referred to in programming contexts as a reusable definition that can be re-evaluated with different data. For example, *template engines* for pages, email templates for newsletter campaigns, etc. This JS feature is not re-usable. It's a literal, and it produces a single, immediate value (usually a string). You can put such a value in a function, and call the function multiple times. But then the function is acting as the template, not the the literal itself. I prefer instead to refer to this feature as *interpolated literals*, or the funny, shortened *interpoliterals*, as I think this name is more accurately descriptive. |
356
+
357
+
Some JS developers believe that this style of string literal is preferable to use for *all* strings, even if you're not doing any expression interpolation. I disagree. I think it should only be used when interpolating, and classic `".."` or `'..'` delimited strings should be used for non-interpolated string definitions.
358
+
359
+
Moreover, there are a few places where `` `..` `` style strings are disallowed. For example, the `"use strict"` pragma cannot use back-ticks, or the pragma will be silently ignored (and thus the program accidentally runs in non-strict mode). Also, this style of strings cannot be used in quoted property names of object literals, or in the ES Module `import .. from ..` module-specifier clause.
360
+
361
+
My advice: use `` `..` `` delimited strings where allowed, and only when interpolation is needed, but keep using `".."` or `'..'` delimited strings for all other strings.
262
362
263
363
### Number Values
264
364
@@ -560,4 +660,6 @@ Here, the `myAge` and `yourAge` variables each have their own copy of the number
560
660
561
661
// TODO
562
662
563
-
[^IEEE754]: IEEE-754; https://en.wikipedia.org/wiki/IEEE_754; Accessed July 2022
663
+
[^UTFUCS]: "JavaScript’s internal character encoding:UCS-2 or UTF-16?"; Mathias Bynens; January 20 2012; https://mathiasbynens.be/notes/javascript-encoding ; Accessed July 2022
664
+
665
+
[^IEEE754]: "IEEE-754"; https://en.wikipedia.org/wiki/IEEE_754 ; Accessed July 2022
0 commit comments