You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: types-grammar/ch1.md
+37-25Lines changed: 37 additions & 25 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -224,27 +224,27 @@ myName = "Kyle";
224
224
225
225
Strings can be delimited by double-quotes (`"`), single-quotes (`'`), or back-ticks (`` ` ``). The ending delimiter must always match the starting delimiter.
226
226
227
-
Strings have an intrinsic length which corresponds to how many code-points they contain.
227
+
Strings have an intrinsic length which corresponds to how many code-points -- actually, code-units, more on that in a moment -- they contain.
228
228
229
229
```js
230
230
myName = "Kyle";
231
231
232
232
myName.length; // 4
233
233
```
234
234
235
-
This does not necessarily correspond to the number of visible characters you type between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
235
+
This does not necessarily correspond to the number of visible characters present between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
236
236
237
237
#### JS Character Encodings
238
238
239
239
What type of character encoding does JS use for string characters?
240
240
241
241
One might assume UTF-8 (8-bit) or UTF-16 (16-bit). It's actually more complicated, because you also need to consider UCS-2 (2-byte Universal Character Set), which is similar to UTF-16, but not quite the same. [^UTFUCS]
242
242
243
-
The first 65,535 code points in Unicode is called the BMP (Basic Multilingual Plane). All the rest of the code points are grouped into 16 so called "supplemental planes" or "astral planes". When representing Unicode characters from the BMP, it's pretty straightforward.
243
+
The first group of 65,535 code points in Unicode is called the BMP (Basic Multilingual Plane). All the rest of the code points are grouped into 16 so called "supplemental planes" or "astral planes". When representing Unicode characters from the BMP, it's pretty straightforward, as they can *fit* neatly into single JS characters.
244
244
245
245
But when representing extended characters outside the BMP, JS actually represents these characters code-points as a pairing of two separate code units, called *surrogate halves*.
246
246
247
-
For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fireworks symbol). JS stores this as two surrogatehalve code units, `U+D83C` and `U+DF86`.
247
+
For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fireworks symbol). JS stores this in the string value as two surrogate-halve code units: `U+D83C` and `U+DF86`.
248
248
249
249
This has implications on the length of strings, because a single visible character like the `🎆` fireworks symbol, when in a JS string, is a counted as 2 characters for the purposes of the string length!
If `"` or `'` are used to delimit a string literal, the contents are only parsed for *character-escape sequences*: `\` followed by one or more characters that JS recognizes and parses with special meaning. Any other characters in a string that don't parse as escape-sequences (single-character or multi-character), are inserted as-is into the string value.
256
256
257
-
For single-character escape sequences, the following characters are recognized after a `\`: `bfnrtv0'"\`. For example, `\n` (new-line), `\t` (tab), etc.
257
+
For single-character escape sequences, the following characters are recognized after a `\`: `b`, `f`, `n`, `r`, `t`, `v`, `0`, `'`, `"`, and `\`. For example, `\n` means new-line, `\t` means tab, etc.
258
258
259
-
If a `\` is followed by any other character (except `x` and `u` -- explained below), like for example `\g`, such a sequence is parsed as just the literal character itself (`g`), dropping the preceding `\`.
259
+
If a `\` is followed by any other character (except `x` and `u` -- explained below), like for example `\k`, that sequence is interpted as the `\` being an unnecessary escape, which is thus dropped, leaving just the literal character itself (`k`).
260
260
261
-
If you want to include a `"` in the middle of a `"`-delimited string literal, use the `\"` escape sequence. Similarly, if you're including a `'` character in the middle of a `'`-delimited string literal, use the `\'` escape sequence. By contrast, a `'` does *not* need to be escaped inside a `"`-delimited string, nor vice versa.
261
+
To include a `"` in the middle of a `"`-delimited string literal, use the `\"` escape sequence. Similarly, if you're including a `'` character in the middle of a `'`-delimited string literal, use the `\'` escape sequence. By contrast, a `'` does *not* need to be escaped inside a `"`-delimited string, nor vice versa.
262
262
263
263
```js
264
-
myName = "Kyle Simpson (aka, \"getify\")";
264
+
myTitle = "Kyle Simpson (aka, \"getify\"), former O'Reilly author";
265
265
266
-
console.log(myName);
267
-
// Kyle Simpson (aka, "getify")
266
+
console.log(myTitle);
267
+
// Kyle Simpson (aka, "getify"), former O'Reilly author
268
268
```
269
269
270
-
To include a literal `\` backslash character in a string literal, use the `\\` (two backslashes) character-escape sequence. So, then... what would `\\\` (three backslashes) parse as? The first two `\`'s would be a `\\` escape sequence, thereby inserting just a single `\` character in the string value. The remaining third `\` would just escape whatever character comes immediately after it.
270
+
In text, forward slash `/` is most common. But occasionally, you need a backward slash `\`. To include a literal `\` backslash character without it performing as the start of a character-escape sequence, use the `\\` (double backslashes).
271
+
272
+
So, then... what would `\\\` (three backslashes) in a string parse as? The first two `\`'s would be a `\\` escape sequence, thereby inserting just a single `\` character in the string value, and the remaining `\` would just escape whatever character comes immediately after it.
273
+
274
+
One place backslashes show up commonly is in Windows file paths, which use the `\` separator instead of the `/` separator used in linux/unix style paths:
271
275
272
276
```js
273
-
windowsDriveLocation=
274
-
"C:\\\"Program Files\\Common Files\\\"";
277
+
windowsFontsPath=
278
+
"C:\\Windows\\Fonts\\";
275
279
276
-
console.log(windowsDriveLocation);
277
-
// C:\"Program Files\Common Files\"
280
+
console.log(windowsFontsPath);
281
+
// C:\Windows\Fonts\"
278
282
```
279
283
280
284
| TIP: |
281
285
| :--- |
282
-
| What about four backslashes `\\\\` in a string literal? Well, that's just two `\\` escape sequences next to each other, so it results in two adjacent backslashes (`\\`) in the underlying string value. If you're paying attention, you'll see there's an odd/even pattern rule here. You should thus be able to deciper any odd (`\\\\\`, `\\\\\\\\\`, etc) or even (`\\\\\\`, `\\\\\\\\\\`, etc) number of backslashes in a string literal. |
286
+
| What about four backslashes `\\\\` in a string literal? Well, that's just two `\\` escape sequences next to each other, so it results in two adjacent backslashes (`\\`) in the underlying string value. You might recognize there's an odd/even rule pattern at play. You should thus be able to deciper any odd (`\\\\\`, `\\\\\\\\\`, etc) or even (`\\\\\\`, `\\\\\\\\\\`, etc) number of backslashes in a string literal. |
283
287
284
288
#### Multi-Character Escapes
285
289
286
-
Multi-character escape sequences may be hexadecimal or unicode sequences.
290
+
Multi-character escape sequences may be hexadecimal or Unicode sequences.
287
291
288
292
Hexidecimal escape sequences are used to encode any of the base ASCII characters (codes 0-255), and look like `\x` followed by exactly two hexadecimal characters (`0-9` and `a-f` / `A-F` -- case insensitive). For example, `A9` or `a9` are decimal value `169`, which corresponds to:
289
293
@@ -301,13 +305,13 @@ For any normal character that can be typed on a keyboard, such as `"a"`, it's us
When any character-escape sequence (regardless of length) is recognized, the single character it represents is inserted into the string, rather than the original separate characters. So, in the string `"\u263A"`, there's only one (smiley) character, not six individual characters.
307
311
308
-
Unicode code-points can go well above `65535` (`FFFF` in hexadecimal), up to a maximum of `1114111` (`10FFFF` in hexadecimal). For example, `1F4A9` is decimal code-point `128169`, which corresponds to the funny `💩` (pile of poo) character.
312
+
Unicode code-points can go well above `65535` (`FFFF` in hexadecimal), up to a maximum of `1114111` (`10FFFF` in hexadecimal). For example, `1F4A9`(or `1f4a9`)is decimal code-point `128169`, which corresponds to the funny `💩` (pile of poo) symbol.
309
313
310
-
But `"\u1F4A9"` wouldn't work as expected, since it would be parsed as `\u1F4A` as a unicode escape sequence, followed by just the`9`literal character. To address this limitation, a variation of unicode escape sequences was introduced in ES6, to allow an arbitrary number of hexadecimal characters after the `\u`, by surrounding them with `{ .. }` curly braces:
314
+
But `\u1F4A9` wouldn't work to include this character in a string, since it would be parsed as the Unicode escape sequence`\u1F4A`, followed by a literal`9` character. To address this limitation, a variation of Unicode escape sequences was introduced to allow an arbitrary number of hexadecimal characters after the `\u`, by surrounding them with `{ .. }` curly braces:
311
315
312
316
```js
313
317
myReaction ="\u{1F4A9}";
@@ -316,7 +320,7 @@ console.log(myReaction);
316
320
// 💩
317
321
```
318
322
319
-
Recall the earlier discussion of extended (non-BMP) Unicode characters, *surrogate halves*? The same `💩` could also be defined with the explicit code-units:
323
+
Recall the earlier discussion of extended (non-BMP) Unicode characters and *surrogate halves*? The same `💩` could also be defined with the two explicit code-units:
320
324
321
325
```js
322
326
myReaction ="\uD83D\uDCA9";
@@ -325,14 +329,14 @@ console.log(myReaction);
325
329
// 💩
326
330
```
327
331
328
-
All three representations of this same character are stored internally by JS identically and are indistinguishable:
332
+
All three representations of this same character are stored internally by JS identically, and are indistinguishable:
329
333
330
334
```js
331
335
"💩"==="\u{1F4A9}"; // true
332
336
"\u{1F4A9}"==="\uD83D\uDCA9"; // true
333
337
```
334
338
335
-
Even though JS doesn't care which way such a character is represented, consider the readability differences carefully when authoring your code.
339
+
Even though JS doesn't care which way such a character is represented in your program, consider the readability differences carefully when authoring your code.
336
340
337
341
#### Line Continuation
338
342
@@ -380,11 +384,19 @@ Everything between the `{ .. }` in such a template literal is an arbitrary JS ex
380
384
| :--- |
381
385
| This feature is commonly called "template literals" or "template strings", but I think that's confusing. "Template" is usually referred to in programming contexts as a reusable definition that can be re-evaluated with different data. For example, *template engines* for pages, email templates for newsletter campaigns, etc. This JS feature is not re-usable. It's a literal, and it produces a single, immediate value (usually a string). You can put such a value in a function, and call the function multiple times. But then the function is acting as the template, not the the literal itself. I prefer instead to refer to this feature as *interpolated literals*, or the funny, shortened *interpoliterals*, as I think this name is more accurately descriptive. |
382
386
383
-
Some JS developers believe that this style of string literal is preferable to use for *all* strings, even if you're not doing any expression interpolation. I disagree. I think it should only be used when interpolating, and classic `".."` or `'..'` delimited strings should be used for non-interpolated string definitions.
387
+
Template literals usually result in a string value, but not always. A form of template literal that may look kind of strange is called a *tagged template literal*:
388
+
389
+
```js
390
+
price = formatCurrency`The cost is: ${totalCost}`;
391
+
```
392
+
393
+
Here, `formatCurrency` is a tag applied to the template literal value, which actually invokes `formatCurrency(..)` as a function, passing it the string literals and interpolated expressions parsed from the value. This function can then assemble those in any way it sees fit -- such as formatting a `number` value as currency in the current locale -- and return whatever value, string or otherwise, that it wants. So tagged template literals are not always strings. But untagged template literals will always be strings.
394
+
395
+
Some JS developers believe that untagged template literal strings are preferable to use for *all* strings, even if not doing any expression interpolation. I disagree. I think it should only be used when interpolating, and classic `".."` or `'..'` delimited strings should be used for non-interpolated string definitions.
384
396
385
397
Moreover, there are a few places where `` `..` `` style strings are disallowed. For example, the `"use strict"` pragma cannot use back-ticks, or the pragma will be silently ignored (and thus the program accidentally runs in non-strict mode). Also, this style of strings cannot be used in quoted property names of object literals, or in the ES Module `import .. from ..` module-specifier clause.
386
398
387
-
My advice: use `` `..` `` delimited strings where allowed, and only when interpolation is needed, but keep using `".."` or `'..'` delimited strings for all other strings.
399
+
My advice: use `` `..` `` delimited strings where allowed, but only when interpolation is needed, and keep using `".."` or `'..'` delimited strings for all other strings.
0 commit comments