Skip to content

Commit c2d2aef

Browse files
committed
types-grammar, ch1: clarifications to strings discussion
1 parent 4c2a730 commit c2d2aef

File tree

1 file changed

+37
-25
lines changed

1 file changed

+37
-25
lines changed

types-grammar/ch1.md

Lines changed: 37 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -224,27 +224,27 @@ myName = "Kyle";
224224
225225
Strings can be delimited by double-quotes (`"`), single-quotes (`'`), or back-ticks (`` ` ``). The ending delimiter must always match the starting delimiter.
226226
227-
Strings have an intrinsic length which corresponds to how many code-points they contain.
227+
Strings have an intrinsic length which corresponds to how many code-points -- actually, code-units, more on that in a moment -- they contain.
228228
229229
```js
230230
myName = "Kyle";
231231
232232
myName.length; // 4
233233
```
234234
235-
This does not necessarily correspond to the number of visible characters you type between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
235+
This does not necessarily correspond to the number of visible characters present between the start and end delimiters (aka, the string literal). It can sometimes be a little confusing to keep straight the difference between a string literal and the underlying string value, so pay close attention.
236236
237237
#### JS Character Encodings
238238
239239
What type of character encoding does JS use for string characters?
240240
241241
One might assume UTF-8 (8-bit) or UTF-16 (16-bit). It's actually more complicated, because you also need to consider UCS-2 (2-byte Universal Character Set), which is similar to UTF-16, but not quite the same. [^UTFUCS]
242242
243-
The first 65,535 code points in Unicode is called the BMP (Basic Multilingual Plane). All the rest of the code points are grouped into 16 so called "supplemental planes" or "astral planes". When representing Unicode characters from the BMP, it's pretty straightforward.
243+
The first group of 65,535 code points in Unicode is called the BMP (Basic Multilingual Plane). All the rest of the code points are grouped into 16 so called "supplemental planes" or "astral planes". When representing Unicode characters from the BMP, it's pretty straightforward, as they can *fit* neatly into single JS characters.
244244
245245
But when representing extended characters outside the BMP, JS actually represents these characters code-points as a pairing of two separate code units, called *surrogate halves*.
246246
247-
For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fireworks symbol). JS stores this as two surrogate halve code units, `U+D83C` and `U+DF86`.
247+
For example, the Unicode code point `127878` (hexadecimal `1F386`) is `🎆` (fireworks symbol). JS stores this in the string value as two surrogate-halve code units: `U+D83C` and `U+DF86`.
248248
249249
This has implications on the length of strings, because a single visible character like the `🎆` fireworks symbol, when in a JS string, is a counted as 2 characters for the purposes of the string length!
250250
@@ -254,36 +254,40 @@ We'll revisit Unicode characters shortly.
254254
255255
If `"` or `'` are used to delimit a string literal, the contents are only parsed for *character-escape sequences*: `\` followed by one or more characters that JS recognizes and parses with special meaning. Any other characters in a string that don't parse as escape-sequences (single-character or multi-character), are inserted as-is into the string value.
256256

257-
For single-character escape sequences, the following characters are recognized after a `\`: `bfnrtv0'"\`. For example, `\n` (new-line), `\t` (tab), etc.
257+
For single-character escape sequences, the following characters are recognized after a `\`: `b`, `f`, `n`, `r`, `t`, `v`, `0`, `'`, `"`, and `\`. For example, `\n` means new-line, `\t` means tab, etc.
258258
259-
If a `\` is followed by any other character (except `x` and `u` -- explained below), like for example `\g`, such a sequence is parsed as just the literal character itself (`g`), dropping the preceding `\`.
259+
If a `\` is followed by any other character (except `x` and `u` -- explained below), like for example `\k`, that sequence is interpted as the `\` being an unnecessary escape, which is thus dropped, leaving just the literal character itself (`k`).
260260
261-
If you want to include a `"` in the middle of a `"`-delimited string literal, use the `\"` escape sequence. Similarly, if you're including a `'` character in the middle of a `'`-delimited string literal, use the `\'` escape sequence. By contrast, a `'` does *not* need to be escaped inside a `"`-delimited string, nor vice versa.
261+
To include a `"` in the middle of a `"`-delimited string literal, use the `\"` escape sequence. Similarly, if you're including a `'` character in the middle of a `'`-delimited string literal, use the `\'` escape sequence. By contrast, a `'` does *not* need to be escaped inside a `"`-delimited string, nor vice versa.
262262

263263
```js
264-
myName = "Kyle Simpson (aka, \"getify\")";
264+
myTitle = "Kyle Simpson (aka, \"getify\"), former O'Reilly author";
265265
266-
console.log(myName);
267-
// Kyle Simpson (aka, "getify")
266+
console.log(myTitle);
267+
// Kyle Simpson (aka, "getify"), former O'Reilly author
268268
```
269269

270-
To include a literal `\` backslash character in a string literal, use the `\\` (two backslashes) character-escape sequence. So, then... what would `\\\` (three backslashes) parse as? The first two `\`'s would be a `\\` escape sequence, thereby inserting just a single `\` character in the string value. The remaining third `\` would just escape whatever character comes immediately after it.
270+
In text, forward slash `/` is most common. But occasionally, you need a backward slash `\`. To include a literal `\` backslash character without it performing as the start of a character-escape sequence, use the `\\` (double backslashes).
271+
272+
So, then... what would `\\\` (three backslashes) in a string parse as? The first two `\`'s would be a `\\` escape sequence, thereby inserting just a single `\` character in the string value, and the remaining `\` would just escape whatever character comes immediately after it.
273+
274+
One place backslashes show up commonly is in Windows file paths, which use the `\` separator instead of the `/` separator used in linux/unix style paths:
271275
272276
```js
273-
windowsDriveLocation =
274-
"C:\\\"Program Files\\Common Files\\\"";
277+
windowsFontsPath =
278+
"C:\\Windows\\Fonts\\";
275279

276-
console.log(windowsDriveLocation);
277-
// C:\"Program Files\Common Files\"
280+
console.log(windowsFontsPath);
281+
// C:\Windows\Fonts\"
278282
```
279283
280284
| TIP: |
281285
| :--- |
282-
| What about four backslashes `\\\\` in a string literal? Well, that's just two `\\` escape sequences next to each other, so it results in two adjacent backslashes (`\\`) in the underlying string value. If you're paying attention, you'll see there's an odd/even pattern rule here. You should thus be able to deciper any odd (`\\\\\`, `\\\\\\\\\`, etc) or even (`\\\\\\`, `\\\\\\\\\\`, etc) number of backslashes in a string literal. |
286+
| What about four backslashes `\\\\` in a string literal? Well, that's just two `\\` escape sequences next to each other, so it results in two adjacent backslashes (`\\`) in the underlying string value. You might recognize there's an odd/even rule pattern at play. You should thus be able to deciper any odd (`\\\\\`, `\\\\\\\\\`, etc) or even (`\\\\\\`, `\\\\\\\\\\`, etc) number of backslashes in a string literal. |
283287
284288
#### Multi-Character Escapes
285289
286-
Multi-character escape sequences may be hexadecimal or unicode sequences.
290+
Multi-character escape sequences may be hexadecimal or Unicode sequences.
287291
288292
Hexidecimal escape sequences are used to encode any of the base ASCII characters (codes 0-255), and look like `\x` followed by exactly two hexadecimal characters (`0-9` and `a-f` / `A-F` -- case insensitive). For example, `A9` or `a9` are decimal value `169`, which corresponds to:
289293
@@ -301,13 +305,13 @@ For any normal character that can be typed on a keyboard, such as `"a"`, it's us
301305
302306
##### Unicode
303307
304-
Unicode escape sequences encode any of the characters in the unicode set whose code-point values range from 0-65535, and look like `\u` followed by exactly four hexadecimal characters. For example, the escape-sequence `\u00A9` (or `\u00a9`) corresponds to that same `©` symbol, while `\u263A` (or `\u263a`) corresponds to the unicode character with code-point `9786`: `` (smiley face symbol).
308+
Unicode escape sequences encode any of the characters in the Unicode set whose code-point values range from 0-65535. They look like `\u` followed by exactly four hexadecimal characters. For example, the escape-sequence `\u00A9` (or `\u00a9`) corresponds to that same `©` symbol, while `\u263A` (or `\u263a`) corresponds to the Unicode character with code-point `9786`: `` (smiley face symbol).
305309
306310
When any character-escape sequence (regardless of length) is recognized, the single character it represents is inserted into the string, rather than the original separate characters. So, in the string `"\u263A"`, there's only one (smiley) character, not six individual characters.
307311
308-
Unicode code-points can go well above `65535` (`FFFF` in hexadecimal), up to a maximum of `1114111` (`10FFFF` in hexadecimal). For example, `1F4A9` is decimal code-point `128169`, which corresponds to the funny `💩` (pile of poo) character.
312+
Unicode code-points can go well above `65535` (`FFFF` in hexadecimal), up to a maximum of `1114111` (`10FFFF` in hexadecimal). For example, `1F4A9` (or `1f4a9`)is decimal code-point `128169`, which corresponds to the funny `💩` (pile of poo) symbol.
309313
310-
But `"\u1F4A9"` wouldn't work as expected, since it would be parsed as `\u1F4A` as a unicode escape sequence, followed by just the `9` literal character. To address this limitation, a variation of unicode escape sequences was introduced in ES6, to allow an arbitrary number of hexadecimal characters after the `\u`, by surrounding them with `{ .. }` curly braces:
314+
But `\u1F4A9` wouldn't work to include this character in a string, since it would be parsed as the Unicode escape sequence `\u1F4A`, followed by a literal `9` character. To address this limitation, a variation of Unicode escape sequences was introduced to allow an arbitrary number of hexadecimal characters after the `\u`, by surrounding them with `{ .. }` curly braces:
311315
312316
```js
313317
myReaction = "\u{1F4A9}";
@@ -316,7 +320,7 @@ console.log(myReaction);
316320
// 💩
317321
```
318322
319-
Recall the earlier discussion of extended (non-BMP) Unicode characters, *surrogate halves*? The same `💩` could also be defined with the explicit code-units:
323+
Recall the earlier discussion of extended (non-BMP) Unicode characters and *surrogate halves*? The same `💩` could also be defined with the two explicit code-units:
320324
321325
```js
322326
myReaction = "\uD83D\uDCA9";
@@ -325,14 +329,14 @@ console.log(myReaction);
325329
// 💩
326330
```
327331
328-
All three representations of this same character are stored internally by JS identically and are indistinguishable:
332+
All three representations of this same character are stored internally by JS identically, and are indistinguishable:
329333
330334
```js
331335
"💩" === "\u{1F4A9}"; // true
332336
"\u{1F4A9}" === "\uD83D\uDCA9"; // true
333337
```
334338
335-
Even though JS doesn't care which way such a character is represented, consider the readability differences carefully when authoring your code.
339+
Even though JS doesn't care which way such a character is represented in your program, consider the readability differences carefully when authoring your code.
336340
337341
#### Line Continuation
338342
@@ -380,11 +384,19 @@ Everything between the `{ .. }` in such a template literal is an arbitrary JS ex
380384
| :--- |
381385
| This feature is commonly called "template literals" or "template strings", but I think that's confusing. "Template" is usually referred to in programming contexts as a reusable definition that can be re-evaluated with different data. For example, *template engines* for pages, email templates for newsletter campaigns, etc. This JS feature is not re-usable. It's a literal, and it produces a single, immediate value (usually a string). You can put such a value in a function, and call the function multiple times. But then the function is acting as the template, not the the literal itself. I prefer instead to refer to this feature as *interpolated literals*, or the funny, shortened *interpoliterals*, as I think this name is more accurately descriptive. |
382386
383-
Some JS developers believe that this style of string literal is preferable to use for *all* strings, even if you're not doing any expression interpolation. I disagree. I think it should only be used when interpolating, and classic `".."` or `'..'` delimited strings should be used for non-interpolated string definitions.
387+
Template literals usually result in a string value, but not always. A form of template literal that may look kind of strange is called a *tagged template literal*:
388+
389+
```js
390+
price = formatCurrency`The cost is: ${totalCost}`;
391+
```
392+
393+
Here, `formatCurrency` is a tag applied to the template literal value, which actually invokes `formatCurrency(..)` as a function, passing it the string literals and interpolated expressions parsed from the value. This function can then assemble those in any way it sees fit -- such as formatting a `number` value as currency in the current locale -- and return whatever value, string or otherwise, that it wants. So tagged template literals are not always strings. But untagged template literals will always be strings.
394+
395+
Some JS developers believe that untagged template literal strings are preferable to use for *all* strings, even if not doing any expression interpolation. I disagree. I think it should only be used when interpolating, and classic `".."` or `'..'` delimited strings should be used for non-interpolated string definitions.
384396
385397
Moreover, there are a few places where `` `..` `` style strings are disallowed. For example, the `"use strict"` pragma cannot use back-ticks, or the pragma will be silently ignored (and thus the program accidentally runs in non-strict mode). Also, this style of strings cannot be used in quoted property names of object literals, or in the ES Module `import .. from ..` module-specifier clause.
386398
387-
My advice: use `` `..` `` delimited strings where allowed, and only when interpolation is needed, but keep using `".."` or `'..'` delimited strings for all other strings.
399+
My advice: use `` `..` `` delimited strings where allowed, but only when interpolation is needed, and keep using `".."` or `'..'` delimited strings for all other strings.
388400
389401
### Number Values
390402

0 commit comments

Comments
 (0)