Skip to content

Commit 23df12f

Browse files
authored
Variable identifiers can effectively include whitespace!
1 parent 1752173 commit 23df12f

File tree

1 file changed

+49
-9
lines changed

1 file changed

+49
-9
lines changed

README.md

Lines changed: 49 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,8 @@ Unicode is Awesome. Prior to Unicode, international communication was grueling-
3131
- [Applied Unicode Encodings](#applied-unicode-encodings)
3232
- [Source Code](#source-code)
3333
- [Awesome Characters List](#awesome-characters-list)
34-
- [Characters](#characters)
34+
- [Special Characters](#special-characters)
35+
- [Variable identifiers can effectively include whitespace!](#user-content-variable-identifiers-can-effectively-include-whitespace)
3536
- [Modifiers](#modifiers)
3637
- [Quirks and Troubleshooting](#quirks-and-troubleshooting)
3738
- [List of Characters with One-To-Many Case Mappings](#list-of-characters-with-one-to-many-case-mappings)
@@ -224,8 +225,16 @@ There are also surrogate code points, private and unassigned codepoints, and con
224225

225226
# Awesome Characters List
226227

228+
229+
230+
231+
<center>
232+
[![](http://imgs.xkcd.com/comics/rtl.png )](https://xkcd.com/1137/)
233+
</center>
234+
227235
## Special Characters
228236

237+
The Unicode Consortium published a [general punctuation chart](http://www.unicode.org/charts/PDF/U2000.pdf) where you can find more details.
229238

230239

231240
| Char | Name | Description |
@@ -241,20 +250,51 @@ There are also surrogate code points, private and unassigned codepoints, and con
241250
| `';'` | U+037E GREEK QUESTION MARK | a look-alike to the semicolon. Also a fun way to annoy developers. |
242251
| `'‭'` | U+202D | change the text direction to Left-to-Right. |
243252
| `'‮'`‭ ‭ | U+202E | change the text direction to Right-to-Left: |
244-
| `` | U+A4F8 LISU LETTER TONE MYA TI |A lookalike for the period character. |
245-
| `` | U+A4F9 LISU LETTER TONE NA PO |A lookalike for the comma character.|
246-
| `` | U+A4FC LISU LETTER TONE MYA NA |A lookalike for the semi-colon character.|
247-
| `` | U+A4FD LISU LETTER TONE MYA JEU|A lookalike for the colon character.|
253+
| `'ꓸ'` | U+A4F8 LISU LETTER TONE MYA TI |A lookalike for the period character. |
254+
| `'ꓹ'` | U+A4F9 LISU LETTER TONE NA PO |A lookalike for the comma character.|
255+
| `'ꓼ'` | U+A4FC LISU LETTER TONE MYA NA |A lookalike for the semi-colon character.|
256+
| `'ꓽ'` | U+A4FD LISU LETTER TONE MYA JEU|A lookalike for the colon character.|
248257
| `'︀︁︂︃︄︅︆︇︈︉︊︋︌︍︎️󠄀󠄁󠄂󠄃󠄄󠄅󠄆󠄇󠄈󠄉󠄊󠄋󠄌󠄍󠄎󠄏󠄐󠄑󠄒󠄓󠄔󠄕󠄖󠄗󠄘󠄙󠄚󠄛󠄜󠄝󠄞󠄟󠄠󠄡󠄢󠄣󠄤󠄥󠄦󠄧󠄨󠄩󠄪󠄫󠄬󠄭󠄮󠄯󠄰󠄱󠄲󠄳󠄴󠄵󠄶󠄷󠄸󠄹󠄺󠄻󠄼󠄽󠄾󠄿󠅀󠅁󠅂󠅃󠅄󠅅󠅆󠅇󠅈󠅉󠅊󠅋󠅌󠅍󠅎󠅏󠅐󠅑󠅒󠅓󠅔󠅕󠅖󠅗󠅘󠅙󠅚󠅛󠅜󠅝󠅞󠅟󠅠󠅡󠅢󠅣󠅤󠅥󠅦󠅧󠅨󠅩󠅪󠅫󠅬󠅭󠅮󠅯󠅰󠅱󠅲󠅳󠅴󠅵󠅶󠅷󠅸󠅹󠅺󠅻󠅼󠅽󠅾󠅿󠆀󠆁󠆂󠆃󠆄󠆅󠆆󠆇󠆈󠆉󠆊󠆋󠆌󠆍󠆎󠆏󠆐󠆑󠆒󠆓󠆔󠆕󠆖󠆗󠆘󠆙󠆚󠆛󠆜󠆝󠆞󠆟󠆠󠆡󠆢󠆣󠆤󠆥󠆦󠆧󠆨󠆩󠆪󠆫󠆬󠆭󠆮󠆯󠆰󠆱󠆲󠆳󠆴󠆵󠆶󠆷󠆸󠆹󠆺󠆻󠆼󠆽󠆾󠆿󠇀󠇁󠇂󠇃󠇄󠇅󠇆󠇇󠇈󠇉󠇊󠇋󠇌󠇍󠇎󠇏󠇐󠇑󠇒󠇓󠇔󠇕󠇖󠇗󠇘󠇙󠇚󠇛󠇜󠇝󠇞󠇟󠇠󠇡󠇢󠇣󠇤󠇥󠇦󠇧󠇨󠇩󠇪󠇫󠇬󠇭󠇮󠇯'` | **Variation Selectors** ( U+FE00 to U+FE0F & U+E0100 to U+E01EF ) | a block of 256 zero width characters that posess the ID_Continue proprerty- meaning they can be used in variable names (not the first letter). What makes these special is the fact that mouse cursors pass over them as they are combining characters - unlike most other zero width characters.|
258+
| `'ᅟ'` | **U+115F HANGUL CHOSEONG FILLER** | In general it produces a space. Rendered as zero width (invisible) if not explicitly supported in rendering. Designated ID_Start|
259+
| `'ᅠ'` | **U+1160 HANGUL JUNGSEONG FILLER** | Perhaps it produces a space? Rendered as zero width (invisible) if not explicitly supported in rendering. Designated ID_Start|
260+
| `'ㅤ'` | **U+3164 HANGUL FILLER** | In general it produces a space. Rendered as zero width (invisible) if not explicitly supported in rendering. Designated ID_Start |
261+
<br><br>
262+
#### Wait a second... what did I just read?
249263

250264

265+
<br><br>
266+
## Variable identifiers can effectively include whitespace!
251267

252-
The Unicode Consortium published a [general punctuation chart](http://www.unicode.org/charts/PDF/U2000.pdf) where you can find more details.
268+
The **U+3164 HANGUL FILLER** character displays as an advancing whitespace character. The character is rendered as completely invisible (and non advancing, i.e. "zero width"), if not explicitly [supported in rendering](http://unicode.org/faq/unsup_char.html). That means the ugly character replacement (�) symbol should never be displayed. Interestingly, U+3164 was added to Unicode in version 1.1 (1993).
269+
270+
```javascript
271+
> var= 'foo';
272+
undefined
273+
>
274+
'foo'
275+
276+
277+
> var= alert;
278+
undefined
279+
> var foo = 'bar'
280+
undefined
281+
> if ( foo ===`baz` ){} // alert
282+
undefined
283+
284+
285+
> var varㅤfooㅤ\u{A60C}ㅤπ = 'bar';
286+
undefined
287+
> varㅤfooㅤꘌㅤπ
288+
'bar'
289+
290+
```
291+
<br>
292+
**NOTE:** I've tested U+3164 rendering on Ubuntu and OS X with the following: `node`, `php`, `ruby`, `python3.5`, `scala` ,`vim`, `cat`, `chrome`+`github gist`. Atom is the only system that fails by (incorrectly) displaying empty boxes. I have yet to test it out on Emacs and Sublime. From what I understand, the Unicode Consortium will not reassign or rename characters or codepoints, but may be convinced to change character properties like ID_Start/ID_Continue.
293+
294+
295+
<br>
253296

254297

255-
<center>
256-
[![](http://imgs.xkcd.com/comics/rtl.png )](https://xkcd.com/1137/)
257-
</center>
258298

259299
## Modifiers
260300

0 commit comments

Comments
 (0)