@@ -30,6 +30,7 @@ information.
3030- [ C/C++ addons] ( #cc-addons )
3131- [ Directory locations] ( #directory-locations )
3232- [ System configuration] ( #system-configuration )
33+ - [ Character encoding] ( #character-encoding )
3334- [ Newlines] ( #newlines )
3435- [ File paths] ( #file-paths )
3536- [ Filenames] ( #filenames )
@@ -38,7 +39,6 @@ information.
3839- [ Environment variables] ( #environment-variables )
3940- [ Symlinks] ( #symlinks )
4041- [ File metadata] ( #file-metadata )
41- - [ Character encoding] ( #character-encoding )
4242- [ Permissions] ( #permissions )
4343- [ Users] ( #users )
4444- [ Time resolution] ( #time-resolution )
@@ -154,6 +154,79 @@ can be used to access it from Node.
154154This should only be done when accessing OS-specific settings. Otherwise storing
155155configuration as files or remotely is easier and more portable.
156156
157+ # Character encoding
158+
159+ The [ character encoding] ( https://en.wikipedia.org/wiki/Character_encoding ) on
160+ Unix is usually [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) . However on Windows
161+ it is usually either [ UTF-16] ( https://en.wikipedia.org/wiki/UTF-16 ) or one of
162+ the [ Windows code pages] ( https://en.wikipedia.org/wiki/Windows_code_page ) .
163+ Few non-[ Unicode] ( https://unicode.org/ ) character encodings are also popular in
164+ some countries. This can result in characters not being printed properly,
165+ especially high
166+ [ Unicode code points] ( https://en.wikipedia.org/wiki/Unicode#Code_point_planes_and_blocks ) and
167+ [ emoji] ( https://en.wikipedia.org/wiki/Emoji ) .
168+
169+ The character encoding can be specified using an ` encoding ` option with most
170+ [ relevant Node.js core methods] ( https://nodejs.org/api/fs.html#fs_fs_writefile_file_data_options_callback ) .
171+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) is always the default value except
172+ for
173+ [ readable streams] ( https://nodejs.org/api/stream.html#stream_readable_streams )
174+ (including
175+ [ ` fs.createReadStream() ` ] ( https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options ) ),
176+ [ ` fs.readFile() ` ] ( https://nodejs.org/api/fs.html#fs_fs_readfile_path_options_callback ) and
177+ most [ ` crypto ` ] ( https://nodejs.org/api/crypto.html ) methods where ` buffer ` is
178+ the default instead.
179+
180+ To convert between character encodings
181+ [ ` string_encoder ` ] ( https://nodejs.org/api/string_decoder.html ) (decoding only),
182+ [ ` Buffer.transcode() ` ] ( https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc ) ,
183+ [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
184+ and
185+ [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder )
186+ can be used.
187+
188+ Node.js
189+ [ supports] ( https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings )
190+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) ,
191+ [ UTF-16 little endian] ( https://en.wikipedia.org/wiki/UTF-16 ) ,
192+ [ Latin-1] ( https://en.wikipedia.org/wiki/ISO/IEC_8859-1 ) and
193+ [ ASCII] ( https://en.wikipedia.org/wiki/ASCII ) , except for
194+ [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
195+ and
196+ [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder )
197+ which support
198+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) ,
199+ [ UTF-16 little endian] ( https://en.wikipedia.org/wiki/UTF-16 ) and
200+ [ UTF-16 big endian] ( https://en.wikipedia.org/wiki/UTF-16 ) by default. If
201+ Node.js is built with
202+ [ full internationalization support] ( https://nodejs.org/api/intl.html#intl_internationalization_support )
203+ or provided with it at runtime,
204+ [ many more character encodings] ( https://nodejs.org/api/util.html#util_encodings_requiring_full_icu_data )
205+ are supported by
206+ [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
207+ and
208+ [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder ) .
209+ If doing so is inconvenient,
210+ [ iconv-lite] ( https://github.com/ashtuchkin/iconv-lite ) or
211+ [ iconv] ( https://github.com/bnoordhuis/node-iconv ) can be used instead.
212+
213+ It is recommended to always use [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
214+ When reading from a file or terminal, one should either:
215+
216+ - detect the character encoding using
217+ [ ` node-chardet ` ] ( https://github.com/runk/node-chardet ) or
218+ [ ` jschardet ` ] ( https://github.com/aadsm/jschardet ) and convert to
219+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
220+ - validate and/or document that the input should be in
221+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
222+
223+ When writing to a terminal the character encoding will almost always be
224+ [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) on Unix and
225+ [ CP866] ( https://en.wikipedia.org/wiki/Code_page_866 ) on Windows (` cmd.exe ` ).
226+ [ figures] ( https://github.com/sindresorhus/figures ) and
227+ [ log-symbols] ( https://github.com/sindresorhus/log-symbols ) can be used to
228+ print common symbols consistently across platforms.
229+
157230# Newlines
158231
159232The character representation of a
@@ -454,79 +527,6 @@ only works on Linux.
454527For example the option ` recursive ` does not work on Linux.
455528[ ` chokidar ` ] ( https://github.com/paulmillr/chokidar ) can be used instead.
456529
457- # Character encoding
458-
459- The [ character encoding] ( https://en.wikipedia.org/wiki/Character_encoding ) on
460- Unix is usually [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) . However on Windows
461- it is usually either [ UTF-16] ( https://en.wikipedia.org/wiki/UTF-16 ) or one of
462- the [ Windows code pages] ( https://en.wikipedia.org/wiki/Windows_code_page ) .
463- Few non-[ Unicode] ( https://unicode.org/ ) character encodings are also popular in
464- some countries. This can result in characters not being printed properly,
465- especially high
466- [ Unicode code points] ( https://en.wikipedia.org/wiki/Unicode#Code_point_planes_and_blocks ) and
467- [ emoji] ( https://en.wikipedia.org/wiki/Emoji ) .
468-
469- The character encoding can be specified using an ` encoding ` option with most
470- [ relevant Node.js core methods] ( https://nodejs.org/api/fs.html#fs_fs_writefile_file_data_options_callback ) .
471- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) is always the default value except
472- for
473- [ readable streams] ( https://nodejs.org/api/stream.html#stream_readable_streams )
474- (including
475- [ ` fs.createReadStream() ` ] ( https://nodejs.org/api/fs.html#fs_fs_createreadstream_path_options ) ),
476- [ ` fs.readFile() ` ] ( https://nodejs.org/api/fs.html#fs_fs_readfile_path_options_callback ) and
477- most [ ` crypto ` ] ( https://nodejs.org/api/crypto.html ) methods where ` buffer ` is
478- the default instead.
479-
480- To convert between character encodings
481- [ ` string_encoder ` ] ( https://nodejs.org/api/string_decoder.html ) (decoding only),
482- [ ` Buffer.transcode() ` ] ( https://nodejs.org/api/buffer.html#buffer_buffer_transcode_source_fromenc_toenc ) ,
483- [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
484- and
485- [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder )
486- can be used.
487-
488- Node.js
489- [ supports] ( https://nodejs.org/api/buffer.html#buffer_buffers_and_character_encodings )
490- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) ,
491- [ UTF-16 little endian] ( https://en.wikipedia.org/wiki/UTF-16 ) ,
492- [ Latin-1] ( https://en.wikipedia.org/wiki/ISO/IEC_8859-1 ) and
493- [ ASCII] ( https://en.wikipedia.org/wiki/ASCII ) , except for
494- [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
495- and
496- [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder )
497- which support
498- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) ,
499- [ UTF-16 little endian] ( https://en.wikipedia.org/wiki/UTF-16 ) and
500- [ UTF-16 big endian] ( https://en.wikipedia.org/wiki/UTF-16 ) by default. If
501- Node.js is built with
502- [ full internationalization support] ( https://nodejs.org/api/intl.html#intl_internationalization_support )
503- or provided with it at runtime,
504- [ many more character encodings] ( https://nodejs.org/api/util.html#util_encodings_requiring_full_icu_data )
505- are supported by
506- [ ` TextDecoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textdecoder )
507- and
508- [ ` TextEncoder ` ] ( https://nodejs.org/api/util.html#util_class_util_textencoder ) .
509- If doing so is inconvenient,
510- [ iconv-lite] ( https://github.com/ashtuchkin/iconv-lite ) or
511- [ iconv] ( https://github.com/bnoordhuis/node-iconv ) can be used instead.
512-
513- It is recommended to always use [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
514- When reading from a file or terminal, one should either:
515-
516- - detect the character encoding using
517- [ ` node-chardet ` ] ( https://github.com/runk/node-chardet ) or
518- [ ` jschardet ` ] ( https://github.com/aadsm/jschardet ) and convert to
519- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
520- - validate and/or document that the input should be in
521- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) .
522-
523- When writing to a terminal the character encoding will almost always be
524- [ UTF-8] ( https://en.wikipedia.org/wiki/UTF-8 ) on Unix and
525- [ CP866] ( https://en.wikipedia.org/wiki/Code_page_866 ) on Windows (` cmd.exe ` ).
526- [ figures] ( https://github.com/sindresorhus/figures ) and
527- [ log-symbols] ( https://github.com/sindresorhus/log-symbols ) can be used to
528- print common symbols consistently across platforms.
529-
530530# Permissions
531531
532532Unix uses [ POSIX permissions] ( https://linux.die.net/man/1/chmod ) but Windows is
@@ -796,6 +796,13 @@ retrying few milliseconds later.
796796 [ ` npm install -g windows-build-tools ` ] ( https://github.com/felixrieseberg/windows-build-tools )
797797 on Windows when installing [ C/C++ addons] ( https://nodejs.org/api/addons.html ) .
798798- use [ ` os ` ] ( https://nodejs.org/api/os.html ) Node.js core module when needed.
799+ - use [ ` UTF-8 ` ] ( https://en.wikipedia.org/wiki/UTF-8 ) . File/terminal input
800+ should either be validated or converted to it
801+ ([ ` node-chardet ` ] ( https://github.com/runk/node-chardet ) ).
802+ - avoid printing Unicode characters (including
803+ [ emoji] ( https://en.wikipedia.org/wiki/Emoji ) ) except through projects like
804+ [ figures] ( https://github.com/sindresorhus/figures ) and
805+ [ log-symbols] ( https://github.com/sindresorhus/log-symbols ) .
799806- use [ ` os.EOL ` ] ( https://nodejs.org/api/os.html#os_os_eol ) when reading from or
800807 writing to a file, ` \n ` otherwise.
801808- use
@@ -844,13 +851,6 @@ retrying few milliseconds later.
844851 [ ` getgroups() ` ] ( https://nodejs.org/api/process.html#process_process_getgroups ) ,
845852 [ ` setgroups() ` ] ( https://nodejs.org/api/process.html#process_process_setgroups_groups ) and
846853 [ ` initgroups() ` ] ( https://nodejs.org/api/process.html#process_process_initgroups_user_extragroup ) .
847- - use [ ` UTF-8 ` ] ( https://en.wikipedia.org/wiki/UTF-8 ) . File/terminal input
848- should either be validated or converted to it
849- ([ ` node-chardet ` ] ( https://github.com/runk/node-chardet ) ).
850- - avoid printing Unicode characters (including
851- [ emoji] ( https://en.wikipedia.org/wiki/Emoji ) ) except through projects like
852- [ figures] ( https://github.com/sindresorhus/figures ) and
853- [ log-symbols] ( https://github.com/sindresorhus/log-symbols ) .
854854- do not assume
855855 [ ` process.hrtime() ` ] ( https://nodejs.org/api/process.html#process_process_hrtime_time )
856856 is nanoseconds-precise.
0 commit comments