Skip to content

Commit e2f1f7c

Browse files
committed
[Doc] Encourage use of encoding constants
Lots of documentation examples still use encoding APIs with encoding names rather than encoding constants. I think it would be preferable to direct users toward constants as it can help with auto-completion, static analysis and such.
1 parent d9d0848 commit e2f1f7c

File tree

2 files changed

+51
-51
lines changed

2 files changed

+51
-51
lines changed

doc/encodings.rdoc

Lines changed: 43 additions & 43 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,9 @@ Some character sets contain only 1-byte characters;
1212
{US-ASCII}[https://en.wikipedia.org/wiki/ASCII], for example, has 256 1-byte characters.
1313
This string, encoded in US-ASCII, has six characters that are stored as six bytes:
1414

15-
s = 'Hello!'.encode('US-ASCII') # => "Hello!"
16-
s.encoding # => #<Encoding:US-ASCII>
17-
s.bytes # => [72, 101, 108, 108, 111, 33]
15+
s = 'Hello!'.encode(Encoding::US_ASCII) # => "Hello!"
16+
s.encoding # => #<Encoding:US-ASCII>
17+
s.bytes # => [72, 101, 108, 108, 111, 33]
1818

1919
Other encodings may involve multi-byte characters.
2020
{UTF-8}[https://en.wikipedia.org/wiki/UTF-8], for example,
@@ -106,17 +106,17 @@ Method Encoding.default_external returns the default external \Encoding:
106106

107107
Method Encoding.default_external= sets that value:
108108

109-
Encoding.default_external = 'US-ASCII' # => "US-ASCII"
110-
Encoding.default_external # => #<Encoding:US-ASCII>
109+
Encoding.default_external = Encoding::US_ASCII # => #<Encoding:US-ASCII>
110+
Encoding.default_external # => #<Encoding:US-ASCII>
111111

112112
Method Encoding.default_internal returns the default internal \Encoding:
113113

114114
Encoding.default_internal # => nil
115115

116116
Method Encoding.default_internal= sets the default internal \Encoding:
117117

118-
Encoding.default_internal = 'US-ASCII' # => "US-ASCII"
119-
Encoding.default_internal # => #<Encoding:US-ASCII>
118+
Encoding.default_internal = Encoding::US_ASCII # => #<Encoding:US-ASCII>
119+
Encoding.default_internal # => #<Encoding:US-ASCII>
120120

121121
=== Compatible Encodings
122122

@@ -128,9 +128,9 @@ returns the \Encoding of the concatenated string, or +nil+ if incompatible:
128128
eng = 'text'
129129
Encoding.compatible?(rus, eng) # => #<Encoding:UTF-8>
130130

131-
s0 = "\xa1\xa1".force_encoding('iso-8859-1') # => "\xA1\xA1"
132-
s1 = "\xa1\xa1".force_encoding('euc-jp') # => "\x{A1A1}"
133-
Encoding.compatible?(s0, s1) # => nil
131+
s0 = "\xa1\xa1".force_encoding(Encoding::ISO_8859_1) # => "\xA1\xA1"
132+
s1 = "\xa1\xa1".force_encoding(Encoding::EUCJP) # => "\x{A1A1}"
133+
Encoding.compatible?(s0, s1) # => nil
134134

135135
== \String \Encoding
136136

@@ -151,38 +151,38 @@ The default encoding for a string created with method String.new is:
151151

152152
In either case, any encoding may be specified:
153153

154-
s = String.new(encoding: 'UTF-8') # => ""
155-
s.encoding # => #<Encoding:UTF-8>
156-
s = String.new('foo', encoding: 'ASCII-8BIT') # => "foo"
157-
s.encoding # => #<Encoding:ASCII-8BIT>
154+
s = String.new(encoding: Encoding::UTF_8) # => ""
155+
s.encoding # => #<Encoding:UTF-8>
156+
s = String.new('foo', encoding: Encoding::BINARY) # => "foo"
157+
s.encoding # => #<Encoding:BINARY (ASCII-8BIT)>
158158

159159
The encoding for a string may be changed:
160160

161-
s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé"
162-
s.encoding # => #<Encoding:UTF-8>
163-
s.force_encoding('ISO-8859-1') # => "R\xC3\xA9sum\xC3\xA9"
164-
s.encoding # => #<Encoding:ISO-8859-1>
161+
s = "R\xC3\xA9sum\xC3\xA9" # => "Résumé"
162+
s.encoding # => #<Encoding:UTF-8>
163+
s.force_encoding(Encoding::ISO_8859_1) # => "R\xC3\xA9sum\xC3\xA9"
164+
s.encoding # => #<Encoding:ISO-8859-1>
165165

166166
Changing the assigned encoding does not alter the content of the string;
167167
it changes only the way the content is to be interpreted:
168168

169-
s # => "R\xC3\xA9sum\xC3\xA9"
170-
s.force_encoding('UTF-8') # => "Résumé"
169+
s # => "R\xC3\xA9sum\xC3\xA9"
170+
s.force_encoding(Encoding::UTF_8) # => "Résumé"
171171

172172
The actual content of a string may also be altered;
173173
see {Transcoding a String}[#label-Transcoding+a+String].
174174

175175
Here are a couple of useful query methods:
176176

177-
s = "abc".force_encoding("UTF-8") # => "abc"
178-
s.ascii_only? # => true
179-
s = "abc\u{6666}".force_encoding("UTF-8") # => "abc晦"
180-
s.ascii_only? # => false
177+
s = "abc".force_encoding(Encoding::UTF_8) # => "abc"
178+
s.ascii_only? # => true
179+
s = "abc\u{6666}".force_encoding(Encoding::UTF_8) # => "abc晦"
180+
s.ascii_only? # => false
181181

182-
s = "\xc2\xa1".force_encoding("UTF-8") # => "¡"
183-
s.valid_encoding? # => true
184-
s = "\xc2".force_encoding("UTF-8") # => "\xC2"
185-
s.valid_encoding? # => false
182+
s = "\xc2\xa1".force_encoding(Encoding::UTF_8) # => "¡"
183+
s.valid_encoding? # => true
184+
s = "\xc2".force_encoding(Encoding::UTF_8) # => "\xC2"
185+
s.valid_encoding? # => false
186186

187187
== \Symbol and \Regexp Encodings
188188

@@ -337,8 +337,8 @@ then reads the file into a new string, encoding it as UTF-8:
337337

338338
s = "R\u00E9sum\u00E9"
339339
path = 't.tmp'
340-
ext_enc = 'ISO-8859-1'
341-
int_enc = 'UTF-8'
340+
ext_enc = Encoding::ISO_8859_1
341+
int_enc = Encoding::UTF_8
342342

343343
File.write(path, s, external_encoding: ext_enc)
344344
raw_text = File.binread(path)
@@ -373,8 +373,8 @@ These keyword-value pairs specify encoding options:
373373
Examples:
374374

375375
s = "\x80foo\x80"
376-
s.encode('ISO-8859-3') # Raises Encoding::InvalidByteSequenceError.
377-
s.encode('ISO-8859-3', invalid: :replace) # => "?foo?"
376+
s.encode(Encoding::ISO_8859_3) # Raises Encoding::InvalidByteSequenceError.
377+
s.encode(Encoding::ISO_8859_3, invalid: :replace) # => "?foo?"
378378

379379
- For an undefined character:
380380

@@ -385,8 +385,8 @@ These keyword-value pairs specify encoding options:
385385
Examples:
386386

387387
s = "\x80foo\x80"
388-
"\x80".encode('UTF-8', 'ASCII-8BIT') # Raises Encoding::UndefinedConversionError.
389-
s.encode('UTF-8', 'ASCII-8BIT', undef: :replace) # => "�foo�"
388+
"\x80".encode(Encoding::UTF_8, Encoding::BINARY) # Raises Encoding::UndefinedConversionError.
389+
s.encode(Encoding::UTF_8, Encoding::BINARY, undef: :replace) # => "�foo�"
390390

391391

392392
- Replacement string:
@@ -400,7 +400,7 @@ These keyword-value pairs specify encoding options:
400400

401401
s = "\xA5foo\xA5"
402402
options = {:undef => :replace, :replace => 'xyzzy'}
403-
s.encode('UTF-8', 'ISO-8859-3', **options) # => "xyzzyfooxyzzy"
403+
s.encode(Encoding::UTF_8, Encoding::ISO_8859_3, **options) # => "xyzzyfooxyzzy"
404404

405405
- Replacement fallback:
406406

@@ -420,12 +420,12 @@ These keyword-value pairs specify encoding options:
420420

421421
hash = {"\u3042" => 'xyzzy'}
422422
hash.default = 'XYZZY'
423-
s.encode('ASCII', fallback: hash) # => "xyzzyfooXYZZY"
423+
s.encode(Encoding::US_ASCII, fallback: hash) # => "xyzzyfooXYZZY"
424424

425425
def (fallback = "U+%.4X").escape(x)
426426
self % x.unpack("U")
427427
end
428-
"\u{3042}".encode("US-ASCII", fallback: fallback.method(:escape)) # => "U+3042"
428+
"\u{3042}".encode(Encoding::US_ASCII, fallback: fallback.method(:escape)) # => "U+3042"
429429

430430
proc = Proc.new {|x| x == "\u3042" ? 'xyzzy' : 'XYZZY' }
431431
s.encode('ASCII', fallback: proc) # => "XYZZYfooXYZZY"
@@ -458,8 +458,8 @@ These keyword-value pairs specify encoding options:
458458
Examples:
459459

460460
s = 'foo"<&>"bar' + "\u3042"
461-
s.encode('ASCII', xml: :text) # => "foo\"&lt;&amp;&gt;\"bar&#x3042;"
462-
s.encode('ASCII', xml: :attr) # => "\"foo&quot;&lt;&amp;&gt;&quot;bar&#x3042;\""
461+
s.encode(Encoding::US_ASCII, xml: :text) # => "foo\"&lt;&amp;&gt;\"bar&#x3042;"
462+
s.encode(Encoding::US_ASCII, xml: :attr) # => "\"foo&quot;&lt;&amp;&gt;&quot;bar&#x3042;\""
463463

464464

465465
- Newlines:
@@ -476,7 +476,7 @@ These keyword-value pairs specify encoding options:
476476

477477
Examples:
478478

479-
s = "\n \r \r\n" # => "\n \r \r\n"
480-
s.encode('ASCII', cr_newline: true) # => "\r \r \r\r"
481-
s.encode('ASCII', crlf_newline: true) # => "\r\n \r \r\r\n"
482-
s.encode('ASCII', universal_newline: true) # => "\n \n \n"
479+
s = "\n \r \r\n" # => "\n \r \r\n"
480+
s.encode(Encoding::US_ASCII, cr_newline: true) # => "\r \r \r\r"
481+
s.encode(Encoding::US_ASCII, crlf_newline: true) # => "\r\n \r \r\r\n"
482+
s.encode(Encoding::US_ASCII, universal_newline: true) # => "\n \n \n"

string.c

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -3902,13 +3902,13 @@ rb_str_append_as_bytes(int argc, VALUE *argv, VALUE str)
39023902
* s = 'foo'
39033903
* s.encoding # => <Encoding:UTF-8>
39043904
* s << 0x00110000 # 1114112 out of char range (RangeError)
3905-
* s = 'foo'.encode('EUC-JP')
3905+
* s = 'foo'.encode(Encoding::EUC_JP)
39063906
* s << 0x00800080 # invalid codepoint 0x800080 in EUC-JP (RangeError)
39073907
*
39083908
* If the encoding is US-ASCII and the codepoint is 0..0xff, _string_
39093909
* is automatically promoted to ASCII-8BIT.
39103910
*
3911-
* s = 'foo'.encode('US-ASCII')
3911+
* s = 'foo'.encode(Encoding::US_ASCII)
39123912
* s << 0xff
39133913
* s.encoding # => #<Encoding:BINARY (ASCII-8BIT)>
39143914
*
@@ -4137,7 +4137,7 @@ rb_str_cmp(VALUE str1, VALUE str2)
41374137
* s == 'FOO' # => false
41384138
*
41394139
* Returns +false+ if the two strings' encodings are not compatible:
4140-
* "\u{e4 f6 fc}".encode("ISO-8859-1") == ("\u{c4 d6 dc}") # => false
4140+
* "\u{e4 f6 fc}".encode(Encoding::ISO_8859_1) == ("\u{c4 d6 dc}") # => false
41414141
*
41424142
* If +object+ is not an instance of +String+ but responds to +to_str+, then the
41434143
* two strings are compared using <code>object.==</code>.
@@ -4170,7 +4170,7 @@ rb_str_equal(VALUE str1, VALUE str2)
41704170
*
41714171
* Returns +false+ if the two strings' encodings are not compatible:
41724172
*
4173-
* "\u{e4 f6 fc}".encode("ISO-8859-1").eql?("\u{c4 d6 dc}") # => false
4173+
* "\u{e4 f6 fc}".encode(Encoding::ISO_8859_1).eql?("\u{c4 d6 dc}") # => false
41744174
*
41754175
*/
41764176

@@ -11412,9 +11412,9 @@ rb_str_b(VALUE str)
1141211412
*
1141311413
* Returns +true+ if +self+ is encoded correctly, +false+ otherwise:
1141411414
*
11415-
* "\xc2\xa1".force_encoding("UTF-8").valid_encoding? # => true
11416-
* "\xc2".force_encoding("UTF-8").valid_encoding? # => false
11417-
* "\x80".force_encoding("UTF-8").valid_encoding? # => false
11415+
* "\xc2\xa1".force_encoding(Encoding::UTF_8).valid_encoding? # => true
11416+
* "\xc2".force_encoding(Encoding::UTF_8).valid_encoding? # => false
11417+
* "\x80".force_encoding(Encoding::UTF_8).valid_encoding? # => false
1141811418
*/
1141911419

1142011420
static VALUE
@@ -11889,7 +11889,7 @@ rb_str_unicode_normalize_bang(int argc, VALUE *argv, VALUE str)
1188911889
*
1189011890
* Raises an exception if +self+ is not in a Unicode encoding:
1189111891
*
11892-
* s = "\xE0".force_encoding('ISO-8859-1')
11892+
* s = "\xE0".force_encoding(Encoding::ISO_8859_1)
1189311893
* s.unicode_normalized? # Raises Encoding::CompatibilityError.
1189411894
*
1189511895
* Related: String#unicode_normalize, String#unicode_normalize!.

0 commit comments

Comments
 (0)