Skip to content

Commit bac8f99

Browse files
committed
Add encoding.md
1 parent 761a7a9 commit bac8f99

File tree

1 file changed

+56
-0
lines changed

1 file changed

+56
-0
lines changed

docs/encoding.md

Lines changed: 56 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,56 @@
1+
# RBS File Encoding
2+
3+
## Best Practice
4+
5+
**Use UTF-8** for both file encoding and your system locale.
6+
7+
## Supported Encodings
8+
9+
RBS parser supports ASCII-compatible encodings (similar to Ruby's script encoding support).
10+
11+
**Examples**: UTF-8, US-ASCII, Shift JIS, EUC-JP, ...
12+
13+
## Unicode Codepoint Symbols
14+
15+
String literal types in RBS can contain Unicode codepoint escape sequences (`\uXXXX`).
16+
17+
When the file encoding is UTF-8, the parser translates Unicode codepoint symbols:
18+
19+
```rbs
20+
# In UTF-8 encoded files
21+
22+
type t = "\u0123" # Translated to the actual Unicode character ģ
23+
type s = "\u3042" # Translated to the actual Unicode character あ
24+
```
25+
26+
When the file encoding is not UTF-8, Unicode escape sequences are interpreted literally as the string `\uXXXX`:
27+
28+
```rbs
29+
# In non-UTF-8 encoded files
30+
31+
type t = "\u0123" # Remains as the literal string "\u0123"
32+
```
33+
34+
## Implementation
35+
36+
RBS gem currently doesn't do anything for file encoding. It relies on Ruby's encoding handling, specifically `Encoding.default_external` and `Encoding.default_internal`.
37+
38+
`Encoding.default_external` is the encoding Ruby assumes when it reads external resources like files. The Ruby interpreter sets it based on the locale. `Encoding.default_internal` is the encoding Ruby converts the external resources to. The default is `nil` (no conversion.)
39+
40+
When your locale is set to use `UTF-8` encoding, `default_external` is `Encoding::UTF_8`. So the RBS file content read from the disk will have UTF-8 encoding.
41+
42+
### Parsing non UTF-8 RBS source text
43+
44+
If you want to work with another encoding, ensure the source string has ASCII compatible encoding.
45+
46+
```ruby
47+
source = '"日本語"'
48+
RBS::Parser.parse_type(source.encode(Encoding::EUC_JP)) # => Parses successfully
49+
RBS::Parser.parse_type(source.encode(Encoding::UTF_32)) # => Returns `nil` since UTF-32 is not ASCII compatible
50+
```
51+
52+
### Specifying file encoding
53+
54+
Currently, RBS doesn't support specifying file encoding directly.
55+
56+
You can use `Encoding.default_external` while the gem loads RBS files from the storage.

0 commit comments

Comments
 (0)