Skip to content

Commit 489f930

Browse files
committed
Treat latin1 database charset as actually-UTF-8 (iykyk)
1 parent 9fef677 commit 489f930

File tree

5 files changed

+32
-164
lines changed

5 files changed

+32
-164
lines changed

README.md

Lines changed: 7 additions & 79 deletions
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,12 @@
1-
# Trilogy
1+
# trilogy force-latin1-to-utf8
22

3-
Trilogy is a client library for MySQL-compatible database servers, designed for performance, flexibility, and ease of embedding.
3+
Got a MySQL database with a latin-1 charset that actually stores utf-8 data?
44

5-
It's currently in production use on github.com.
5+
Oops!
66

7-
## Features
7+
This fork of the Trilogy MySQL client "fixes" the "glitch."
88

9-
* Supports the most frequently used parts of the text protocol
10-
* Handshake
11-
* Password authentication
12-
* Query, ping, and quit commands
9+
Latin-1 database strings are mapped to UTF-8 Ruby strings instead of latin-1,
10+
and that's it.
1311

14-
* Support prepared statements (binary protocol)
15-
16-
* Low-level protocol API completely decoupled from IO
17-
18-
* Non-blocking client API wrapping the protocol API
19-
20-
* Blocking client API wrapping the non-blocking API
21-
22-
* No dependencies outside of POSIX, the C standard library & OpenSSL
23-
24-
* Minimal dynamic allocation
25-
26-
* MIT licensed
27-
28-
## Limitations
29-
30-
* Only supports the parts of the text protocol that are in common use.
31-
32-
* No support for `LOAD DATA INFILE` on local files
33-
34-
* `trilogy_escape` assumes an ASCII-compatible connection encoding
35-
36-
## Building
37-
38-
`make` - that's it. This will build a static `libtrilogy.a`
39-
40-
Trilogy should build out of the box on most UNIX systems which have OpenSSL installed.
41-
42-
## API Documentation
43-
44-
Documentation for Trilogy's various APIs can be found in these header files:
45-
46-
* `blocking.h`
47-
48-
The blocking client API. These are simply a set of convenient wrapper functions around the non-blocking client API in `client.h`
49-
50-
* `client.h`
51-
52-
The non-blocking client API. Every command is split into a `_send` and `_recv` function allowing callers to wait for IO readiness externally to Trilogy
53-
54-
* `builder.h`
55-
56-
MySQL-compatible packet builder API
57-
58-
* `charset.h`
59-
60-
Character set and encoding tables
61-
62-
* `error.h`
63-
64-
Error table. Every Trilogy function returning an `int` uses the error codes defined here
65-
66-
* `packet_parser.h`
67-
68-
Streaming packet frame parser
69-
70-
* `protocol.h`
71-
72-
Low-level protocol API. Provides IO-decoupled functions to parse and build packets
73-
74-
* `reader.h`
75-
76-
Bounds-checked packet reader API
77-
78-
## Bindings
79-
80-
We maintain a [Ruby binding](contrib/ruby) in this repository. This is currently stable and production-ready.
81-
82-
## License
83-
84-
Trilogy is released under the [MIT license](LICENSE).
12+
Otherwise, this fork is identical to the [mainline Trilogy library](https://github.com/trilogy-libraries/trilogy).

contrib/ruby/README.md

Lines changed: 7 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,88 +1,12 @@
1-
# trilogy
1+
# trilogy force-latin1-to-utf8
22

3-
Ruby bindings to the Trilogy client library
3+
Got a MySQL database with a latin-1 charset that actually stores utf-8 data?
44

5-
## Installation
5+
Oops!
66

7-
Add this line to your application's Gemfile:
7+
This fork of the Trilogy MySQL client "fixes" the "glitch."
88

9-
``` ruby
10-
gem 'trilogy'
11-
```
9+
Latin-1 database strings are mapped to UTF-8 Ruby strings instead of latin-1,
10+
and that's it.
1211

13-
And then execute:
14-
15-
```
16-
$ bundle
17-
```
18-
19-
Or install it yourself as:
20-
21-
```
22-
$ gem install trilogy
23-
```
24-
25-
## Usage
26-
27-
``` ruby
28-
client = Trilogy.new(host: "127.0.0.1", port: 3306, username: "root", read_timeout: 2)
29-
if client.ping
30-
client.change_db "mydb"
31-
32-
result = client.query("SELECT id, created_at FROM users LIMIT 10")
33-
result.each_hash do |user|
34-
p user
35-
end
36-
end
37-
```
38-
39-
### Processing multiple result sets
40-
41-
In order to send and receive multiple result sets, pass the `multi_statement` option when connecting.
42-
`Trilogy#more_results_exist?` will return true if more results exist, false if no more results exist, or raise
43-
an error if the respective query errored. `Trilogy#next_result` will retrieve the next result set, or return nil
44-
if no more results exist.
45-
46-
``` ruby
47-
client = Trilogy.new(host: "127.0.0.1", port: 3306, username: "root", read_timeout: 2, multi_statement: true)
48-
49-
results = []
50-
results << client.query("SELECT name FROM users WHERE id = 1; SELECT name FROM users WHERE id = 2")
51-
results << client.next_result while client.more_results_exist?
52-
```
53-
54-
## Building
55-
You should use the rake commands to build/install/release the gem
56-
For instance:
57-
```shell
58-
bundle exec rake build
59-
```
60-
61-
## Contributing
62-
63-
The official Ruby bindings are inside of the canonical trilogy repository itself.
64-
65-
1. Fork it ( https://github.com/trilogy-libraries/trilogy/fork )
66-
2. Create your feature branch (`git checkout -b my-new-feature`)
67-
3. Commit your changes (`git commit -am 'Add some feature'`)
68-
4. Push to the branch (`git push origin my-new-feature`)
69-
5. Create a new Pull Request
70-
71-
## mysql2 gem compatibility
72-
73-
The trilogy API was heavily inspired by the mysql2 gem but has a few notable
74-
differences:
75-
76-
* The `query_flags` don't inherit from the connection options hash.
77-
This means that options like turning on/of casting will need to be set before
78-
a query and not passed in at connect time.
79-
* For performance reasons there is no `application_timezone` query option. If
80-
casting is enabled and your database timezone is different than what the
81-
application is expecting you'll need to do the conversion yourself later.
82-
* While we still tag strings with the encoding configured on the field they came
83-
from - for performance reasons no automatic transcoding into
84-
`Encoding.default_internal` is done. Similarly to not automatically converting
85-
Time objects from `database_timezone` into `application_timezone`, we leave
86-
the transcoding step up to the caller.
87-
* There is no `as` query option. Calling `Trilogy::Result#each` will yield an array
88-
of row values. If you want a hash you should use `Trilogy::Result#each_hash`.
12+
Otherwise, this fork is identical to the [mainline Trilogy library](https://github.com/trilogy-libraries/trilogy).

contrib/ruby/ext/trilogy-ruby/cast.c

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ static const char *ruby_encoding_name_map[] = {
3434
[TRILOGY_ENCODING_KEYBCS2] = NULL,
3535
[TRILOGY_ENCODING_KOI8R] = "KOI8-R",
3636
[TRILOGY_ENCODING_KOI8U] = "KOI8-U",
37-
[TRILOGY_ENCODING_LATIN1] = "ISO-8859-1",
37+
// When life gives you latin1 containing UTF-8 bytes...
38+
[TRILOGY_ENCODING_LATIN1] = "UTF-8",
3839
[TRILOGY_ENCODING_LATIN2] = "ISO-8859-2",
3940
[TRILOGY_ENCODING_LATIN5] = "ISO-8859-9",
4041
[TRILOGY_ENCODING_LATIN7] = "ISO-8859-13",

contrib/ruby/lib/trilogy/encoding.rb

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,8 @@ module Encoding
88
"cp850" => "CP850",
99
"hp8" => nil,
1010
"koi8r" => "KOI8-R",
11-
"latin1" => "ISO-8859-1",
11+
# When it says latin1 on the tin, but it isn't, that's amore
12+
"latin1" => "UTF-8",
1213
"latin2" => "ISO-8859-2",
1314
"swe7" => nil,
1415
"ascii" => "US-ASCII",

contrib/ruby/test/client_test.rb

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1108,6 +1108,20 @@ def test_character_encoding
11081108
assert_equal expected, client.query("SELECT 'こんにちは'").to_a.first.first
11091109
end
11101110

1111+
def test_latin1_results_are_utf8
1112+
client = new_tcp_client(encoding: "latin1")
1113+
1114+
assert_equal "latin1", client.query("SELECT @@character_set_client").first.first
1115+
assert_equal "latin1", client.query("SELECT @@character_set_results").first.first
1116+
assert_equal "latin1", client.query("SELECT @@character_set_connection").first.first
1117+
collation = client.query("SELECT @@collation_connection").first.first
1118+
assert_includes ["latin1_swedish_ci", "latin1_general_ci"], collation
1119+
1120+
result = client.query("SELECT CAST(0xC3A9 AS CHAR CHARACTER SET latin1)").first.first
1121+
assert_equal Encoding::UTF_8, result.encoding
1122+
assert_equal "\u00E9", result
1123+
end
1124+
11111125
def test_character_encoding_handles_binary_queries
11121126
client = new_tcp_client
11131127
expected = "\xff".b

0 commit comments

Comments
 (0)