Skip to content

Commit 9291b38

Browse files
committed
PYTHON-1008
Addresses PYTHON-1008/CASSANDRA-14632. The CREATE CUSTOM INDEX statement initially generated in `IndexMetadata.as_cql_query` is always a `unicode` object under Python 2, because they are derived from CQL `text` fields in in `system_schema.indexes`, and we serialize the bytes we get from `text`-type values with `.encode('utf-8')`. The result of `cql_encode_all_types` is always a `six.binary_type`, i.e. a `str` under Python 2. Notably, Python 2 `str` objects cannot be cast to `unicode` objects if they contain non-ASCII characters, including those used to encode Unicode characters in UTF-8. These casting errors can occur during attempts to concatenate these objects: ``` >>> u"I'm a `unicode` object" + ", and I am a `str`." u"I'm a `unicode` object, and I am a `str`." >>> u"I'm a `unicode` object" + ", and I am a `str` with unicode code points \xf0\x9f\x98\xac" Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 45: ordinal not in range(128) ``` So, the error reported in CASSANDRA-14632 happens when we try to concatenate a `str` containing escaped UTF-8 bytes, as generated by `cql_encode_all_types`, onto `ret`, a `unicode`. Since we know `ret` is always a `six.text_type`, we can check if we need to convert the result of `cql_encode_all_types`, and do so. We should only ever need to under Python 2.
1 parent 0f2b8ea commit 9291b38

File tree

2 files changed

+6
-1
lines changed

2 files changed

+6
-1
lines changed

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4,6 +4,7 @@
44
Bug Fixes
55
---------
66
* Improve and fix socket error-catching code in nonblocking-socket reactors (PYTHON-1024)
7+
* Non-ASCII characters in schema break CQL string generation (PYTHON-1008)
78

89
Other
910
-----

cassandra/metadata.py

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1435,7 +1435,11 @@ def as_cql_query(self):
14351435
index_target,
14361436
class_name)
14371437
if options:
1438-
ret += " WITH OPTIONS = %s" % Encoder().cql_encode_all_types(options)
1438+
opts_cql_encoded = Encoder().cql_encode_all_types(options)
1439+
# PYTHON-1008
1440+
if isinstance(opts_cql_encoded, six.binary_type):
1441+
opts_cql_encoded = opts_cql_encoded.decode('utf-8')
1442+
ret += " WITH OPTIONS = %s" % opts_cql_encoded
14391443
return ret
14401444

14411445
def export_as_string(self):

0 commit comments

Comments
 (0)