You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Addresses PYTHON-1008/CASSANDRA-14632.
The CREATE CUSTOM INDEX statement initially generated in
`IndexMetadata.as_cql_query` is always a `unicode` object under Python 2,
because they are derived from CQL `text` fields in in
`system_schema.indexes`, and we serialize the bytes we get from
`text`-type values with `.encode('utf-8')`.
The result of `cql_encode_all_types` is always a `six.binary_type`, i.e.
a `str` under Python 2.
Notably, Python 2 `str` objects cannot be cast to `unicode` objects if
they contain non-ASCII characters, including those used to encode
Unicode characters in UTF-8. These casting errors can occur during
attempts to concatenate these objects:
```
>>> u"I'm a `unicode` object" + ", and I am a `str`."
u"I'm a `unicode` object, and I am a `str`."
>>> u"I'm a `unicode` object" + ", and I am a `str` with unicode code points \xf0\x9f\x98\xac"
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 45: ordinal not in range(128)
```
So, the error reported in CASSANDRA-14632 happens when we try to
concatenate a `str` containing escaped UTF-8 bytes, as generated by
`cql_encode_all_types`, onto `ret`, a `unicode`.
Since we know `ret` is always a `six.text_type`, we can check if we need
to convert the result of `cql_encode_all_types`, and do so. We should
only ever need to under Python 2.
0 commit comments