PYTHON-1008

mambocab · mambocab · commit 9291b38444ee · 2018-10-09T16:15:28.000-04:00
Addresses PYTHON-1008/CASSANDRA-14632.

The CREATE CUSTOM INDEX statement initially generated in
`IndexMetadata.as_cql_query` is always a `unicode` object under Python 2,
because they are derived from CQL `text` fields in in
`system_schema.indexes`, and we serialize the bytes we get from
`text`-type values with `.encode('utf-8')`.

The result of `cql_encode_all_types` is always a `six.binary_type`, i.e.
a `str` under Python 2.

Notably, Python 2 `str` objects cannot be cast to `unicode` objects if
they contain non-ASCII characters, including those used to encode
Unicode characters in UTF-8. These casting errors can occur during
attempts to concatenate these objects:

```
&gt;&gt;&gt; u"I'm a `unicode` object" + ", and I am a `str`."
u"I'm a `unicode` object, and I am a `str`."
&gt;&gt;&gt; u"I'm a `unicode` object" + ", and I am a `str` with unicode code points \xf0\x9f\x98\xac"
Traceback (most recent call last):
  File "&lt;stdin&gt;", line 1, in &lt;module&gt;
UnicodeDecodeError: 'ascii' codec can't decode byte 0xf0 in position 45: ordinal not in range(128)
```

So, the error reported in CASSANDRA-14632 happens when we try to
concatenate a `str` containing escaped UTF-8 bytes, as generated by
`cql_encode_all_types`, onto `ret`, a `unicode`.

Since we know `ret` is always a `six.text_type`, we can check if we need
to convert the result of `cql_encode_all_types`, and do so. We should
only ever need to under Python 2.
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -4,6 +4,7 @@
 Bug Fixes
 ---------
 * Improve and fix socket error-catching code in nonblocking-socket reactors (PYTHON-1024)
+* Non-ASCII characters in schema break CQL string generation (PYTHON-1008)
 
 Other
 -----
diff --git a/cassandra/metadata.py b/cassandra/metadata.py
@@ -1435,7 +1435,11 @@ def as_cql_query(self):
                 index_target,
                 class_name)
             if options:
-                ret += " WITH OPTIONS = %s" % Encoder().cql_encode_all_types(options)
+                opts_cql_encoded = Encoder().cql_encode_all_types(options)
+                # PYTHON-1008
+                if isinstance(opts_cql_encoded, six.binary_type):
+                    opts_cql_encoded = opts_cql_encoded.decode('utf-8')
+                ret += " WITH OPTIONS = %s" % opts_cql_encoded
             return ret
 
     def export_as_string(self):