Skip to content

Commit 75ff3b7

Browse files
committed
Don't decode YAML responses before parsing them in get_artifact
Taskcluster sends those without a content-type which means that requests assumes they're latin1 text, which they might not be as taskgraph uploads its YAMLs encoded as UTF-8. While decoding UTF-8 as latin1 works most of the time, if the commit message passed in a parameters.yml contains a special character, then it will return a garbled mess which the yaml library will reject. By passing the raw bytes instead of decoding them, we sidestep the problem entirely, delegating the decoding to the YAML library which supports both utf-8 and utf-16s. From the documentation for `yaml.load` > A byte string or a file must be encoded with utf-8, utf-16-be or > utf-16-le encoding. yaml.load detects the encoding by checking the BOM > (byte order mark) sequence at the beginning of the string/file. If no > BOM is present, the utf-8 encoding is assumed.
1 parent efb7fc0 commit 75ff3b7

File tree

2 files changed

+16
-1
lines changed

2 files changed

+16
-1
lines changed

src/taskgraph/util/taskcluster.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ def _handle_artifact(path, response):
143143
if path.endswith(".json"):
144144
return response.json()
145145
if path.endswith(".yml"):
146-
return yaml.load_stream(response.text)
146+
return yaml.load_stream(response.content)
147147
response.raw.read = functools.partial(response.raw.read, decode_content=True)
148148
return response.raw
149149

test/test_util_taskcluster.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -154,6 +154,21 @@ def test_get_artifact(responses, root_url):
154154
)
155155
assert tc.get_artifact(tid, "artifact.yml") == {"foo": "bar"}
156156

157+
responses.add(
158+
responses.GET,
159+
f"{root_url}/api/queue/v1/task/{tid}/artifacts/artifact.yml",
160+
body=b"foo: \xe2\x81\x83",
161+
)
162+
assert tc.get_artifact(tid, "artifact.yml") == {"foo": b"\xe2\x81\x83".decode()}
163+
164+
responses.add(
165+
responses.GET,
166+
f"{root_url}/api/queue/v1/task/{tid}/artifacts/artifact.yml",
167+
body=b"foo: \xe2\x81\x83".decode().encode("utf-16"),
168+
headers={"Content-Type": "text/yaml; charset=utf-16"},
169+
)
170+
assert tc.get_artifact(tid, "artifact.yml") == {"foo": b"\xe2\x81\x83".decode()}
171+
157172

158173
def test_list_artifact(responses, root_url):
159174
tid = 123

0 commit comments

Comments
 (0)