-
Notifications
You must be signed in to change notification settings - Fork 27
Open
Labels
Description
@DylanWelzel @ctrl-schaff (based on discussion in this Slack thread)
While writing queries and testing x-bte annotation in biothings/biothings_explorer#904, I found a query that returns an error: {"code":500,"success":false,"error":"Internal Server Error","details":"bioactivity"}. Similar queries worked fine (different uniprot ID used in body and jmespath parameter).
click to see problematic query
curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60]' \
--header 'Content-Type: application/json' \
--data '{
"q": ["P29274"],
"scopes": "drugcentral.bioactivity.uniprot.uniprot_id"
}'
Johnathan confirmed that this error could be reproduced locally, and a more specific error message was "KeyError: 'bioactivity'"
[ERROR tornado.application:1875] Uncaught exception POST /v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60] (127.0.0.1)
HTTPServerRequest(protocol='http', host='localhost:8000', method='POST', uri='/v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60]', version='HTTP/1.1', remote_ip='127.0.0.1')
Traceback (most recent call last):
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/tornado/web.py", line 1790, in _execute
result = await result
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 204, in _method
return await coro(*args, **kwargs)
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 264, in post
result = await ensure_awaitable(self.pipeline.search(**self.args))
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 197, in ensure_awaitable
return await obj
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/pipeline.py", line 103, in _
return await func(*args, **kwargs)
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/pipeline.py", line 176, in search
result = self.formatter.transform(response, **options)
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 200, in transform
responses = [self.transform(res, **options) for res in response]
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 200, in <listcomp>
responses = [self.transform(res, **options) for res in response]
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 253, in transform
self._transform_hit(hit, options)
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 303, in _transform_hit
self.trasform_jmespath(path, obj, doc, options)
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 469, in trasform_jmespath
idx_to_remove = [i for i, _obj in enumerate(obj) if not _obj[target_field]]
File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 469, in <listcomp>
idx_to_remove = [i for i, _obj in enumerate(obj) if not _obj[target_field]]
KeyError: 'bioactivity'
Based on our digging so far, I think this error occurs when this query meets all of this criteria:
- retrieves a document with a specific structure (we think the problematic query gets stuck on this retrieved doc)
drugcentral's value is an array of objects, rather than an object. Johnathan found <50 more documents with this structure.- Some of those
drugcentralobjects have thebioactivityfield and others don't.
- the query has the parameter
jmespath_exclude_empty=true. If you try taking it out, the query then returns without an error....but will keep the hits that didn't pass the criteria specified injmespath(bioactivityis[]ornullafter jmespath processing). For BTE/retriever use, it's important that those non-matching hits are removed, and that's whatjmespath_exclude_empty=truewas supposed to do.- Notice the problematic document
OEYIOHPDSNJKLS-UHFFFAOYSA-Nhasbioactivity[] or doesn't exist. So both jmespath criteria weren't met in a single bioactivity object, and we want to remove this hit entirely withjmespath_exclude_empty. - Use
P05186from extra info 5 as a positive control - the problematic doc will have a bioactivity object that meets the jmespath criteria - so it should be kept in the response.
- Notice the problematic document
I think the next steps are to:
- review some more cases where
drugcentralis an array. Is this data structured correctly (should it all be organized into 1 document)? - investigate how
jmespath_exclude_empty=truebehavior currently works and perhaps change it. - is there a way to remove individual
drugcentralobjects that don't have thebioactivityfield? I don't know how to do this.filter=_exists_:drugcentral.bioactivitydoesn't work (it removes entire hits only if the entire document lacks the bioactivity field)
Reactions are currently unavailable