Skip to content

jmespath: new 500 error scenario?  #368

@colleenXu

Description

@colleenXu

@DylanWelzel @ctrl-schaff (based on discussion in this Slack thread)

While writing queries and testing x-bte annotation in biothings/biothings_explorer#904, I found a query that returns an error: {"code":500,"success":false,"error":"Internal Server Error","details":"bioactivity"}. Similar queries worked fine (different uniprot ID used in body and jmespath parameter).

click to see problematic query

curl --location --globoff 'https://mychem.info/v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60]' \
--header 'Content-Type: application/json' \
--data '{
    "q": ["P29274"],
    "scopes": "drugcentral.bioactivity.uniprot.uniprot_id"
}'

Johnathan confirmed that this error could be reproduced locally, and a more specific error message was "KeyError: 'bioactivity'"

[ERROR tornado.application:1875] Uncaught exception POST /v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60] (127.0.0.1)
    HTTPServerRequest(protocol='http', host='localhost:8000', method='POST', uri='/v1/query?size=1000&fields=drugcentral.bioactivity%2Cdrugcentral.xrefs.umlscui%2Cdrugcentral.synonyms&jmespath_exclude_empty=true&always_list=drugcentral.bioactivity&jmespath=drugcentral.bioactivity%7C[%3F!action_type%20%20%26%26%20length(uniprot[%3Funiprot_id%3D%3D%27P29274%27])%20%3E%20%600%60]', version='HTTP/1.1', remote_ip='127.0.0.1')
    Traceback (most recent call last):
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/tornado/web.py", line 1790, in _execute
        result = await result
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 204, in _method
        return await coro(*args, **kwargs)
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 264, in post
        result = await ensure_awaitable(self.pipeline.search(**self.args))
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/handlers/query.py", line 197, in ensure_awaitable
        return await obj
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/pipeline.py", line 103, in _
        return await func(*args, **kwargs)
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/pipeline.py", line 176, in search
        result = self.formatter.transform(response, **options)
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 200, in transform
        responses = [self.transform(res, **options) for res in response]
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 200, in <listcomp>
        responses = [self.transform(res, **options) for res in response]
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 253, in transform
        self._transform_hit(hit, options)
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 303, in _transform_hit
        self.trasform_jmespath(path, obj, doc, options)
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 469, in trasform_jmespath
        idx_to_remove = [i for i, _obj in enumerate(obj) if not _obj[target_field]]
      File "/home/jschaff/workspace/biothings/lib/python3.10/site-packages/biothings/web/query/formatter.py", line 469, in <listcomp>
        idx_to_remove = [i for i, _obj in enumerate(obj) if not _obj[target_field]]
    KeyError: 'bioactivity'

Based on our digging so far, I think this error occurs when this query meets all of this criteria:

  • retrieves a document with a specific structure (we think the problematic query gets stuck on this retrieved doc)
    1. drugcentral's value is an array of objects, rather than an object. Johnathan found <50 more documents with this structure.
    2. Some of those drugcentral objects have the bioactivity field and others don't.
  • the query has the parameter jmespath_exclude_empty=true. If you try taking it out, the query then returns without an error....but will keep the hits that didn't pass the criteria specified in jmespath (bioactivity is [] or null after jmespath processing). For BTE/retriever use, it's important that those non-matching hits are removed, and that's what jmespath_exclude_empty=true was supposed to do.
    • Notice the problematic document OEYIOHPDSNJKLS-UHFFFAOYSA-N has bioactivity [] or doesn't exist. So both jmespath criteria weren't met in a single bioactivity object, and we want to remove this hit entirely with jmespath_exclude_empty.
    • Use P05186 from extra info 5 as a positive control - the problematic doc will have a bioactivity object that meets the jmespath criteria - so it should be kept in the response.

I think the next steps are to:

  • review some more cases where drugcentral is an array. Is this data structured correctly (should it all be organized into 1 document)?
  • investigate how jmespath_exclude_empty=true behavior currently works and perhaps change it.
  • is there a way to remove individual drugcentral objects that don't have the bioactivity field? I don't know how to do this. filter=_exists_:drugcentral.bioactivity doesn't work (it removes entire hits only if the entire document lacks the bioactivity field)

Metadata

Metadata

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions