Adding possibility to select cim10 and atc in eds.cim10 and eds.drugs#314
Adding possibility to select cim10 and atc in eds.cim10 and eds.drugs#314
Conversation
55433ab to
c804921
Compare
Coverage Report
Files without new missing coverage
264 files skipped due to complete coverage. Coverage success: total of 97.76% is above 97.76% 🎉 |
|
|
|
||
| patterns = df.groupby("code")["patterns"].agg(list).to_dict() | ||
|
|
||
| patterns = {k: v for k, v in patterns.items() if k in cim10} if cim10 else patterns |
There was a problem hiding this comment.
Why not (which would fail for unknown cim10 codes) and might be a bit faster
| patterns = {k: v for k, v in patterns.items() if k in cim10} if cim10 else patterns | |
| patterns = {k: patterns[k] for k in codes} if codes else patterns |
|
|
||
|
|
||
| def get_patterns() -> Dict[str, List[str]]: | ||
| def get_patterns(cim10: List[str] = None) -> Dict[str, List[str]]: |
There was a problem hiding this comment.
Maybe standardize the cim10/atc into a "code" parameter ?
| def get_patterns(cim10: List[str] = None) -> Dict[str, List[str]]: | |
| def get_patterns(codes: List[str] = None) -> Dict[str, List[str]]: |
| name: str = "cim10", | ||
| *, | ||
| attr: str = "NORM", | ||
| cim10: List[str] = None, |
There was a problem hiding this comment.
| cim10: List[str] = None, | |
| codes: List[str] = None, |
| cim10 : str | ||
| List of cim10 to retrieve. If None, all cim10 will be searched, | ||
| resulting in higher computation time. |
There was a problem hiding this comment.
Standardize cim10/atc/codes
| cim10 : str | |
| List of cim10 to retrieve. If None, all cim10 will be searched, | |
| resulting in higher computation time. | |
| codes : str | |
| CIM10 codes to retrieve. If None, synonyms for all codes will be searched | |
| resulting in higher computation time. |
| name=name, | ||
| regex=dict(), | ||
| terms=get_patterns(), | ||
| terms=get_patterns(cim10), |
There was a problem hiding this comment.
| terms=get_patterns(cim10), | |
| terms=get_patterns(codes), |
| attr : str | ||
| The default attribute to use for matching. | ||
| atc : str | ||
| List of atc to retrieve. If None, all atc will be searched, |
There was a problem hiding this comment.
| List of atc to retrieve. If None, all atc will be searched, | |
| codes : str | |
| ATC codes to retrieve. If None, synonyms for all codes will be searched | |
| resulting in higher computation time. |
| name=name, | ||
| regex=dict(), | ||
| terms=get_patterns(), | ||
| terms=get_patterns(atc), |
There was a problem hiding this comment.
| terms=get_patterns(atc), | |
| terms=get_patterns(codes), |
|
|
||
|
|
||
| def get_patterns() -> Dict[str, List[str]]: | ||
| def get_patterns(atc: List[str] = None) -> Dict[str, List[str]]: |
There was a problem hiding this comment.
| def get_patterns(atc: List[str] = None) -> Dict[str, List[str]]: | |
| def get_patterns(codes: List[str] = None) -> Dict[str, List[str]]: |
| with open(drugs_file, "r") as f: | ||
| return json.load(f) | ||
| patterns = json.load(f) | ||
| patterns = {k: v for k, v in patterns.items() if k in atc} if atc else patterns |
There was a problem hiding this comment.
| patterns = {k: v for k, v in patterns.items() if k in atc} if atc else patterns | |
| patterns = {k: patterns[k] for k in codes} if codes else patterns |
| doc = matcher(doc) | ||
|
|
||
| assert len(doc.spans["measurements"]) == 0 | ||
| assert len(doc.spans["quantities"]) == 0 |
There was a problem hiding this comment.
Add a backward compatibility test ?
| assert len(doc.spans["quantities"]) == 0 | |
| assert len(doc.spans["quantities"]) == 0 | |
| def test_measurements(blank_nlp): | |
| blank_nlp.add_pipe("eds.measurements") | |
| assert blank_nlp("La tumeur fait 4 cm").spans["measurements"][0]._.value.mm == 40 |
ae75dc5 to
430ef22
Compare
2038fb9 to
232ca91
Compare
fe81659 to
1ffa7c6
Compare
d2e1f39 to
65669dc
Compare



Description
Currently,
eds.cim10andeds.drugsdoes not allow to select cim10 or ATC of interest, resulting in high computational time when used.TODO : see also
eds.umls?Checklist