Conversation
addok_france/utils.py
Outdated
|
|
||
| def clean_query(q): | ||
| q = re.sub(r'([\d]{5})', r' \1 ', q, flags=re.IGNORECASE) | ||
| q = re.sub(r'(^| )(boite postale|b\.?p\.?|cs|tsa|cidex) *(n(o|°|) *|)[\d]+ *', r'\1', q, flags=re.IGNORECASE) |
There was a problem hiding this comment.
Regex is quite slow, I'm not sure we wanna go that far in cleaning.
Have you had a look on perfs? :)
There was a problem hiding this comment.
regexp are not that slow (I measured 3µs for this one)... and it is called just once to clean the query
addok_france/utils.py
Outdated
| def fold_ordinal(s): | ||
| """3bis => 3b.""" | ||
| if s[0].isdigit() and not s.isdigit(): | ||
| if s is not None and s !='' and s[0].isdigit() and not s.isdigit(): |
There was a problem hiding this comment.
I feel like this needs to be fixed properly beforehand. I'll have a look.
There was a problem hiding this comment.
No way to reproduce the issue, neither from the shell, the pyshell or the http API.
Can you be a bit more specific on how you get the issue here? A simple way to reproduce from shell or pyshell would help :)
There was a problem hiding this comment.
This needs either a reproducable test case (so we can understand) either removal :)
- Ajout de 22 nouveaux types de voies : aérodrome, déviation, digue, embranchement, jardin, jetée, passerelle, placette, parvis, quartier, ruelle, terrasse, tunnel, viaduc, villa, etc. - Amélioration de 22 patterns existants pour mieux reconnaître les abréviations courantes : bld/blvd, ch/chem, pass, dom, res, prom, ham, fbg, carr, trav, espl, etc. - Ajout de tests pour valider la reconnaissance des numéros de rue avec ces nouveaux types et abréviations Source: PR #8
More variations for BP/CS/CIDEX/TSA like:
B.P 41
TSA N°41
Cleans phone/fax numbers too...