Fix a bug
The bug is that Ronin may split the same identifier into different results due to the term order in the set of common_terms_with_numbers.
Reproduction
I added md5sum into the set of common_terms_with_numbers and then ran ronin.split("md5sum") several times.
The splitting results were sometimes ["md5sum"] and sometimes ["md5", "sum"].
Reason & Solution
I checked the code and found that the heuristic_split function in simple_splitters.py relys on the regex expression _exceptions_re.
The _exceptions_re is generated from common_terms_with_numbers without considering term order in the set.
It means that if "md5" is before "md5sum" in _exceptions_re, the split result is ["md5", "sum"]; If "md5sum" is before "md5" in _exceptions_re, the split result is ["md5sum"].
Solution: Sort the terms by term length when generating _exceptions_re.
_exceptions_re = re.compile(r'(' + '|'.join(sorted(common_terms_with_numbers, key=lambda term: len(term), reverse=True)) + ')', re.I)