Evaluation of Regular Expression Takes Too Long

I tried to import `word2vec` using nlpia module. But it took almost forever. I thought it was a memory issue. But my laptop has 16G RAM and the memory usage was fine when I run the code below. 
```
from nlpia.data.loaders import get_data
word_vectors = get_data('word2vec')
```
When I traced the source code, I found the evaluation of the regular expression in futil.py took too long.
https://github.com/totalgood/nlpia/blob/master/src/nlpia/futil.py#L275

I test the time of the evaluation using the test code below. I found the evaluation took 6 minutes. This evaluation need to be run for 6 times in the entire code. That mean, we need to wait at least for half a hour to get the result. That seems not efficient. Maybe we should make the regular expression more efficient or only evaluate the the file name, not the full file path.
```
import re
import timeit

filepath = fplower = '/users/kelly/.pyenv/versions/nlpia/lib/python3.8/site-packages/nlpia/bigdata/googlenews-vectors-negative300.bin.gz'
ext, newext = '.tgz', '.tar.gz'
r = ext.lower().replace('.', r'\.') + r'$'
r = r'^[.]?([^.]*)\.([^.]{1,10})*' + r

start_time = timeit.default_timer()
if re.match(r, fplower) and not fplower.endswith(newext):
    filepath = filepath[:-len(ext)] + newext
elapsed = timeit.default_timer() - start_time
print('elapsed time is: ', elapsed)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation of Regular Expression Takes Too Long #36

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Evaluation of Regular Expression Takes Too Long #36

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions