-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Milestone
Description
The Issue
The difference between filter and search is not intuitive and is unnecessary. Furthermore, the api should be refined to be more clear and provide more error handling
The motivation
I just ran this query and had no idea what happened:
In [1]: archives = api.batch_get_archive(api.search(
... : pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
... :
In [2]: len(archives) # The query took forever, then returned ALL archives
Out[2]: 13375Turns out the problem was that api.search doesn't have a pattern argument, but since it accepts **kwargs it failed to catch this. The correct query is:
In [3]: archives = api.batch_get_archive(api.filter(
... : pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
... :
In [4]: len(archives)
Out[4]: 44Proposed solution
We should merge filter and search into one function:
def search(self, *query, prefix=None, pattern=None, engine='path'):
'''
Search for archives using tags, patterns, and prefixes
Parameters
---------
query: str
tags to search for
prefix: str
start of archive name. Providing a start string improves search
speed.
pattern: str
string matching the characters within the archive or set of
archives you are filtering on. Note that authority prefixes, e.g.
``local://my/archive.txt`` are not supported in pattern searches.
engine: str
string of value 'str', 'path', or 'regex'. That indicates the
type of pattern you are filtering on
yields
------
archive_name : str
names of archives matching the specified criteria
'''
if prefix is not None:
prefix = fs.path.relpath(prefix)
if pattern is not None:
pattern = fs.path.relpath(pattern)
archives = self.manager.search(query, begins_with=prefix, )
if not pattern:
for archive in archives:
yield archive
elif engine == 'str':
for arch in archives:
if pattern in arch:
yield arch
elif engine == 'path':
# Change to generator version of fnmatch.filter
for arch in archives:
if fnmatch.fnmatch(arch, pattern):
yield arch
elif engine == 'regex':
for arch in archives:
if re.search(pattern, arch):
yield arch
else:
raise ValueError(
'search engine "{}" not recognized. '.format(engine) +
'choose "str", "fn", or "regex"')