Skip to content

filter and search should be merged #269

@delgadom

Description

@delgadom

The Issue

The difference between filter and search is not intuitive and is unnecessary. Furthermore, the api should be refined to be more clear and provide more error handling

The motivation

I just ran this query and had no idea what happened:

In [1]: archives = api.batch_get_archive(api.search(
  ... :     pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
  ... :

In [2]: len(archives) # The query took forever, then returned ALL archives
Out[2]: 13375

Turns out the problem was that api.search doesn't have a pattern argument, but since it accepts **kwargs it failed to catch this. The correct query is:

In [3]: archives = api.batch_get_archive(api.filter(
  ... :     pattern='ACP/climate/smme/HDD-CDD/county/001/annual/rcp85/*/198101-*.nc')
  ... :

In [4]: len(archives)
Out[4]: 44

Proposed solution

We should merge filter and search into one function:

    def search(self, *query, prefix=None, pattern=None, engine='path'):
        '''
        Search for archives using tags, patterns, and prefixes

        Parameters
        ---------
        query: str
            tags to search for

        prefix: str
            start of archive name. Providing a start string improves search
            speed.

        pattern: str
            string matching the characters within the archive or set of
            archives you are filtering on. Note that authority prefixes, e.g.
            ``local://my/archive.txt`` are not supported in pattern searches.

        engine: str
            string of value 'str', 'path', or 'regex'. That indicates the
            type of pattern you are filtering on

        yields
        ------
        archive_name : str
            names of archives matching the specified criteria
        '''

        if prefix is not None:
            prefix = fs.path.relpath(prefix)

        if pattern is not None:
            pattern = fs.path.relpath(pattern)

        archives = self.manager.search(query, begins_with=prefix, )

        if not pattern:
            for archive in archives:
                yield archive

        elif engine == 'str':
            for arch in archives:
                if pattern in arch:
                    yield arch

        elif engine == 'path':
            # Change to generator version of fnmatch.filter

            for arch in archives:
                if fnmatch.fnmatch(arch, pattern):
                    yield arch

        elif engine == 'regex':
            for arch in archives:
                if re.search(pattern, arch):
                    yield arch

        else:
            raise ValueError(
                'search engine "{}" not recognized. '.format(engine) +
                'choose "str", "fn", or "regex"')

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions