How to handle file exclusions and inclusions in 0.8?

This is an issue that has cropped up repeatedly in various guises (e.g., #215, #364, #277, #184, #131, and probably others). The question is how to allow users to specify explicit inclusion and exclusion paths at `BIDSLayout` initialization. The reason for bringing this up again is that, as of version 0.8 (see #369), pybids will no longer depend on grabbit. The cord-cutting means we can no longer rely on the behavior implemented in grabbit. Since this was entirely undocumented in pybids, I think we have a good opportunity to start afresh and hopefully settle on something that works for everyone.

The main constraints I think we should try to respect are:
* We want to exclude a bunch of hard-coded subdirectories by default (e.g., `'code'`, `'stimuli'`, `'sourcedata'`, etc.)
* Users should be able to easily override any of the default exclusions and make sure they're indexed
* Users should be able to specify arbitrary directories anywhere in the file system that should *not* be indexed (in the event they're encountered in any raw or derivatives `BIDSLayout`)

The current approach doesn't allow users to specify explicit exclusions at all (well, it does, but this is an undocumented grabbit feature). It uses an `include` argument only as a means of negating the default exclusions. E.g., if you want `'stimuli'/` to be indexed, you pass `include=['stimuli']`. Beyond this, there's no pybids-level ability to control inclusions or exclusions (aside from specifying derivatives, which is a separate matter that I think we're handling in a satisfactory way). I don't think this is satisfactory, and a bunch of the opened issues reflect that.

Here are a few proposals (feel free to suggest others):
1) Keep the current approach, where `include` negates values in the default exclusion list, but add an `exclude` argument that causes any matching files/dirs to be skipped during indexing. The main downside I see here is that the behavior is counterintuitive, as `include` and `exclude` act asymmetrically. A potential fix is to give these arguments different names (e.g., `override_exclusions` and `exclude_paths`).

2) Stick with just `exclude`, and have any manually specified value override the default internal list (e.g., if you pass `['code', 'sourcedata']`, then things like `'stimuli'` will now be indexed, and *only* files/dirs that match the elements in your list will be skipped). The downside of this is it requires users to know what the default exclusions are, and reproduce them, and this will probably get pretty messy.

3) Get rid of the current default exclusion list entirely, and treat `exclude` as a strict list of paths to exclude from indexing. Now that the validator is working properly, directories like 'stimuli' will automatically be skipped if `validate=True`, because files won't pass the validator unless they're explicitly part of the spec. The downside of this option is that it makes it difficult to index selectively—e.g., if you want to index only what's in `'stimuli'`, you need to set `validate=False` and then pass a whole pile of exclusions (i.e., everything that doesn't pass the validator *except* for `'stimuli'`).

I lean towards (1) (with more explicit argument names). Thoughts? If I don't get any feedback in the next couple of days, I'll make an executive decision in the interest of getting 0.8 merged, so speak up now if you have an opinion! (Tagging in @effigies @adelavega @yarikoptic @gkiar)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to handle file exclusions and inclusions in 0.8? #378

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

How to handle file exclusions and inclusions in 0.8? #378

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions