Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 16 additions & 7 deletions bids/layout/layout.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ class BIDSLayout(Layout):
was left empty (in which case the root defaults to the file system
root).
derivatives (bool, str, list): Specificies whether and/or which
derivatives to to index. If True, all pipelines found in the
derivatives to index. If True, all pipelines found in the
derivatives/ subdirectory will be indexed. If a str or list, gives
the paths to one or more derivatives directories to index. If False
or None, the derivatives/ directory is ignored during indexing, and
Expand Down Expand Up @@ -114,7 +114,10 @@ def __init__(self, root, validate=True, index_associated=True,
"dataset_description.json." % k)

# Determine which subdirectories to exclude from indexing
excludes = {"code", "stimuli", "sourcedata", "models", "derivatives"}
# A set is extended with non-BIDS-defined common suspects
# which are not expected to carry any data of interest
excludes = {"code", "stimuli", "sourcedata", "models", "derivatives",
".git", ".datalad", ".cache", ".workdir", "workdir"}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than hard-coding these, I think it's reasonable to expect users to maintain a .bidsignore file, as these should also cause validator issues.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

> cat .bidsignore
inputs/
workdir/
eyetrack/
README.md
model-frst_smdl.json
CHANGELOG.md

this is what I have in my .bidsignore, files in workdir were still considered.
@yarikoptic remarks: "Would it ever makes sense to consider .git files?"

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree .git should never be considered. I'm not sure .bidsignore reading is implemented. That would be a valuable contribution. But I also think sensible hardcoded defaults (such as these) make sense.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please update the docstring

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to specify sensible defaults, I'd be inclined to disallow all .* files except .bidsignore.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason these particular values are hardcoded is that they're explicitly mentioned in the spec as places people should put stuff, but we don't want to index them by default. The other directories (e.g., .git) are not part of the spec, so I don't want to give them special handling. I we start adding things like workdir/, there's no limit to what people might want to add.

I agree that .bidsignore support is the right way to handle this. Pybids currently passes (in kwargs) an exclude argument onto grabbit, which one can use to pass arbitrary regexes that (I believe) are matched against every file. So probably the easiest way to implement .bidsignore support would be to add entries in .bidsignore to the exclude list. There may be unintended consequences, however, so this should be investigated.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good to me, but I think excluding .* folders by default is probably a good idea (when would it be a good idea to scan them in?) Other folders can go into a bidsignore

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to specify sensible defaults, I'd be inclined to disallow all .* files except .bidsignore.

Just curious -- what for to allow/consider .bidsignore? if any function would be needing to treat this this file, it would just test for its presence and open it. It doesn't need it to be a part of the Layout schema/pool of files, since there is nothing to query for in its name.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other directories (e.g., .git) are not part of the spec, so I don't want to give them special handling. I we start adding things like workdir/, there's no limit to what people might want to add.

I agree that .bidsignore support is the right way to handle this.

I agree too! Support of .bidsignore would allow for all those workdir etc. But also I agree with @adelavega -- I do not think in any BIDS spec we would/should allow/be interested in files/directories starting with . for the purpose of paths matching etc.
As I have pointed above, .bidsignore is a special file which is not part of the path layout template matching and actually not a part of the BIDS specification itself! Its existence was instigated within bids-validator, so it somewhat of a tooling level file. So it should IMHO be ok to ignore all . files/dirs in general for pattern matching. I think it might even be worth to "hardcode" such consideration in BIDS specification itself? e.g.

"Note on BIDS dataset tools support/development: No directories or files in BIDS are generally named with leading . and thus should be ignored for the purpose of discovery of relevant data files. A .bidsignore file can be used by dataset users to inform tools of additional directories/files which should also be ignored by the tools. The format of .bidsignore file is .."

Copy link
Copy Markdown
Collaborator

@tyarkoni tyarkoni Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed—let's add "^\." to the list of default excludes (if I'm not mistaken, they're regexes by default) and remove the other entries (i.e., workdir and anything that starts with .). We can add .bidsignore support in a separate PR, unless someone wants to take a pass at it here. It shouldn't be difficult—unless I'm missing something, I think just adding all the lines in that file to the internally-constructed exclude list that gets passed onto grabbit's Layout initializer should do the trick.

I don't have strong feelings about whether or not the spec should mention . and .bidsignore. I guess I lean towards not for the time being; to my mind the place to put that would be some kind of developer guide or appendix.

if include is not None:
include = listify(include)
if "derivatives" in include:
Expand Down Expand Up @@ -563,18 +566,24 @@ def index_file(self, f, overwrite=False):
an entry already exists.
"""
if isinstance(f, six.string_types):
f = self.layout.get_file(f)
filename = self.layout.get_file(f).path
elif isinstance(f, tuple):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the motivation for this? For methods like this, I'd prefer to minimize the number of valid types that can be passed as input. It's cleaner if the caller has to first extract a string or BIDSFile, I think. Plus, long term, I'd like to get rid of the namedtuple entirely—I think it mostly just adds confusion.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Motivation was pointed to below this line: http://github.com/grabbles/grabbit/pull/84 . In the effort of using fitlins with current pybids we ended up here with the File tuple which has no .path (see that issue for our past discussion). Since our attempts are largely shots in the darkness trying to maneuver between fluid APIs, I did not want to try to change more than minimally necessary and this change was such a minimal change to stay compatible possibly with older and newer pybids API.

Copy link
Copy Markdown
Collaborator

@tyarkoni tyarkoni Nov 7, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic How would you feel about getting rid of the namedtuple entirely, and just returning the File object by default? The main functional difference is that entity names are attributes in the tuple, but not in the File class. But for the sake of backward compatibility, I'd be okay with hacking getattr to check in the .entities dict before raising an exception. That way I think nobody's existing code would break, and we would reduce code complexity. @effigies @adelavega feel free to chime in also.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see no problems with it, as long as its backwards compatible. It does seem a bit redundant and confusing two have two types of objects that can be returned.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #281 to deal with this.

# See http://github.com/grabbles/grabbit/pull/84
# .path is not available within File named tuple
filename = f.filename
else:
filename = f

if f.path in self.file_index and not overwrite:
if filename in self.file_index and not overwrite:
return

md = self._get_metadata(f.path)
md = self._get_metadata(filename)

for md_key, md_val in md.items():
if md_key not in self.key_index:
self.key_index[md_key] = {}
self.key_index[md_key][f.path] = md_val
self.file_index[f.path][md_key] = md_val
self.key_index[md_key][filename] = md_val
self.file_index[filename][md_key] = md_val

def _get_metadata(self, path, **kwargs):
potential_jsons = self.layout._get_nearest_helper(path, '.json',
Expand Down