Conversation
| atoms = ["." if x == "" else x for x in atoms] | ||
|
|
||
| # Check atoms for spaces | ||
| if any(" " in x for x in atoms): |
There was a problem hiding this comment.
@tanghaibao are there any cases where a valid gene name might contain a space? Is this too harsh?
There was a problem hiding this comment.
this is probably fine. I've never seen a gene name with a space in it.
| # Truncate or pad the atoms to match ncols | ||
| if len(atoms) > ncols: | ||
| logger.warning( | ||
| "Row length exceeds expected columns in blocks file, truncating to {} at line {}:\n {}".format( | ||
| ncols, row_count, row.strip() | ||
| ) | ||
| ) | ||
| # Truncate to ncols | ||
| atoms = atoms[:ncols] | ||
| elif len(atoms) < ncols: | ||
| logger.warning( | ||
| "Row length less than expected columns in file `{}`, padding to {} cols at line {}:\n{}".format( | ||
| filename, ncols, row_count, row.strip() | ||
| ) | ||
| ) | ||
| # Pad with "." to match ncols | ||
| atoms = atoms + ["."] * (ncols - len(atoms)) |
There was a problem hiding this comment.
@tanghaibao I've added some warnings around the existing behaviour (pad or truncate).
I think it would be cleaner to raise an error if any line has a column count that differs from the first row (taken as correct number).
There was a problem hiding this comment.
thanks @Adamtaranto . I'll leave the choice to you, my preference is to simply warn since I don't feel like breaking ppl's script that's already working for them.
Also we may need some tests around this at some point if we do choose to raise error. No pressure though.
|
See issue #735 reading layout file stops if first line is blank. Add catch for this. |
Adding checks for common input file formatting checks in blocks, bed, and layout.
Fixes #788 (issue where cols are not separated by tabs)
Fixes #720