Skip to content

WIP: Input format checks#790

Draft
Adamtaranto wants to merge 1 commit intomainfrom
input_file_checks
Draft

WIP: Input format checks#790
Adamtaranto wants to merge 1 commit intomainfrom
input_file_checks

Conversation

@Adamtaranto
Copy link
Collaborator

@Adamtaranto Adamtaranto commented Jun 28, 2025

Adding checks for common input file formatting checks in blocks, bed, and layout.

Fixes #788 (issue where cols are not separated by tabs)
Fixes #720

atoms = ["." if x == "" else x for x in atoms]

# Check atoms for spaces
if any(" " in x for x in atoms):
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanghaibao are there any cases where a valid gene name might contain a space? Is this too harsh?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is probably fine. I've never seen a gene name with a space in it.

Comment on lines +105 to 121
# Truncate or pad the atoms to match ncols
if len(atoms) > ncols:
logger.warning(
"Row length exceeds expected columns in blocks file, truncating to {} at line {}:\n {}".format(
ncols, row_count, row.strip()
)
)
# Truncate to ncols
atoms = atoms[:ncols]
elif len(atoms) < ncols:
logger.warning(
"Row length less than expected columns in file `{}`, padding to {} cols at line {}:\n{}".format(
filename, ncols, row_count, row.strip()
)
)
# Pad with "." to match ncols
atoms = atoms + ["."] * (ncols - len(atoms))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tanghaibao I've added some warnings around the existing behaviour (pad or truncate).

I think it would be cleaner to raise an error if any line has a column count that differs from the first row (taken as correct number).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Adamtaranto . I'll leave the choice to you, my preference is to simply warn since I don't feel like breaking ppl's script that's already working for them.

Also we may need some tests around this at some point if we do choose to raise error. No pressure though.

@Adamtaranto Adamtaranto marked this pull request as draft June 28, 2025 05:16
@Adamtaranto Adamtaranto self-assigned this Jun 28, 2025
@Adamtaranto
Copy link
Collaborator Author

See issue #735 reading layout file stops if first line is blank. Add catch for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

jcvi.graphics.karyotype seqids layout: IDs can not found in bed files Add format tests for synteny Layout

2 participants