A genomics data checking plugin
- TODO
Define the tests in a file within the directory tests (ex tests/fasta.py). Use comment strings. ex:
""" fasta.py This tests for proper formatting of a fasta file. See: https://zhanggroup.org/FASTA/ The tests are: File is a text file Line length is under 80 char (warning only) Only has allowed characters Print allowed type Ensure the file ends properly """
Then a pytest must be written for each test:
def test_check_line_length(file_path, max_length=80):
"""Check for lines longer than max_length and return warnings."""
line_warnings = check_line_length(file_path, max_length)
if line_warnings:
for warning in line_warnings:
warnings.warn(warning, UserWarning)
The functions are to be written independently and stored in src/ensembl/datacheck_functions. These methods are to be as generic as reasonable and used by as many tests as possible. The methods are stored in files based on function:
content_checks.py : Data checks within a text file
db_checks.py : Checking mysql databases (not implemented yet)
file_checks.py : System level checks of files
utils.py : Other checks or special commands.
Your test will be called, by calling the file name, without extensions, after --test=. ex:
ensembl-datacheck --test=fasta --file=~/TEST/2pass.fasta
Download the repo and install it (virtual enviroment recomended):
git clone (insert repo here) pip install ensembl-datacheck-py
To run a program you can call it like:
ensembl-datacheck --test=fasta --file=~/TEST/2pass.fasta
- More tests!
- Confluence Page
- Publish it
- Introduce tests for tests
Contributions are very welcome.
Distributed under the terms of the Apache Software License 2.0 license, "ensembl-datacheck-py" is free and open source software
If you encounter any problems, please file an issue along with a detailed description.