Skip to content

Ensembl/ensembl-datacheck-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ensembl-datacheck-py

PyPI version Python versions See Build Status on GitHub Actions

A genomics data checking plugin


Features

  • TODO

Creating a new plugin

Define the tests in a file within the directory tests (ex tests/fasta.py). Use comment strings. ex:

"""
fasta.py

This tests for proper formatting of a fasta file. See: https://zhanggroup.org/FASTA/
The tests are:
File is a text file
Line length is under 80 char (warning only)
Only has allowed characters
Print allowed type
Ensure the file ends properly
"""

Then a pytest must be written for each test:

def test_check_line_length(file_path, max_length=80):
    """Check for lines longer than max_length and return warnings."""
    line_warnings = check_line_length(file_path, max_length)
    if line_warnings:
        for warning in line_warnings:
            warnings.warn(warning, UserWarning)

The functions are to be written independently and stored in src/ensembl/datacheck_functions. These methods are to be as generic as reasonable and used by as many tests as possible. The methods are stored in files based on function:

content_checks.py : Data checks within a text file

db_checks.py : Checking mysql databases (not implemented yet)

file_checks.py : System level checks of files

utils.py : Other checks or special commands.

Your test will be called, by calling the file name, without extensions, after --test=. ex:

ensembl-datacheck --test=fasta --file=~/TEST/2pass.fasta

Installation

Download the repo and install it (virtual enviroment recomended):

git clone (insert repo here)
pip install ensembl-datacheck-py

Usage

To run a program you can call it like:

ensembl-datacheck --test=fasta --file=~/TEST/2pass.fasta

To Do

  • More tests!
  • Confluence Page
  • Publish it
  • Introduce tests for tests

Contributing

Contributions are very welcome.

License

Distributed under the terms of the Apache Software License 2.0 license, "ensembl-datacheck-py" is free and open source software

Issues

If you encounter any problems, please file an issue along with a detailed description.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages