Skip to content

danstaines/Ensembl_thesis

 
 

Repository files navigation

Ensembl_thesis

Contains the new healthchecks in Perl for the Ensembl database.

Healthchecks are grouped in 5 categories:

  1. 'Hard' integrity: entity and referential integrity.
  2. 'Soft' integrity: user-defined integrity.
  3. Schema-related sanity: unique rows, blank & NULL values etc.
  4. Data-related sanity: Data format & values.
  5. Comparison of databases and/or database versions

A set of 'old' healthchecks will be adapted into Perl using the Ensembl API. For of the above categories a number of old healthchecks has been chosen to demonstrate the concept. Further development and optimisation will follow.

Inital set of tests to be developed: (by their old name, all can be found in: https://github.com/Ensembl/ensj-healthcheck/tree/release/83/src/org/ensembl/healthcheck/testcase/generic )

Category 1

  • CoreForeignKeys
  • AncestralSequencesExtraChecks - Tests on compara db - what was it doing in the generic repository?

Category 2

  • AssemblyMapping
  • AssemblyMultipleOverlap
  • FeatureCoords
  • LRG
  • Meta
  • ProjectXrefs
  • SeqRegionCoordSystem
  • SequenceLevel
  • VariationDensity
  • XrefTypes

Category 3

  • AutoIncrement
  • BlanksInsteadOfNulls
  • SchemaType
  • StableID

Category 4

  • AssemblyNameLength
  • DataFiles
  • GeneCount
  • NonGTACNSequency
  • XrefPrefixes

Category 5

  • ComparePreviousDatabases
  • CoordSystemAcrossSpecies
  • MySQLStorageEngine
  • ProductionMeta (or another Production test)
  • SeqRegionAcrossSpecies & SeqRegionAttribAcrossSpecies

(Non-ensembl modules used: Moose, Test::CSV, Getopt::Long, File::Spec)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • Perl 99.9%
  • Shell 0.1%