Healthchecks are grouped in 5 categories:
- 'Hard' integrity: entity and referential integrity.
- 'Soft' integrity: user-defined integrity.
- Schema-related sanity: unique rows, blank & NULL values etc.
- Data-related sanity: Data format & values.
- Comparison of databases and/or database versions
A set of 'old' healthchecks will be adapted into Perl using the Ensembl API. For of the above categories a number of old healthchecks has been chosen to demonstrate the concept. Further development and optimisation will follow.
Inital set of tests to be developed: (by their old name, all can be found in: https://github.com/Ensembl/ensj-healthcheck/tree/release/83/src/org/ensembl/healthcheck/testcase/generic )
Category 1
- CoreForeignKeys
AncestralSequencesExtraChecks- Tests on compara db - what was it doing in the generic repository?
Category 2
- AssemblyMapping
AssemblyMultipleOverlapFeatureCoords- LRG
- Meta
- ProjectXrefs
- SeqRegionCoordSystem
- SequenceLevel
VariationDensity- XrefTypes
Category 3
- AutoIncrement
BlanksInsteadOfNullsSchemaTypeStableID
Category 4
- AssemblyNameLength
- DataFiles
GeneCountNonGTACNSequencyXrefPrefixes
Category 5
ComparePreviousDatabases- CoordSystemAcrossSpecies
MySQLStorageEngineProductionMeta (or another Production test)SeqRegionAcrossSpecies & SeqRegionAttribAcrossSpecies
(Non-ensembl modules used: Moose, Test::CSV, Getopt::Long, File::Spec)