Skip to content
ken farmer edited this page Jul 29, 2015 · 2 revisions

Administrator easily checks new data every day for problems

  • Checks for 10 types of problems:
    • relationship (primary/foreign key)
    • uniqueness (pk, uniqueness)
    • simple logic (end_date < start_date)
    • formatting (ip address of 300.300.300.300)
    • consistency between base & aggregate table
    • consistency between identical tables on two clusters
    • consistency between source & target tables on separate hosts
    • security policy (correct privs for table or hdfs folder)
    • data management policy (stats age, table names)
    • ?
  • Checks can be written multiple ways:
    • as any executable program within a folder dedicated to a table
    • in python as a module dedicated to a table, located in folder
    • in python as a module identified as reusable, then referenced in table test config

Administrator easily adds new tests to checks

End-Users see data annotations along with regular data on charts & graphs

End-Users investigate data anomalies using Inspector

Clone this wiki locally