Skip to content

medizininformatik-initiative/dqLib

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

104 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dqLib

dqLib is an R package for data quality assessment and reporting. dqLib provides methods for calculating data quality metrics and generating reports on detected data quality issues, especially in CORD-MI.

Acknowledgement: This work was done within the “Collaboration on Rare Diseases” of the Medical Informatics Initiative (CORD-MI) funded by the German Federal Ministry of Education and Research (BMBF), under grant number: 01ZZ1911R, FKZ-01ZZ1911R

Data Quality Metrics and Reports

  • dqLib provides functions for creating specific reporting scripts that enable user to select desired data quality dimensions and indicators. The data quality reports provide adequate information to find the data quality violations and the causes of these violations.

  • The following data quality dimensions, indicatos and parameters are implemented:

    Dimension Data Quality Indicator Name
    completeness item completeness rate, value completeness rate, orphaCoding completeness rate
    plausibility orphaCoding plausibility rate, range plausibility rate
    uniqueness RD case unambiguity rate, RD case dissimilarity rate
    concordance concordance of RD cases, concordance of tracer cases
    No. Data Quality Parameter Name Description
    P1 missing data items number of missing data items per year
    P2 mandatory data items number of mandatory items per year
    P3 missing data values number of missing data values per year
    P4 available data values number of available data values per year
    P5 missing orphacodes number of missing Orphacodes per year
    P6 tracer diagnoses number of tracer RD diagnoses per year
    P7 implausible links number of implausible code links per year
    P8 checked for outliers number of checked data values for outliers per year
    P9 outliers number of detected outliers per year
    P10 ambigous RD cases number of ambigous RD cases per year
    P11 RD cases number of RD cases per year
    P13 duplicated RD cases number of duplicated RD cases per year
    P14 tracer cases number of tracer RD cases per year
    P15 inpatient cases number of inpatient cases per year
    P16 RD cases rel. frequency relative frequency of inpatient RD cases per year
    P17 tracer cases rel. frequency relative frequency of inpatient tracer RD cases per year
    P18 available cases number of available cases per year
    P19 available patients number of available patients per year
    P20 orphacodes number of available orphacodes per year
    P21 orpha-coded cases number of available orpha-coded cases per year
    P22 unambigous RD cases number of unambigous RD cases per year
  • The following references are required to assess the quality of RD documentation:


Installation

You can install dqLib from local folder with:

devtools::install_local("./dqLib")

You can also install it directly from github with:

devtools::install_github("https://github.com/medizininformatik-initiative/dqLib")

Example

Here are examples for data quality analysis and reporting using this package

Note

The default data quality dimensions are completeness, plausibility, uniqueness and concordance. Howerver, this framework allows the user to select desired quality dimensions and indicators as well as to generate user defined DQ reports.

To cite dqLib, please use the following BibTeX entry:

@software{Tahar_dqLib,
author = {Tahar, Kais},
title = {{dqLib}},
url = {https://github.com/KaisTahar/dqLib}
year = {2021}
}

See also: CORD-MI

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages