-
Notifications
You must be signed in to change notification settings - Fork 13
Valid
cesine edited this page Apr 13, 2011
·
3 revisions
- true positives
- false positives
- true negatives
- false negatives
- actual positives (true positives + false negatives)
- actual negatives (true negatives + false positives)
Medicine is more concerned with high accuracy of the negative test results.
- Sensitivity refers to the probability that a diagnostic technique will detect a particular disease or condition when it does indeed exist in a patient (National Multiple Sclerosis Society). A measure with high sensitivity is sensitive to the existence of the condition, and will avoid false negatives and avoid delayed treatment.
- Specificity refers to the probability that a diagnostic technique will indicate a negative test result when the condition is absent (true negative). A measure with high specificity can be used to eliminate candidate diseases/conditions.
Information retrieval is more concerned with high accuracy of the returned (positive) results.
- Precision (true positives/returned positives) is defined as the number of relevant documents retrieved by a search divided by the total number of documents retrieved by that search. Precision is a measure of correctness, in the context of a final disease diagnosis or final search result few false positives (high precision) is preferred.
- Recall (true positives/actual positives) is defined as the number of relevant documents retrieved by a search divided by the total number of existing relevant documents. Recall is a measure of completeness, in the context of early disease testing or a first pass search, few false negatives (high recall) is preferred.
- F-score is a weighted average of accuracy which simply combines the measures of precision and recall. ( precision * recall) / ( precision + recall)
- Conjunction: apply in conjunction with known tests or tools (to achieve the "actual" values).
- Gold standard/Criterion: Examines the extent to which a measure provides results that are consistent with a gold standard . It is typically divided into concurrent validity and predictive validity.
- Concurrent: To validate a new measure, the results of the measure are compared to the results of the gold standard obtained at approximately the same point in time (concurrently), so they both reflect the same construct. This approach is useful in situations when a new or untested tool is potentially more efficient, easier to administer, more practical, or safer than another more established method and is being proposed as an alternative instrument.
- Convergent: A type of validity that is determined by hypothesizing and examining the overlap between two or more tests that presumably measure the same construct. In other words, convergent validity is used to evaluate the degree to which two or more measures that theoretically should be related to each other are, in fact, observed to be related to each other.
- Predictive: predictive of hospital stay length, predictive of improvement following rehabilitation
- Floor effects: too many subjects perform at floor level. The floor effect is when data cannot take on a value lower than some particular number. Thus, it represents a subsample for whom clinical decline may not register as a change in score, even if there is worsening of function/behavior etc. because there are no items or scaling within the test that measure decline from the lowest possible score.
- Ceiling effects: too many subjects perform at ceiling level. A ceiling effect occurs when test items aren't challenging enough for a group of individuals. Thus, the test score will not increase for a subsample of people who may have clinically improved because they have already reached the highest score that can be achieved on that test. In other words, because the test has a limited number of difficult items, the most highly functioning individuals will score at the highest possible score. This becomes a measurement problem when you are trying to identify changes - the person may continue to improve but the test does not capture that improvement.
- repeated use of the same screening tool with the same client often reduces its validity
- it is advised that some screening tools not be used repeatedly with the same individual if the time interval between testing is short.