Adjustment of a Prevalence Estimate for Misclassification

Normally, a prevalence (or incidence) estimate consists simply of a proportion: those with the disease in the numerator, and those with and without the disease in the denominator. However, if measurement error is acknowledged, it must be accepted that the subjects in the numerator truly reflect a mixture of true positives and false positives with respect to the disease variable.

Denote those who truly have the disease as 'A', those who truly do not as 'B'. Subjects determined using an imperfect measure to have the disease, 'a' and those determined using an imperfect measure to be disease negative, 'b'.

Based on the above, we can say that:

a = TP + FP

or a = (Se)A + (1-Sp)(B)


b = (1-Se)A + (Sp)B

These equations can be re-arranged to calculate an estimate of the true prevalence:

A / (A+B)

from the observed prevalence:

a / (a+b)

and the error rates (Se and Sp). The adjusted prevalence should represent the population value resulting in the observed prevalence, given the misclassification rates.

For a more detailed discussion, and an example, see: Patten SB. Integrating Data from Clinical and Administrative Databases in Pharmacoepidemiological Research. Canadian Journal of Clinical Pharmacology 1998; 5(2): 92-97.

Explore the Impact of Misclassification Bias by Putting Some Values into this Calculator

Observed Proportion: