Determination of the number of observers needed to evaluate a subjective test and its application in two PD-L1 studies. Statistics in Medicine. 2022; 41( 8): 1361– 1375. doi:10.1002/sim.9282, , , , .
In any area of medicine, but especially in pathology, subjective assays and diagnostic tests can dramatically affect treatment of cancer. Diagnostic test results may vary between pathologists or observers who interpret tissue slides or molecular tests on these slides. Different observers may have discordant interpretations for certain assays even with training. While there are many statistically rigorous methods for measuring concordance between observers, we are unaware of a method that can identify how many observers are needed to determine whether the test itself is designed in such a way that the pathologists can reach an acceptable concordance for optimal patient care. In this paper we develop a statistical framework to assess the performance of a diagnostic test with multiple observers. The proposed method includes 1) an exploratory analysis, 2) a statistical test of whether the observers’ agreement percentage will plateau to a non-zero value, and 3) a statistical model to estimate the agreement percentage and the number of observers for reaching the plateau. We applied this method in a non-small cell lung cancer example and a triple negative breast cancer example using reads of the immunohistochemical test for expression of Programmed death-ligand 1 (PD-L1) to determine the number of observers needed for evaluation of the subjective tests.