Prospective analysis of inter-observer and intra-observer variability in multi ultrasound descriptor assessment of thyroid nodules
Katarzyna Dobruch-Sobczak1,2, Bartosz Migda3, Agnieszka Krauze3, Krzysztof Mlosek3, Rafał Z. Słapa3, Paweł Wareluk3, Elwira Bakuła-Zalewska4, Zbigniew Adamczewski5,6, Andrzej Lewiński5,6, Wiesław Jakubowski3, Marek Dedecjus7
Aim: The aim of this study was to evaluate the inter- and intra-observer variability and accuracy of ultrasound assessment of thyroid nodules using a descriptive lexicon. Materials and methods: A prospective study was performed on complete ultrasound examinations, including sonoelastography and color Doppler ultrasound of 18 patients with 20 thyroid nodules. A total of 20 records of thyroid nodules from these techniques were duplicated, numbered, and randomly arranged. Five radiologists assessed the recordings independently. Cohen Kappa and Fleiss Kappa statistics were used to determine the degree of intra- and inter-observer agreement. Results: Mean accuracy rates for all radiologists, for all ultrasound features, ranged from 82.7 to 87.8%. For B-mode and strain elastography, accuracies ranged from 65.0 to 100% and 47.4 to 86.8%, respectively. Concerning intra-observer variability, three radiologists demonstrated almost perfect agreement (the κ-value ranged from 0.81 to 0.86), and a substantial agreement was noted for the two remaining radiologists. The κ-values for inter-observer agreement ranged from 0.61 for macrocalcifications (substantial agreement) to 0.33 for Asteria four-point elastography scale criteria (fair agreement). Conclusions: The results suggest relatively good inter-observer and excellent intra-observer agreement in the assessment of thyroid nodules using ultrasound, and fair agreement in the case of strain elastography.