All evidence
External validation of a commercially available deep learning algorithm for fracture detection in children
Objective
The purpose of this study was to conduct an external validation of a fracture assessment deep learning algorithm (Rayvolve®) using digital radiographs from a real-life cohort of children presenting routinely to the emergency room
Methods
This retrospective study was conducted on 2634 radiography sets (5865 images) from 2549 children (1459 boys, 1090 girls; mean age, 8.5 ± 4.5 [SD] years; age range: 0–17 years) referred by the pediatric emergency room for trauma. For each set was recorded whether one or more fractures were found, the number of fractures, and their location found by the senior radiologists and the algorithm. Using the senior radiologist diagnosis as the standard of reference, the diagnostic performance of deep learning algorithm (Rayvolve®) was calculated via three different approaches: a detection approach (presence/absence of a fracture as a binary variable), an enumeration approach (exact number of fractures detected) and a localization approach (focusing on whether the detected fractures were correctly localized). Subgroup analyses were performed according to the presence of a cast or not, age category (0–4 vs. 5–18 years) and anatomical region.
Results
Regarding detection approach, the deep learning algorithm yielded 95.7% sensitivity (95% CI: 94.0–96.9), 91.2% specificity (95% CI: 89.8–92.5) and 92.6% accuracy (95% CI: 91.5–93.6). Regarding enumeration and localization approaches, the deep learning algorithm yielded 94.1% sensitivity (95% CI: 92.1–95.6), 88.8% specificity (95% CI: 87.3–90.2) and 90.4% accuracy (95% CI: 89.2–91.5) for both approaches. Regarding age-related subgroup analyses, the deep learning algorithm yielded greater sensitivity and negative predictive value in the 5–18-years age group than in the 0–4-years age group for the detection approach (P < 0.001 and P = 0.002) and for the enumeration and localization approaches (P = 0.012 and P = 0.028). The high negative predictive value was robust, persisting in all of the subgroup analyses, except for patients with casts (P = 0.001 for the detection approach and P < 0.001 for the enumeration and localization approaches).
Conclusion
The Rayvolve® deep learning algorithm is very reliable for detecting fractures in children, especially in those older than 4 years and without cast.
Our Rayvolve® AI Suite
CE
fda
trauma
CE
fda
83%
Turnaround Time reduction
67%
False Negatives reduction
99.7%
Negative predictive value
CE
chest
CE
36%
Reading Time reduction
11%
Sensitivity improvement
97.9%
Negative predictive value
CE
measures
CE
1.4°
Average MAE for Angles
1.3mm
Average MAE for Lengths
coming soon
bone age
COMING SOON
Based on Greulich & Pyle reference methodology
Statistical comparison with chronological age
Latest evidence
Optimize Your Workflow and Improve Quality of Care with AZmed
Discover the power of our AI Suite today!