Deep learning algorithms for identifying developmental dysplasia of the hip based on sonographic images: a retrospective, prospective, multicenter study in China
Author Block: N. XU; Shenzhen/CN
Purpose: This study aims to develop and validate a deep convolutional neural network algorithm, named HipSonoNeuNet model (HSNN), using multicenter hip ultrasound data.
Methods or Background: This multicenter cross-sectional study combined data from 22 Chinese hospitals, enrolling 3082 participants. A total of 7286 hip ultrasound images (1429 dynamic, 5857 static) were collected and were divided into three datasets. The study was conducted in three phases. Phase I trained the models using 2431 participants. Phase II compared diagnostic performance between radiologists of varied experience and the model across 500 participants. Phase III prospectively validated the model's generalizability with 151 participants .
Results or Findings: In Phase I, the HSNN yielded AUC of 0.99 (95%CI: 0.99-1.00), sensitivity of 1.00 (95% CI: 0.99-1.00), specificity of 0.91 (95% CI: 0.88-1.00), F1 score of 0.90 (95% CI: 0.87-1.00) on internal test dataset. In Phase II, the HSNN achieved an accuracy of 0.94 (95% CI: 0.88-1.00), AUC of 0.99 (95%CI: 0.99-1.00), sensitivity of 1.00 (95% CI: 0.99-1.00), specificity of 0.94 (95% CI: 0.87-1.00), F1 score of 0.58 (95% CI: 0.50-0.66), and strong agreement with expert (κ = 0.77). AI assistance improved all 7 junior radiologists' diagnostic performance (accuracy from 0.90 to 0.93, AUC from 0.80 to 0.95, sensitivity from 0.69 to 0.97) and reduced examination time with enhanced interobserver agreement. In Phase III, the model maintained robust performance (accuracy = 0.92, AUC = 0.99, sensitivity = 1.00, κ with experts = 0.76).
Conclusion: The HSNN demonstrates accurate, robust, and generalizable performance in DDH detection. It might potentially enhance diagnostic capabilities for radiologists, particularly in hospitals with varying levels of expertise.
Limitations: 1.DDH image imbalance may reduce model prediction stability.
2.China-only US images limit model performance across regions/ethnicities, affected by culture, genetics and healthcare resources.
Funding for this study: 1.Guangdong High-level Hospital Construction Fund(SZGSP012).
2.Shenzhen Clinical Research Center(20220819113341005)“Shenzhen Clinical Research Center for Child Health and Disease(szcrc2024_005)”
3.Guangdong Medical Research Funded Project (A2024019)
4.Shenzhen Science and Technology Innovation Commission General Program for Basic Research(JCYJ20220530160000001)
Has your study been approved by an ethics committee? Yes
Ethics committee - additional information: This study protocol was approved by the Ethics Committee of Shenzhen Children's Hospital (Approval No. 202308602)