Model Generalizability Considerations in Development and Evaluation of SaMD
doi:10.21203/rs.3.rs-3915862/v1 (2024)
Abstract
Software as a Medical Device (SaMD) has been transforming medical practices by improving patient care with more precise and timely information. The key capabilities of SaMD are often enabled by Artificial Intelligence (AI) algorithms. However, notable challenges have been observed in building robust algorithms in SaMD development and evaluation, where it is not unusual for algorithms to show strong performance in the development phase but poor performance when deployed in pivotal validation study. With rapid development in medical research, data from new trials or the real world are very likely to differ from legacy data used for training in many important aspects. Added caution is thus needed to account for such potential differences. In this paper, we will discuss the pitfalls of using conventional cross-validation methods in SaMD algorithm development and its magnitude of overestimation of model performance when heterogeneity is anticipated. To mitigate against overestimation bias performance assessment, we propose a leave-one-set-out (LOSO) cross validation method and discuss best practices in design of the independent validation pivotal study.