Background: ComBat is a promising harmonization method for radiomic features, but it cannot harmonize simultaneously by multiple batch effects and shows reduced performance in the setting of bimodal distributions and unknown clinical/batch variables. In this study, we develop and evaluate two iterative ComBat approaches (Nested and Nested+GMM ComBat) to address these limitations and improve radiomic feature harmonization performance. Methods: In Nested ComBat, radiomic features are sequentially harmonized by multiple batch effects with order determined by the permutation associated with the smallest number of features with statistically significant differences due to batch effects. In Nested+GMM ComBat, a Gaussian mixture model is used to identify a scan grouping associated with a latent variable from the observed feature distributions to be added as a batch effect to Nested ComBat. These approaches were used to harmonize differences associated with contrast enhancement, spatial resolution due to reconstruction kernel, and manufacturer in radiomic datasets generated by using CapTK and PyRadiomics to extract features from lung CT datasets (Lung3 and Radiogenomics). Differences due to batch effects in the original data and data harmonized with standard ComBat, Nested ComBat, and Nested+GMM ComBat were assessed. Results: Nested ComBat exhibits similar or better performance compared to standard ComBat, likely due to bimodal feature distributions. Nested+GMM ComBat successfully harmonized features with bimodal distributions and in most cases showed superior harmonization performance when compared to Nested and standard ComBat. Conclusions: Our findings show that Nested ComBat can harmonize by multiple batch effects and that Nested+GMM ComBat can improve harmonization of bimodal features.
|