Fig. 2

The Adjusted Rand Index (ARI) scores of K-means clustering on PCA reduced scRNA datasets vs. imputation method. Clustering performance is a strong indicator of improved downstream performance, as long as the data is not heavily biased due to imputation. PCA is a linear technique, and this metric aims to measure the impact of the imputation on correcting the linear patterns in the data. The range of possible values is in the interval \([-1,1]\), with a higher value indicating better performance. ccImpute is the best performing approach on all datasets. scImpute, DCA, and DeepImpute only work with raw unnormalized datasets and cannot impute the Usoskin dataset. Further, scImpute and DrImpute timed out on the larger datasets