Fig. 3
From: Leveraging gene correlations in single cell transcriptomic data

Statistical significance of pairwise gene correlations in data from a clonal cell line. A, B Using scRNAseq data from a human melanoma cell line [50] (8640 cells × 14,933 genes), pairwise values of PCC were calculated from normalized data, and PCC′ from raw data. Histograms display the frequency of observed values (the logarithmic axis in B emphasizes low-frequency events). Notice in B how positive skewing, also seen in simulated data (Fig. 2), is less for PCC′ than PCC. Dashed lines in B show thresholds at which Fisher formula-derived p values would fall below 1.1 × 10.−4. C, D Scatterplots showing p values assigned by BigSur to pairs of genes within two representative sets of bins of gene expression (for all pairwise combinations see Figs. S1, S2/Additional files 3, 4). The abscissa shows PCC (panel C) and PCC′ (panel D). The ordinate gives the negative log10 of p values determined by BigSur, i.e., larger values mean greater statistical significance. Orange and gray shading indicate gene pairs judged significant by BigSur (FDR < 0.02). Blue and orange show gene pairs that would have been judged statistically significant by applying the Fisher formula to the PCC or PCC′, using the same p value threshold as used by BigSur. The blue region contains gene pairs judged significant by the Fisher formula only, while the unshaded region shows gene pairs not significant by either method. Numbers in the lower right corner are the total numbers of possible correlations (blue), statistically significant correlations according to the Fisher formula (green), and statistically significant correlations according to BigSur (red). E, F ROC curves assessing whether the overall performance of the Fisher formula—applied either to PCC or PCC′—can be adequately improved either by using a more stringent p value cutoff (E) or limiting pairwise gene-correlations to those involving only genes with mean expression above a threshold level, µ (F)