Fig. 2
From: Leveraging gene correlations in single cell transcriptomic data

Comparing uncorrected and modified corrected Fano factors and correlation coefficients. Random, independent, uncorrelated gene expression data was generated for 1000 genes in 999 cells, under the assumption that observations are random Poisson variates from a per-cell expression level that is itself a random variate of a log-normal distribution, scaled by a sequencing depth factor that is different for each cell (see methods). A Uncorrected (\(\phi\)) or modified corrected (\(\phi ^{\prime}\)) Fano factors are plotted as a function of mean expression level for each gene. Uncorrected factors were calculated either without normalization, or with default normalization (scaling observations by sequencing depth factors, learned by summing the gene expression in each cell). Uncorrected Fano factors were also calculated using SCTransform [48] as an alternative to default normalization. Modified corrected Fano factors were obtained by applying BigSur to unnormalized data, using a coefficient of variation parameter of c = 0.5. B Modified corrected Fano factors (\(\phi ^{\prime}\)) were calculated as in A, but using different values of c. The data suggest that an optimal choice of c can usually be found by examining a plot of \(\phi^{\prime}\) versus mean expression. C Empirical p values associated with uncorrected (PCC) or modified corrected (PCC′) Pearson correlation coefficients were calculated for pairwise combinations of genes in bins of different mean gene expression level (µ); examples are shown for four representative bins (both genes derived from the same bin). With increasing gene expression levels, the p value versus PCC relationship begins to approach the Fisher formula (dashed curve), but it does so much sooner for PCC′ than PCC