A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Van, Richard; Alvarez, Daniel; Mize, Travis; Gannavarapu, Sravani; Chintham Reddy, Lohitha; Nasoz, Fatma; Han, Mira V.

doi:10.1186/s12859-024-05801-x

BMC Bioinformatics

Table 5 Overall performance metrics of classifier using data preprocessing combinations evaluated against ICGC/GEO test set related to Fig. 3

From: A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Index	Normalization	Batch effect correction	Scaling	Micro-average of AUROC	Weighted F1-score	p Value
1	Unnormalized	No batch correction	Unscaled	0.95 (0.94–0.96)	0.80 (0.76–0.81)	Baseline
2	Quantile normalization	No batch correction	Unscaled	0.95 (0.91–0.97)	0.75 (0.61–0.81)	0.8986
3	Quantile normalization with target	No batch correction	Unscaled	0.96 (0.92–0.97)	0.79 (0.64–0.85)	0.7418
4	Feature specific quantile normalization	No batch correction	Unscaled	0.79 (0.79–0.80)	0.50 (0.47–0.51)	1
5	Unnormalized	Batch correction	Unscaled	0.87 (0.85–0.88)	0.57 (0.55–0.60)	1
6	Quantile normalization	Batch correction	Unscaled	0.86 (0.85–0.87)	0.56 (0.55–0.62)	1
7	Quantile normalization with target	Batch correction	Unscaled	0.87 (0.85–0.87)	0.59 (0.56–0.65)	0.9999
8	Feature specific quantile normalization	Batch correction	Unscaled	0.75 (0.74–0.76)	0.22 (0.20–0.25)	1
9	Unnormalized	No batch correction	Scaled	0.94 (0.92–0.95)	0.65 (0.63–0.66)	0.8986
10	Quantile normalization	No batch correction	Scaled	0.91 (0.87–0.93)	0.62 (0.60–0.64)	1
11	Quantile normalization with target	No batch correction	Scaled	0.90 (0.86–0.93)	0.64 (0.63–0.66)	1
12	Feature specific quantile normalization	No batch correction	Scaled	0.80 (0.79–0.82)	0.53 (0.51–0.55)	1
13	Unnormalized	Batch correction	Scaled	0.84 (0.81–0.87)	0.57 (0.50–0.60)	1
14	Quantile normalization	Batch correction	Scaled	0.86 (0.85–0.87)	0.58 (0.54–0.62)	1
15	Quantile normalization with target	Batch correction	Scaled	0.87 (0.85–0.88)	0.59 (0.54–0.63)	1
16	Feature specific quantile normalization	Batch correction	Scaled	0.77 (0.77–0.77)	0.34 (0.30–0.39)	1

Values indicate the median of each metric with five models evaluated from the outer folds of cross-validation; Inside the parentheses denotes the 95% confidence interval. Statistical significance was determined with the Student's t-test

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com