A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Van, Richard; Alvarez, Daniel; Mize, Travis; Gannavarapu, Sravani; Chintham Reddy, Lohitha; Nasoz, Fatma; Han, Mira V.

doi:10.1186/s12859-024-05801-x

BMC Bioinformatics

Table 2 Overall performance metrics of SVM classifier using data preprocessing combinations evaluated against GTEx test set related to Fig. 2

From: A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Index	Normalization	Batch effect correction	Scaling	Micro-average of AUROC	Weighted F1-score	p Value
1	Unnormalized	No batch correction	Unscaled	0.94 (0.93–0.95)	0.71 (0.66–0.72)	Baseline
2	Quantile normalization	No batch correction	Unscaled	0.93 (0.92–0.94)	0.71 (0.68–0.73)	0.2963
3	Quantile normalization with target	No batch correction	Unscaled	0.93 (0.92–0.94)	0.70 (0.68–0.72)	0.3133
4	Feature specific quantile normalization	No batch correction	Unscaled	0.92 (0.91–0.93)	0.66 (0.63–0.67)	0.9636
5	Unnormalized	Batch correction	Unscaled	0.98 (0.96–0.98)	0.76 (0.74–0.77)	0.0049**
6	Quantile normalization	Batch correction	Unscaled	0.97 (0.96–0.97)	0.75 (0.73–0.76)	0.0089**
7	Quantile normalization with target	Batch correction	Unscaled	0.97 (0.96–0.97)	0.75 (0.74–0.75)	0.0073**
8	Feature specific quantile normalization	Batch correction	Unscaled	0.96 (0.94–0.97)	0.73 (0.72–0.73)	0.0339*
9	Unnormalized	No batch correction	Scaled	0.92 (0.90–0.93)	0.70 (0.67–0.70)	0.6009
10	Quantile normalization	No batch correction	Scaled	0.90 (0.89–0.91)	0.68 (0.67–0.69)	0.7241
11	Quantile normalization with target	No batch correction	Scaled	0.89 (0.87–0.90)	0.68 (0.67–0.69)	0.7298
12	Feature specific quantile normalization	No batch correction	Scaled	0.91 (0.89–0.91)	0.69 (0.64–0.71)	0.3715
13	Unnormalized	Batch correction	Scaled	0.97 (0.96–0.98)	0.76 (0.75–0.77)	0.0026**
14	Quantile normalization	Batch correction	Scaled	0.96 (0.96–0.97)	0.77 (0.76–0.77)	0.0009***
15	Quantile normalization with target	Batch correction	Scaled	0.96 (0.96–0.97)	0.76 (0.75–0.77)	0.0016**
16	Feature specific quantile normalization	Batch correction	Scaled	0.96 (0.95–0.97)	0.73 (0.72–0.74)	0.0305*

Values indicate the median of each metric with five models evaluated from the outer folds of cross-validation; Inside the parentheses denotes the 95% confidence interval. Statistical significance was determined with the Student's t-test. *p < 0.05; **p < 0.01; ***p < 0.001

Back to article page

ISSN: 1471-2105

Contact us

General enquiries: journalsubmissions@springernature.com