Skip to main content

Table 5 Overall performance metrics of classifier using data preprocessing combinations evaluated against ICGC/GEO test set related to Fig. 3

From: A comparison of RNA-Seq data preprocessing pipelines for transcriptomic predictions across independent studies

Index

Normalization

Batch effect correction

Scaling

Micro-average of AUROC

Weighted F1-score

p Value

1

Unnormalized

No batch correction

Unscaled

0.95 (0.94–0.96)

0.80 (0.76–0.81)

Baseline

2

Quantile normalization

No batch correction

Unscaled

0.95 (0.91–0.97)

0.75 (0.61–0.81)

0.8986

3

Quantile normalization with target

No batch correction

Unscaled

0.96 (0.92–0.97)

0.79 (0.64–0.85)

0.7418

4

Feature specific quantile normalization

No batch correction

Unscaled

0.79 (0.79–0.80)

0.50 (0.47–0.51)

1

5

Unnormalized

Batch correction

Unscaled

0.87 (0.85–0.88)

0.57 (0.55–0.60)

1

6

Quantile normalization

Batch correction

Unscaled

0.86 (0.85–0.87)

0.56 (0.55–0.62)

1

7

Quantile normalization with target

Batch correction

Unscaled

0.87 (0.85–0.87)

0.59 (0.56–0.65)

0.9999

8

Feature specific quantile normalization

Batch correction

Unscaled

0.75 (0.74–0.76)

0.22 (0.20–0.25)

1

9

Unnormalized

No batch correction

Scaled

0.94 (0.92–0.95)

0.65 (0.63–0.66)

0.8986

10

Quantile normalization

No batch correction

Scaled

0.91 (0.87–0.93)

0.62 (0.60–0.64)

1

11

Quantile normalization with target

No batch correction

Scaled

0.90 (0.86–0.93)

0.64 (0.63–0.66)

1

12

Feature specific quantile normalization

No batch correction

Scaled

0.80 (0.79–0.82)

0.53 (0.51–0.55)

1

13

Unnormalized

Batch correction

Scaled

0.84 (0.81–0.87)

0.57 (0.50–0.60)

1

14

Quantile normalization

Batch correction

Scaled

0.86 (0.85–0.87)

0.58 (0.54–0.62)

1

15

Quantile normalization with target

Batch correction

Scaled

0.87 (0.85–0.88)

0.59 (0.54–0.63)

1

16

Feature specific quantile normalization

Batch correction

Scaled

0.77 (0.77–0.77)

0.34 (0.30–0.39)

1

  1. Values indicate the median of each metric with five models evaluated from the outer folds of cross-validation; Inside the parentheses denotes the 95% confidence interval. Statistical significance was determined with the Student's t-test