Fig. 5

Statistical methods to evaluate scores between actual and permuted label group in real world biological datasets. A Scores obtained from actual (yellow) and permuted (orange) P/U labels with two different scoring methods (EPR, left and MBS, right) under varied numbers of KP (NKP) samples for WDBC (top) BRAC/LUAC (middle) and Lakhashe et al. (bottom) study datasets. Number of samples is indicated in parentheses. Boxplots depict median (bar), mean (point), interquartile range (IQR) and error bars depict mean and standard deviation (SD). B AUC of U set samples calculated between class 1 probability using PU bagging SVM compared and ground truth label. C Statistical significance from z-score between the mean score in actual label group and the distribution for scores in permuted label group. Dashed line: p value = 0.05. D Effect size (Cliff’s Delta) estimate between score distributions in actual and permuted group labels