Skip to main content
Fig. 4 | BMC Bioinformatics

Fig. 4

From: Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets

Fig. 4

Statistical methods to evaluate scores between actual and permuted label group in the PU simulations from synthetic datasets. Explicit positive recall (EPR, left) and Mean Bagging Scores (MBS, right) for varying classification difficulties. Lines are colored according to class separation distance between U and P classes and x-axis indicates the composition of the unlabeled class (% True Negatives). A Mean and standard deviation of the scores in actual group obtained from 30-time repetition. B Cliff’s Delta estimate between scores from the actual P/U label and multiple sets of permuted labels. Error bars represent the 95% confidence interval of the estimate. From bottom (light gray) to top (black), lines represent the boundary of negligible/small/medium/large differences between groups defined by Cliff’ Delta statistics. C Statistical significance as defined by one-tailed z-score. Dashed line indicates p value = 0.05. Lines are colored according to the underlying ground truth class separation

Back to article page