Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets

Table 1 Description of the datasets and positive-unlabeled settings evaluated

	Datasets	#Instance	#Attributes	#True negative (%)	#Known positive (%)
a	Synthetic (ClassSep = 2)	200	200	100 (50)	40 (20)
				60 (30)
				20 (10)
b	Synthetic (ClassSep = 1)	200	200	100 (50)	40 (20)
				60 (30)
				20 (10)
c	Synthetic (ClassSep = 0)	200	200	100 (50)	40 (20)
				60 (30)
				20 (10)
d	Wisconsin Diagnostic Breast Cancer (WDBC) https://doiorg.publicaciones.saludcastillayleon.es/10.24432/C5DW2B	400	32	200 (50)	80 (20)
				120 (30)
				43 (10.8%)
		569	32	(212) 37.3	25 (4.40)
					50 (8.79)
					75 (13.2)
					100 (17.6)
e	TCGA-BRCA/LUAD dbGaP Study Accession: phs000178 https://portal.gdc.cancer.gov/projects/TCGA-LUAD	441	20,531	141 (32.0)	25 (5.67)
					50 (11.3)
					75 (17.0)
					100 (22.7)
f	Lakhashe study dataset	108	195	36 (33.3)	10 (9.30)
					20 (18.5)
					30 (27.8)
					40 (37.0)
g	Synthetic (ClassSep = 1)	100	200	30(30.0)	20 (20)
g	Synthetic (ClassSep = 1)	100	200	30(30.0)	40 (40)

ISSN: 1471-2105