Skip to main content

Table 1 Description of the datasets and positive-unlabeled settings evaluated

From: Leveraging permutation testing to assess confidence in positive-unlabeled learning applied to high-dimensional biological datasets

 

Datasets

#Instance

#Attributes

#True negative (%)

#Known positive (%)

a

Synthetic (ClassSep = 2)

200

200

100 (50)

40 (20)

60 (30)

20 (10)

b

Synthetic (ClassSep = 1)

200

200

100 (50)

40 (20)

60 (30)

20 (10)

c

Synthetic (ClassSep = 0)

200

200

100 (50)

40 (20)

60 (30)

20 (10)

d

Wisconsin Diagnostic Breast Cancer (WDBC)

https://doiorg.publicaciones.saludcastillayleon.es/10.24432/C5DW2B

400

32

200 (50)

80 (20)

120 (30)

43 (10.8%)

569

32

(212) 37.3

25 (4.40)

50 (8.79)

75 (13.2)

100 (17.6)

e

TCGA-BRCA/LUAD

dbGaP Study Accession: phs000178 https://portal.gdc.cancer.gov/projects/TCGA-LUAD

441

20,531

141 (32.0)

25 (5.67)

50 (11.3)

75 (17.0)

100 (22.7)

f

Lakhashe study dataset

108

195

36 (33.3)

10 (9.30)

20 (18.5)

30 (27.8)

40 (37.0)

g

Synthetic (ClassSep = 1)

100

200

30(30.0)

20 (20)

40 (40)