Skip to main content

Table 2 List of benchmark datasets

From: An effective heuristic for developing hybrid feature selection in high dimensional and low sample size datasets

No

Dataset

Disease

Instances

Features

Dimensionality Index*

Balance Ratio**

1

ALLAML

Leukemia

72

7129

2.07

0.347

2

alon

Colon Cancer

62

2000

1.84

0.355

3

borovecki

Huntington

31

22,283

2.92

0.452

4

chiaretti

Leukemia

128

12,625

1.95

0.422

5

chin

Breast Cancer

118

22,215

2.1

0.364

6

chowdary

Breast Cancer

104

22,283

2.16

0.404

7

GLI_85

Gliomas

85

22,283

2.25

0.306

8

gordon

Lung Cancer

181

12,533

1.82

0.171

9

gravier

Breast Cancer

168

2905

1.56

0.339

10

pomeroy

CNS Tumor

60

7128

2.17

0.35

11

Prostate_GE

Prostate Cancer

102

5966

1.88

0.49

12

shipp

Lymphoma

77

7129

2.04

0.247

13

singh

Prostate Cancer

102

12,600

2.04

0.49

14

SMK_CAN_187

Lung cancer

187

19,993

1.89

0.481

15

subramanian

N/A

50

10,100

2.36

0.34

16

tian

Myeloma

173

12,625

1.83

0.208

17

west

Breast Cancer

49

7129

2.28

0.49

18

arcene

 

200

10,000

1.74

0.44

19

gisette

 

7000

5000

0.96

0.5

20

Hill_valley

 

1212

100

0.65

0.495

21

ionosphere

 

351

33

0.6

0.359

22

madelon

 

2600

500

0.79

0.5

23

sonar

 

208

60

0.77

0.466

24

wdbc

 

569

30

0.54

0.373

  1. * Dimensionality index = log(\(Number of features\))/log(\(Number of instances\)). It is a measure of how high-dimensional a given dataset is
  2. **Balanced ratio: the proportion of the samples in lower class of the entire dataset. A value 0.5 is ideal for a binary class dataset