Skip to main content

Table 1 Calling and QC sub-pipeline report

From: Rare copy number variant analysis in case–control studies using snp array data: a scalable and automated data analysis pipeline

Module 1: Data Conversion

Generate signal intensity file

700,079 markers 6,112 samples

Module 2: Data Calling

 

Initial samples

Final samples

Lost

Initial calls

Final calls

Lost

Raw data

6,112

98,702

Module 3: Data Clean

Filters

Initial samples

Final samples

Lost

Initial calls

Final calls

Lost

Default parameters*

6,112

6,012

100

98,702

71,010

27,692

Clean Immunoglobulin regions

6,012

6,012

71,010

70,436

574

Clean centromere and telomere regions

6,012

6,012

70,436

63,202

7,234

Merging calls**

6,012

6,012

63,202

60,705

2,497

Module 3: Data Clean

Filters

Initial samples

Final samples

Lost

Initial calls

Final calls

Lost

Default parameters*

6,112

6,012

100

98,702

71,010

27,692

Clean Immunoglobulin regions

6,012

6,012

71,010

70,436

574

Clean centromere and telomere regions

6,012

6,012

70,436

63,202

7,234

Merging calls**

6,012

6,012

63,202

60,705

2,497

  1. The table summarizes the samples included and excluded at each module in the calling and QC sub–pipeline. Final samples and calls, after QC, are in bold
  2. *LRR_SD < 0.3, BAF_drift < 0.01, |WF|< 0.05, NumCNV > 50
  3. **fraction: 0.5 and 0.4. In this step, calls were not lost, but the number decreased because two or more calls can be combined into a unique