Fig. 1

Workflow. Black boxes show components of the overall workflow. DeriveFeatures is a component that uses four bioinformatic tools to derive the four feature sets from the FASTQ files (.fastq): RAW, MAP, LOC, TSS. seqQscorer computes Plow, the probability of a sample to be of low quality. We used seqQscorer’s generic model, which is derived from 2642 labeled samples and uses a random forest as classification algorithm. We used the salmon tool to quantify gene expression and DESeq2 for rlog normalization [19, 20]