Fig. 2
From: Prediction of virus-host infectious association by supervised learning methods

Scheme of the RF method for the prediction of hosts of viral contigs of different lengths. For each of the 9 main host genera, we produced 4 training datasets with differed sequence lengths by breaking the whole viral genomes into nonoverlapping contigs of lengths 1, 3, 5 kbps and the whole genomes with 0.05% sequencing errors added; Similarly, we also generated 4 different testing datasets with different contig lengths. We then evaluated the performances of RF for each training dataset with specific sequence length on all 4 testing datasets