Figure 2
From: Using distances between Top-n-gram and residue pairs for protein remote homology detection

Algorithm of construing the Distance-based Top-1-gram feature vector. The input of this algorithm is the Top-1-gram sequence S', distance threshold d MAX , and the output is the feature vector of distance-based Top-1-grams. The vector of alphabet Index [] is the index of all the Top-1-gram in the alphabet Ӑ and 20 is the size of Ӑ, for example, index 0 indicates the first Top-1-gram in the alphabet Ӑ(t 1 = A), and index 19 is the last Top-1-gram in the alphabet Ӑ(t 19 = V).