Fig. 4
From: DNASimCLR: a contrastive learning-based deep learning approach for gene sequence data classification

Data processing flow. The original gene sequence data is one-hot encoded and converted into the image form that can be processed by convolutional neural network (“1,0,0,0” is used to express base G, “0,1,0,0” to express base A, “0,0,1,0” to express base C, and “0,0,0,1” to express base T.). Then 30% of them are randomly selected and masked with “0, 0, 0, 0”