Fig. 4
From: Fast noisy long read alignment with multi-level parallelism

Non-equal sequence distribution optimization. a Blocking communication. The CPU processing time (\(T_{cpu}\)) and accelerators processing time (\(T_{dev}\)) are equal, and the program’s communication and computation processes cannot overlap. The total runtime is the sum of the communication overhead (\(O_c\)), communication time(\( T_{send} \ \& \ T_{recv}\)), and CPU processing time, i.e., \(O_c + T_{send} + T_{cpu} + O_c + T_{recv}\). b Non-blocking communication. The CPU and accelerators processing time are equal, and the program’s communication and computation processes can overlap. The CPU need to wait for the accelerators to finish after its processing. The total runtime is the sum of the communication overhead, communication time, and accelerators processing time, i.e., \(O_c + T_{send} + T_{dev} + O_c + T_{recv}\). c Non-blocking communication with non-equal sequence distribution strategy (NESD-s). The accelerators process shorter sequences, while the CPU processes longer sequences. The CPU processing time is equal to the sum of the communication overhead, communication time, and accelerators processing time, i.e., \(T_{cpu} = T_{send} + T_{dev} + O_c + T_{recv}\). d Non-blocking communication with non-equal sequence distribution strategy (NESD-l). The accelerators process longer sequences, while the CPU processes shorter sequences. Both communication and computation times increase, resulting in a longer total runtime for the program