GPU Gems 3 - Chapter 39. Parallel Prefix Sum (Scan) with CUDA
39.1.1 Sequential Scan and Work Efficiency Implementing a sequential version of scan (that could be run in a single thread on a CPU, for example) is trivial. We simply loop over all the elements in the input array and add the value of the previous element...