We consider a vector V of N double precision numbers. Successive steps of size s=2, 3,...N are carried out. In step s, the maximum value in V[0,s-1] is obtained, and each position in V[0,s-1] is substituted by the maximum minus the value in that position (m=max V[0,s-1], and V[i]=m-V[i], 0<=i<s). The solution should use CUDA.
A number of problems is solved. For each problem the function to parallelize has:
Input parameters:
-int N: the size of the vector
-double *V: array of data
For more instructions: general instructions.