Sum of rectangular submatrices - CUDA version

Let consider a double precision matrix A and two integer matrices R and C, all them of size NxM. Each element in matrix A in position (i,j) is substituted by the sum of the elements in a square matrix beginning in this position and with a number of rows R[i,j]+1 (rows i to i+R[i,j]) and number of columns C[i,j]+1 (columns j to C[i,j]). When the value of R[i,j] is negative the rows considered are from i+R[i,j] to i, and similarly with the columns. When the values are out of the dimension of the matrix data are taken cyclically. The values in matrix C are between -N+1 and N-1, and those of matrix C between -M+1 and M-1.

For example, for matrices


A





R





C




2.1

3.2

2.2

-1.3


1

2

2

0


-2

2

0

0

5.6

-6.1

2.3

-3.4


2

1

2

-2


2

1

-2

-1

-4.1

5.7

2.6

3.3


0

2

-1

2


1

0

0

-2


The submatrices corresponding to the first row and half the second row are:


2.1

3.2

2.2

-1.3


2.1

3.2

2.2

-1.3


2.1

3.2

2.2

-1.3

5.6

-6.1

2.3

-3.4


5.6

-6.1

2.3

-3.4


5.6

-6.1

2.3

-3.4

-4.1

5.7

2.6

3.3


-4.1

5.7

2.6

3.3


-4.1

5.7

2.6

3.3


2.1

3.2

2.2

-1.3


2.1

3.2

2.2

-1.3


2.1

3.2

2.2

-1.3

5.6

-6.1

2.3

-3.4


5.6

-6.1

2.3

-3.4


5.6

-6.1

2.3

-3.4

-4.1

5.7

2.6

3.3


-4.1

5.7

2.6

3.3


-4.1

5.7

2.6

3.3


A number of problems is solved. For each problem the function to parallelize has:

Input parameters:

-int N: number of rows of the matrices

-int M: number of columns of the matrices

-int *R: matrix of rows, of size NxM

-int *C: matrix of columns, of size NxM

Input/Output parameter:

-double *A: matrix of data, of size NxM

Files

For more instructions: general instructions.