6

Consider an $M\times N$ grid with periodicity in both dimensions upon which I want to do some local stencil-like computations using distributed-memory parallelism (MPI). I have two options for decomposing this problem:

(1) split the grid along one dimension so that each of the $P$ processes gets about $M/P\times N$ or $M\times N/P$ grid points, or

(2) divide the grid in both dimensions so that each process gets a rectangle of roughly $(M\times N)/\sqrt{P}$ grid points.

Obviously, approach (2) has a much lower communication bound than (1), but for rectangular grids of arbitrary size on arbitrary numbers of processes, finding a partitioning in 2D that balances the load seems difficult. I cannot restrict myself to allowing only $P=4^n$.

What is the state of the art in terms of these arbitrary 2D decompositions?

coolguy1000000
  • 991
  • 6
  • 14

1 Answers1

7

The basic approach that I am using is weighted orthogonal recursive bisection (weighted ORB).

(which is usually applied to unstructured meshes or N-body simulations; however, it is still naturally applied to grids in $d$-dimensions)

Say (for example), you have a $N^x_0=100\times N^y_0=45$ rectangular grid that has $N^x_0*N^y_0=N_0=4500$ and you want to distribute it among $P=5$ processors. Here, $N_0$ stands for the total number of grid points at the $0$th partitioning level. The numbers are definitely arbitrary but should be illustrative enough.

Set initial weight at the $0$th partitioning level as $W_0=P=5$. We will partition, until $W_\ell=1$ for each partition.

First partitioning level ($100\times 45$):

  1. Determine the largest dimension that we are going to cut along: $x$.
  2. Determine weights and number of grid points for the left and right parts.

Left part 00:

$$ W_{00}=\left\lfloor{\frac{W_{0}}{2}}\right\rfloor=\left\lfloor{\frac{5}{2}}\right\rfloor=2\\ \tilde{N}_{00}=N_{0}\frac{W_{00}}{W_{0}}=4500\frac{2}{5}=1800 $$

Right part 01:

$$ W_{01}=\left\lceil{\frac{W_{0}}{2}}\right\rceil=\left\lceil{\frac{5}{2}}\right\rceil=3\\ \tilde{N}_{01}=N_{0}\frac{W_{01}}{W_{0}}=4500\frac{3}{5}=2700 $$

  1. Now, cut along the $y$ dimension at $n^x_{01}=1800/45=40$.

Resulting in 00th partition being $40\times45$ (starting at $n^x_{00}=0$) and 01th partitioning being $60\times 45$ (starting at $n^x_{01}=40$).

Since both $W_{00}, W_{01}>1$, the partitioning will continue on both left and right sides.

Note, $\tilde{N}_{\ell}$ is the estimated number of grid points in the partitioning before the cut is done. Actual ${N}_{\ell}$ is very close to $\tilde{N}_{\ell}$, but depends on the grid shape (and its dimensions being even or odd - since bisection is being used).

Second partitioning level for 00 ($40\times45$):

  1. Determine the largest dimension to cut along: now $y$
  2. Determine weights and number of grid points for the left and right parts.

Left part 000:

$$ W_{000}=\left\lfloor{\frac{W_{00}}{2}}\right\rfloor=\left\lfloor{\frac{2}{2}}\right\rfloor=1\\ \tilde{N}_{000}=N_{00}\frac{W_{000}}{W_{00}}=1800\frac{1}{2}=900 $$

Right part 000:

$$ W_{001}=\left\lceil{\frac{W_{00}}{2}}\right\rceil=\left\lceil{\frac{2}{2}}\right\rceil=1\\ \tilde{N}_{001}=N_{00}\frac{W_{001}}{W_{00}}=1800\frac{1}{2}=900 $$

  1. Now cut along the $y$ dimension at $n^y_{001}=\lfloor900/40\rfloor=22$

Resulting in 000th partitioning being $40\times 22$ (starting at $n^x_{000}=0$) and 001th partitioning being also $40\times 23$ (starting at $n^x_{001}=22$).

Since both $W_{000}, W_{001}=1$, the partitioning on this branch tree stops.

Note, in this step $\hat{N}_{00}=900$ is different from the actual $N_{00}=880$.

Second partitioning level for 01 ($60\times45$):

  1. Determine the largest dimension to cut along: $x$
  2. Determine weights and number of grid points for the left and right parts.

Left part 010:

$$ W_{010}=\left\lfloor{\frac{W_{01}}{2}}\right\rfloor=\left\lfloor{\frac{3}{2}}\right\rfloor=1\\ \tilde{N}_{010}=N_{01}\frac{W_{010}}{W_{01}}=2700\frac{1}{3}=900 $$

Right part 011:

$$ W_{011}=\left\lceil{\frac{W_{01}}{2}}\right\rceil=\left\lceil{\frac{3}{2}}\right\rceil=2\\ \tilde{N}_{011}=N_{01}\frac{W_{011}}{W_{01}}=2700\frac{2}{3}=1800 $$

  1. Now cut along the $x$ dimension at $n^x_{011}=900/45+40=60$

Resulting in 010th partitioning being $20\times 45$ (starting at $n^x_{010}=n^x_{01}=40$) and 011th partitioning being $40\times 45$ (starting at $n^x_{011}=60$).

Since $W_{010}=1$, the partitioning of this branch is finished; however, the right branch has to go to the third level as $W_{011}=2>1$.

Third partitioning level for 011 ($40\times45$):

  1. Determine the largest dimension to cut along: $y$
  2. Determine weights and number of grid points for the left and right parts.

Left part 0110:

$$ W_{0110}=\left\lfloor{\frac{W_{011}}{2}}\right\rfloor=\left\lfloor{\frac{2}{2}}\right\rfloor=1\\ \tilde{N}_{0110}=N_{011}\frac{W_{0110}}{W_{011}}=1800\frac{1}{2}=900 $$

Right part 0111:

$$ W_{0111}=\left\lceil{\frac{W_{011}}{2}}\right\rceil=\left\lceil{\frac{2}{2}}\right\rceil=1\\ \tilde{N}_{0111}=N_{011}\frac{W_{0111}}{W_{011}}=1800\frac{1}{2}=900 $$

  1. Now cut along the $y$ dimension at $n^y_{0111}=\lfloor900/40\rfloor=22$

Resulting in 0110th partitioning being $40\times 22$ and 0111th partitioning being $40\times 23$.

Alltogether, we have 5 leaf partitinings:

  1. $N_{000}=880$ corresponding to processor with id $000_2=0$ (binary $\to$ dec)
  2. $N_{001}=920$ corresponding to processor with id $001_2=1$
  3. $N_{010}=900$ corresponding to processor with id $010_2=2$
  4. $N_{0110}=880$ corresponding to processor with id $0110_2=6$
  5. $N_{0111}=920$ corresponding to processor with id $0111_2=7$

Rearranging (renumbering) of the processor IDs is required to get to $0\ldots P-1$

with the final partitioning visualized:

Partitioning visualization for 5 processors

The total length of the boundaries between the partitionings is $40+45+45+40=170$ which is smaller than $45*4=180$ in case of simple partitioning along one largest dimension.

One of the references that (as far as I know) presented this method is:

To balance the load even more (920 vs 880), one can assign half of the grid points on the 22/23 border to the left (000) and the other half to the right (001). The case for 0110 and 0111 is handled the same way. That introduces a little bit more complicated borders (with certain consequences).

Partitioning visualization for 5 processors also using border-balancing

Anton Menshov
  • 8,672
  • 7
  • 38
  • 94
  • Do you have any (pseudo)code which implements this partitioning? – nluigi Jun 24 '18 at 07:39
  • @nluigi, not in the form I can share. That's why I tried to do a very detailed description, which is pretty much an "unwrapped" pseudocode for a representative example. – Anton Menshov Jun 24 '18 at 12:01