The basic approach that I am using is weighted orthogonal recursive bisection (weighted ORB).
(which is usually applied to unstructured meshes or N-body simulations; however, it is still naturally applied to grids in $d$-dimensions)
Say (for example), you have a $N^x_0=100\times N^y_0=45$ rectangular grid that has $N^x_0*N^y_0=N_0=4500$ and you want to distribute it among $P=5$ processors. Here, $N_0$ stands for the total number of grid points at the $0$th partitioning level. The numbers are definitely arbitrary but should be illustrative enough.
Set initial weight at the $0$th partitioning level as $W_0=P=5$.
We will partition, until $W_\ell=1$ for each partition.
First partitioning level ($100\times 45$):
- Determine the largest dimension that we are going to cut along: $x$.
- Determine weights and number of grid points for the left and right parts.
Left part 00:
$$
W_{00}=\left\lfloor{\frac{W_{0}}{2}}\right\rfloor=\left\lfloor{\frac{5}{2}}\right\rfloor=2\\
\tilde{N}_{00}=N_{0}\frac{W_{00}}{W_{0}}=4500\frac{2}{5}=1800
$$
Right part 01:
$$
W_{01}=\left\lceil{\frac{W_{0}}{2}}\right\rceil=\left\lceil{\frac{5}{2}}\right\rceil=3\\
\tilde{N}_{01}=N_{0}\frac{W_{01}}{W_{0}}=4500\frac{3}{5}=2700
$$
- Now, cut along the $y$ dimension at $n^x_{01}=1800/45=40$.
Resulting in 00th partition being $40\times45$ (starting at $n^x_{00}=0$) and 01th partitioning being $60\times 45$ (starting at $n^x_{01}=40$).
Since both $W_{00}, W_{01}>1$, the partitioning will continue on both left and right sides.
Note, $\tilde{N}_{\ell}$ is the estimated number of grid points in the partitioning before the cut is done. Actual ${N}_{\ell}$ is very close to $\tilde{N}_{\ell}$, but depends on the grid shape (and its dimensions being even or odd - since bisection is being used).
Second partitioning level for 00 ($40\times45$):
- Determine the largest dimension to cut along: now $y$
- Determine weights and number of grid points for the left and right parts.
Left part 000:
$$
W_{000}=\left\lfloor{\frac{W_{00}}{2}}\right\rfloor=\left\lfloor{\frac{2}{2}}\right\rfloor=1\\
\tilde{N}_{000}=N_{00}\frac{W_{000}}{W_{00}}=1800\frac{1}{2}=900
$$
Right part 000:
$$
W_{001}=\left\lceil{\frac{W_{00}}{2}}\right\rceil=\left\lceil{\frac{2}{2}}\right\rceil=1\\
\tilde{N}_{001}=N_{00}\frac{W_{001}}{W_{00}}=1800\frac{1}{2}=900
$$
- Now cut along the $y$ dimension at $n^y_{001}=\lfloor900/40\rfloor=22$
Resulting in 000th partitioning being $40\times 22$ (starting at $n^x_{000}=0$) and 001th partitioning being also $40\times 23$ (starting at $n^x_{001}=22$).
Since both $W_{000}, W_{001}=1$, the partitioning on this branch tree stops.
Note, in this step $\hat{N}_{00}=900$ is different from the actual $N_{00}=880$.
Second partitioning level for 01 ($60\times45$):
- Determine the largest dimension to cut along: $x$
- Determine weights and number of grid points for the left and right parts.
Left part 010:
$$
W_{010}=\left\lfloor{\frac{W_{01}}{2}}\right\rfloor=\left\lfloor{\frac{3}{2}}\right\rfloor=1\\
\tilde{N}_{010}=N_{01}\frac{W_{010}}{W_{01}}=2700\frac{1}{3}=900
$$
Right part 011:
$$
W_{011}=\left\lceil{\frac{W_{01}}{2}}\right\rceil=\left\lceil{\frac{3}{2}}\right\rceil=2\\
\tilde{N}_{011}=N_{01}\frac{W_{011}}{W_{01}}=2700\frac{2}{3}=1800
$$
- Now cut along the $x$ dimension at $n^x_{011}=900/45+40=60$
Resulting in 010th partitioning being $20\times 45$ (starting at $n^x_{010}=n^x_{01}=40$) and 011th partitioning being $40\times 45$ (starting at $n^x_{011}=60$).
Since $W_{010}=1$, the partitioning of this branch is finished; however, the right branch has to go to the third level as $W_{011}=2>1$.
Third partitioning level for 011 ($40\times45$):
- Determine the largest dimension to cut along: $y$
- Determine weights and number of grid points for the left and right parts.
Left part 0110:
$$
W_{0110}=\left\lfloor{\frac{W_{011}}{2}}\right\rfloor=\left\lfloor{\frac{2}{2}}\right\rfloor=1\\
\tilde{N}_{0110}=N_{011}\frac{W_{0110}}{W_{011}}=1800\frac{1}{2}=900
$$
Right part 0111:
$$
W_{0111}=\left\lceil{\frac{W_{011}}{2}}\right\rceil=\left\lceil{\frac{2}{2}}\right\rceil=1\\
\tilde{N}_{0111}=N_{011}\frac{W_{0111}}{W_{011}}=1800\frac{1}{2}=900
$$
- Now cut along the $y$ dimension at $n^y_{0111}=\lfloor900/40\rfloor=22$
Resulting in 0110th partitioning being $40\times 22$ and 0111th partitioning being $40\times 23$.
Alltogether, we have 5 leaf partitinings:
- $N_{000}=880$ corresponding to processor with id $000_2=0$ (binary $\to$ dec)
- $N_{001}=920$ corresponding to processor with id $001_2=1$
- $N_{010}=900$ corresponding to processor with id $010_2=2$
- $N_{0110}=880$ corresponding to processor with id $0110_2=6$
- $N_{0111}=920$ corresponding to processor with id $0111_2=7$
Rearranging (renumbering) of the processor IDs is required to get to $0\ldots P-1$
with the final partitioning visualized:

The total length of the boundaries between the partitionings is $40+45+45+40=170$ which is smaller than $45*4=180$ in case of simple partitioning along one largest dimension.
One of the references that (as far as I know) presented this method is:
To balance the load even more (920 vs 880), one can assign half of the grid points on the 22/23 border to the left (000) and the other half to the right (001). The case for 0110 and 0111 is handled the same way. That introduces a little bit more complicated borders (with certain consequences).
