Elegant way to partition list to be faster than `ParallelTable` (using `ParallelSubmit`), or some alternative?

Question

Suppose you have a function

fun[x_] := (Pause[.05*x]; x^2);

of which you know that its evaluation time increases with its argument - in this case obviously linearly. Consider the following piece of code:

ClearAll["Global`*"];
CloseKernels[];
list = Range[1, 12];
listparallel1 = Partition[list, 3];
listparallel2 = {{1, 5, 12}, {2, 6, 11}, {3, 7, 10}, {4, 8, 9}};
f11 := (Table[fun[i], {i, listparallel1[[1]]}]);
f21 := (Table[fun[i], {i, listparallel1[[2]]}]);
f31 := (Table[fun[i], {i, listparallel1[[3]]}]);
f41 := (Table[fun[i], {i, listparallel1[[4]]}]);
f12 := (Table[fun[i], {i, listparallel2[[1]]}]);
f22 := (Table[fun[i], {i, listparallel2[[2]]}]);
f32 := (Table[fun[i], {i, listparallel2[[3]]}]);
f42 := (Table[fun[i], {i, listparallel2[[4]]}]);

Now compare timings:

LaunchKernels[4];
DistributeDefinitions[f11, f21, f31, f41, f12, f22, f32, f42];
res1 = Table[fun[i], {i, list}]; // AbsoluteTiming
(* 3.905648 *)
res2 = ParallelTable[fun[i], {i, list}]; // AbsoluteTiming
(* 1.670125 *)
AbsoluteTiming[
 res3tmp = {ParallelSubmit[f11], ParallelSubmit[f21], 
   ParallelSubmit[f31], ParallelSubmit[f41]};
 res3 = Flatten@WaitAll[res3tmp];]
(* 1.674126 *)
AbsoluteTiming[
 res4tmp = {ParallelSubmit[f12], ParallelSubmit[f22], 
   ParallelSubmit[f32], ParallelSubmit[f42]};
 res4 = Flatten@WaitAll[res4tmp];]
(* 1.068721 *)

We see, ParallelTable already does a goodish job, but since it does not have the insight into the function as we do, this can be optimized via ParallelSubmit. Since runtime increases with elements in list, it is natural to partition the list as done in listparallel2 and clearly, timings for res4 are much better than for the ParallelTable version res2.

My question is, what is an elegant way to partition list into the form of listparallel2 such that I provide a number of sublists n (here, n=4) and it then fills the first sublist starting with first and last element of list, going on to the second sublist with second and second to last element of list and so on.

Alternatively, how can one determine the optimal distribution of jobs to parallel kernels?

Update

Unfortunately, the first method in the answer by @unlikely leads to kernel crash on my machine for sufficiently large problems (see this question). Since I am using Mathematica 10.0.1 I cannot use the second method in unlikely's answer because RepeatedTiming and EchoFunction are not available. So, even if it is not the most optimal way - I am again interested in a customized Partition like function, that brings list into the form of listparallel2.

RepeatedTiming and EchoFunction are just for benchmarking. You can remove all these calls and the following //Last. You can also use the sub/optimal partitioning as of my second answer. — unlikely, Feb 25 '16 at 16:24
@unlikely Thanks for all your effort! I actually believe that I did that, but it returned an empty list. Will try again later when I have access to my computer again and report back . — Lukas, Feb 25 '16 at 17:05

unlikely · Accepted Answer · 2016-02-24T08:33:51.983

8

It's a common jobs-to-resources assignment problem.

Suppose you have $n$ jobs (long running) to be distributed across $m$ resources working in parrallel and each job $i$ takes a time $c_i$ to be completed. Define the following decision variables:

$x_{i,j} \in \{0,1\}$ and it's $1$ if the job $i$ is assigned to resource $j$
$y_j \ge 0$ is the total time required for the resource $j$ complete his work (the definition/use of these variables is not mandatory)
$z \ge 0$ is the whole time required to complete all jobs

Obviously you have some constraints:

$\sum_{j=1}^m x_{i,j} = 1$ for all $i=1,\ldots,n$ because each job must be assigned to exactly one resource
$y_j = \sum_{i=1}^n c_i x_{i,j}$ for all $j=1,\ldots,m$
$z \ge y_j$ for all $j=1,\ldots,m$ because $z = \max_j y_j$

Now your objective function is $z$ (to be minimized).

This can be done with LinearProgramming:

n = 12;
m = 4;
costs = Range@n/2 // N;

vars = Flatten@{
    Array[x, {n, m}],
    Array[y, m],
    z
    };

constraints = Flatten@{
    Table[Sum[x[i, j], {j, m}] == 1, {i, n}],
    Table[y[j] == Sum[x[i, j] costs[[i]], {i, n}], {j, m}],
    Table[z >= y[j], {j, m}]
    };

bm = CoefficientArrays[Equal @@@ constraints, vars];
solution = LinearProgramming[
   Last@CoefficientArrays[z, vars],
   bm[[2]],
   Transpose@{-bm[[1]], 
     constraints[[All, 0]] /. {Equal -> 0, GreaterEqual -> 1}},
   vars /. {_x -> {0, 1}, (_y | z) -> {0, \[Infinity]}},
   vars /. {_x -> Integers, (_y | z) -> Reals}
   ];


Cases[Pick[vars, solution, 1], x[ij__] :> {ij}];
GroupBy[%, Last -> First] // Values // SortBy[First]

The answer is

{{1, 3, 6, 10}, {2, 4, 12}, {5, 7, 8}, {9, 11}}

that tell you the optimal distribution of the jobs across the resources.

Indeed with this approach the time required to complete the work is:

Pick[solution, vars, z]

{10.}

while with your approach is

listparallel2 = {{1, 5, 12}, {2, 6, 11}, {3, 7, 10}, {4, 8, 9}};
Plus @@@ (listparallel2/2) // Max

21/2

Obviously the whole optimization process takes times, and for short jobs is not necessary.

You can also use the simple Minimize to get the another optimal result

n = 12; m = 4; c = Range@n/2 // N;
solution = Minimize[{z, {
          Table[Sum[x[i, j], {j, m}] == 1, {i, n}],
          Table[Sum[c[[i]] x[i, j], {i, n}] <= z, {j, m}],

          Table[{x[i, j] \[Element] Integers, 0 <= x[i, j] <= 1}, {i, 
            n}, {j, m}]
          }},
        Flatten@{Array[x, {n, m}], z}
        ] // RepeatedTiming // EchoFunction["Timing:", First] // 
     Last // EchoFunction["Optimal value:", First] // Last;
GroupBy[Cases[solution, (x[ij__] -> 1) :> {ij}], 
  Last -> First] // Values

Optimal value: 10.

{{1, 2, 3, 4, 10}, {5, 6, 7}, {8, 12}, {9, 11}}

edited Feb 24 '16 at 08:33

answered Feb 22 '16 at 18:25

unlikely

7,103
20
52

Many thanks! This seems to be a really nice approach. Would you mind elaborating a little bit on what your constraints represent? I would like to fully understand this approach using LinearProgramming – Lukas Feb 22 '16 at 18:54
@Lukas I added an introduction. You can also check my answrs to the follwoing questions: http://mathematica.stackexchange.com/questions/47211/elegant-solution-to-the-zebra-logic-puzzle-einsteins-riddle?s=1|1.0737 and http://mathematica.stackexchange.com/questions/107859/puzzle-with-mathematica because I used LinearProgramming – unlikely Feb 22 '16 at 19:46
Thanks alot! This goes beyond what I imagined and excellently answers my question. Now that you have explained the constraints it is all clear. Also thanks for further references, will take a look. – Lukas Feb 22 '16 at 19:49
@Lukas Thank you, although I do not know if it responds to your initial question as this optimization deserves to be made for long jobs – unlikely Feb 22 '16 at 20:37
I agree, it does not 100% answer the question as it is formulated because I ask for some way to Partition. Yet, my actual problem is some sort of a Table with many elements where timings scale linearly with big slope. Your approach is way better than the approximation I asked for in these cases. I extended the question a little – Lukas Feb 22 '16 at 20:46
Unfortunately, there is an issue. The kernel crashes if n is larger than a certain threshold. I'll put this issue into a new question and update this question accordingly. Sorry for revoking the Accept (temporarily), but until this is solved I would like to leave the question "open". – Lukas Feb 25 '16 at 12:28
I decided to accept this answer again. The kernel crash seems to be unique to some configurations. – Lukas Feb 25 '16 at 21:35
@Lukas, are you sure the Kernel crash is caused by LinearProgramming and not, for example, while evaluating some of its arguments? Try building arguments before maybe – unlikely Feb 25 '16 at 21:39
I will investigate this issue. I'll soon be at work and hope that I can reproduce it there (different system), and if so, I will try what you suggested - otherwise I need to wait until I'm back home – Lukas Feb 26 '16 at 07:10
I am pretty sure it must have to do with LinearProgramming. Everything that is set before the LinearProgramming and everything that is called inside evaluates instantly without any issues (even for very large n). Only when I evaluate the LinearProgramming the kernel crashes after while. On a Linux x64 running 10.0.2, the kernel seems to stay kind of alive but Mathematica does not return a result even after an hour - and it seems to freeze. – Lukas Feb 27 '16 at 11:54

score 1 · Answer 2 · answered Feb 25 '16 at 16:19

Following your today comment, I'll add another answer with another method of partitioning.

If you can accept a sub optimal partitioning, you can also apply the following strategy:

n = 12; m = 4;
j = Range@n;
c = j/2;

jd = Fold[
   MapAt[Append[#2], #, Ordering[Total[#[[All, All, 2]], {2}], 1]] &,
   Table[{{0, 0}}, m],
   Transpose@{j, c} // SortBy[Minus@*Last]
   ][[All, 2 ;; -1]];

jd[[All, All, 1]]
Max@Total[jd[[All, All, 2]], {2}]

{{12, 5, 4}, {11, 6, 3}, {10, 7, 2}, {9, 8, 1}}

21/2

Thanks! Although this is not optimal, it at least yields solutions reliably without crashing kernel... — Lukas, Feb 25 '16 at 21:32

Elegant way to partition list to be faster than `ParallelTable` (using `ParallelSubmit`), or some alternative?

Update

2 Answers2

Linked