Picking highest value non-adjacent groups within a set

Question

Imagine you are given 20 random numbers in a row. The original order must be maintained. From this set you can choose 2 groups of 3 numbers each. The position of these groups must be separated by at least 4 numbers.

The goal is to maximise the sum of values in both groups. How can you ensure that optimal groups are taken?

The main issue is that the rule to choose the first group may worsen the choice of the second group due to the constraint. The optimal solution may exclude the 'best' single group.

What method is there to solve this generally for any size of original set, size of groups, number of groups and length of constraint?

If I ask "exactly four numbers or at least four numbers?" will your answer be "yes"? — k20, Nov 06 '14 at 22:39
Edited to now say at least four numbers as you guessed correctly — KieranPC, Nov 08 '14 at 20:39

score 1 · Answer 1 · answered Nov 07 '14 at 04:49

1

You can write this as the following integer optimization problem, if I understand your question correctly: Let $i,j\in [1,20]$ be the starting indices of the two groups of three numbers, then you want to solve $$ \max_{i,j} x_i+x_{i+1}+x_{i+2}+x_j+x_{j+1}+x_{j+2} \\ i \ge 1 \\ j-i \ge 7 \\ j \le 18. $$ Like all integer optimization problems, it is likely difficult to find an exact answer. On the other hand, a greedy algorithms with complexity $O(N)$ is likely going to find the correct solution if you allow the size of your array of numbers $N$ to become large.

answered Nov 07 '14 at 04:49

Wolfgang Bangerth

55,373
59
119

As someone who spent a good chunk of their PhD work doing integer programming, I don't think this approach is necessary. While this problem can be posed as an integer program, and this formulation allows for much more complicated sets of constraints, for the problem formulation proposed by the OP, provably polynomial time algorithms are available (easily $O(N^{2})$, likely $O(N)$ as well), whereas any integer programming solver is going to use an exponential time algorithm and rely on heuristics for both cuts and exploration of the B&B tree to solve this problem in a reasonable amount of time. – Geoff Oxberry Nov 07 '14 at 20:20
@GeoffOxberry I posted almost the identical comment as yours, which I've replaced by an answer :) There is indeed an O(N) dynamic programming solution even for the generalization to more than two groups, and the memory and time complexity is (#groups)*(#numbers). – k20 Nov 07 '14 at 21:51
Your comments are of course entirely correct. The optimization problem is obviously $O(N^2)$ just by virtue of this being the size of the space of possible solutions. That also implies that even integer programming solvers will run in at most this complexity simply because the search tree isn't larger. – Wolfgang Bangerth Nov 08 '14 at 04:50
@WolfgangBangerth: A general purpose branch-and-bound code without any preprocessing, cut heuristics, or B&B tree search heuristics will enumerate a tree with $2^{N}$ leaves (the classical formulations are binary), and hopefully, by dumb luck, the tree search will fathom out large parts of it. CPLEX and Gurobi have all sorts of black magic to preprocess integer programs and to use various cuts and search heuristics to accelerate this process, and that might yield a search tree with $N^{2}$ leaves, but the details of both of those solvers are only known to the developers. – Geoff Oxberry Nov 08 '14 at 05:06
@GeoffOxberry: I'm not quite sure I understand. The $2^N$ tree is for a problem with $N$ binary variables. For $N$ variables that can each take on $M$ different values, you'd need a tree with $M^N$ nodes. Here, $M$ equals the number of possible values for the integer variable (which I had denoted by $N$) and $N$ is the number of variables -- namely, two: $i$ and $j$. So that leaves a tree with two levels and $M^2$ nodes. – Wolfgang Bangerth Nov 09 '14 at 19:36
@WolfgangBangerth: Depends how you formulate it. If you use binary variables to denote "this index will contribute a nonzero value in the objective function", you could have a binary variable for each index. There are alternate formulations that could require fewer variables, which probably contributes to the ambiguity, and solver performance depends very strongly on formulation (not just number of variables, but specifically, whether the LP relaxation of the integer feasible set is the convex hull of that feasible set). – Geoff Oxberry Nov 09 '14 at 19:45

k20 · Accepted Answer · 2014-11-07T19:53:56.590

Here's a brute force quadratic solution to the original problem, a linear solution to the original problem, and a brute force and generic linear solution for three groups.


import numpy as np

N = 20
G = 3
MINSEP = 4

def solve_cubic(a):
    r = range(N - G + 1)
    return max(
            (
                (a[i:i+G]+a[j:j+G]+a[k:k+G]).sum(), i, j, k
                ) for i in r for j in r for k in r if (
                    i + G + MINSEP &lt= j and j + G + MINSEP &lt= k))

def solve_quadratic(a):
    r = range(N - G + 1)
    return max(
            (a[i:i+G].sum() + a[j:j+G].sum(), i, j) for i in r for j in r if (
                j-i > G+MINSEP-1))

def solve_linear(a):
    sumc = np.cumsum(a)
    sumc = np.array([0] + sumc.tolist())
    sumg = sumc[G:] - sumc[:-G]
    rmax = np.maximum.accumulate(sumg[::-1])[::-1]
    scores = sumg[:-(MINSEP+G)] + rmax[(MINSEP+G):]
    i = np.argmax(scores)
    j = i + MINSEP + G + np.argmax(sumg[i + MINSEP + G:])
    return scores[i], i, j

def solve_linear_generic(numbers, groups):
    sumc = np.cumsum([0] + list(numbers))
    sumg = sumc[G:] - sumc[:-G]
    table = {}
    trace = {}
    for a in range(N - G + 2):
        a_prime = a - MINSEP - G
        for b in range(groups + 1):
            if a == 0:
                table[a, b] = 0 if b == 0 else -1
                trace[a, b] = ()
            else:
                c0 = table[a-1, b]
                tracec0 = trace[a-1, b]
                if b == 0:
                    table[a, b] = c0
                    trace[a, b] = tracec0
                else:
                    c1 = sumg[-a]
                    tracec1 = (N - G + 1 - a,)
                    if a_prime >= 0:
                        c1 += table[a_prime, b-1]
                        tracec1 = tracec1 + trace[a_prime, b-1]
                    if c0 &lt c1:
                        table[a, b] = c1
                        trace[a, b] = tracec1
                    else:
                        table[a, b] = c0
                        trace[a, b] = tracec0
    return (table[N - G + 1, groups],)  + trace[N - G + 1, groups]

def main():
    a = np.random.randint(100, size=N)
    print a
    print 'two groups:'
    print solve_quadratic(a)
    print solve_linear(a)
    print solve_linear_generic(a, 2)
    print 'three groups:'
    print solve_cubic(a)
    print solve_linear_generic(a, 3)

main()

I appreciate that, thanks k20 – KieranPC Nov 08 '14 at 20:38 — KieranPC, Nov 08 '14 at 20:38

Picking highest value non-adjacent groups within a set

2 Answers2