15

I need to combine two long, already-sorted lists into one sorted list. My effort below takes advantage of the fact that the lists are already sorted; but it's far slower than just using Join and Sort, which ignore the fact that the lists were sorted.

a = Sort[RandomInteger[10^8, 10^6]];
b = Sort[RandomInteger[10^8, 10^7]];
Timing[c1 = Sort[Flatten[{a, b}]];]
Timing[c2 = Sort[Join[a, b]];]

Timing[ 
 aIndex = 1;
 bIndex = 1;
 c3 = ConstantArray[0, Length[a] + Length[b]];
 cIndex = 1; 
 While[aIndex <= Length[a] && bIndex <= Length[b],c3[[cIndex++]] = If[a[[aIndex]] <= b[[bIndex]], a[[aIndex++]], b[[bIndex++]]]];

 c3[[cIndex ;; Length[c3]]] = If[aIndex > Length[a], b[[bIndex ;; Length[b]]], a[[aIndex ;; Length[a]]]];]

(* {1.14063, Null} {0.24063, Null} {70.4063, Null} *)

Can we do better?

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
Jerry Guern
  • 4,602
  • 18
  • 47
  • 2
    Intuitively (although I might be wrong, of course) I think that you won't achieve any substantial improvement compared to Sort@Join. The reason is that Sort is a very basic function, that I believe takes into account the hundreds known sorting algorithms, which are implemented in the most efficient way. (if it turns out I'm indeed wrong, let me know so that I can delete this comment.) – corey979 Mar 13 '17 at 18:06
  • 1
    Improvements are likely possible with a compiled function. The algorithm is largely correct. – LLlAMnYP Mar 13 '17 at 18:33
  • 1
    cf = Compile[{{x, _Real, 1}, {y, _Real, 1}}, Sort@Join[x, y]]; c4 = cf[a, b]; // AbsoluteTiming is by 15-17% slower than just Sort@Join[a,b]. Compiling c3-approach looks inefficient to me. – corey979 Mar 13 '17 at 19:02
  • @corey979 Got a better approach than my c3? – Jerry Guern Mar 13 '17 at 19:58
  • @corey979 I understand that Sort is very basic and optimized, but I don't understand how even the best Sorting algorithm can compete with not having to sort at all. – Jerry Guern Mar 13 '17 at 20:06
  • 4
    Closely related to http://mathematica.stackexchange.com/questions/6931/implementing-a-function-which-generalizes-the-merging-step-in-merge-sort – Alan Mar 13 '17 at 20:35
  • 1
    @Alan Thanks for reminding about it. The function mergeList from that link works out of the box. One just need to use it as mergeList[a, b, Less, CompileToC -> True], and the first run will be slower, since it would include the compilation time. I get 0.3 sec for subsequent runs, vs. 1.7 sec for the first sort and 0.25 sec for the second one, so it is not bad. – Leonid Shifrin Mar 13 '17 at 20:41
  • 5
    Sort is much faster on Join[a, b] than on random data of the same length, so to some extent it is taking advantage of the fact that the lists are sorted. e.g. x = Join[a, b]; y = Reverse[x]; z = RandomSample[x]; Timing[Sort[#];] & /@ {x, y, z} – Simon Woods Mar 13 '17 at 22:13
  • 1
    You can get a slight (and sometimes a noticeable) improvement by using Timing[c2 = Sort[Join[b, a]];] instead of Timing[c2 = Sort[Join[a, b]];] (this doesn't work 100% of the times, but on average, this sorts the list around 80% faster). – AccidentalFourierTransform Mar 13 '17 at 23:19
  • @AccidentalFourierTransform Are you saying put the put the longer list first? – Jerry Guern Mar 14 '17 at 03:20

2 Answers2

7

I rewrote your procedural loop for a compiled function:

mergeSort = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
  Block[{aIndex = 1, la = Length[a], lb = Length[b], 
    lc = Length[a] + Length[b],
    bIndex = 1, cIndex = 1,
    c3 = Table[0, {Length[a] + Length[b]}]},
   For[cIndex = 1, aIndex <= la && bIndex <= lb, cIndex++, 
    c3[[cIndex]] = 
     If[a[[aIndex]] <= b[[bIndex]], a[[aIndex++]], b[[bIndex++]]]];
   c3[[cIndex ;; lc]] = 
    If[aIndex > la, b[[bIndex ;; lb]], a[[aIndex ;; la]]];
   c3]
  , CompilationTarget -> "C", Parallelization -> True, 
  "RuntimeOptions" -> "Speed"]

I also went with corey's suggestion:

mergeSort2 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
  Sort[Join[a, b]]
  , CompilationTarget -> "C", Parallelization -> True, 
  "RuntimeOptions" -> "Speed"]

The timings are very close.

RepeatedTiming[mergeSort[a, b];]
RepeatedTiming[Sort[Join[a, b]];]
RepeatedTiming[mergeSort2[a, b];]

{0.299, Null}
{0.33, Null}
{0.330, Null}

The list order doesn't make a difference:

RepeatedTiming[mergeSort[b, a];]
RepeatedTiming[Sort[Join[b, a]];]
RepeatedTiming[mergeSort2[b, a];]

{0.302, Null}
{0.271, Null}
{0.308, Null}

As you have seen in your tests, Sort@Flatten[{a,b}] is significantly slower. Turns out, this is unsurprising:

mergeSort3 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
  Sort[Flatten[{a, b}]]
  , CompilationTarget -> "C", Parallelization -> True, 
  "RuntimeOptions" -> "Speed"]

Calling mergeSort3[a,b] returns errors about non-tensor objects being generated: indeed, {a, b} is a list of two lists of different lengths.

A concatenation of two lists is likely very favorable for a real merge-sort (nothing to do with the names of my functions). Sort is (almost certainly) implemented in low-level code which would be very hard to beat, though I agree, that dropping the assumption that the two lists are pre-sorted introduces some overhead. However, this overhead seems to be on the order of the error-bar of timings.

Update

On request of kglr:

RepeatedTiming[#[[Ordering@#]] &@Join[a, b];]
RepeatedTiming[#[[Ordering@#]] &@Join[b, a];]

{0.33, Null}
{0.276, Null}

I've run this a few times and Join[b,a] seems consistently slightly faster than Join[a,b], however otherwise it's more or less the same as the other functions.

LLlAMnYP
  • 11,486
  • 26
  • 65
5

This question is related to: Complement on pre-sorted lists

Giving basically the same answer I did there I don't think Sort is being "wasteful" in this application and I doubt you will be able to substantially improve upon it. By definition sorting must still take place outside of a trivial case like all elements of one list being larger than any of those in the other.

As Simon Woods already observed the Sort algorithm is more efficient on lists that are partially ordered. Although the algorithm is not the same take as illustration How can I collect data for visualization of quick sort? Using SeedRandom[0]; s = RandomSample @ Range @ 100; the sort takes 55 steps. However if we pre-sort each half of the list with s = Join @@ Sort /@ Partition[s, 50]; the remaining sort only takes 22 steps.

Trying this test with Sort itself shows that it may be even more efficient in this regard.

SeedRandom[0];

s = RandomSample @ Range @ 2*^6;
s2 = Join @@ Sort /@ Partition[s, 1*^6];

Sort[s];  // RepeatedTiming // First
Sort[s2]; // RepeatedTiming // First
0.27

0.037

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 1
    I have checked Timing for Sort[s] and Sort[s2] using versions 5.2, 8.0.4, 10.4.1 and 11.0.1 on my dual-core laptop running Win7 x64 and got the following timings correspondingly: {0.686, 0.266}, {0.515, 0.047}, {0.641, 0.119}, {0.613, 0.112} (in version 5.2 I generated the initial list as s = Table[Random[Integer, {1, 2*^6}], {2*^6}];). – Alexey Popkov Mar 15 '17 at 18:35
  • Interesting, but I'm really not sure that the step count is accurate -- some steps are not shown in the animation since they don't change anything, but they may consume processing time. MMA could be doing a natural merge sort, perhaps? – LLlAMnYP Mar 16 '17 at 07:56
  • @LLlAMnYP Fair point regarding the step count, but it was only a rough illustration at best as it isn't working with the built-in sort algorithm. I am not knowledgeable enough regarding sort algorithms to attempt to tease out the algorithm used by Sort by probing performance; I mean I don't know if a heapsort could also exhibit the timings shown above for example. – Mr.Wizard Mar 16 '17 at 10:58
  • Don't know about the heapsort, the point about natural merge sort is that the very first thing it does is splitting the list to be sorted into monotonically increasing runs, which reduces the problem of two presorted sublists into a simple merge, so in this case this is O(2n) complexity – LLlAMnYP Mar 16 '17 at 11:11