I rewrote your procedural loop for a compiled function:
mergeSort = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
Block[{aIndex = 1, la = Length[a], lb = Length[b],
lc = Length[a] + Length[b],
bIndex = 1, cIndex = 1,
c3 = Table[0, {Length[a] + Length[b]}]},
For[cIndex = 1, aIndex <= la && bIndex <= lb, cIndex++,
c3[[cIndex]] =
If[a[[aIndex]] <= b[[bIndex]], a[[aIndex++]], b[[bIndex++]]]];
c3[[cIndex ;; lc]] =
If[aIndex > la, b[[bIndex ;; lb]], a[[aIndex ;; la]]];
c3]
, CompilationTarget -> "C", Parallelization -> True,
"RuntimeOptions" -> "Speed"]
I also went with corey's suggestion:
mergeSort2 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
Sort[Join[a, b]]
, CompilationTarget -> "C", Parallelization -> True,
"RuntimeOptions" -> "Speed"]
The timings are very close.
RepeatedTiming[mergeSort[a, b];]
RepeatedTiming[Sort[Join[a, b]];]
RepeatedTiming[mergeSort2[a, b];]
{0.299, Null}
{0.33, Null}
{0.330, Null}
The list order doesn't make a difference:
RepeatedTiming[mergeSort[b, a];]
RepeatedTiming[Sort[Join[b, a]];]
RepeatedTiming[mergeSort2[b, a];]
{0.302, Null}
{0.271, Null}
{0.308, Null}
As you have seen in your tests, Sort@Flatten[{a,b}] is significantly slower. Turns out, this is unsurprising:
mergeSort3 = Compile[{{a, _Integer, 1}, {b, _Integer, 1}},
Sort[Flatten[{a, b}]]
, CompilationTarget -> "C", Parallelization -> True,
"RuntimeOptions" -> "Speed"]
Calling mergeSort3[a,b] returns errors about non-tensor objects being generated: indeed, {a, b} is a list of two lists of different lengths.
A concatenation of two lists is likely very favorable for a real merge-sort (nothing to do with the names of my functions). Sort is (almost certainly) implemented in low-level code which would be very hard to beat, though I agree, that dropping the assumption that the two lists are pre-sorted introduces some overhead. However, this overhead seems to be on the order of the error-bar of timings.
Update
On request of kglr:
RepeatedTiming[#[[Ordering@#]] &@Join[a, b];]
RepeatedTiming[#[[Ordering@#]] &@Join[b, a];]
{0.33, Null}
{0.276, Null}
I've run this a few times and Join[b,a] seems consistently slightly faster than Join[a,b], however otherwise it's more or less the same as the other functions.
Sort@Join. The reason is thatSortis a very basic function, that I believe takes into account the hundreds known sorting algorithms, which are implemented in the most efficient way. (if it turns out I'm indeed wrong, let me know so that I can delete this comment.) – corey979 Mar 13 '17 at 18:06cf = Compile[{{x, _Real, 1}, {y, _Real, 1}}, Sort@Join[x, y]]; c4 = cf[a, b]; // AbsoluteTimingis by 15-17% slower than justSort@Join[a,b]. Compilingc3-approach looks inefficient to me. – corey979 Mar 13 '17 at 19:02mergeListfrom that link works out of the box. One just need to use it asmergeList[a, b, Less, CompileToC -> True], and the first run will be slower, since it would include the compilation time. I get0.3 secfor subsequent runs, vs.1.7 secfor the first sort and0.25 secfor the second one, so it is not bad. – Leonid Shifrin Mar 13 '17 at 20:41Sortis much faster onJoin[a, b]than on random data of the same length, so to some extent it is taking advantage of the fact that the lists are sorted. e.g.x = Join[a, b]; y = Reverse[x]; z = RandomSample[x]; Timing[Sort[#];] & /@ {x, y, z}– Simon Woods Mar 13 '17 at 22:13Timing[c2 = Sort[Join[b, a]];]instead ofTiming[c2 = Sort[Join[a, b]];](this doesn't work 100% of the times, but on average, this sorts the list around 80% faster). – AccidentalFourierTransform Mar 13 '17 at 23:19