You could use slightly adapted version of merge step of merge sort. For lists with lengths $n_1$ and $n_2$, where first is already sorted, sorting second list and merging it with first should run in $O(n_1 + n_2\log n_2)$ time.
Since we'll be compiling two versions of our function for Real and for Complex arguments, and each requires slightly different comparisons let's use simple quoting to simplify code building.
ClearAll[quote, unquote, eval]
SetAttributes[{quote, unquote}, HoldAllComplete]
quote@expr : Except@_Symbol :=
Unevaluated@expr /. {x : _unquote | _quote :> x, s_Symbol -> quote@s}
unquote@args___ := args
eval = # /. HoldPattern@quote@s_ :> s &;
Some type dependent code generating functions:
inline // ClearAll
inline // Attributes = HoldAllComplete;
inline[a_[[i_]] > b_[[j_]], Real] :=
quote[Compile`GetElement[a, i, 1] > Compile`GetElement[b, j, 1]]
inline[a_[[i_]] > b_[[j_]], Complex] := quote@With[
{aEl = Compile`GetElement[a, i, 1], bEl = Compile`GetElement[b, j, 1]},
Re@aEl > Re@bEl || Re@aEl == Re@bEl && Abs@Im@aEl > Abs@Im@bEl
]
inline[a_ != b_, eps_, Real] :=
quote[Abs@Subtract[a, b] > eps Max[Abs@a, Abs@b]]
inline[a_ != b_, eps_, Complex] := quote[
Abs@Subtract[Re@a, Re@b] > eps Max[Abs@Re@a, Abs@Re@b] ||
Abs@Subtract[Im@a, Im@b] > eps Max[Abs@Im@a, Abs@Im@b]
]
inline[a_[[i_]] != b_[[j_]], eps_, m_, type_] :=
quote@Module[{bool = False, aEl, bEl, l},
Do[
aEl = Compile`GetElement[a, i, l];
bEl = Compile`GetElement[b, j, l];
If[unquote@inline[aEl != bEl, eps, type],
bool = True;
Break[]
],
{l, 1, m}
];
bool
]
inline[a_[[i_]] = b_[[j_]], m_] := quote@Module[{l},
Do[a[[i, l]] = Compile`GetElement[b, j, l], {l, 1, m}]
]
In above function "comparison with tolerance" was implemented "manually" as shown by Carl Woll.
Library function performing merging for given types:
compiledUnionJoin // ClearAll
compiledUnionJoin[type : Real | Complex] := compiledUnionJoin@type =
eval@quote@Last@Compile[{{set, _type, 2}, {list, _type, 2}, {eps, _Real}},
Module[{i, j, k, n, nSet, nSorted, m, lastFromSet, sorted, res},
nSorted = Length@list;
sorted = Sort@list;
nSet = Length@set;
n = nSet + nSorted;
lastFromSet = False;
i = j = k = 1;
If[nSet > 0,
m = Length@Compile`GetElement[set, i];
If[nSorted > 0,
m = Min[m, Length@Compile`GetElement[sorted, j]]
]
(* else *),
If[nSorted > 0,
m = Length@Compile`GetElement[sorted, j];
(* else *),
m = 0;
]
];
res = Table[unquote@Replace[type, {Real -> 0., Complex -> I 0.}], {n}, {m}];
If[nSet > 0,
If[nSorted > 0 && unquote@inline[set[[i]] > sorted[[j]], type],
unquote@inline[res[[k]] = sorted[[j]], m];
++j
(* else *),
unquote@inline[res[[k]] = set[[i]], m];
lastFromSet = True;
++i
]
(* else *),
If[nSorted > 0,
unquote@inline[res[[k]] = sorted[[j]], m];
++j
(* else *),
k = 0
]
];
While[i <= nSet && j <= nSorted,
If[unquote@inline[set[[i]] > sorted[[j]], type],
If[unquote@inline[res[[k]] != sorted[[j]], eps, m, type],
++k;
unquote@inline[res[[k]] = sorted[[j]], m];
lastFromSet = False
];
++j
(* else *),
If[lastFromSet || unquote@inline[res[[k]] != set[[i]], eps, m, type],
++k;
unquote@inline[res[[k]] = set[[i]], m];
lastFromSet = True
];
++i
]
];
If[i <= nSet,
If[lastFromSet || unquote@inline[res[[k]] != set[[i]], eps, m, type],
++k;
unquote@inline[res[[k]] = set[[i]], m]
]
];
Do[
++k;
unquote@inline[res[[k]] = set[[l]], m]
,
{l, i + 1, nSet}
];
Do[
If[unquote@inline[res[[k]] != sorted[[l]], eps, m, type],
++k;
unquote@inline[res[[k]] = sorted[[l]], m]
],
{l, j, nSorted}
];
Take[res, k]
],
RuntimeOptions -> "Speed", CompilationTarget -> "C"
]
For comparison let's add compiledUnion from answer by Oleksandr R. adapted to both types of arguments:
compiledUnion // ClearAll
compiledUnion[type : Real | Complex, tol_] := compiledUnion[type, tol] =
Block[{Internal`$EqualTolerance = tol},
Compile[{{r, _type, 2}},
Block[{sorted = Sort@r, output, seen, current},
output = Internal`Bag[seen = First@sorted, 1];
Do[
If[i != seen, Internal`StuffBag[output, seen = i, 1]],
{i, sorted}
];
Partition[Internal`BagPart[output, All], Length@seen]
],
RuntimeOptions -> {"Speed", "CompareWithTolerance" -> True},
CompilationTarget -> "C"
]
]
Let's pre-compile both functions for both argument types. compiledUnion requires comparison tolerance at compile time, we'll use tol = 15:
compiledUnionJoin@Real;
compiledUnionJoin@Complex;
tol = 15;
eps = 10^(tol - $MachinePrecision);
compiledUnion[Real, tol];
compiledUnion[Complex, tol];
Our test data, both real and complex:
SeedRandom@0
n = 2 10^6;
dataR = RandomChoice[RandomReal[1, {n, 2}], n];
dataC = RandomChoice[RandomComplex[1, {n, 2}], n];
Calling compiledUnionJoin with empty list as first argument is equivalent to compiledUnion:
res1 = compiledUnion[Real, tol]@dataR; // AbsoluteTiming // First
res2 = compiledUnionJoin[Real][{}, dataR, eps]; // AbsoluteTiming // First
res1 === res2
res2 // Length
(* 0.780 *)
(* 0.671 *)
(* True *)
(* 1123205 *)
res1 = compiledUnion[Complex, tol]@dataC; // AbsoluteTiming // First
res2 = compiledUnionJoin[Complex][{}, dataC, eps]; // AbsoluteTiming // First
res1 === res2
res2 // Length
(* 0.9354 *)
(* 0.8026 *)
(* True *)
(* 1124283 *)
compiledUnionJoin is slightly faster than compiledUnion when normalizing single list.
Now task from OP, joining two lists from which first is already sorted and free of duplicates.
a = RandomReal[1, {1000, 2}];
a = RandomChoice[a, 1000];
a = compiledUnion[Real, tol][a];
b = RandomReal[1, {10, 2}];
b = RandomChoice[b, 10];
res1 = compiledUnion[Real, tol]@Join[a, b]; // AbsoluteTiming // First
res2 = compiledUnionJoin[Real][{}, Join[a, b], eps]; // AbsoluteTiming // First
res3 = compiledUnionJoin[Real][a, b, eps]; // AbsoluteTiming // First
res1 === res2 === res3
res3 // Length
(* 0.000079 *)
(* 0.0000385 *)
(* 0.0000189 *)
(* True *)
(* 565 *)
For our bigger test data:
a = Take[dataR, Round[.99 n]];
b = Drop[dataR, Round[.99 n]];
a = compiledUnionJoin[Real][{}, a, eps];
Length@a
Length@b
(* 1116705 *)
(* 20000 *)
res1 = compiledUnion[Real, tol]@Join[a, b]; // AbsoluteTiming // First
res2 = compiledUnionJoin[Real][{}, Join[a, b], eps]; // AbsoluteTiming // First
res3 = compiledUnionJoin[Real][a, b, eps]; // AbsoluteTiming // First
res1 === res2 === res3
res3 // Length
(* 0.166 *)
(* 0.0897 *)
(* 0.0425 *)
(* True *)
(* 1122434 *)
a = Take[dataC, Round[.99 n]];
b = Drop[dataC, Round[.99 n]];
a = compiledUnionJoin[Complex][{}, a, eps];
Length@a
Length@b
(* 1117685 *)
(* 20000 *)
res1 = compiledUnion[Complex, tol]@Join[a, b]; // AbsoluteTiming // First
res2 = compiledUnionJoin[Complex][{}, Join[a, b], eps]; // AbsoluteTiming // First
res3 = compiledUnionJoin[Complex][a, b, eps]; // AbsoluteTiming // First
res1 === res2 === res3
res3 // Length
(* 0.230 *)
(* 0.123 *)
(* 0.060 *)
(* True *)
(* 1123507 *)
If non-normalized list is about two orders of magnitude smaller than normalized one, as in OP, merging them using compiledUnionJoin is about four times faster than using compiledUnion on Join[a, b].
compiledUnionJoin[domain_] := compiledUnionJoin[domain] = (*your code with _Real -> _domain, and a choice of Complex/Real modification for the "comparison with tolerance" part*), but this runs nearly twice as slowly. I don't understand why that is, shouldn't this just define two separate compiled functions and choose one as necessary? Do you have a solution for this that doesn't slow the code down? – Jojo Dec 19 '17 at 17:55aandbsets, since comparison of complex numbers is more complicated this slowdown is expected. – jkuczm Dec 20 '17 at 23:56f[x_] := f[x] = (*f here*)When I run your code I get a bunch of compile errors, but perhaps this is because I'm on Mathematica 8? I will try on a newer version also when I can.compiledUnionJoin[Real][{{0.1}}, {{0.101}}, 0.1]for me evaluates toSequence[{{0.1}}, {{0.101}}, 0.1]– Jojo Dec 22 '17 at 15:15SameQis compilable from version 9, so it needed to be changed toEqual.Tablewith number as iterator is allowed since version 10.2, in previous versions iterator always needed to be a list. – jkuczm Dec 22 '17 at 20:26