11

I have an ordered list of $N$ numbers, and I want to find out whether adjacent numbers are SameQ or not, returning a list of 0s and 1s of length $N-1$. One approach is something like:

pairwiseBoole[y_] := Boole[SameQ @@@ Transpose[{Rest[y],Most[y]}]]

but this is extremely slow:

data = Union @ RandomReal[E + {0, 10^-7}, 10^6];
Length[data]

res = pairwiseBoole[data]; //AbsoluteTiming
Count[res, 0]

997747

{0.640326, Null}

993254

If I naively compile this, I can increase the speed by an order of magnitude, but I think it should be possible to write a function that is still faster, since:

Rest[y] - Most[y]; //AbsoluteTiming

{0.009215, Null}

and SameQ is true as long as they only differ in their last binary digit (note that one might need to know about Internal`$SameQTolerance).

How can I perform this computation as quickly as possible?

Update

Based on the answers/comments, I should have provided a better example, one where the fact that SameQ uses a relative and not absolute tolerance plays a more important role. So, the kind of data I'm using is more similar to the following (corrected to output a packed array thanks to @Szabolcs):

data = Developer`ToPackedArray @ Union[
    RandomChoice[100 Sin[Subdivide[0, 7, 40]], 10^6] +
    RandomReal[{0, 10^-7}, 10^6]
];

And, it is vital that the output is the same as what one would get using pairwise SameQ on list elements. Only results that produce the same output as pairwiseBoole are acceptable.

My compiled version

By the way, here is the compiled version that I had come up with:

pairwiseBooleC = Compile[{{x,_Real,1}},
    Table[Boole[SameQ[x[[i]], x[[i+1]]]], {i, Length[x]-1}]
];

and a comparison with my pairwiseBoole:

r1 = pairwiseBoole[data]; //RepeatedTiming
r2 = pairwiseBooleC[data]; //RepeatedTiming

r1 === r2

{0.63, Null}

{0.042, Null}

True

Some background, I am writing a RegionMember type function that can handle multiple regions, returning which region a point belongs to. In order to get edge cases correct it is vital that I use something equivalent to SameQ when comparing adjacent elements.

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
  • 1
    Why not Boole[Developer`PartitionMap[Apply[SameQ], y, 2, 1]]? – J. M.'s missing motivation Sep 04 '17 at 07:46
  • 1
    Maybe something based on 1 - Unitize[Rest[data] - Most[data]]? It would need to be investigated how Unitize checks for zeros (with ==, with === or with a different tolerance than either of these two). How important is it to you to have precisely the same tolerance as SameQ? – Szabolcs Sep 04 '17 at 08:06
  • 1
    Why not start with Differences? – Mr.Wizard Sep 04 '17 at 08:08
  • 1
    @Mr.Wizard It's slower than Rest[data] - Most[data] :-( Probably Differences is implemented naïvely, while Rest[data] - Most[data] makes use of the full power of vector arithmetic though the MKL. – Szabolcs Sep 04 '17 at 08:09
  • @Szabolcs That's a surprise; it didn't use to be the case. – Mr.Wizard Sep 04 '17 at 08:11
  • 5
    With BoolEval, the above could be written more readable as BoolEval[Rest[data] == Most[data]], with minimal performance degradation. Also, I just realize that the difference based method can't easily reproduce the tolerance behaviour of SameQ (because it uses relative, not absolute tolerances). – Szabolcs Sep 04 '17 at 08:13
  • 1
    Split[data] is only 3.5 times faster than pairwiseBoole. Perhaps this is the limit of what can be achieved when using SameQ instead of arithmetic tricks ... (Split does unpack though, which takes significant time.) – Szabolcs Sep 04 '17 at 08:16
  • @Szabolcs excellent point regarding tolerance – Mr.Wizard Sep 04 '17 at 08:18
  • 4
    Systems like MATLAB or numpy are very good at vectorized computation. They can make use of SIMD instructions and multi-threading while comparing two arrays (== or >). Mathematica cannot do this. The algorithmically equivalent constructs (e.g. MapThread[compare, {arr1, arr2}]) are much slower in Mathematica. A workaround is to translate the comparisons into vector arithmetic (Subtract, Unitize, UnitStep, etc.), which is just as fast in Mathematica as in MATLAB/numpy. Unfortunately, such code is both hard to read and write. My BoolEval package tries to fix this ... – Szabolcs Sep 04 '17 at 09:02
  • 4
    ... by providing a convenient notation in terms of ==, >, and translating that into arithmetic transparently. But your example shows that arithmetic isn't really a good replacement for such tasks: it changes the tolerance behaviour. I think the lack of vectorized comparisons is a significant handicap for Mathematica. Years ago I sent suggestions to support to improve this situation, but I don't really have any hope that anything will happen ... – Szabolcs Sep 04 '17 at 09:04
  • 3
    Observations: The new data you provided is not packed, which causes Rest[data] - Most[data]; to be slower than pairwiseBooleC on this. (This really confused me.) This gives a marginal speedup compared to pairwiseBooleC: cf = Compile[{{x, _Real}, {y, _Real}}, Boole[x === y], RuntimeAttributes -> {Listable}, Parallelization -> True]; cf[Rest[data], Most[data]] – Szabolcs Sep 04 '17 at 16:05
  • @Szabolcs Thanks, I forgot about packing. Also, your Compile version is a nice improvement, thanks! – Carl Woll Sep 04 '17 at 16:14

3 Answers3

8

On my computer simple LibraryFunction, that compares subsequent elements in a loop, is fastest.

pairwiseBooleJkuczm = Last@Compile[{{data, _Real, 1}},
  Table[
    Boole[Compile`GetElement[data, i] === Compile`GetElement[data, i + 1]],
    {i, Length@data - 1}
  ],
  CompilationTarget -> "C",
  RuntimeOptions -> {"CatchMachineIntegerOverflow" -> False, "CompareWithTolerance" -> True}
];

SeedRandom@0;
data = Developer`ToPackedArray@
   Union[RandomChoice[100 Sin[Subdivide[0, 7, 40]], 10^6] + 
     RandomReal[{0, 10^-7}, 10^6]];

(res1 = pairwiseBoole@data) // MaxMemoryUsed // RepeatedTiming
(res2 = pairwiseBooleC@data) // MaxMemoryUsed // RepeatedTiming
(res3 = pairwiseBooleJkuczm@data) // MaxMemoryUsed // RepeatedTiming
res1 === res2 === res3
(* {0.563, 175791672} *)
(* {0.044,  15981232} *)
(* {0.013,   7990696} *)
(* True *)
jkuczm
  • 15,078
  • 2
  • 53
  • 84
  • One must use a relative tolerance. Please see my update for a more representative data sample. – Carl Woll Sep 04 '17 at 15:39
  • @CarlWoll I've changed comparison to to SameQ with "CompareWithTolerance" -> True, which, I believe, works the same as uncompiled SameQ on machine reals. – jkuczm Sep 04 '17 at 16:38
  • 1
    Yes, that works, but we lose some of the speed advantage that you had before. Possibly that's related to the RuntimeOptions you needed to include. I came up with an alternate approach in my answer that avoids those RuntimeOptions but perhaps I've introduced a bug by doing so. – Carl Woll Sep 04 '17 at 22:53
7

Here is an approach assuming that the SameQ tolerance is not changed from its default.

\$MachineEpsilon

$MachineEpsilon is the smallest number that when added to 1. produces a different number. This means that 1. and 1.+ $MachineEpsilon will differ by 1 in the last bit of the base 2 representation of the mantissa. Hence the two numbers will satisfy the SameQ predicate. Any larger numbers will not. This suggests the following algorithm for deciding the SameQ predicate:

Algorithm

Let $l$ and $u$ be two numbers to be compared, with $l<u$. Then, the SameQ predicate should be equivalent to:

$$\frac{u - l}{\left|u\right|} \le \epsilon$$

We can rewrite this as:

$$\epsilon \left|u\right| + (l-u) \ge 0$$

Finally, if we use a Heaviside theta function where 0. is mapped to 1, then we can use the following arithmetical expression:

$$\theta(\epsilon \left|u\right| + (l-u))$$

which will return 1 if $l$ and $u$ are SameQ, and 0 otherwise. I'm not sure about the correctness of this algorithm, but it passes all of the tests that I could think of.

Mathematica implementation

Here is the Mathematica implementation:

pairwiseSameQ[data_] := With[{u=Rest[data], l=Most[data]},
    UnitStep[$MachineEpsilon Abs[u] + (l-u)]
]

Tests

Let's compare this function to my slow function:

data = Developer`ToPackedArray @ Union[
    RandomChoice[100 Sin[Subdivide[0,7,40]], 10^6] + 
    RandomReal[{0, 10^-7}, 10^6]
];

r1 = pairwiseBoole[data]; //RepeatedTiming
r2 = pairwiseSameQ[data]; //RepeatedTiming

r1 === r2

{0.651, Null}

{0.0109, Null}

True

We get the same results, and the function is faster than the Compile alternatives in the other answers. What happens if we Compile this approach?

Compile

Following @jkuczm's approach, a compiled version might look like:

pairwiseSameQC = With[{e=$MachineEpsilon},
    Last @ Compile[{{d, _Real, 1}},
        Table[
            Boole[e Abs[Compile`GetElement[d,i+1]]>=(Compile`GetElement[d,i+1]-Compile`GetElement[d,i])],
            {i, Length@d-1}
        ],
        CompilationTarget->"C",
        RuntimeOptions->"Speed"
    ]
];

One final comparison:

r3 = pairwiseSameQC[data]; //RepeatedTiming

r1 === r2 === r3

{0.0016, Null}

True

Now, that's pretty fast!

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
6

Best so far is BoolEval[Rest[y] == Most[y]]

Mathematica graphics


data = Union @ RandomReal[E + {0, 10^-7}, 10^6];

pwb1[y_] := Boole[SameQ @@@ Transpose[{Rest[y], Most[y]}]]

pwb1[data]; // RepeatedTiming
(* {0.730, Null} *)

pwb2[y_] := Inner[Boole @* SameQ, Rest[y], Most[y], List]

pwb2[data]; // RepeatedTiming
(* {0.9117, Null} *)

pwb3[y_] := Boole @ Inner[SameQ, Rest[y], Most[y], List]

pwb3[data]; // RepeatedTiming
(* {0.66, Null} *)

pwb4[y_] := Boole[Developer`PartitionMap[SameQ, y, 2, 1]]

pwb4[data]; // RepeatedTiming
(* {0.55, Null} *)

pwb5[y_] := Unitize @ Differences[y]

pwb5[data]; // RepeatedTiming
(* {0.024, Null} *)

pwb6[y_] := 1 - Unitize @ Differences[y]

pwb6[data]; // RepeatedTiming
(* {0.0269, Null} *)

pwb7[y_] := 1 - Unitize[Rest[y] - Most[y]]

pwb7[data]; // RepeatedTiming
(* {0.0203, Null} *)

pwb8[y_] := 1 - Unitize[Rest[y] - Most[y], 1*^-17]

pwb8[data]; // RepeatedTiming
(* {0.0233, Null} *)

pwb9[y_] := ListConvolve[{1, 1}, y, {-1, 1}, {}, Times, SameQ]

pwb9[data]; // RepeatedTiming
(* {0.712, Null} *)

pwb10[y_] := BoolEval[Rest[y] == Most[y]]

pwb10[data]; // RepeatedTiming
(* {0.00671, Null} *)

pwb11[y_] := Subtract[1, Unitize[Subtract[Rest[y], Most[y]]]]

pwb11[data]; // RepeatedTiming
(* {0.0165, Null} *)
rhermans
  • 36,518
  • 4
  • 57
  • 149
  • 1
    Do you care if they give the same output? I'm finding e.g. pwb1 not equal to the results of 6 or 7. e.g. with resX=pwbX[data]; I'm finding: res6 === res7 === 1 - res5 === 1 - res4 and res1 === res2 === res3, but the two sets of results are not equivalent. – John Joseph M. Carrasco Sep 04 '17 at 09:01
  • @JohnJosephM.Carrasco pwb5 clearly has a reverse logic, and I assume the other difference are about the tolerances, as discussed by Szabolcs and Mr.Wizard. This is a community wiki answer, feel free to edit to improve and clarify. Also add your own benchmarks. – rhermans Sep 04 '17 at 09:14
  • 1
    I don't have anything to add about the performance (all my games were much slower), just suggesting that the tolerance difference in output should be possibly emphasized in answer (obvious to all the cognoscenti , but ...) – John Joseph M. Carrasco Sep 04 '17 at 09:19
  • Notice that Unitize[x,dx] does allow for specific tolerance specifications. – rhermans Sep 04 '17 at 09:37
  • 2
    Specifying tolerance for Unitize seems to add about 25% to computation time (on my system). Perhaps that should be taken into consideration? – aardvark2012 Sep 04 '17 at 09:54
  • Can you include ListConvolve[{1, 1}, y, {-1, 1}, {}, Times, SameQ] as well? – J. M.'s missing motivation Sep 04 '17 at 11:28
  • 4
    Note that a - b is parsed to Plus[a, Times[-1, b]] usually this doesn't matter, but for vectorized operations on large arrays Subtract[a, b] is faster. So Subtract[1, Unitize[Subtract[Rest@data, Most@data]]] should be faster than 1 - Unitize[Rest[data] - Most[data]]. – jkuczm Sep 04 '17 at 12:51
  • That's weird because BoolEval (i.e. pwb10) translates to pwb11. It should be slightly slower than pwb11 (due to extra overhead), and it is slower on my machine. – Szabolcs Sep 04 '17 at 13:16
  • @rhermans decode.m returns 404 :-( I have a half-written blog post on BoolEval which I could not publish because of this performance inconsistency. Could it be affecting you? – Szabolcs Sep 04 '17 at 13:27
  • It is vital that the output is the same as what one gets with pwb1, so only pwb1, pwb2 and pwb3 provide acceptable answers, and I'm hopeful that there is an answer that is quite a bit faster. – Carl Woll Sep 04 '17 at 15:41