2

This question has been addressed adequately for pattern matching. However, I have a similar task with floating point data. I am looking for an efficient way to delete duplicates within a tolerance value (preferably a conditional statement between the elements of the two lists). Kindly help.

2 Answers2

1
master = RandomReal[{0, 1}, 1000];
sub = RandomReal[{0, 1}, 100];

{roundm, rounds} = Round[{master, sub}, 0.01];

s2 = DeleteDuplicates[Join[roundm, {Null}, rounds]] /. {a__, Null, b___} :> {b};

sub2 = Extract[sub, Flatten[Position[rounds, #] & /@ s2, 1]]

sub2 contains all the reals in sub that do not match master within the tolerance specified in Round.

For a faster version use makePositionFunction in the last step.

makePositionFunction[f_Symbol, data_, level_: {-1}] := Block[{},
  ClearAll[f];
  Reap[
   MapIndexed[Sow[#2, #1] &, data, level, Heads -> True],
   _, (f[#] = #2) &];
  f[other_] := Position[data, other, level]]

makePositionFunction[pos, rounds];

sub2 = Extract[sub, Flatten[pos /@ s2, 1]]
Chris Degnen
  • 30,927
  • 2
  • 54
  • 108
1
SeedRandom[1]
ClearAll[list, masterlist]
{list, masterlist} = Round[RandomReal[1, {2, 10}], 0.01];
tolerance = .02;

dist = Nearest[masterlist->"Distance"];

Pick[list, dist[#][[1]]>tolerance & /@ list] (* or *)
Pick[list, UnitStep[dist[#][[1]] & /@ list - tolerance], 1]

{0.11, 0.79, 0.07, 0.54, 0.7}

dist0 = Nearest[masterlist -> {"Element", "Distance"}];
res = {#, dist0[#][[1]]} & /@ list;

Grid[Prepend[If[#[[2, -1]] > tolerance, Flatten@Map[Style[#, Red, Bold] &, #, {-1}], 
  Flatten@#] & /@ res, 
 {Column[{"element in", "list"}, Alignment -> Center], 
  Column[{"Nearest" , "element in", "masterlist"}, Alignment -> Center], "distance"}], 
Dividers -> All, Alignment -> {Center, Center}]

enter image description here

kglr
  • 394,356
  • 18
  • 477
  • 896