9

I want to use DeleteCases (or any appropriate function) to remove elements of a list, which in turn are lists of fixed size. The rule I wish to apply is that any element of the list which already appear elsewhere in the list, up to any number of -1 is to be removed. But I don't know how to do this kind of tricky pattern matching. For example, I have the following list:

myData = {{h -> 255.155, c -> 0, s -> -10000.},
 {h -> -255.155, c -> 0, s -> 10000.}, 
 {h -> 0, c -> 0, s -> 10000.},
 {h -> 0, c -> 0, s -> -10000.}, 
 {h -> 255.155, c -> 0, s -> 10000.},
 {h -> -255.155, c -> 0, s -> -10000.},
 {h -> -255.155, c -> 1870.83, s -> 3535.53},
 {h -> 255.155, c -> -1870.83, s -> -3535.53},
 {h -> 0, c -> 1870.83, s -> 3535.53}, 
 {h -> 0, c -> -1870.83, s -> -3535.53},
 {h -> 255.155, c -> 1870.83, s -> 3535.53},
 {h -> -255.155, c -> -1870.83, s -> -3535.53},
 {h -> 255.155, c -> -4000., s -> 0},
 {h -> -255.155, c -> 4000., s -> 0},
 {h -> 0, c -> 4000., s -> 0}, 
 {h -> 0, c -> -4000., s -> 0},
 {h -> 255.155, c -> 4000., s -> 0},
 {h -> -255.155, c -> -4000., s -> 0}, 
 {h -> 255.155, c -> 1870.83, s -> -3535.53},
 {h -> -255.155, c -> -1870.83, s -> 3535.53}, 
 {h -> 0, c -> 1870.83, s -> -3535.53},
 {h -> 0, c -> -1870.83, s -> 3535.53},
 {h -> 255.155, c -> -1870.83, s -> 3535.53}, 
 {h -> -255.155, c -> 1870.83, s -> -3535.53},
 {h -> 255.155, c -> 0, s -> 0}, 
 {h -> -255.155, c -> 0, s -> 0},
 {h -> 0, c -> 0, s -> 0}}

As can be seen the first element myData[[1]] (which is a list of three items) is a duplicate (up to the minus signs) myData[[2]] myData[[5]] myData[[6]]. But I don't know how to get Mathematica to remove these.

As an added bonus, it would be nice that, among the ones that are duplicates up to -1, the one with the least number of negatives is kept, and all others are removed. (In the example above, myData[[5]] would be among those that would be kept.)

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
QuantumDot
  • 19,601
  • 7
  • 45
  • 121

5 Answers5

9

If I have understood what you want, here is one approach:

DeleteDuplicates[myData, Abs@#1[[All, 2]] == Abs@#2[[All, 2]] &]

I get:

matrix

Here you don't have the order problem.

Murta
  • 26,275
  • 6
  • 76
  • 166
9

The problem with both Union - based and DeleteDuplicates - based solutions is that both functions have quadratic complexity in the size of the list, for an explicit comparison function. Here is code which should be much faster for larger lists:

Reap[
  Sow[#, Abs[{{h, c, s} /. #}]] & /@ myData, 
  _, 
  First@#2[[Ordering[UnitStep[-{h, c, s} /. #2]]]] &
][[2]]

(*
  {{h->255.155,c->0,s->10000.},{h->0,c->0,s->10000.},
  {h->255.155,c->1870.83,s->3535.53},{h->0,c->1870.83,s->3535.53},
  {h->255.155,c->4000.,s->0},{h->0,c->4000.,s->0},
  {h->255.155,c->0,s->0},{h->0,c->0,s->0}}
*)

It is based on tagging the list entries with some function of the entry that serves as an equivalence tag, and then postprocessing the resulting lists. It should have a near-linear complexity with respect to the length of the list.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • When I think that I know something in Mathematica, always comes Leonid a better replica. :p 1+ – Murta Feb 05 '13 at 21:47
  • @Murta Thanks for the upvote :-) – Leonid Shifrin Feb 05 '13 at 21:56
  • This is what GatherBy does as well ("It is based on tagging the list entries with some function of the entry that serves as an equivalence tag") – Szabolcs Feb 05 '13 at 23:12
  • @Szabolcs Sure. Our solutions are similar. No idea why yours received less votes (I voted for it, of course). Normally GatherBy comes to my mind first, before Reap-Sow, but for some reason here it didn't. GatherBy is actually several times faster, but here it is not seen, since the bottleneck in your code is elsewhere (mapping, rule application). – Leonid Shifrin Feb 05 '13 at 23:22
  • This works marvelously well. I've used Mathematica for six years, and I have never seen # /@ & #2, so I don't even know how your code works... :( – QuantumDot Feb 06 '13 at 04:05
  • @QuantumDot Thanks, good that it works. Re: # /@ & #2 - no, what I had there was just rule application {h, c, s} /. #2, where #2 is a second argument which, in the context of last optional argument of Reap, gives a list of collected results for a given tag. – Leonid Shifrin Feb 06 '13 at 04:10
  • Actually, I am completely unable to understand how your one-liner works. I would like to learn how to pry it apart so that I can adapt it to larger code. How can we talk on the side about this? What's really confusing me is what #2 is referring to. Is it the second argument of myData? – QuantumDot Feb 06 '13 at 22:15
  • @QuantumDot Sorry, don't have much time today. Look at the documentation for Reap, it has the third optional argument, which is a function f[tag,collected]. So, #2 simply means a list of results collected by Reap for a given tag. Since I am not interested in the tag itself, #1 is not present in the code. – Leonid Shifrin Feb 07 '13 at 16:55
7

Here's my solution:

classes = GatherBy[myData, Abs[{h, c, s} /. #] &]
classes[[All, 1]]

To get the ones with the least number of negatives,

MaxBy[list_, fun_] := list[[First@Ordering[fun /@ list, -1]]]
MaxBy[#, Total[Sign /@ ({h, c, s} /. #)] &] & /@ classes

(I use this MaxBy function often, so I tend to think of solutions in terms of it.)

Szabolcs
  • 234,956
  • 30
  • 623
  • 1,263
4

This does the first part of the job

Union[myData, SameTest -> (Abs[#1[[All, 2]]] == Abs[#2[[All, 2]]] &)]

I am looking into the ordering. Unfortunately Mathematica built-in sorting is from the smaller number to the greatest.

Batracos
  • 2,105
  • 14
  • 14
4

You can use DeleteDuplicates. Here is a version that explicitly uses the variable names h, c, s. Therefore the rules will be considered duplicates even if the ordering is different.

DeleteDuplicates[myData, SameQ @@ (Abs@{h, c, s} /. {#1, #2}) &]

{{h -> 255.155, c -> 0, s -> -10000.}, {h -> 0, c -> 0, s -> 10000.}, {h -> -255.155, c -> 1870.83, s -> 3535.53}, {h -> 0, c -> 1870.83, s -> 3535.53}, {h -> 255.155, c -> -4000., s -> 0}, {h -> 0, c -> 4000., s -> 0}, {h -> 255.155, c -> 0, s -> 0}, {h -> 0, c -> 0, s -> 0}}

To keep only the results you want you can Sort the list first, and then Reverse it (compare with the result from above):

DeleteDuplicates[Reverse@Sort@myData, SameQ @@ (Abs@{h, c, s} /. {#1, #2}) &]

{{h -> 255.155, c -> 4000., s -> 0}, {h -> 255.155, c -> 1870.83, s -> 3535.53}, {h -> 255.155, c -> 0, s -> 10000.}, {h -> 255.155, c -> 0, s -> 0}, {h -> 0, c -> 4000., s -> 0}, {h -> 0, c -> 1870.83, s -> 3535.53}, {h -> 0, c -> 0, s -> 10000.}, {h -> 0, c -> 0, s -> 0}}

einbandi
  • 4,024
  • 1
  • 23
  • 39