7

I have a list of lists that looks something like -

  data= {{920.093, 1135., 110.45, 1135., 1135.19, 1138.13, 920.093,850.453, 
  920.093, 921.905, 991.545, 859.285, 983.714, 1133.21, 
  1135.8}, {1509.96, 1907.41, 114.108, 1907.41, 1907.51, 1909.41, 
  1509.96, 1439.94, 1509.96, 1511.06, 1581.07, 1448.6, 1573.03, 
  1906.34, 1907.89}, ...}

I want to select only those lists where 8th element is always smaller than all the other elements of the list. I tried using-

   data1=Select[data,#[[1]]>#[[8]]&&#[[2]]>#[[8]]&&#[[4]]>#[[8]]...&]

But It looks quite stupid. Is there any other minimal way to impose this condition?

solphy101
  • 337
  • 2
  • 12

4 Answers4

7

You could for example use

res1 = Select[
 data,
 Function[{list}, AllTrue[Drop[list, {8}], list[[8]] < # &]]
 ]

but this is not the fastest way to do it in Mathematica. This might be perform better:

{{eigth}, rest} = TakeDrop[Transpose[data], {8}];
sel = Total /@ Transpose@UnitStep[ConstantArray[eigth, Length[rest]] - rest];
res2 = Pick[data, sel, 0];

The result is the same:

res1 == res2

True

C. E.
  • 70,533
  • 6
  • 140
  • 264
  • You may be interested in the tuned version of your second method that I posted below. (+1 of course, as if I need to say it) – Mr.Wizard Apr 05 '17 at 10:12
  • @Mr.Wizard Nice, +1 – C. E. Apr 05 '17 at 11:15
  • @C.E. What can be done if we want to exclude certain elements from the less than condition? For e.g- I want every element to be smaller than 8th element except 2nd one. 2nd element can have any value. – solphy101 Apr 07 '17 at 04:41
  • @solphy Then you use Drop again on # and rest respectively. – C. E. Apr 07 '17 at 07:47
7

A straightforward solution (and correction of J. M.'s comment code):

SeedRandom[0]
data = RandomInteger[9, {100, 10}];

Select[data, #[[8]] < Min @ Drop[#, {8}] &]
{{3, 9, 4, 7, 2, 1, 2, 0, 7, 8},
 {9, 5, 9, 9, 3, 6, 6, 1, 3, 3},
 {3, 5, 2, 2, 6, 8, 9, 1, 3, 7},
 {9, 6, 7, 8, 8, 7, 9, 1, 7, 6}}

This is twice as fast as C. E.'s AllTrue code:

data = RandomInteger[9, {10000, 15}];

Select[data, Function[{list}, AllTrue[Drop[list, {8}], list[[8]] < # &]]] // 
  Length // RepeatedTiming

Select[data, #[[8]] < Min @ Drop[#, {8}]] // Length // RepeatedTiming
{0.045, 273}

{0.021, 273}

It is still and order of magnitude behind his Pick method however. Here is a tuned version of that code that can be more than twice as fast.

Now faster and cleaner after reading LLlAMnYP's answer and recognizing a simplification.

select[data_, n_] := (
   Subtract[data[[All, n]], data]
     // UnitStep
     // Total[#, {2}] &
     // Pick[data, #, 1] &
 )

SeedRandom[0]
data = RandomInteger[9, {1*^6, 15}];

select[data, 8] // Length // RepeatedTiming

(* his code *)  // Length // RepeatedTiming
{0.141, 28205}

{0.302, 28205}

Methods include explicit Subtract; reference:

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
4

Even though there is already an accepted answer, the problem at hand lends itself well to a compiled approach for performance gains.

compiledSelect =
 Compile[{{a, _Integer, 2}},
  Total[Transpose[UnitStep[-a + a[[1 ;; -1, 8]]]]],
  CompilationTarget -> "C", Parallelization -> True, "RuntimeOptions" -> "Speed"]
selectLL[data_] := Pick[data, compiledSelect[data], 1]

Comparing this to Mr.Wizard's best solution:

data = RandomInteger[9, {1*^6, 15}];
selectLL[data] == select[data, 8]
selectLL[data] // Length // RepeatedTiming
select[data, 8] // Length // RepeatedTiming
True
{0.212, 27950}
{0.220, 27950}

It has a very marginal edge of a few percent in speed. In essence, this is a sort of refactoring of Mr.Wizard's code, minimizing the necessary manipulations, but only Compile lets it be faster.

EDIT
After carefully considering Mr.Wizard's reference I included an explicit Subtract as well:

compiledSelect2 =
 Compile[{{a, _Integer, 2}},
  Total[Transpose[UnitStep[Subtract[a[[1 ;; -1, 8]], a]]]]
  , CompilationTarget -> "C", Parallelization -> True, 
  "RuntimeOptions" -> "Speed"]
selectLL2[data_] := Pick[data, compiledSelect2[data], 1]

Now including performance tests for Mr.Wizard's simplified function (which I call select2 here). I also leave the original function (see his edit history) for comparison purposes.

data = RandomInteger[9, {1*^6, 15}];
selectLL[data] == select[data, 8] == select2[data, 8] == selectLL2[data]
True

Benchmarking with repeatedly new data:

(Table[data = RandomInteger[9, {1*^6, 15}];
 {selectLL[data] // Length // RepeatedTiming // First,
  selectLL2[data] // Length // RepeatedTiming // First,
  select[data, 8] // Length // RepeatedTiming // First,
  select2[data, 8] // Length // RepeatedTiming // First}, {10}]
    // Transpose
    // Map[Append[#, Mean@#] &]
    // Prepend[#, Range[10]~Join~{"Avg."}] & 
    // Transpose 
    // Join[{{"N.", "selectLL", "selectLL2", "select", "select2"}}, #] &
    // Grid

enter image description here

All the functions used in our routines are certainly implemented in low-level code where Compile can hardly give much of an edge. As we see, a compiled //Transpose//Total loses out to the uncompiled Total[..., {2}].

A quick shot at "improving" (maybe in performance, certainly not in readability) Mr.W's code by removing all explicit Functions:

select3[data_, n_] :=
 Pick[data, Total[Subtract[data[[All, n]], data] // UnitStep, {2}], 1]
Table[data = RandomInteger[9, {1*^6, 15}]; 
 select3[data, 8] // Length // RepeatedTiming // First, {10}]
{0.195, 0.194, 0.194, 0.195, 0.194, 0.194, 0.194, 0.195, 0.195, 0.194}

Very marginally better, probably not statistically significant.

TODO:
Were the input transposed, could the compiled function be more efficient?

After some tests, it doesn't look that way.

EDIT:
I managed to find a fully compiled version that performs on par with the other solutions. Still not as fast as select2 though.

compiledSelect3 = Compile[{{a, _Integer, 2}},
  a[[
    Flatten@
      Position[
        Total[
          Transpose[
            UnitStep[Subtract[a[[All, 8]], a]]
          ]
        ], 
        1
      ]
    ]]
  , CompilationTarget -> "C", Parallelization -> True, 
  "RuntimeOptions" -> "Speed"]

Head-to-head with select2:

Table[data = RandomInteger[9, {1*^6, 15}];
  (compiledSelect3[data] // RepeatedTiming // First) -
    (select2[data, 8] // RepeatedTiming // First), {10}]
Mean@%
{0.003, 0.004, 0.008, 0.006, 0.006, 0.006, 0.005, 0.006, 0.006, 0.*10^-3}
0.006

3% slower. Close, but no cigar.

LLlAMnYP
  • 11,486
  • 26
  • 65
  • From this answer I see that all that faff with Drop and Table I used was counterproductive. I shall update my answer with a simpler form. Thank you, and +1 for inspiration. (Please consider updating your timings after my edit.) – Mr.Wizard Apr 05 '17 at 19:59
  • @Mr.W I was really surprised that your approach was actually slowed down when I tried to simplify it to that what I tried. I'll be researching that tomorrow and see what happens to the timings. – LLlAMnYP Apr 05 '17 at 20:10
  • Did you use an explicit (verbatim) Subtract in what you tried? – Mr.Wizard Apr 05 '17 at 20:11
  • @Mr.W No, your answer seemed conceptually similar to mine, so I just copied my part of the code verbatim; unfortunately I didn't have time to dig deeper. – LLlAMnYP Apr 05 '17 at 20:16
  • @Mr.Wizard added the benchmarks, our answers seem to converge to each other. As I mentioned, your previous code had a lot of overhead and still it was competitive. I think there is something to be borrowed from there, and the reference you edited out. – LLlAMnYP Apr 06 '17 at 07:10
  • Thanks. I'm not sure I understand I think there is something to be borrowed from there, and the reference you edited out. Do you feel that I removed something from my answer I should not have? Should I restore the intermediate version and append the last version as select2? – Mr.Wizard Apr 06 '17 at 07:23
0

Another solution

Pick[data, Negative[data[[All, 8]] - Min /@ Drop[data, 0, {8}]]]
sakra
  • 5,120
  • 21
  • 33