3

Suppose, I have a list-

   data = {{1, 2, 3}, {3, 5, 0}, {8, 9, 3}, {2, 5, 0}};

I want to delete second and fourth sublists for which third element of the row is zero. I can do this using

   DeleteCases[data,{_,_,0}];

How can the same operation be achieved efficiently if the list contains a large number of elements instead of just 3?

Ali Hashmi
  • 8,950
  • 4
  • 22
  • 42
solphy101
  • 337
  • 2
  • 12

2 Answers2

10

If your data is packed, then the fastest will probably be

Pick[data, Unitize@data[[All, m]], 1]

where m is the position where you don't allow a 0. If it is the last element in a row, then you can use -1 for m.

Marius Ladegård Meyer
  • 6,805
  • 1
  • 17
  • 26
0

In:

Clear[unitize, pick, n, data]
RandomSeed[1];
n = -1;
data = RandomChoice[Range[0, 10], {10^8, 3}];

AbsoluteTiming[Pick[data, Unitize@data[[All, n]], 1] // Length]

unitize[x_] := unitize[x] = Unitize[x]
pick[xs_, sel_, patt_] := pick[xs] = Pick[xs, sel, patt]
AbsoluteTiming[pick[data, unitize@data[[All, n]], 1] // Length]

Out:

{7.3081, 90913401}
{5.87919, 90913401}
webcpu
  • 3,182
  • 12
  • 17
  • 1
    -1, this doesn't answer the question. The solution would be fine if instead of AnyTrue you had nthTrue, but nthTrue is not a built-in. – LLlAMnYP May 04 '17 at 09:50
  • Fair enough. I optimised it base on Marius's method. – webcpu May 04 '17 at 12:49
  • un-downvoted, but this is essentially identical to the accepted answer + some very odd memoization technique, which I'm not sure what is achieving here. You know, your original approach would have easily been doable as Select[data, #[[n]] != 0 &]. As an aside, I see you using memoization in many of your recent answers, why do you utilize it so heavily? – LLlAMnYP May 04 '17 at 13:57
  • 1
    In Functional Programming, we call it Pure Function, It's NOT Pure Function in Wolfram Language. Pure Functions in Wolfram Language is just anonymous functions. When I know that a function has no side effect, such as File System, Network, I implement it as a Pure Function(Functional Programming term). – webcpu May 04 '17 at 14:08
  • The pure function always evaluates the same result value given the same argument value(s). The function result value cannot depend on any hidden information or state that may change while program execution proceeds or between different executions of the program, nor can it depend on any external input from I/O devices (usually—see below). Evaluation of the result does not cause any semantically observable side effect or output, such as mutation of mutable objects or output to I/O devices. https://en.wikipedia.org/wiki/Pure_function – webcpu May 04 '17 at 14:08
  • f[x] := f[x] = (* stuff *) mutates the definition of f. This has nothing to do with pure functions, this is called memoization. But in your code saying pick[xs_, sel_, patt_] := pick[xs] = Pick[xs, sel, patt] does literally nothing, but add an extra level of complexity. With every call you update teh value of pick[xs], but it is never called! – LLlAMnYP May 04 '17 at 14:21
  • Besides that, this unitize[x_] := unitize[x] = Unitize[x] is simply harmful, unless we know that unitize will be called many times with exactly the same argument. Now the kernel is wasting time comparing an input to a possibly large list stored in the DownValues instead of just getting on with Unitizeing. – LLlAMnYP May 04 '17 at 14:24
  • The optimised method is faster. The execution time of unoptimised method is 7.3081, the optimised one's execution time is 5.87919. – webcpu May 04 '17 at 14:25
  • I have seen two AbsoluteTimings in a row return one result, then insert the next immediately before the first. My run of your code had your method almost twice a slow as the built-ins. Split the timings into two input cells and run them separately. – LLlAMnYP May 04 '17 at 14:31
  • This is what I see on my machine: https://i.stack.imgur.com/qPaGP.png – LLlAMnYP May 04 '17 at 14:54
  • I tested it on my another machine. The optimised one is still faster. https://i.stack.imgur.com/X3d4k.png – webcpu May 04 '17 at 15:23
  • Could this be related to caching? Perhaps throwing in a ClearSystemCache[] would restore the expected behavior. If not, this is worth a separate question. – LLlAMnYP May 04 '17 at 15:31
  • Function Clear is enough. Anyway, I added ClearSystemCache[] and tested it again. https://i.stack.imgur.com/yOP8h.png – webcpu May 04 '17 at 15:38
  • I honestly have no idea, why this happens. Your application of ClearSystemCache[] is not what I had in mind, I thought, perhaps, something was being cached after the first call to Pick, so I tried to put the ClearSystemCache between Pick and pick. No effect. My machine at work has, apparently, less cores and memory than yours, so I tried smaller inputs (between 10^7 and 5*10^7 rows). pick was very slightly, but consistently faster, much to my surprise. However today I cannot reproduce this reliably; (see next comment) – LLlAMnYP May 05 '17 at 06:44
  • with this code: $HistoryLength = 0; Table[ Clear[pick, unitize, data]; unitize[x_] := unitize[x] = Unitize[x]; pick[xs_, sel_, patt_] := pick[xs] = Pick[xs, sel, patt]; data = RandomChoice[Range[0, 10], {i*10^7, 3}]; {Pick[data, Unitize@data[[All, -1]], 1]; // AbsoluteTiming // First, pick[data, unitize@data[[All, -1]], 1]; // AbsoluteTiming // First}, {i, 5}] I now get {{0.466588, 0.465504}, {0.924702, 0.929265}, {1.3835, 1.40883}, {1.8459, 1.90577}, {2.30805, 2.44937}} i.e. Pick is consistently faster. – LLlAMnYP May 05 '17 at 06:47
  • Well. now the behavior is back. Please drop by the thread here – LLlAMnYP May 05 '17 at 07:39