7

In this answer I needed to remove contiguous zero-valued cols and rows from a matrix, leaving only two of them in place, no matter what the original number was.

I made up this code:

m = RandomVariate[BinomialDistribution[1, 10^-3], {400, 400}]; 
rule = {h__, {0 ..}, w : {0 ..}, {0 ..}, t__} -> {h, w, w, t};
mClean = Transpose[Transpose[m //. rule] //. rule];
Dimensions@mClean

But it is way too slow.
I'm pretty sure this code can be enhanced. Any better ideas?

Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453

4 Answers4

7

Linked lists - based solution

The real reason for the slowdown seems to be the same as usual for ReplaceRepeated - multiple copying of large arrays. I can offer a solution which would still be rule-based, but uses linked lists to avoid the mentioned slowdown. Here are auxiliary functions:

zeroVectorQ[x_] := VectorQ[x, IntegerQ] && Total[Unitize[x]] == 0;

toLinkedList[l_List] := Fold[ll[#2, #1] &, ll[], Reverse[l]]

ClearAll[rzvecs];
rzvecs[mat_List] :=  rzvecs[ll[First@#, ll[]], Last@#] &@toLinkedList[mat];

rzvecs[accum_, rest : (ll[] | ll[_, ll[_, ll[]]])] := 
   List @@ Flatten[ll[accum, rest], Infinity, ll];

rzvecs[accum_, ll[head_?zeroVectorQ, ll[_?zeroVectorQ, tail : ll[_?zeroVectorQ, Except[ll[]]]]]] :=
   rzvecs[accum, ll[head, tail]];

rzvecs[accum_, ll[head_?zeroVectorQ, ll[_?zeroVectorQ, tail_]]] :=
   rzvecs[ll[ll[accum, head], head], tail];

rzvecs[accum_, ll[head_, tail_]] := rzvecs[ll[accum, head], tail];

Now the main function:

removeZeroVectors[mat_] := Nest[Transpose[rzvecs[#]] &, mat, 2]

Benchmarks

Now the benchmarks:

m = RandomVariate[BinomialDistribution[1, 10^-3], {600, 600}];
(res = removeZeroVectors[m]); // AbsoluteTiming
(res1 = Transpose[Transpose[m //. rule] //. rule]); // AbsoluteTiming
res == res1

(*
    {0.046875, Null}
    {3.715820, Null}
    True
*)

Remarks

I have been promoting the uses of linked lists for some time now. In my opinion, in Mathematica they allow one to stay of the higher level of abstraction while achieving very decent (for the top-level code) performance. They also allow one to avoid many non-obvious performance-tuning tricks which take time to come up with, and even more time to understand for others. The algorithms expressed with linked lists are usually rather straight-forward and can be directly read off from the code.

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • +1, grrreat. I went on the smartless, easily compilable, procedulal road. I really like this one – Rojo Feb 05 '13 at 16:31
  • @Rojo Thanks. I tried to keep the solution essentially rule-based, that's why I went down this road. – Leonid Shifrin Feb 05 '13 at 16:35
  • Should the following return {{0, 0}, {0, 0}, {0, 1}} ? m = {{0, 0}, {0, 0}, {0, 0}, {0, 1}}; – halmir Feb 05 '13 at 17:55
  • @halmir No, it should not - the first and last columns are always kept and do not participate in the comparison. In a sense, the OP's original rule serves as a very clear spec. – Leonid Shifrin Feb 05 '13 at 18:02
  • @LeonidShifrin I see. But OP's description sounds like it should. Probably, OP's rule need to be fixed like the following:rule = {h___, {0 ..}, w : {0 ..}, {0 ..}, t___} -> {h, w, w, t}; – halmir Feb 05 '13 at 18:08
  • @halmir I would rather take the code as a spec and make the description less ambiguous in this respect - presumably the OP looked at the results of his code and was satisfied with it, just not its speed. – Leonid Shifrin Feb 05 '13 at 18:18
  • @halmir Leonid is right, the spec is in my code. – Dr. belisarius Apr 08 '13 at 03:55
7

It's not often I get to say this, but this is faster than Leonid's!

Clear@rZeroVecs
rZeroVecs[mat_, n_] := (Take[#, Min[n, Length@#]] & /@ Split[mat]) ~Flatten~ 1

Here n is the number of consecutive zero columns you wish to keep (sort of a generalized version of the question). Use it as:

m2 = Nest[rZeroVecs[Transpose@#, 2] &, m, 2];
rm -rf
  • 88,781
  • 21
  • 293
  • 472
  • I wasn't going to produce the fastest solution - otherwise I'd just use Compile. I produced the solution which still uses rules and is two orders of magnitude faster than the original. – Leonid Shifrin Feb 05 '13 at 16:32
  • ...What's the n? – Rojo Feb 05 '13 at 16:33
  • @Rojo The number of consecutive zero rows/columns you want to keep (I generalized the objective in the question). I forgot to add in the function call... will do now. – rm -rf Feb 05 '13 at 16:33
  • @LeonidShifrin Yes, yes, I know... but, I'll gloat while I still can :D – rm -rf Feb 05 '13 at 16:34
  • How is z used? – Leonid Shifrin Feb 05 '13 at 16:36
  • You can't any more. This only works because the sample matrix rarely has more than 2 non-zero consecutive rows. But it can be fixed – Rojo Feb 05 '13 at 16:37
  • @LeonidShifrin Good catch! That was left behind from an earlier version which used z as a pattern. It's not needed now – rm -rf Feb 05 '13 at 16:40
  • Ok, I see. This is a nice code, +1. – Leonid Shifrin Feb 05 '13 at 16:40
  • Actually, your code produce different results from mine or @belisarius's, for some benchmarks - your matrices end up smaller. – Leonid Shifrin Feb 05 '13 at 16:45
  • @LeonidShifrin, my last comment here is feeling ignored – Rojo Feb 05 '13 at 16:47
  • @Rojo Sorry, I saw it but did not connect to this. – Leonid Shifrin Feb 05 '13 at 16:48
  • @LeonidShifrin Do you have an example of a test matrix where this gives smaller results, so that I can test it? – rm -rf Feb 05 '13 at 16:51
  • @rm-rf my last before last comment is feeling ignored :P – Rojo Feb 05 '13 at 16:52
  • Help yourself: m = RandomVariate[BinomialDistribution[1, 10^-3], {300, 300}] While[removeZeroVectors[m] == Transpose[Transpose[m //. rule] //. rule] == Nest[rZeroVecs[Transpose@#, 2] &, m, 2], m = RandomVariate[BinomialDistribution[1, 10^-3], {300, 300}]]. – Leonid Shifrin Feb 05 '13 at 16:53
  • Try m={{1}, {1}, {1}} – Rojo Feb 05 '13 at 16:53
  • To fix it, you could do something like this: removeRowZeros[mat_, n_] :=

    Block[{z = ConstantArray[0, Last@Dimensions@mat], remove}, remove[v : {z ..}] := Take[v, Min[n, Length@v]]; remove[z_] := z; (remove /@ Split[mat])~Flatten~1];removeAllZeros[mat_, n_: 2] := Nest[Transpose[removeRowZeros[#, n]] &, mat, 2]

    – halmir Feb 05 '13 at 17:54
  • @halmir Thanks, that's how I actually had it at first, but then removed it because the above "worked" and was slightly faster (I didn't think of the cases Rojo pointed out). Now I see that my original version would've given the correct result in all cases. I'll fix this answer later today when I have some time. – rm -rf Feb 05 '13 at 17:57
6

My contribution:

cleanrows[m_] := 
 Delete[m, 1 + Position[ListConvolve[{1, 1, 1}, Total[Unitize@m, {2}], {3, 1}], 0]]

m2 = Transpose @ cleanrows @ Transpose @ cleanrows @ m;
Simon Woods
  • 84,945
  • 8
  • 175
  • 324
5
cleanAllProc[m_] := Transpose@cleanRowsProc@Transpose@cleanRowsProc@m

cleanRowsProc[m_] :=
 With[{ceroRow = ConstantArray[0, Last@Dimensions@m]},
  Module[{counter = 0, tag},
   Reap[
     Scan[
      (If[# === ceroRow , If[++counter > 2, Continue[Null, Scan]], 
         counter = 0]; Sow[#, tag]) &, m],
     tag
     ][[-1, 1]]
   ]]
Rojo
  • 42,601
  • 7
  • 96
  • 188