12

Given a (large) rectangular array, e.g. (small for example case):

Column[{test, test // MatrixForm}, Left, 2]

(*
{{1, 3, 0, 3, 0}, {1, 3, 0, 3, 3}, {0, 2, 3, 1, 3}, {2, 0, 0, 0, 0},
 {2, 0, 2, 3, 1}, {0, 0, 0, 0, 0}, {0, 0, 1, 0, 0}}
*)

enter image description here

I need to remove any leading/trailing zeroes from rows, dropping row if all zeroes, e.g. with above the result s/b

(* {{1, 3, 0, 3}, {1, 3, 0, 3, 3}, {2, 3, 1, 3}, {2}, {2, 0, 2, 3, 1}, {1}} *)

I'm using

trim0 = With[{t = DeleteCases[#, ConstantArray[0, Length@#[[1]]]]}, 
    MapIndexed[t[[First@#2, #1[[1]] ;; #1[[-1]]]] &, SparseArray[t]["AdjacencyLists"]]] &;

Might there be a faster method?

N.B.: Currently, I'm working with arrays of non-negative integers, so if that can be used to speed things (above is general)...

ciao
  • 25,774
  • 2
  • 58
  • 139
  • Uuuuuhhh. your code is already lightening fast, after 4 hours of working out possible alternatives, I must admit that the penguins cannot keep up with the speed of your method, +1 for your question.... – penguin77 Apr 19 '15 at 00:20
  • @penguin77: Appreciate comment & vote - it's one of those "gut feeling" things (usually right) that there's a better way, but perhaps I've got blinders on from things I've tried, hence the question (hoping the Daniel L/Mr. Wizard/Simon/kguler.. et. al - the performance kings) might have some flash of insight. I could always code up a chunk of C, but this is an interesting MMA puzzle to me. Thanks again! – ciao Apr 19 '15 at 05:25
  • I have noticed that your solution requires significant amount of memory for an array with dimensions larger than {10x10^6, 5}. This may cause a slow down, depending on the machine spec. I have worked out a solution that performs within a few percentage at the same speed for rows of size 5 and far less memory requirements. Depending on hardware, memory consumption of algorithm may have significant impact. I thought that's not worthwhile posting my answer (because only within a few percent of speed of your proposal), however if you wish than let me know and I can post it. – penguin77 Apr 19 '15 at 23:28
  • @penguin77: Ram not an issue (1/2 TB on W/S), but always interested in ideas. I do have a way with small (<=20 or so) rows with small-ish values that's twice as fast, but it has those limitations. – ciao Apr 20 '15 at 05:03

2 Answers2

14
idtz = Internal`DeleteTrailingZeros; 
trim0a = With[{t = Reverse/@ idtz/@DeleteCases[#, ConstantArray[0, Length@#[[1]]]]}, 
               Reverse /@ idtz /@ t] &;

trim0b = With[{t = DeleteCases[#, ConstantArray[0, Length@#[[1]]]]}, 
              Fold[idtz@#2@# &, #, {Reverse, Reverse}] & /@ t] &;

Equal @@ (#@test & /@ {trim0, trim0a, trim0b})
(* True *)

Some timings:

n = 5000; m = 5000; k = 1;
testmat = RandomChoice[{k, 1, 1, 1} -> Range[0, 3], {n, m}];
rslt = {0, 0, 0}; i = 1;
First[AbsoluteTiming[rslt[[i++]] = #@testmat ;]] & /@ {trim0, trim0a,     trim0b}
(* {1.900441, 1.351605, 0.965583} *)
Equal @@ rslt
(* True *)

n = 5000; m = 5000; k = 5;
(* {1.265648, 1.555662, 1.140635} *)
n = 1000000; m = 10; k = 1;
(* {6.119162, 4.407390, 5.105766} *)
n = 1000000; m = 10; k = 5;
(* {5.770310, 4.532018, 5.306773} *)
kglr
  • 394,356
  • 18
  • 477
  • 896
  • Another interesting Internal you've revealed to me. Thanks. Added 50 bounty as thanks for past tips / interactions (hope system auto-rewards it after time-out). Ciao. – ciao Apr 23 '15 at 21:26
  • @ciao, thank you for the accept and kind words. Cool new psuedonym btw:) – kglr Apr 23 '15 at 21:47
  • @ciao and kugler, not cool, he means it! :) – Kuba Apr 24 '15 at 08:16
  • Now I think I have a comparable method that is faster, if you care to look. – Mr.Wizard May 14 '15 at 07:17
3

Edit: revised to perform the correct operation this time!


In many cases this code appears to be faster than either your own SparseArray formulation or kguler's Internal`DeleteTrailingZeros methods:

unPad =
  MinMax /@ SparseArray[#]["AdjacencyLists"] /. pos_ :>
    Take @@@ Pick[{#, pos}\[Transpose], UnitStep @ pos[[All, 1]], 1] &;

A few performance points (10.1.0 under Windows):

fns = {trim0, trim0a, trim0b, unPad};

test = RandomInteger[1, {50000, 50}];
Table[First @ RepeatedTiming @ f[test], {f, fns}]
{0.220, 0.190, 0.222, 0.120}
test = RandomInteger[1, {50, 50000}];
Table[First @ RepeatedTiming @ f[test], {f, fns}]
{0.0533, 0.08662, 0.0667, 0.0507}
test = RandomChoice[{50, 1} -> {0, 1}, {4000, 4000}];
Table[First @ RepeatedTiming @ f[test], {f, fns}]
{0.288, 0.680, 0.575, 0.221}
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 1
    Umm, am I missing something? What about the leading zeroes? +1 in any case for interesting trailing alternative. – ciao May 13 '15 at 21:13
  • @ciao LOL -- no, I guess I was sleep-coding again. :-p I'll see what I can make of this later tonight. – Mr.Wizard May 14 '15 at 01:38
  • @ciao Function and timings updated; please check my work. – Mr.Wizard May 14 '15 at 07:17
  • Mr.W, i thought i had tried almost all combinations of SparseArray UnitStep, Pick:) Upvoted already. Btw, a variation pF[p_]:= Take @@@ Pick[{#, p}\[Transpose], UnitStep @ p[[All, 1]], 1]&; unPadb = With[{pos= MinMax /@ SparseArray[#]["AdjacencyLists"]}, pF[pos][#] ]&; has similar speed. – kglr May 14 '15 at 10:56