15

Take a ragged 2D array:

rArray={{4, 2, 1, 0}, {2, 2}, {0, 0, 3, 3, 3}, {4, 0}, {3, 4}}

I need to "deduplicate" this, where a duplicate is any element of a sublist where that element exists at the same position in a prior sublist, e.g., on the above example, the result would be

{{4, 2, 1, 0}, {2}, {0, 0, 3, 3, 3}, {}, {3, 4}}

Targets are numeric (integer, in fact).

I'm using the following:

keepFirst[lst_List] := Module[{u = Unique[], t},
  t = Flatten[PadRight[lst, Automatic, u], {{2}, {1}}];
  t = Block[{f}, f[a_] := (f[a] = u; a); Map[f, #]] & /@ t;
  DeleteCases[Flatten[t, {{2}, {1}}], u, {2}]];

Does what I need, reasonably quick, wondering if there's perhaps a cleaner/more efficient means.

(N.b: Above fn is output from a function I use to generate these for differing dimensions/etc. - in the 2D case, Flatten[...,{{2},{1}}] can obviously simplify to the slightly faster Transpose@... with minor speed increase.)

m_goldberg
  • 107,779
  • 16
  • 103
  • 257
ciao
  • 25,774
  • 2
  • 58
  • 139

5 Answers5

11

Another solution using memoization:

keepFirst[lst_List] := Module[{isFirst},
  isFirst[elem_, col_] := (isFirst[elem, col] = False; True);
  Pick[lst, MapIndexed[isFirst[#1, Last[#2]] &, lst, {2}]]
  ]
sakra
  • 5,120
  • 21
  • 33
  • Nice idea- slightly slower in my tests using RandomInteger[100000, #] & /@ RandomInteger[{1, 10}, 100000]. +1 – ciao Aug 09 '15 at 22:14
  • Accepted, fits the "cleaner" part of the OP, quite elegant... – ciao Aug 10 '15 at 23:21
10

This is faster for the given array:

keepFirstK7[lst_List] := 
 Delete[lst, 
  Flatten[(GatherBy[Position[lst, #], Last] & /@ 
      DeleteDuplicates[Flatten@lst])[[All, All, 2 ;;]], 2]]

keepFirstK7@rArray

$\ $ {{4, 2, 1, 0}, {2}, {0, 0, 3, 3, 3}, {}, {3, 4}}


Another approach

keepFirstK[lst_List] := Block[{sowFirst},
  sowFirst[int_, count_] := (sowFirst[int, count] = int; Sow[int]);
  Flatten /@ 
   Part[Reap[Block[{cnt = 1}, sowFirst[#, cnt++] & /@ #]] & /@ lst, 
    All, 2]]

By using Flatten[..., 1] instead of Flatten /@ ... one would get rid of the empty lists.


Or simpler:

keepFirstN[lst_List] := Block[{nftn},
  nftn[int_, count_] := (nftn[int, count] = Nothing; int);
  Block[{cnt = 1}, nftn[#, cnt++] & /@ #] & /@ lst]
Karsten7
  • 27,448
  • 5
  • 73
  • 134
  • Try with RandomInteger[100000, #] & /@ RandomInteger[{1, 10}, 100000]... much slower. +1 though. – ciao Aug 09 '15 at 22:14
  • @ciao I added another approach, that is faster for arrays like RandomInteger[10, RandomInteger[1000]] &~Array~100000 – Karsten7 Aug 10 '15 at 01:39
  • Mobile - will check on return... – ciao Aug 10 '15 at 02:20
  • 2
    I really like keepFirstN, even though it only works in V10.2 or later. But it can made simpler yet: keepFirstN[lst_List] := Block[{nftn}, nftn[val_, indx_] := (nftn[val, indx] = Nothing; val); MapIndexed[nftn[#1, #2] &, #] & /@ lst] – m_goldberg Aug 11 '15 at 00:37
  • @m_goldberg Your keepFirstN is more beautiful than mine, but unfortunately also slower. – Karsten7 Aug 11 '15 at 08:10
  • 1
    No surprise that it is slower. MapIndex is known to be slow. I only brought it to your attention because of its simplicity. – m_goldberg Aug 11 '15 at 13:12
2

sakra made good use of MapIndexed, but Pick is superfluous if we use Sequence directly, as done in the first method in Delete duplicate elements from a list

coldupe[lst_List] :=
  Module[{f},
    mem : f[elem_, _] := (mem = Sequence[]; elem);
    MapIndexed[f[#, Last[#2]] &, lst, {2}]
  ]

coldupe[rArray]
{{4, 2, 1, 0}, {2}, {0, 0, 3, 3, 3}, {}, {3, 4}}

And now I see that m_goldberg posted basically this in a comment two years ago.

Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
2

Uninspiring but thought I'd play:

func[lst_] := 
 Module[{res = Thread[{#, Range[Length@#]}] & /@ lst, f = Unique[]},
  (f[#] = 1) & /@ Flatten[res, 1];
  Cases[#, {x_, 1} :> x] & /@ Map[{#[[1]], f[#]++} &, res, {2}]]

so func[rArray]

yields:

{{4, 2, 1, 0}, {2}, {0, 0, 3, 3, 3}, {}, {3, 4}}

ubpdqn
  • 60,617
  • 3
  • 59
  • 148
1

If one has some prior knowledge about the shape (maxL) and values (nVal) of the possible arrays, a compiled function can be used.

maxL = Max[Length /@ rArray]
nVal = Max@Flatten[rArray] + 1

keepFirstKC = 
 With[{hA = ConstantArray[0, {nVal + 1, maxL}], nothing = nVal + 1}, 
  Compile[{{lst, _Integer, 2}},
   Block[{testA = hA},
     Block[{cnt = 1}, 
        If[testA[[#, cnt]] == 0, testA[[#, cnt++]] = 1; #, 
           nothing] & /@ #] & /@ (lst + 1)] - 1, 
   CompilationTarget -> "C"
   ]]

keepFirstKF[lst_] := 
 DeleteCases[keepFirstKC@PadRight[lst, Automatic, nVal], nVal, {2}]

Depending on how speed is measured, using

deleteCases = With[{dc = nVal},
  Compile[{{intList, _Integer, 1}},
   DeleteCases[intList, dc], CompilationTarget -> "C"]
  ]

keepFirstKF[lst_] := 
 deleteCases /@ keepFirstKC[PadRight[lst, Automatic, nVal]]

can be faster.

Karsten7
  • 27,448
  • 5
  • 73
  • 134