intersection between two 2D arrays with labeled data is slow

Question

Here is how to Generate data

the function is below with two images to generate data

img1

img2

segmentImage[binarizedMask_?ImageQ] := 
Module[{seg, areas, indexMaxarea, maxArea},
seg = MorphologicalComponents@*ColorNegate@Dilation[binarizedMask, 1];
areas = ComponentMeasurements[seg, "Area"];
{indexMaxarea, maxArea} = First@MaximalBy[areas, Last] /. Rule -> List;
If[maxArea > 20000, ArrayComponents[seg, Length@areas, indexMaxarea -> 0], seg] ~
Dilation~1(* 20000 is just a threshold *)
];

generating data

{seg1, seg2} = segmentImage /@ {img1, img2};

Now the question

I have two 2-D arrays with dimensions {651,823}. Each array has labelled components i.e. at each position the value is either zero or an integer label. I have 51 objects in the first seg1 and 50 objects in the second array seg2.

I wish to calculate the number of positions where a given object from the first array intersects the objects in the second array.

I have concocted this strategy but it seems terribly slow:

(pos = Table[
 Length@Intersection[Position[seg1, i], Position[seg2, j]], {i, 
  51}, {j, 50}];) // AbsoluteTiming
(* {143.607, Null} *)

Since these arrays need to be modified and consequently the intersections need to recalculated in an iteration later, I find that this is not the best way to do it.

Any ideas of speeding the process up will be welcomed.

the output i have looks something like this (each row represents an integer label in seg1 and each column represents a label in seg2 and the values represent the number of positions the labelled objects in both arrays intersect):

Note: In the image attached the values 267 and 564 in the first line delineates that the object (label 1) in seg1 intersects an object labeled 5 and 7 in seg2 at 267 and 564 positions respectively

In the image attached the value 267 in the first line delineates that the object (label 1) in seg1 intersects an object labeled 5 in seg2 at 267 positions — Ali Hashmi, May 21 '17 at 22:36
So are the two 2d arrays seg1 and seg2? In your pos printout, if each column reflects a single integer label and each row also reflects a unique integer label, how can you have two nonzero values in the first column, 204 and 118? — MikeY, May 21 '17 at 22:37
@MikeY sorry i read your comment late. The thing is the labeled data in seg1 is 1,2,3....51 and seg2 is labeled with 1,2,3....50 so the entries being non-zero means that the 4th and 6th labels in seg1 intersect with the label 1 in seg2 — Ali Hashmi, May 21 '17 at 22:56
maybe SparseArray[SparseArray@Normal[Rest@Counts[Flatten[(1 - Unitize[seg1 - #]) seg2]]]&/@Range[51]]// Normal? — kglr, May 21 '17 at 23:24
@kglr let me add the raw images and the function from which the arrays are created. because currently your suggestion is giving a few errors. — Ali Hashmi, May 21 '17 at 23:28
@kglr please see the question with the input images and the only function needed to generate the two arrays — Ali Hashmi, May 21 '17 at 23:35
@Ali, could you try Normal[SparseArray[SparseArray[#, {50}]&@DeleteCases[Normal[Counts[Flatten[(1 - Unitize[seg1 - #]) seg2]]],0->_]&/@Range[51]]]? — kglr, May 21 '17 at 23:39
So by intersect, you mean the corresponding positions in seg1 and seg2 are both nonzero, but they need not be the same? — MikeY, May 21 '17 at 23:47
@kglr can you post this as an answer. It works and is blazingly fast :) — Ali Hashmi, May 21 '17 at 23:48
@MikeY the labels can be any number in both the arrays. If the same position in both arrays is occupied then that means an intersection. We are computing the number of positions where objects i = 1,2,...51 in seg1 intersect with objects j = 1,2,3...50 in seg2 — Ali Hashmi, May 21 '17 at 23:57

score 9 · Answer 1 · edited May 22 '17 at 00:00

9

AbsoluteTiming[
 counts = Normal[SparseArray[SparseArray[#, {50}] &@ DeleteCases[
         Normal[Counts[Flatten[(1 - Unitize[seg1 - #]) seg2]]], 0 -> _] & /@ Range[51]]];]

{0.30279, Null}

(pos = Table[Length@Intersection[Position[seg1, i], Position[seg2, j]], {i, 
        51}, {j, 50}];) // AbsoluteTiming

{122.747, Null}

counts == pos

True

edited May 22 '17 at 00:00

Ali Hashmi

8,950
4
22
42

answered May 21 '17 at 23:56

kglr

394,356
18
477
896

this is brilliant stuff ! – Ali Hashmi May 21 '17 at 23:57
2

Thank you @Ali. You might want to wait a while before accepting as questions with accepted answers are less likely to attract attention. – kglr May 22 '17 at 00:28
i wish i could go back and double like this one :) – Ali Hashmi May 22 '17 at 11:30

Carl Woll · Accepted Answer · 2017-05-22T14:33:43.150

7

Something like the following would be pretty fast:

intersections[r1_, r2_] := Module[{t},
  t = DeleteCases[Tally@Flatten[Unitize[r1 r2] (100 r1 + r2)], {0, _}];
  t[[All, 1]] = IntegerDigits[t[[All, 1]], 100];
  Normal[SparseArray[Rule @@@ t]]
]

This assumes that the labels are all less than 100. For the OP example:

res=intersections[seg1, seg2]; //AbsoluteTiming
res[[;;5, ;;5]]

{0.006349, Null}

{{0, 0, 0, 0, 267}, {0, 0, 6888, 0, 0}, {0, 6511, 0, 0, 0}, {204, 0, 0, 7249, 70}, {0, 0, 0, 110, 3971}}

edited May 22 '17 at 14:33

answered May 22 '17 at 01:42

Carl Woll

130,679
6
243
355

@AliHashmi It works fine for me. Perhaps you need to restart the kernel or clear some variables? – Carl Woll May 22 '17 at 14:35
your method works like a charm !! plus one'd it already :) – Ali Hashmi May 22 '17 at 16:27
@AliHashmi IMHO you really should change the Accept to this post. I somehow overlooked it completely before starting on an answer myself, and it seems to be far and away the best method provided. – Mr.Wizard May 22 '17 at 16:32
@CarlWoll there is a slight problem with your approach. Your method here gives a dimension of {50,50} whereas it should have been {51,50} – Ali Hashmi Jan 22 '18 at 11:11
1

@AliHasmi The dimensions are {50, 50} because there is no intersection of the 51 from seg1 with any of the integers from seg2. This could be fixed by adding a dimension specification to the SparseArray code. That is, using SparseArray[Rule @@@ t, {Max[r1], Max[r2]}] instead of SparseArray[Rule @@@ t]. – Carl Woll Jan 22 '18 at 19:19
@CarlWoll thanks checked your comment late. Btw can your approach be modified to work with gaps in labels. lets say the first mask had labels such as {1,2,3 ... 35, 39,40,42} (36, 37, 38 and 41 are not there). This is only an example. The gaps can exist anywhere. The second mask has regular labeling scheme with no gaps {1,2,3.. n} where n is the maximum value. – Ali Hashmi Apr 05 '18 at 16:15
@CarlWoll i modified your function as: overlapMatrix[segPrev_, segCurr_] := Module[{m,p1,p2,val,rules1,rules2}, {p1,p2} = Values@ComponentMeasurements[segPrev,"Label"]&/@{segPrev,segCurr}; val = Max[Max@p1,Max@p2]+1; m = DeleteCases[Tally@Flatten[Unitize[segPrev segCurr] (val segPrev + segCurr)], {0, _}]; m[[All, 1]] = IntegerDigits[m[[All, 1]], val]; m =SortBy[m,First]; {rules1,rules2}=Thread[#-> Range@Length@#]&/@{p1,p2}; m = MapAt[ind\[Function]ind/.rules1,m,{All,1,1}]; m = MapAt[ind\[Function]ind/.rules2,m,{All,1,2}]; Normal@SparseArray[Rule@@@m,{Length@p1,Length@p2}] ]; – Ali Hashmi Apr 05 '18 at 16:16
@CarlWoll this passes in some cases (when i use different masks) and fails in other cases. I am not sure where i am going wrong to compute the overlaps. – Ali Hashmi Apr 05 '18 at 16:17

score 5 · Answer 3 · answered May 22 '17 at 01:29

5

A simpler solution using PositionIndex

a1 = Flatten[seg1] // PositionIndex;
a2 = Flatten[seg2] // PositionIndex;
a3 = Table[Length@Intersection[a1[i], a2[j]], {i, 51}, {j, 50}]; // AbsoluteTiming

{0.930523, Null}

counts == pos == a3

True

Can't beat kglr's solution, though. It takes 0.41 s.

answered May 22 '17 at 01:29

Anjan Kumar

4,979
1
15
28

nice work with PositionIndex. +1 – Ali Hashmi May 22 '17 at 11:08

Mr.Wizard · Answer 4 · 2017-05-22T15:47:12.353

Pick is optimized to work on packed arrays. Let's try it. As a self-contained function:

fn[x_, y_] :=
 Module[{a, b, m, n, toRow},
   {m, n} = Max /@ {x, y};
   {a, b} = Flatten /@ {x, y};
   toRow = SparseArray[Normal @ KeyDrop[0] @ Counts @ #, {n}] &;
   Array[toRow @ Pick[b, a, #] &, m]
 ]

Output matches kglr's:

img1 = Import["https://i.stack.imgur.com/oKpri.png"];
img2 = Import["https://i.stack.imgur.com/yzvfv.png"];

{seg1, seg2} = segmentImage /@ {img1, img2};

fn[seg1, seg2] == counts  (* True *)

Performance is superior by more than an order of magnitude:

fn[seg1, seg2] // RepeatedTiming // First

Normal[SparseArray[
    SparseArray[#, {50}] &@
       DeleteCases[Normal[Counts[Flatten[(1 - Unitize[seg1 - #]) seg2]]], 
        0 -> _] & /@ Range[51]]] // RepeatedTiming // First

0.026
0.488

The code above still scans the data fifty times. That's usually a bad idea. With more segments this might become an issue. As an attempt to avoid the problem I'll use Szabolcs's function from How to efficiently find positions of duplicates? (as pos) as a substitute for Pick:

fn2[x_, y_] :=
  Module[{a, b, n, pos, toRow},
    n = Max[y];
    {a, b} = Flatten /@ {x, y};
    pos[lst_] := GatherBy[Range @ Length @ lst, lst[[#]] &];
    toRow = SparseArray[Normal @ KeyDrop[0] @ Counts @ #, {n}] &;
    toRow @ b[[#]] & /@ Rest @ pos @ a
  ]

Test:

fn[seg1, seg2] == fn2[seg1, seg2]

fn2[seg1, seg2] // RepeatedTiming // First

True
0.024

Only a slight edge over fn here but I expect it to pull ahead as the number of segments rises.

This is still about 3-4 times slower than my method. Also, I would pull the definition of pos outside of fn2, as there is no reason to define pos every time you run fn2. — Carl Woll, May 22 '17 at 16:21
@CarlWoll That's weird, I completely overlooked your post. Sorry! I agree about externalizing pos but I wanted self-contained functions for the sake of this post. — Mr.Wizard, May 22 '17 at 16:29

score 4 · Answer 5 · edited May 22 '17 at 16:13

4

OK, as a learning exercise in associations I worked through this, and ended up with a faster result than the @kglr one. Count me surprised!

RepeatedTiming[
  cr = {seg1 // Flatten, seg2 // Flatten} // Transpose // Counts; 
  sa = SparseArray[ KeySelect[cr, FreeQ[0]] // Normal] // Normal;
]

(* out[11]={0.153, Null} *)

@kglr's

AbsoluteTiming[counts = Normal[SparseArray[SparseArray[#, {50}] &@DeleteCases[
   Normal[Counts[Flatten[(1 - Unitize[seg1 - #]) seg2]]], 0 -> _] & /@ Range[51]]];]

(* out[11]={0.478285, Null} *)

The matrices match (minus the fact that I did not pad the sa matrix... the last row is all zeros)

sa == Counts // Most
(* true *)

edited May 22 '17 at 16:13

Mr.Wizard

271,378
34
587
1,371

answered May 22 '17 at 14:31

MikeY

7,153
18
27

Nice method. I believe I can make your code cleaner while retaining the same performance and output by using KeySelect. May I edit your post? – Mr.Wizard May 22 '17 at 16:09
You bet, it was my first foray into Key[ ] type functions. I tried to use KeySelect[ ], but struggled with keys that are lists. – MikeY May 22 '17 at 16:11
Please check my edit. – Mr.Wizard May 22 '17 at 16:13
Nice simplification, thanks! – MikeY May 22 '17 at 16:15
@MikeY +1 for your fast approach – Ali Hashmi May 22 '17 at 16:31

Ali Hashmi · Answer 6 · 2018-04-06T08:20:50.613

This approach works for the posted question as well as for cases when there are gaps in labels. What i mean by gaps is that the first or the second segmented mask may have a structure as follows: {1,2,3 ... 10,13,14,20} (11,12 as well as 16-19 are the missing labels).

overlapMatrix[seg1_,seg2_]:=Block[{keys1,keys2,mask,map,rules1,rules2},
keys1 = Keys@ComponentMeasurements[seg1, "Label"];
keys2 = Keys@ComponentMeasurements[seg2, "Label"];
mask= Unitize[seg1*seg2];
map = Normal@Counts@Thread[{SparseArray[mask*seg1]["NonzeroValues"],SparseArray[mask*seg2]["NonzeroValues"]}];
{rules1,rules2}= Dispatch@Thread[# -> Range@Length@#]&/@{keys1,keys2};
map[[All,1,1]]=map[[All,1,1]]/.rules1;
map[[All,1,2]]=map[[All,1,2]]/.rules2;
Normal@SparseArray[map,{Length@keys1,Length@keys2}]
];

Applying it over the above images

overlapMatrix[seg1,seg2]

(* {0.167852, Null} *)

score 0 · Answer 7 · 2017-05-24T07:03:57.350

0

Transpose accepts a second parameter, which you can use to shuffle the values around a bit. I have used it to merge the value of the two arrays together in a list.

Normal[SparseArray[
   DeleteCases[Normal[
       Counts[Flatten[Transpose[{seg1, seg2}, {3, 2, 1}], {1, 2}]]],
       Rule[{0,0}|{0,_}|{_,0}, _]
   ], 
   {Max[seg1], Max[seg2]} 
]]

edited May 24 '17 at 07:03

answered May 23 '17 at 21:28

intersection between two 2D arrays with labeled data is slow

7 Answers7