Group a list of pairs based on the path they make

Question

I have a list of pairs of numbers and I want to group these pairs in such a way that in each group, the second element of each item is equal to the first element of the next one. For example, consider this list:

list = {{-1, -2}, {-1, 1.4}, {-2, -2.24}, {-3, -4}, {-5, -6}, {-5, -5.08}, {-6, 1},
{1, 1.4}, {1, 2}, {-2.24, -3.16}, {-3.16, -3.6}, {-4.12, -5.08}, {-4.12, -4.48},
{2.24, 2.84}, {2.24, 3.16}, {2.84, 3.6}, {-3.6, -4.48}, {-5.4, -6.32}, {-5.4, -5.84},
{-6.32, -6.72}, {3.16, 3.6}, {4.24, -5}, {-5.84, -6.72}};

which can be divided into 12 groups like this:

{{ {-1, -2}, {-2, -2.24}, {-2.24, -3.16}, {-3.16, -3.6}, {-3.6, -4.48} },
 { {-1, 1.4} },
 { {-3, -4} },
 { {-5, -6}, {-6, 1}, {1, 1.4} },
 { {4.24, -5}, {-5, -5.08} },
 { {1, 2} },
 { {-4.12, -5.08} },
 { {-4.12, -4.48} },
 { {2.24, 2.84}, {2.84, 3.6} },
 { {2.24, 3.16}, {3.16, 3.6} },
 { {-5.4, -6.32}, {-6.32, -6.72} },
 { {-5.4, -5.84}, {-5.84, -6.72} }}

My current approach involves looping through all elements and removing each one from the main list after Appending it to its designated group, which is a highly inefficient and dirty code. Considering that the list might become too big, is there a fast and efficient way to do this in Mathematica?

Why isn't {{4.24, -5}, {-5, -5.08}} and {{4.24, -5}, {-5, -6}, {-6, 1}, {1, 1.4}} in your list? — Carl Woll, Dec 10 '21 at 18:42
Maybe this is an answer to your question: https://mathematica.stackexchange.com/a/258537 — Henrik Schumacher, Dec 11 '21 at 07:33

Carl Woll · Answer 1 · 2021-12-10T19:22:32.597

6

Maybe you can take advantage of Graph functionality:

findPaths[l_] := With[{g = Graph[DirectedEdge @@@ l]},
    Flatten[
        Outer[
            FindPath[g, ##, Infinity, All]&, 
            Pick[VertexList[g], VertexInDegree[g], 0],
            Pick[VertexList[g], VertexOutDegree[g], 0]
        ],
        2
    ]
]

Then:

findPaths[list]

{{-1, 1.4}, {-1, -2, -2.24, -3.16, -3.6, -4.48}, {-3, -4}, {-4.12, -5.08}, {-4.12, -4.48}, {2.24, 3.16, 3.6}, {2.24, 2.84, 3.6}, {-5.4, -5.84, -6.72}, {-5.4, -6.32, -6.72}, {4.24, -5, -6, 1, 1.4}, {4.24, -5, -5.08}, {4.24, -5, -6, 1, 2}}

edited Dec 10 '21 at 19:22

answered Dec 10 '21 at 19:00

Carl Woll

130,679
6
243
355

This is more or less what I am looking for. But there is one group missing: {{2.24, 3.16}, {3.16, 3.6}} – polfosol Dec 10 '21 at 19:08
@polfosol Fixed – Carl Woll Dec 10 '21 at 19:22
Thanks. I am not trying to be difficult here, but why does {1, 1.4} has a separate group while being included in another one? Edit: got it. my mistake – polfosol Dec 10 '21 at 19:30

score 4 · Answer 2 · answered Dec 10 '21 at 19:16

4

Another method, which may or may not qualify as "dirty code":

Most@NestWhileList[
    Function[i, 
     SelectFirst[
      list, (i[[2]] == #[[1]] &)]], #, (! MissingQ[#] &)] & /@ list

which gives:

{{-1,-2},{-2,-2.24},{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-1,1.4}}
{{-2,-2.24},{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-3,-4}}
{{-5,-6},{-6,1},{1,1.4}}
{{-5,-5.08}}
{{-6,1},{1,1.4}}
{{1,1.4}}
{{1,2}}
{{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-3.16,-3.6},{-3.6,-4.48}}
{{-4.12,-5.08}}
{{-4.12,-4.48}}
{{2.24,2.84},{2.84,3.6}}
{{2.24,3.16},{3.16,3.6}}
{{2.84,3.6}}
{{-3.6,-4.48}}
{{-5.4,-6.32},{-6.32,-6.72}}
{{-5.4,-5.84},{-5.84,-6.72}}
{{-6.32,-6.72}}
{{3.16,3.6}}
{{4.24,-5},{-5,-6},{-6,1},{1,1.4}}
{{-5.84,-6.72}}

answered Dec 10 '21 at 19:16

chuy

11,205
28
48

Why so many groups?! – polfosol Dec 10 '21 at 19:18
2

If you don't want the duplicate subpaths you can use: DeleteDuplicates[paths, ContainsAll] where paths is the result above. – chuy Dec 10 '21 at 19:24
This approach certainly doesn't increase efficiency for big lists – polfosol Dec 10 '21 at 19:40
Yeah probably not. – chuy Dec 10 '21 at 19:43

kglr · Answer 3 · 2021-12-10T21:45:26.473

For some speed-up for large input list, instead of using FindPath on all pairs of source and sink nodes in graph g, we can take pairs {v1,v2} where v1 is a source vertex in graph g and v2 is a sink node which is a descendant of v1:

ClearAll[allPaths]
allPaths[l_] := Module[{g = Graph[DirectedEdge @@@ l], sources, sinks, pairs},
  sources = GraphComputation`SourceVertexList @ g;
  sinks = GraphComputation`SinkVertexList @ g;
  pairs = Join @@ 
     (Thread[{#, Intersection[sinks, VertexOutComponent[g, #]]}] & /@ sources);
  Join @@ (FindPath[g, ##, ∞, All] & @@@ pairs)]

Examples:

Sort[allPaths[list]] // Column

SeedRandom[1]
list2 = DeleteCases[{a_, a_}] @ DeleteDuplicatesBy[Sort] @ RandomInteger[1000, {500, 2}];
allPaths[list2]; // RepeatedTiming // First

0.015

versus findPaths from Carl Woll's answer:

findPaths[list2]; // RepeatedTiming // First

0.53

Sort @ allPaths[list2] == Sort @ findPaths[list2]

True

polfosol · Answer 4 · 2021-12-12T11:41:43.880

I did a benchmark to compare my own (improved) method with the ones proposed by kglr and Carl Woll. My initial thought was to treat this problem in a procedural, old-fashioned way. Here is the code:

_{Edited later: Using some simple tricks, I managed to double-up the speed. These lines are from the last edition:}

ClearAll["Global`*"];
merge = Which[# == {} || #2 == {}, 0,
    Last@# == #2[[1]], {{}, #~Join~Rest@#2},
    #[[1]] == Last@#2, {{}, #2~Join~Rest@#},
    True, 0] &;
pathsFind[li_List] := Module[{a = li, q},
   Do[
    Do[If[ListQ[q = merge @@ a[[{i, j}]]], a[[{i, j}]] = q],
     {j, 2, Length@a}, {i, 1, j - 1}];
    If[FreeQ[a, {}], Break[], a = a~DeleteCases~{}],
    Length@a];
   a];

Nothing fancy. It just loops through the elements multiple times until all paths are found. This code compares allPaths by kglr, findPaths by Carl Woll and pathsFind:

Module[{li, p, rand},
 li = DeleteCases[{a_, a_}]@DeleteDuplicatesBy[Sort]@RandomInteger[1000, {500, 2}];
 First /@ {
   RepeatedTiming[p = findPaths@li;],
   {Length@p},
   RepeatedTiming[p = allPaths@li;],
   {Length@p},
   RepeatedTiming[p = pathsFind@li;],
   {Length@p}
   }]

The result is {0.298115, 362, 0.0115792, 362, 0.907272, 332}. Obviously, allPaths is faster by an order of magnitude. But it seems both faster methods are not so accurate and miss some paths along the way, since the number of groups in the last method is less than the other two. Regarding kglr's comment, I still think lower number of groups means higher accuracy and can't understand what is wrong with my code aside from being slow.

By the way, I appreciate all the efforts and nice ideas proposed here, but I think it's better to stick to my own method for now.

re " both faster methods are not so accurate and miss some paths", looks like the opposite is true: consider l1 = {{1, 2}, {2, 3}, {4, 2}}; {findPaths@l1, allPaths@l1, pathsFind@l1}. Also check the paths in Complement[allPaths @ li, pathsFind @ li](these are legitimate paths in li), and the paths in Complement[pathsFind @ li, allPaths@ li] (these can be extended to longer paths). — kglr, Dec 11 '21 at 22:30

Group a list of pairs based on the path they make

4 Answers4