7

I have a list of pairs of numbers and I want to group these pairs in such a way that in each group, the second element of each item is equal to the first element of the next one. For example, consider this list:

list = {{-1, -2}, {-1, 1.4}, {-2, -2.24}, {-3, -4}, {-5, -6}, {-5, -5.08}, {-6, 1},
{1, 1.4}, {1, 2}, {-2.24, -3.16}, {-3.16, -3.6}, {-4.12, -5.08}, {-4.12, -4.48},
{2.24, 2.84}, {2.24, 3.16}, {2.84, 3.6}, {-3.6, -4.48}, {-5.4, -6.32}, {-5.4, -5.84},
{-6.32, -6.72}, {3.16, 3.6}, {4.24, -5}, {-5.84, -6.72}};

which can be divided into 12 groups like this:

{{ {-1, -2}, {-2, -2.24}, {-2.24, -3.16}, {-3.16, -3.6}, {-3.6, -4.48} },
 { {-1, 1.4} },
 { {-3, -4} },
 { {-5, -6}, {-6, 1}, {1, 1.4} },
 { {4.24, -5}, {-5, -5.08} },
 { {1, 2} },
 { {-4.12, -5.08} },
 { {-4.12, -4.48} },
 { {2.24, 2.84}, {2.84, 3.6} },
 { {2.24, 3.16}, {3.16, 3.6} },
 { {-5.4, -6.32}, {-6.32, -6.72} },
 { {-5.4, -5.84}, {-5.84, -6.72} }}

My current approach involves looping through all elements and removing each one from the main list after Appending it to its designated group, which is a highly inefficient and dirty code. Considering that the list might become too big, is there a fast and efficient way to do this in Mathematica?

polfosol
  • 952
  • 5
  • 20

4 Answers4

6

Maybe you can take advantage of Graph functionality:

findPaths[l_] := With[{g = Graph[DirectedEdge @@@ l]},
    Flatten[
        Outer[
            FindPath[g, ##, Infinity, All]&, 
            Pick[VertexList[g], VertexInDegree[g], 0],
            Pick[VertexList[g], VertexOutDegree[g], 0]
        ],
        2
    ]
]

Then:

findPaths[list]

{{-1, 1.4}, {-1, -2, -2.24, -3.16, -3.6, -4.48}, {-3, -4}, {-4.12, -5.08}, {-4.12, -4.48}, {2.24, 3.16, 3.6}, {2.24, 2.84, 3.6}, {-5.4, -5.84, -6.72}, {-5.4, -6.32, -6.72}, {4.24, -5, -6, 1, 1.4}, {4.24, -5, -5.08}, {4.24, -5, -6, 1, 2}}

Carl Woll
  • 130,679
  • 6
  • 243
  • 355
4

Another method, which may or may not qualify as "dirty code":

Most@NestWhileList[
    Function[i, 
     SelectFirst[
      list, (i[[2]] == #[[1]] &)]], #, (! MissingQ[#] &)] & /@ list

which gives:

{{-1,-2},{-2,-2.24},{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-1,1.4}}
{{-2,-2.24},{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-3,-4}}
{{-5,-6},{-6,1},{1,1.4}}
{{-5,-5.08}}
{{-6,1},{1,1.4}}
{{1,1.4}}
{{1,2}}
{{-2.24,-3.16},{-3.16,-3.6},{-3.6,-4.48}}
{{-3.16,-3.6},{-3.6,-4.48}}
{{-4.12,-5.08}}
{{-4.12,-4.48}}
{{2.24,2.84},{2.84,3.6}}
{{2.24,3.16},{3.16,3.6}}
{{2.84,3.6}}
{{-3.6,-4.48}}
{{-5.4,-6.32},{-6.32,-6.72}}
{{-5.4,-5.84},{-5.84,-6.72}}
{{-6.32,-6.72}}
{{3.16,3.6}}
{{4.24,-5},{-5,-6},{-6,1},{1,1.4}}
{{-5.84,-6.72}}
chuy
  • 11,205
  • 28
  • 48
4

For some speed-up for large input list, instead of using FindPath on all pairs of source and sink nodes in graph g, we can take pairs {v1,v2} where v1 is a source vertex in graph g and v2 is a sink node which is a descendant of v1:

ClearAll[allPaths]
allPaths[l_] := Module[{g = Graph[DirectedEdge @@@ l], sources, sinks, pairs},
  sources = GraphComputation`SourceVertexList @ g;
  sinks = GraphComputation`SinkVertexList @ g;
  pairs = Join @@ 
     (Thread[{#, Intersection[sinks, VertexOutComponent[g, #]]}] & /@ sources);
  Join @@ (FindPath[g, ##, ∞, All] & @@@ pairs)]

Examples:

Sort[allPaths[list]] // Column

enter image description here

SeedRandom[1]
list2 = DeleteCases[{a_, a_}] @ DeleteDuplicatesBy[Sort] @ RandomInteger[1000, {500, 2}];

allPaths[list2]; // RepeatedTiming // First

0.015

versus findPaths from Carl Woll's answer:

findPaths[list2]; // RepeatedTiming // First
0.53
Sort @ allPaths[list2] == Sort @ findPaths[list2]
True
kglr
  • 394,356
  • 18
  • 477
  • 896
1

I did a benchmark to compare my own (improved) method with the ones proposed by kglr and Carl Woll. My initial thought was to treat this problem in a procedural, old-fashioned way. Here is the code:

Edited later: Using some simple tricks, I managed to double-up the speed. These lines are from the last edition:

ClearAll["Global`*"];
merge = Which[# == {} || #2 == {}, 0,
    Last@# == #2[[1]], {{}, #~Join~Rest@#2},
    #[[1]] == Last@#2, {{}, #2~Join~Rest@#},
    True, 0] &;

pathsFind[li_List] := Module[{a = li, q}, Do[ Do[If[ListQ[q = merge @@ a[[{i, j}]]], a[[{i, j}]] = q], {j, 2, Length@a}, {i, 1, j - 1}]; If[FreeQ[a, {}], Break[], a = a~DeleteCases~{}], Length@a]; a];

Nothing fancy. It just loops through the elements multiple times until all paths are found. This code compares allPaths by kglr, findPaths by Carl Woll and pathsFind:

Module[{li, p, rand},
 li = DeleteCases[{a_, a_}]@DeleteDuplicatesBy[Sort]@RandomInteger[1000, {500, 2}];
 First /@ {
   RepeatedTiming[p = findPaths@li;],
   {Length@p},
   RepeatedTiming[p = allPaths@li;],
   {Length@p},
   RepeatedTiming[p = pathsFind@li;],
   {Length@p}
   }]

The result is {0.298115, 362, 0.0115792, 362, 0.907272, 332}. Obviously, allPaths is faster by an order of magnitude. But it seems both faster methods are not so accurate and miss some paths along the way, since the number of groups in the last method is less than the other two. Regarding kglr's comment, I still think lower number of groups means higher accuracy and can't understand what is wrong with my code aside from being slow.

By the way, I appreciate all the efforts and nice ideas proposed here, but I think it's better to stick to my own method for now.

polfosol
  • 952
  • 5
  • 20
  • re " both faster methods are not so accurate and miss some paths", looks like the opposite is true: consider l1 = {{1, 2}, {2, 3}, {4, 2}}; {findPaths@l1, allPaths@l1, pathsFind@l1}. Also check the paths in Complement[allPaths @ li, pathsFind @ li](these are legitimate paths in li), and the paths in Complement[pathsFind @ li, allPaths@ li] (these can be extended to longer paths). – kglr Dec 11 '21 at 22:30