8

The function LongestCommonSequence finds a longest common subsequence between 2 lists. Apparently, this built-in function does not accept more than 2 arguments. How can I find a longest common subsequence between 3 or more lists using Mathematica? Or, better yet, all longest common subsequences?


In a response to a "close as a duplicate" vote: This is not a duplicate of Longest common substring for multiple strings? beacuse that question is concerned with substrings (contiguous subsequences), but my question is concerned with arbitrary (not necessarily contiguous) subsequences.

Vladimir Reshetnikov
  • 7,213
  • 27
  • 75
  • Related: mathematica.stackexchange.com/a/114987/9490 – Jason B. May 13 '16 at 21:32
  • actually this answer http://mathematica.stackexchange.com/a/114987/2079 works by first converting strings to list and so contains exactly an answer to this question. – george2079 May 13 '16 at 21:40
  • 1
    @george2079 It is interesting, but seems to be slow as hell. Besides, I have a hunch that the complexity of even the best algorithm here would be proportional to the product of lists lengths, or something like that. So it is basically quadratic for 2 similarly sized lists, cubic for 3, and so on. Can't prove it though. – Leonid Shifrin May 13 '16 at 21:48
  • @george2079 No, this is not a duplicate, as I explained in an addendum to the question. – Vladimir Reshetnikov May 13 '16 at 21:54
  • Could you explain what do you mean by "not necessarily contiguous", or maybe give an example of what you want to achieve? – xslittlegrass May 13 '16 at 22:30
  • 1
    @xslittlegrass The Wikipedia page I linked provides detailed definitions an examples. Let $S$ be a string or a list. A subsequence of $S$ is obtained by removing zero or more elements of $S$ at arbitrary positions (e.g. "tea", "aha" and "etc" are subsequences of "Mathematica"). A substring of $S$ is a prefix of a suffix of $S$ (e.g. "them" is a substring of "Mathematica", because it is a prefix of its suffix "thematica"). Every substring is also a subsequence. – Vladimir Reshetnikov May 13 '16 at 22:46
  • 1
    you ask to "generalize" LongestCommonSequence to 3 or more arguments, but you also want to generalize to a different definition of sequence? Please give a clear definition of terms and an example to work with. – george2079 May 14 '16 at 04:06
  • Sorry, I don't see why my definition is different. Different from what? – Vladimir Reshetnikov May 14 '16 at 19:08

1 Answers1

8
ClearAll[fuzzyLCS];
fuzzyLCS[strings__List] :=
 Module[
  {subsets, aligned, intersections},
  subsets = Subsets[strings, {2, Length@strings}];
  aligned = 
   Select[SequenceAlignment[#[[1]], #[[2]]], StringQ[#] &] & /@ 
    subsets;
  intersections = 
   Intersection @@ (Subsets[#, {1, 
         Length@#}] & /@ (Flatten[Characters[#]] & /@ aligned));
  StringJoin[SortBy[intersections, Length] // Last]
  ]

fuzzyLCS[{"theano", "mathematica", "matea"}] // AbsoluteTiming

{0.000150089, "tea"}

Alexey Golyshev
  • 9,526
  • 2
  • 27
  • 57