36

I'm looking for robust code to solve the "Longest Common Substring" problem:

Find the longest string (or strings) that is a substring (or are substrings) of two or more strings.

I can just code it up from that description, but I'd thought I'd ask here, first, in case someone knows of an implementation either distributed with Mathematica or available from an open source. I found a hint here that a solution might be part of the (huge) Combinatorica package, but a quick search of the documentation did not disclose it.

Adrian
  • 411
  • 4
  • 14
Reb.Cabin
  • 8,661
  • 1
  • 34
  • 62
  • 4
    You've seen LongestCommonSubsequence[]? – J. M.'s missing motivation May 28 '12 at 16:51
  • @J.M. Beat me by 10 secs :D – Dr. belisarius May 28 '12 at 16:52
  • As far as I know, LongestCommonSubsequence only returns the first hit. That might not be robust enough as there could be multiple distinct and different hits. – István Zachar May 28 '12 at 17:14
  • 3
    @J.M. -- I searched the MMA documentation for "Longest Common Substring" and got tutorial/StringPatterns, tutorial/WorkingWithStringPatterns, and guide/SummaryOfNewFeaturesIn60. I didn't search for "Longest Common Subsequence," which would have found LongestCommonSubsequence[], so thanks for the lesson :) – Reb.Cabin May 28 '12 at 20:20
  • amusingly enough, the wikipedia article on the longest common substring problem says that "it must not be confused with the longest common subsequence problem." IOW, wikipedia will send you to the dark corner of the MMA docs where you will have a hard time finding the function LongestCommonSubsequence, which is, in fact, a solution to Wikipedia's dijoint "longest common substring" problem. – Reb.Cabin May 29 '12 at 23:01

4 Answers4

48

Preamble and motivation

While I am much late to the party here, I hope this answer will not be totally useless. This is a first in a series of posts where I will advocate a wider use of Java in our workflow, and present/describe certain toolset to reduce the mental overhead of this. So, my motivation here is not to provide a faster or more elegant solution, but to show that often, we can mindlessly reuse existing (found on the web or elsewhere) Java code, and the process can be made easy and painless.

Simplistic Java reloader

Here I will present a simplistic Java class reloader, which takes a string of Java code, attempts to compile it, and, upon success, load the resulting class into Mathematica via JLink. Note that I only so far tested it on Windows, but hopefully soon will test on other platforms as well (edit by Jacob: There is also a working OSX version, see this comment below).

Note that it is not my intention to present and describe a full workflow involving the reloader, in this post - I will save this for a future one. Here, I just present the code and an example of how it is useful for the case at hand.

Code

BeginPackage["SimpleJavaReloader`", {"JLink`"}];

JCompileLoad::usage = 
"JCompileLoad[javacode_,addToClassPath_] attempts to compile a Java \
class defined by  a string javacode, optionally adding to Java compiler classpath \
files and folders from addToClassPath, and load the resulting class into 
Mathematica";

Begin["`Private`"]

JCompileLoad::dirmakeerr = "Can not create directory `1`";

$stateValid = True;

$tempJavaDirectory =  FileNameJoin[{$UserBaseDirectory, "Temp", "Java"}];
$tempClassDirectory = FileNameJoin[{$tempJavaDirectory, "Classes"}];
$tempJavaLogDirectory = FileNameJoin[{$tempJavaDirectory, "Logs"}];
$tempCompileLogFile =   FileNameJoin[{$tempJavaLogDirectory, "javac.log"}];
$jrePath =   
     FileNameJoin[{$InstallationDirectory, "SystemFiles", "Java", $SystemID}];
$javaPath = FileNameJoin[{$jrePath, "bin"}];
$jlibPath = FileNameJoin[{$jrePath, "lib"}];
$classPath = {$tempClassDirectory, $jlibPath};


Scan[
   If[! FileExistsQ[#] && CreateDirectory[#] === $Failed,
      Message[JCompileLoad::dirmakeerr, #];
      $stateValid = False
   ] &,
   {
     $tempJavaDirectory,
     $tempClassDirectory,
     $tempJavaLogDirectory
   }];



(* determine a short name of the class from code *)
Clear[getClass];
getClass[classCode_String] :=
  With[{cl =
     StringCases[classCode, 
       "public" ~~ Whitespace ~~ "class"|"interface" ~~ Whitespace ~~ 
         name : (WordCharacter ..) :> name
     ]},
    First@cl /; cl =!= {}];

getClass[__] := Throw[$Failed, error[getClass]];

(* Determine the name of the package for the class *) 
Clear[getPackage];
getPackage[classCode_String] :=
  With[{pk = 
      StringCases[classCode, 
          ShortestMatch["package" ~~ Whitespace ~~ p__ ~~ ";"] :> p
      ]},
    First@pk /; pk =!= {}];

getPackage[classCode_String] := None;

getPackage[__] := Throw[$Failed, error[getPackage]];


ClearAll[getFullClass];
getFullClass[classCode_String] :=
   StringJoin[If[# === None, "", # <> "."] &@
      getPackage[classCode], getClass[classCode]];

(* Note: So far, tested on Windows only. Some specifics of quoting are 
   tuned to work around some bugs in Windows command line processor *)
Clear[makeCompileScript];
makeCompileScript[sourceFile_String] :=
  StringJoin[
    "\"",
    "\"", FileNameJoin[{$javaPath, "javac"}] , "\"",
    " -g ", sourceFile,
    " -d ", $tempClassDirectory,
    " -classpath ", "\"", Sequence @@ Riffle[$classPath, ";"], "\"",
    " 2> ", $tempCompileLogFile,
    "\""
  ];

Clear[getSourceFile];
getSourceFile[javacode_String] :=
   FileNameJoin[{$tempClassDirectory, getClass[javacode] <> ".java"}];

Clear[JCompileLoad];

JCompileLoad::invst =  "The loader is not on valid state. Perhaps some temporary \
     directories do not exist";

JCompileLoad::cmperr = "The following compilation errors were encountered: `1`";

JCompileLoad[javacode_String, addToClassPath_: {}]/; $stateValid :=
  Module[{sourceFile, fullClassName = getFullClass[javacode]},
     sourceFile = getSourceFile[javacode];
     With[{script =
        Block[{$classPath = Join[$classPath, addToClassPath]},
           makeCompileScript[sourceFile]
        ]},
       Export[sourceFile, javacode, "String"];
       If[Run[script] =!= 0,
         Message[
            JCompileLoad::cmperr, 
            Style[#, Red] &@Import[$tempCompileLogFile, "String"]
         ];
         $Failed,
         (*else*)
         ReinstallJava[];
         AddToClassPath @@ Join[$classPath, addToClassPath];
         LoadJavaClass[fullClassName]
       ]
     ]
  ];

JCompileLoad[_String, addToClassPath_: {}] :=
  (
     Message[JCompileLoad::invst];
     $Failed
  )

End[]

EndPackage[]

Notes

Note that you can either put this into a separate package file, or simply copy and paste into the FrontEnd, and run from there, for a quick test.

The package works by saving the string with your Java code into a temporary file, and then invoking Java compiler which comes with the JRE bundled with Mathematica, to compile this class. The compiled class is stored in another temporary location, from where it is then loaded by JLink. In case if compilation errors were encountered, the message generated by Java compiler is printed, and $Failed is returned.

One important limitation is that ReinstallJava is called to recompile / reload any class.

The case at hand

We will now apply the above to our case. First, we need the solution for the longest common substring problem in Java.

Stealing code from the web

I won't stay noble and code that myself. The whole point is - why doing so if we can steal it from someone :)? I get the one from here (the second one). In the simplified workflow I am presenting now, we need a string of Java code, so we define:

jlcsCode = 

"import java.lang.*;

 public class LCS{  
    public static String longestCommonSubstring(String S1, String S2) {
         int Start = 0;
         int Max = 0;
         for (int i = 0; i < S1.length(); i++){
            for (int j = 0; j < S2.length(); j++){
               int x = 0;
               while (S1.charAt(i + x) == S2.charAt(j + x)){
                  x++;
                  if (((i + x) >= S1.length()) || ((j + x) >= S2.length())) 
                     break;
               }
               if (x > Max) {
                  Max = x;
                  Start = i;
               }
            }
         }
         return S1.substring(Start, (Start + Max));
    }
 }";

Compiling and running

First, we have to compile and load this code:

JCompileLoad[jlcsCode]
 JavaClass[LCS,<>]

We are now ready to use the function, no other preparation needed! For example:

LCS`longestCommonSubstring["AAABBBBCCCCC","CCCBBBAAABABA"]
  "AAAB"

Note that, since the function longestCommonSubstring is a static method of the LCS class (meaning that the method belongs to the class rather than specific instance of it), we have to use the syntax className`method[args], and we don't have to create a class instance with JavaNew to use this. The class itself has to be loaded prior to being used, by JCompileLoad does that for us.

Now, some benchmarks:

s = StringJoin@RandomChoice[{"A", "C", "T", "G"}, 10000];
t = StringJoin@RandomChoice[{"A", "C", "T", "G"}, 10000];

Here we use the Mathematica's built-in function:

LongestCommonSubsequence[s,t]//AbsoluteTiming
{0.3232421,TCCACACGGGTAG}

Now our function:

LCS`longestCommonSubstring[s,t]//AbsoluteTiming
{1.1269531,TCCACACGGGTAG}

We see that our function is about 4 times slower, but, given that the Mathematica's built-in function was written in C and heavily optimized, while I just picked the first code snippet on the web I found, I think that the pain/gain ratio is pretty good.

Conclusions

I tried here to make a case for using Java in our workflow more frequently. The good thing about Java is that, in contrast to MathLink/LibraryLink, the JLink interface brings us pretty much all the way there, so there is no preparation at all necessary. The Java class reloader I presented here is very simplistic, but it nevertheless "closes the circle", and now we can protptype everything exclusively from Mathematica. I will expand on this in some future posts, and illustrate the workflow more fully. Note that I don't consider the reloader as anything complete - this is rather a proof of concept at this point.

For the case at hand, it took me literally 5 minutes from start to finish to get this working (the Java reloader I already had), and that includes finding the code on the web, pasting to Mathematica, compiling and using it. Given that there are many cases when Mathematica built-in functions are not available, while Java implementations are ubiquitous, I think this option can significantly expand our possibilities. Of course, to use this one needs some knowledge of Java, but let this not put you off: the things about Java you really need to know for such cases can be picked up in a day or two, especially if you have any experience in C/C++ (but even if not).

Leonid Shifrin
  • 114,335
  • 15
  • 329
  • 420
  • 1
    Wouldn't that make perfect blog material? – celtschk Jun 03 '12 at 17:12
  • 1
    @celtschk Yes, I also thought about it, but I did not want to wait until we get a blog. I will in any case be able to refactor some of my posts later into a blog post. – Leonid Shifrin Jun 03 '12 at 17:20
  • 1
    @Leonid brilliant! Similar pattern could enable C# interop as easily, and I could imagine JavaScript interop via a service in Node.JS wouldn't be too far off, either. There is an incredible amount of JavaScript boilerplate out there, and much of it is easier on the mind of a Mathematica dev because JavaScript is fundamentally functional as opposed to fundamentally object-oriented like Java and C#. – Reb.Cabin Jun 04 '12 at 17:03
  • @Reb.Cabin Thanks! As to javascript, I did not try node yet (my wish list is too long :)), but I would rather code in Lisp, Scheme or Clojure (not that I know any of them well yet), and then use Clojurescript or Parenscript, which compile to js. The Lisp-family languages have true macros, and a lot of work has been put already into e.g. Clojurescript to compile into efficient js. So far, Mathematica is far behind in this respect, and I would only use it for a learning exercise to, e.g., "port" Clojurescript to Mathematica. – Leonid Shifrin Jun 04 '12 at 17:14
  • @Leonid you are right about using Lisp, Scheme, or Clojure for new code (i've been a Scheme aficionado since ZOMG 1980 or so). I am thinking though about slurping up all the wealth of free algo code that others have given away in JavaScript in exactly the opportunistic spirit you articulated ~"why stay noble and code it ourselves when we can borrow from others"~ and that's my gentle paraphrase :) – Reb.Cabin Jun 04 '12 at 17:41
  • One more thought: I have heard first-hand from Brendan Eich that Scheme was a primary inspiration for JavaScript -- he wanted Scheme proper but knew he had to have curly braces for the masses, then (by his own admission) he fell down the slippery slope of trying to make prototypes resemble classes. In any event, programming JavaScript isn't all that bad if you discipline yourself to stick with the "Scheme" subset. – Reb.Cabin Jun 04 '12 at 17:49
  • @Reb.Cabin I also heard similar things about js, although of course not first-hand :). I've actually written one fairly complex application in js, which is now a part of a production system (basically, a compiler in js which generates html coupled by closures to some js in-memory structures from the trading strategy language, about 9kloc). But I also agree with an opinion that js is an assembly for the web, and, after learning js more properly, I plan to switch to generating js by coding in something like Clojurescript, rather than code in plain js. My two cents :). – Leonid Shifrin Jun 04 '12 at 18:11
  • Wow, this is great way to experiment with Java interop. This get's me thinking: would a CLoader be useful as well? The string would get compiled using CCompilerDriver and loaded using DefineDLLFunction. – Ajasja Jun 21 '12 at 08:29
  • @Ajasja Yes, I have the C loader in my plans as well :). It is a little harder to make it easy to use, because JLink already generates boilerplate Mathematica code for you - this is why it is so easy to use it, because Java is a higher-level language, and because when we load as a shared library, crashes in our code would crash the kernel, but all these problems are solvable. – Leonid Shifrin Jun 21 '12 at 09:10
  • @LeonidShifrin I tried on Linux, I had to change makeCompileScript[sourceFile_String] := StringJoin[ FileNameJoin[{$javaPath, "javac"}] , " -g ", sourceFile, " -d ", $tempClassDirectory, " -classpath ", "\"", Sequence @@ Riffle[$classPath, ";"], "\"", " 2> ", $tempCompileLogFile ]; (too many "'s). Not sure if this can be useful. – b.gates.you.know.what Jun 24 '12 at 10:27
  • @b.gatessucks Thanks, very useful! I will try on Linux and update. The reason for extra quotes is a bug in Windows shell, which leads to wrong behavior for directories containing spaces. – Leonid Shifrin Jun 24 '12 at 15:25
  • @LeonidShifrin I think I made my statement about Windows ! Thanks very much for your work. – b.gates.you.know.what Jun 24 '12 at 15:51
  • @b.gatessucks Oh yes, you certainly did :) And without stating my expanded view, I will just say that I could not agree more. – Leonid Shifrin Jun 24 '12 at 15:52
  • Leonid, I tried using your reloader for my answer here. The workflow was 1) jcode= java code from my post 2) JCompileLoad[jcode, {FileNameJoin[{NotebookDirectory[],"mail.jar"}]}]. However, this kept giving me a file-not-found error. It does not say which file could not be found and the compilation errors simply says $Failed. Do you know where the problem could be? Does my code compile fine using JCompileLoad on your machine? – rm -rf Jun 27 '12 at 14:04
  • @R.M. Tried on my machine, worked fine as far as compilation is concerned (although, indeed, I found that I should have added a different folder to classpath - will fix now. So, I got it compiled fine once I put mail.jar in some directory and passed it as a second argument of JCompileLoad. Are you sure you had the jar in your NotebookDirectory[]? Also, I got an error when trying your unreadMail with my gmail account: Java::excptn: A Java exception occurred: java.lang.ArrayIndexOutOfBoundsException: 0 at FetchMail.UnreadMail(FetchMail.java:31).. I suspect that the last argument ... – Leonid Shifrin Jun 27 '12 at 21:35
  • @LeonidShifrin Ah, I didn't check to see if there actually are any unread messages... so will need an if there – rm -rf Jun 27 '12 at 21:38
  • @R.M. to unreadMail (from) is not an e-mail address but a name of the sender as it appears in the "from" page. In any case, the bug seems to be related to the case when no new mail was found - you probably get an empty array then, and should check for it before accessing the 0-th element. Also, you don't need JavaBlock, since you are calling a static method. But, this is not to detract from your nice solution, which has my upvote. Returning to the reloader, it would be nice if you make sure you had jar in place, and then, if it still fails, tell me your exact setup. – Leonid Shifrin Jun 27 '12 at 21:38
  • @LeonidShifrin The last argument is the email address, not the name. In the emails[i]=... line I get only the sender's email id, not name and then compare it to the from argument. Thanks for the hint on JavaBlock... this is my first program in java (spent a day to learn it and the API), so wasn't sure of how to use JLink – rm -rf Jun 27 '12 at 21:46
  • @R.M If this is your first Java program, you are a very quick learner. Somehow, when I gave as the last arg the email address of someone, as "someone@someplace.com", it did not find any new mail, even though I checked that I have a few unread messages from that person. But when I removed the filtering in your code, and ran it, it promptly checked a number of unread messages as read (until I stopped it), this is why I suspected that "from" field might be something else (no time now to read javadocs). I have to go now, but please do tell me your setup if JCompileLoad does not work for you. – Leonid Shifrin Jun 27 '12 at 21:53
  • Thanks Leonid, I'll try that again with some more users (it could possibly be due to someone not setting the name field, etc.). I'm trying your JCompileLoad again , starting with a clean slate and I'll write back here as to how it goes and the details of my setup – rm -rf Jun 27 '12 at 21:58
  • Leonid, I tried again to use JCompileLoad with a clean kernel and everything and it still gives me the same error. I've posted some details on my directory setup, version, etc. here since it's too long for a comment. If you want some more info, let me know what output you need. FWIW, I can't compile the LCS code in your answer here either, which doesn't need any modifications to the classpath. I wonder if there's something in my system that's making your reloader give errors. – rm -rf Jun 28 '12 at 14:20
  • @R.M I have the same. Tracked it down to a problem with escaped " in makeCompileScript, but still no clue in how to solve it – Dr. belisarius Aug 21 '12 at 03:37
  • @Verde, R.M. Ok guys, this is now on the top of my todo list. Will stop posting here until I get done with this :). – Leonid Shifrin Aug 21 '12 at 03:51
  • Leonid: I solved the problems in my machine. The first was due to a few escaped " missing in makeCompileScript, and the second was a problem in the classPath (probably unique to me). I am posting in the following comment my code for the compile script, just in case it could be useful for you or somebody else. Thanks for your effort! – Dr. belisarius Aug 21 '12 at 04:33
  • 1
    Clear[makeCompileScript]; makeCompileScript[sourceFile_String] := StringJoin["\"", "\"", FileNameJoin[{$javaPath, "javac"}], "\"", " -g ", "\"", sourceFile, "\"", " -d ", "\"", $tempClassDirectory, "\"", " -classpath ", "\"", Sequence @@ Riffle[$classPath, ";"], "\"", " 2> ", "\"", $tempCompileLogFile, "\"", "\"", "\""]; – Dr. belisarius Aug 21 '12 at 04:34
  • @Verde Thanks a bunch, I will use this. What is your platform - is it Mac OSX? – Leonid Shifrin Aug 21 '12 at 11:04
  • Nope, Leonid Win$XP. I also had to add "C:\Docu.. .\Temp\Java\Classes" to the classpath. But I guess that is Too Localized. – Dr. belisarius Aug 21 '12 at 11:23
  • @Verde Thanks for the info anyway. Will try to reporduce this when I get the time to look at it. – Leonid Shifrin Aug 21 '12 at 14:02
  • Has anyone got this working on Mac OS X 10.8.2? – M.R. Sep 27 '12 at 07:00
  • 1
    @M.R. I plan to soon test this on Linux and MacOSX, and make it work on both if there are problems. Meanwhile, please report all problems you find, this may help. – Leonid Shifrin Sep 27 '12 at 09:51
  • Waiting for Mac OS X too!.. Great post! Get my up. – Murta Oct 04 '12 at 02:02
  • I just tried this out for the first time and it is great! With Java 7 the stolen code is only twice as slow as LongestCommonSubsequence. :) There is a problem with the makeCompileScript step, though. The "shell bug" actually seems to be caused by the extra quotes added when your string gets converted into its InputForm by Run, and your extra extra quotes simply end up cancelling these out. Ideally I think one should avoid introducing the extra quotes to begin with. Also, each path element should be quoted individually or otherwise it still won't work with paths containing spaces. – Oleksandr R. Feb 02 '13 at 05:45
  • Sorry, just saw the comment of @belisarius above. I agree with it except that he has an unnecessary dangling quote at the end. As I said previously, though, the very first and very last quotes are in principle not necessary if one can avoid adding yet another set of quotes due to InputForm--fixing this would allow it to work on all platforms, I believe. One other very small thing: if you're not using Mathematica's Java, the hard-coded path will be wrong. I think the path to the JRE can be obtained from within Java itself, which may be better. – Oleksandr R. Feb 02 '13 at 06:05
  • @OleksandrR. Thanks for trying it out, and comments! Regarding the quotes, I do have the corrected version, just have to put it here. Also, I really need to put this on GitHub. The real reason why I did not do all this yet is that I was working on a general code-sharing system for us, and it is almost ready, but I had almost zero time to work on it during the last month. Hopefully I will finish it very soon. As to the path to Java, thanks - this is a good suggestion. The problem is that most JREs do not come with javac (compiler) - WRI's JRE is non-standard in this respect. – Leonid Shifrin Feb 02 '13 at 12:43
  • changing $jrePath = "/Developer/Tools" works for me on OSX 10.8.2, except the file paths for using the java reloader to read files weren't relative to Directory[]. – s0rce Mar 06 '13 at 21:16
  • @s0rce Thanks for this input. I really have to set the time to fix this thing on all platforms. Hope to do this very soon. – Leonid Shifrin Mar 06 '13 at 21:55
  • 3
    Mac version here: Import["https://gist.github.com/lshifr/7307845/raw/SimpleJavaReloader.m"]tks @LeonidShifrin – Murta Nov 07 '13 at 21:42
  • Is it possible to write a simplistic C# reloader via NETLink and how difficult will that be? Giving that the .NET framework is now open source and available on all platforms, I think this will be interesting to have. – RunnyKine Apr 23 '16 at 15:36
  • @RunnyKine I guess it should be possible, and not too hard. I don't know C#, but I might be interested in writing such a reloader. Will add that to my project list. Problem is, I have lots of things there already, so can't say when I get time to look into it. You could try yourself, and actually, I've read somewhere that in C#, you can access compiler via an API, rather than command line, so this can be even simpler. Basically, all you'll need is to find a C# analogue of Java class path, so that you could add compiled C# class(es) and they will then be loaded. – Leonid Shifrin Apr 23 '16 at 15:43
  • Thanks, I will look into it. If I'm successful, I'll share it with the community. – RunnyKine Apr 23 '16 at 15:47
  • @RunnyKine Sounds good! – Leonid Shifrin Apr 23 '16 at 15:50
  • I finally found some time to work on that C# reloader. I'm happy to report that I have it working on Windows, I'll post it as a self answer sometime before Wednesday. Maybe people can chime in and get it working on Mac and Linux, it shouldn't be too hard. – RunnyKine Jan 01 '17 at 14:06
  • Sounds cool! Thanks for the heads up, looking forward. – Leonid Shifrin Jan 01 '17 at 16:41
  • I have posted the code here., if you want to take a look. – RunnyKine Jan 02 '17 at 23:26
  • Thanks. I am not at the Windows machine at the moment, but will look this up as soon as I am on Windows. – Leonid Shifrin Jan 03 '17 at 12:05
28

Mathematica supports two related functions, LongestCommonSequence[] and LongestCommonSubsequence[]. The first one finds the longest (contiguous or non-contiguous) sequence common to the two strings given as arguments to it:

LongestCommonSequence["AAABBBBCCCCC", "CCCBBBAAABABA"]
"AAABB"

while the second function is constrained to give the longest contiguous sequence:

LongestCommonSubsequence["AAABBBBCCCCC", "CCCBBBAAABABA"]
"AAAB"

These functions became available only in version seven; if you need to do this in an earlier version, István's routine is useful.

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
20

Also, using pattern matching,just in case:

{{a, b, c, d, e, f, g}, {x, a, r, b, c, j}} /. {{___, Longest[y__], ___}, {___, y__, ___}} -> {y}
(*
-> {b, c}
*)

Edit

With this approach you can do one thing that seems not trivial by using the faster LongestCommonSequence[] function: finding the maximal common subsequence among several lists:

{{1, 2, 3, 4, 7, 8}, {1, 2, 3, 5, 7, 8}, {3, 4, 7, 5, 7, 8}} /. 
                {{___, Longest[y__], ___}, {___, y__, ___} ...} -> {y}
(*
->{7, 8}
*)
Dr. belisarius
  • 115,881
  • 13
  • 203
  • 453
18

These things I have coded up before Mathematica 7 and the introduction of the built in function LongestCommonSubsequence. The built-in version is of course faster, though still this implementation might be of interest as it has a bit wider functionality. Also, with some fine-tuning and compilation the performance can be surely increased.

longestCommonSubsequence[s, t] returns a set of the longest common continuous subsequences that can be found in lists s and t. longestCommonSubsequence lists all distinct continuous subsequences (one thing the built-in LongestCommonSubsequence is not capable of).

longestCommonSubsequence[s_List, t_List] := Module[
   {m = Length@s, n = Length@t, longest = 0, l, set = {}},
   l = Table[0, {m + 1}, {n + 1}];
   Do[
    If[s[[i]] === t[[j]],
      l[[i + 1, j + 1]] = l[[i, j]] + 1;
      If[l[[i + 1, j + 1]] > longest, longest = l[[i + 1, j + 1]]; 
       set = {}];
      If[l[[i + 1, j + 1]] === longest, 
       set = Union[set, {Take[s, {i - longest + 1, i}]}]];
      ];
    , {i, m}, {j, n}];
   set
   ];

Or you can just calculate the length, which is faster:

longestCommonSubsequenceLength[s, t] returns the length of the longest common continuous subsequence that can be found in lists s and t.

longestCommonSubsequenceLength[s_List, t_List] := 
  Module[{m = Length@s, n = Length@t, l, longest = 0},
   l = Table[0, {m + 1}, {n + 1}];
   Do[
    If[s[[i]] === t[[j]], l[[i + 1, j + 1]] = l[[i, j]] + 1];
    longest = Max[longest, l[[i + 1, j + 1]]];
    , {i, m}, {j, n}];
   longest
   ];

Example usage:

s = RandomChoice[{"A", "C", "T", "G"}, 200];
t = RandomChoice[{"A", "C", "T", "G"}, 200];

LongestCommonSubsequence[StringJoin@s, StringJoin@t]

"CATATTG"

longestCommonSubsequence[s, t]

{{"C", "A", "T", "A", "T", "T", "G"}, {"G", "T", "C", "A", "A", "T", "G"}}

Note that longestCommonSubsequence has found all instances of common subsequences.

longestCommonSubsequenceLength[s, t]

7

István Zachar
  • 47,032
  • 20
  • 143
  • 291