2

want to compare two text file. Suppose file AA contains following text

   1.The onunload event occurs once a
   2.page has unloaded (or the browser window has been closed).
   3.onunload occurs when the user
   4.navigates away from the page (by clicking on a link, submitting
   5.text One
   6.text two
   7.text three

And file BB contains following text

   1.The onunload event occurs once a
   2.page has unloadd (or the browser window has been closed).
   3.onunload the when  occurs user
   4.navigates away from the page (by clicking on a link, submitting
   5.text two
   6.text One
   7.text three

What I want is

-Line 1 in AA and 1 in BB is exact match    
-Line 2 in AA and 2 in BB is match but contains error (see word "unloadd" in BB content)   
-Line 3 in AA and 3 in BB is match but words are swaped in BB content    
-Line 4 in AA and 4 in BB is exact match    
-Line 5 in AA is swapped to Line 6 in BB .    
-Line 6 in AA is swapped to Line 5 in BB.   
-Line 7 is exact match with line 7 in BB.

How to achieve this ? Is there any pattern matching algorithm?

Dhinesh
  • 39
  • 1
  • 3

1 Answers1

6
EditDistance["hello", "hell"]
1

s1="The first thing  will do is choose a topic";
s2="The first thing you will do is choose  topic";
EditDistance[s1, s2]
4

This calculates the Levenshtien distance between two strings. When I google "distance between two strings" this wikipedia article is the first hit.

This measures how many letters differ between the two sentences. Anon suggests measuring how many words differ, which can be done by breaking the sentences into words (by splitting at the spaces):

EditDistance @@ (StringSplit /@ {s1, s2})
2
bill s
  • 68,936
  • 4
  • 101
  • 191
  • Since the atom in a sentence is a word and not a character, I would also consider using EditDistance @@ (StringSplit /@ {"The first thing will do is choose a topic", "The first thing you will do is choose topic"}) and see what gets the best result. – C. E. Oct 23 '13 at 14:00
  • @Anon -- good idea. I've added this to the answer. Thanks – bill s Oct 23 '13 at 15:40