15

I would like to know how I can remove accents from a string. For example, how can I transform "string test áéíóú" into "string test aeiou"? I have to normalize some text to make comparisons, and this would be very helpful.

J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574
Murta
  • 26,275
  • 6
  • 76
  • 166

3 Answers3

21

To remove accents from a string I use this function:

removeAccent[string_] := Module[{accentMap,l1,l2},
    l1 = Characters["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"];
    l2 = Characters["SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"];
    accentMap = Thread[l1 -> l2];
    StringReplace[string, accentMap]
]  

So, if you apply it as removeAccent["string test áéíóú"]you get: "string test aeiou"

Update

Now in version 10.1 we have the native function: RemoveDiacritics

RemoveDiacritics["string test áéíóú"] you get "string test aeiou"

Timing comparison using the new RepeatedTiming.

RepeatedTiming[removeAccent["string test áéíóú"]]
RepeatedTiming[RemoveDiacritics["string test áéíóú"]]
> 0.000057
> 0.000015

RemoveDiacritics wins!

Murta
  • 26,275
  • 6
  • 76
  • 166
10

Your own method without extraneous reevaluation:

With[{accentMap =
  Characters /@
    Rule["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ", 
         "SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"] // Thread},

 removeAccent[string_] := StringReplace[string, accentMap]

]
Mr.Wizard
  • 271,378
  • 34
  • 587
  • 1,371
  • 2
    Would a Dispatch[] help matters here? – J. M.'s missing motivation Nov 12 '12 at 09:28
  • 1
    I don't think so. I have made some tests with Dispatch, and it gets slower. StringReplacemust already do something like it. – Murta Nov 12 '12 at 11:38
  • 1
    @J.M. I'll have to test it but I believe that because the String operations are handled by a separate library things like Dispatch are not applicable. At least that's what I seem to recall concluding previously. – Mr.Wizard Nov 12 '12 at 16:39
4

Works nicely:

removeAccent[s_String] := Module[{patt = "(Capital)?([A-Z]{1})([A-Z]\\w*)*", del}, 
 del = Select[Characters[s], StringMatchQ[ToString[FullForm[#]],
                                          RegularExpression[".*\\[" <> patt <> "\\].*"]] &];
 StringReplace[s, Thread[del -> Map[First, StringCases[ToString[FullForm[#]] & /@ del, 
               RegularExpression[patt] :> If["$1" === "", ToLowerCase, Identity]["$2"]]]]]]

Test:

removeAccent["string test áéíóú"]
   "string test aeiou"

removeAccent["Çärîñő Ð Štùrm"]
   "Carino Ð Sturm"
J. M.'s missing motivation
  • 124,525
  • 11
  • 401
  • 574