I would like to know how I can remove accents from a string. For example, how can I transform "string test áéíóú" into "string test aeiou"?
I have to normalize some text to make comparisons, and this would be very helpful.
Asked
Active
Viewed 1,970 times
15
J. M.'s missing motivation
- 124,525
- 11
- 401
- 574
Murta
- 26,275
- 6
- 76
- 166
-
It's worth noting that this is a follow-up question to this one. – Sjoerd C. de Vries Nov 11 '12 at 19:33
3 Answers
21
To remove accents from a string I use this function:
removeAccent[string_] := Module[{accentMap,l1,l2},
l1 = Characters["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ"];
l2 = Characters["SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"];
accentMap = Thread[l1 -> l2];
StringReplace[string, accentMap]
]
So, if you apply it as removeAccent["string test áéíóú"]you get: "string test aeiou"
Update
Now in version 10.1 we have the native function: RemoveDiacritics
RemoveDiacritics["string test áéíóú"] you get "string test aeiou"
Timing comparison using the new RepeatedTiming.
RepeatedTiming[removeAccent["string test áéíóú"]]
RepeatedTiming[RemoveDiacritics["string test áéíóú"]]
> 0.000057 > 0.000015
RemoveDiacritics wins!
Murta
- 26,275
- 6
- 76
- 166
-
1Ð (ð) is not an accented D (d) but the letter eth without any accents. It should not be considered an accented character in this context. – Oleksandr R. Apr 04 '13 at 14:04
-
-
-
-
-
-
10
Your own method without extraneous reevaluation:
With[{accentMap =
Characters /@
Rule["ŠŽšžŸÀÁÂÃÄÅÇÈÉÊËÌÍÎÏÐÑÒÓÔÕÖÙÚÛÜÝàáâãäåçèéêëìíîïðñòóôõöùúûüýÿ",
"SZszYAAAAAACEEEEIIIIDNOOOOOUUUUYaaaaaaceeeeiiiidnooooouuuuyy"] // Thread},
removeAccent[string_] := StringReplace[string, accentMap]
]
Mr.Wizard
- 271,378
- 34
- 587
- 1,371
-
2
-
1I don't think so. I have made some tests with Dispatch, and it gets slower.
StringReplacemust already do something like it. – Murta Nov 12 '12 at 11:38 -
1@J.M. I'll have to test it but I believe that because the String operations are handled by a separate library things like
Dispatchare not applicable. At least that's what I seem to recall concluding previously. – Mr.Wizard Nov 12 '12 at 16:39
4
Works nicely:
removeAccent[s_String] := Module[{patt = "(Capital)?([A-Z]{1})([A-Z]\\w*)*", del},
del = Select[Characters[s], StringMatchQ[ToString[FullForm[#]],
RegularExpression[".*\\[" <> patt <> "\\].*"]] &];
StringReplace[s, Thread[del -> Map[First, StringCases[ToString[FullForm[#]] & /@ del,
RegularExpression[patt] :> If["$1" === "", ToLowerCase, Identity]["$2"]]]]]]
Test:
removeAccent["string test áéíóú"]
"string test aeiou"
removeAccent["Çärîñő Ð Štùrm"]
"Carino Ð Sturm"
J. M.'s missing motivation
- 124,525
- 11
- 401
- 574