1

I tried to improove a function I wrote to modify "castellano" the proper name for spanish to coherent (so that there's no exception rules to phonetics). I made this with StringReplace with a long list of rules.

euzk[cad_] := 
 StringReplace[
  cad, {"á" -> "a", "é" -> "e", "í" -> "i", "ó" -> "o", "ú" -> "u", 
   "ca" -> "ka", "ce" -> "ze", "ci" -> "zi", "co" -> "ko", 
   "cu" -> "ku", "ch" -> "c", "cl" -> "kl", "cr" -> "kr", 
   "cc" -> "kz", "cg" -> "kg", "cn" -> "kn", "cp" -> "kp", 
   "ct" -> "kt", "ge" -> "je", "gi" -> "ji", "gue" -> "ge", 
   "gui" -> "gi", "h" -> "", "qu" -> "k", "v" -> "b", "w" -> "u", 
   "y" -> "i"}]

As I read about Dispatch and how can make functions run faster I made

eusk = {"á" -> "a", "é" -> "e", "í" -> "i", "ó" -> "o", "ú" -> "u", 
   "ca" -> "ka", "ce" -> "ze", "ci" -> "zi", "co" -> "ko", 
   "cu" -> "ku", "ch" -> "c", "cl" -> "kl", "cr" -> "kr", 
   "cc" -> "kz", "cg" -> "kg", "cn" -> "kn", "cp" -> "kp", 
   "ct" -> "kt", "ge" -> "je", "gi" -> "ji", "gue" -> "ge", 
   "gui" -> "gi", "h" -> "", "qu" -> "k", "v" -> "b", "w" -> "u", 
   "y" -> "i"};

then

deusk = Dispatch[eusk]

and finally

euzk[cad_] := StringReplace[cad, deusk]

But when try to use it... receive

StringReplace::srep: Dispatch[Length: 27] is not a valid string replacement rule.

I can of course leave the function as list of rules, but Dispatch looks like so efficient. Can I use it otherwise? Some advice ?

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368
Anxon Pués
  • 907
  • 5
  • 13

1 Answers1

5

Dispatch is designed to optimize lists of replacement rules for the purposes of such functions as Replace, ReplaceAll, ReplaceRepeated and ReplaceList, it is not intended to be used with string patterns (StringReplace, StringReplaceList, StringCases). Mathematica translates string patterns to regular expressions which will be further compiled by the PCRE library and cached (see more info here and here), what already gives huge speed-up.

In your particular case you should just use your replacement rules "as is" for achieving the best possible performance:

euzk[cad_] := StringReplace[cad, eusk]

In some other cases when you need to replace not a literal substring, but a string pattern, rewriting your string pattern (or a part of the pattern) in terms of RegularExpression can give a performance gain, but not always.

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368