2

How might I split a word at each IPA of each unique sound. For example

{"pray", "wade"}
(* {{"p","r","ˈeɪ"}, {"w","ˈeɪ","d"}} *)

ssch answered a question here showing how to add the IPA form to WordData. After adding such values I determined that ideally I am looking to separate the values in between each IPA sound(this probably isn't the correct terminology).

An example would be the following "pray" and {"p","r","ˈeɪ"}.

"wade" would return {"w","ˈeɪ","d"}

Ideally a list of each unique IPA sound could be useful something like {"b", "d", "dʒ", "f", "g", "h"....

So how might I get a list of Mathemametica's IPA sounds such that I can split a word's IPA form at each unique IPA sound?

William
  • 7,595
  • 2
  • 22
  • 70
  • 2
    I really don't see where this question is even related to Mathematica. It's a linguistic problem and when you find the rules of splitting, you can hack it down in every language. – halirutan Oct 20 '13 at 05:33
  • 3
    Furthermore, pray is a single syllable, isn't it? :-/ – Mr.Wizard Oct 20 '13 at 05:43
  • @halirutan I have edited the question. I believe the question is now worded more towards how Mathematica uniquely(or at least differently) represents the IPA sounds of different words. – William Oct 20 '13 at 14:25
  • 1
    @Mr.Wizard I edited the question to clarify. Yes pray is one syllable, but I hope the edit makes it clear that I am not looking for the syllable but more of a list of the different IPA sounds Mathematica uses. Szabolcs pointed our here that the way Mathematica represents IPA isn't the typical representation. – William Oct 20 '13 at 14:28

1 Answers1

3

Using the words from here, I manually made a list of the different IPA's in Matheamtica. They are organized as consonants and vowels. There doesn't appear to be any foreign sounds in Mathematica's IPA representations therefore I didn't list them. Next to Mathematica's representation I also listed dictionary.com's lazy form of word representations. Ideally I am looking to add hyphens between the different IPA sounds. I will update this answer as my thoughts progress.

The following code prints the dictionary.com lazy representation of words separated by hyphens. For example

the needed code with a graph showing the pronunciation of the different sounds.

consonants = {{"_b_oy", "b", "b"}, {"_d_o", "d", "d", "d"}, {"_f_ood",
     "f", "f"}, {"_g_et", "ɡ", "g"}, {"_h_appy", "h", "h"}, {"_j_ump",
     "dʒ", "j"}, {"_c_an", "k", "k"}, {"_l_et", "l", "l"}, {"_m_ake", 
    "m", "m"}, {"_n_o", "n", "n"}, {"si_ng_er", "ŋ", "ng"}, {"_p_ut", 
    "p", "p"}, {"_r_un", "r", "r"}, {"_s_it", "s", "s"}, {"_sh_e", 
    "ʃ", "sh"}, {"_t_op", "t", "t"}, {"_ch_ur_ch_", "tʃ", 
    "ch"}, {"_th_irsty", "\[Theta]", "th"}, {"_th_is", "ð", 
    "th"}, {"_v_ery", "v", "v"}, {"_w_ear", "w", "w"}, {"_wh_ere", 
    "w", "w"}, {"_y_es", "j", "y"}, {"_z_oo", "z", "z"}, {"mea_s_ure",
     "ʒ", "zh"}};
vowels = {{"_a_pple", "æ", "a"}, {"_ai_d", "eɪ", "ey"}, {"_a_rm", "ɒ",
     "ah"}, {"_air_", "ɛr", "air"}, {"_a_ll", "ɔ", "aw"}, {"_e_ver", 
    "ɛ", "e"}, {"_ea_t", "i", "ee"}, {"_ear_", "ɪr", 
    "ear"}, {"teach_er_", "ɝ", "er"}, {"_i_t", "ɪ", "i"}, {"_I_", 
    "aɪ", "ahy"}, {"_o_dd", "ɒ", "o"}, {"_owe_", "oʊ", 
    "oh"}, {"_oo_ze", "u", "oo"}, {"g_oo_d", "ʊ", "oo"}, {"_oi_l", 
    "ɔɪ", "oi"}, {"_ou_t", "aʊ", "ou"}, {"_u_p", "ʌ", 
    "uh"}, {"_a_bout", "ə", "uh"}, {"_ear_ly", "ɝ", "ur"}};
style[str_, color_] := StringReplace[str,
   {"_" ~~ Shortest[x__] ~~ "_" :> 
     "\!\(\*StyleBox[\"" <> x <> "\", FontColor -> " <> 
      ToString[color] <> "]\)"}
   ];
Grid[
 Join[
  Map[{style[#[[1]], Blue], #[[2]], #[[3]]} &, consonants],
  Map[{style[#[[1]], Blue], #[[2]], #[[3]]} &, vowels]
  ]
 ]

and the code

lazyList = 
  Map[#[[2]] -> StringJoin["-", #[[3]]] &, 
   Reverse@SortBy[Join[consonants, vowels], #[[2]] &]];
ipaList = 
  Map[#[[2]] -> StringJoin["-", #[[2]]] &, 
   Reverse@SortBy[Join[consonants, vowels], #[[2]] &]];
word = ToString@WordData[ToLowerCase@"dog", "PhoneticForm"];

(* ipa form no stress marks *)
StringDrop[StringReplace[
  StringReplace[word, ipaList], "ˈ" -> ""], 1]
(* *)
StringDrop[StringReplace[
  StringReplace[word, ipaList], "-" ~~ x_ ~~ "ˈ" -> "-ˈ" ~~ x], 1]
(* lazy form no stress marks *)
StringDrop[StringReplace[
  StringReplace[word, lazyList], "ˈ" -> ""], 1]
(* lazy form correct sttress marks location *)
StringDrop[StringReplace[
  StringReplace[word, lazyList], "-" ~~ x_ ~~ "ˈ" -> "-ˈ" ~~ x], 1]

returns

"d-ɔ-ɡ"
"ˈd-ɔ-ɡ"
"d-aw-g"
"ˈd-aw-g"

enter image description here

William
  • 7,595
  • 2
  • 22
  • 70