8

Is there a way to get a word's linguistic pronunciation given the word as a string?

I would like a function LingusticPronunciation such that, for example, LingusticPronunciation["Dog"] would return "Dawg" or maybe even "dȯg" or "däg", depending on what type of pronunciation dictionary is used.

Ultimately I am trying to search for all homophones in a document and would prefer not having to use the internet to check for different homophone/pronunciation information.

VividD
  • 3,660
  • 4
  • 26
  • 42
William
  • 7,595
  • 2
  • 22
  • 70

1 Answers1

9

WordData can give you the IPA form of a word:

Gather[
 WordData[#, "PhoneticForm"] & /@ {"pray", "prey", "wade", "weighed"}
]
(* {{"pr'ey", "pr'ey"}, {"w'eyd", "w'eyd"}} *)

EDIT It seems WordData[word, "PhoneticForm"] no longer provides the proper IPA, however that data is still included in the paclet so we can make a new WordData property for that. (or override the the current PhoneticForm)

The IPA data is stored in a file called "IPAPronunciation.wdx" which contains a dispatch table with the "word"->"ipa" rules. It does not contain a _ -> Missing["NotAvailable"] so that is added.

Module[{
  ipapath = FileNames@FileNameJoin[
   {$UserBasePacletsDirectory,(*$ avoid SE indentation bug *)
    "Repository", "WordData_IPAPronunciation-*", "Data", "IPAPronunciation.wdx"}],
  iparules},
 If[Quiet[Head[WordData["a", "IPA"]] =!= WordData] ||
    ipapath == {} || ! FileExistsQ[Last@ipapath], 
    Return[$Failed, Module]];
 iparules = Dispatch[Append[
    Import[Last@ipapath][[2, 1]],
    _ -> Missing["NotAvailable"]]];
 Unprotect[WordData];
 WordData[word_String, "IPA"] := word /. iparules;
 DownValues[WordData] = RotateRight[DownValues[WordData]];
 Protect[WordData];
]
WordData[#, "IPA"] & /@ {"pray", "prey", "wade", "weighed"}
(* {"prˈeɪ", "prˈeɪ", "wˈeɪd", "wˈeɪd"} *)

Some things used in above code:

Overloading second argument of CountryData

What can I use as the second argument to Return in my own functions?

If there is a neat way to add a rule to a dispatch table without redoing the Dispatch call feel free to edit.

ssch
  • 16,590
  • 2
  • 53
  • 88
  • Do you know of any way to get this to work approximately for different names? Smith, Williams, Johnson, Brees WordData["Smith","PhoneticForm"] – William Sep 11 '13 at 20:06
  • That is a very interesting link. It appears the following works for names WordData[ToLowerCase@"Smith", "PhoneticForm"] although I am not sure how accurate it is. – William Sep 11 '13 at 20:12
  • 1
    The documentation says it returns IPA, and the examples in the documentation are in IPA, but the actual values Mathematica returns in practice are not in IPA. The IPA for your words should be /preɪ/ and /weɪd/ instead; see e.g. Wikipedia's IPA key. –  Sep 11 '13 at 21:42
  • 1
    @RahulNarain Strange as the data is still there in the paclet, see edit. (@Szabolcs ping) – ssch Sep 11 '13 at 22:45
  • @RahulNarain What version of Mathematica? In v8 I get "prˈeɪ" I believe Mathematica 9 or atlteast your version is using this form. Witch IMO is actually much more readable. – William Sep 11 '13 at 23:06
  • @ssch Not sure if that comment was meant for me, but all I got was an error in Mathematica – William Sep 11 '13 at 23:23
  • @Liam It's not about readability. The ' sign indicates the stress. AFAIK the stress mark is typically put before the syllable, and not before the vowel, so this seems inaccurate. But I may be wrong. Another inaccuracy is that vowel length is not indicated: shoe gives /ʃu/ and not /ʃuː/ as it should. However, this is not a problem in practice because these pronunciations are phonemic transcriptions, not phonetic ones. What that means in plain English is that the information given is sufficient for pronouncing the word ... – Szabolcs Sep 11 '13 at 23:39
  • 1
    @Liam ... even if you have never seen that word before, but you need to know "how English pronunciation works" (to put it simply). All these transcriptions work only with the added knowledge that the language being transcribed is English. – Szabolcs Sep 11 '13 at 23:41
  • @ssch I posted a similar question here a brief glance would be appreciated. – William Oct 20 '13 at 05:27