19

Is there any open database of English words and their pronunciation in IPA? What about other languages (Arabic, French,...)? (Please also point out the license of the DB)

Update: I tried to convert some non-IPA schemes to IPA, but the results were not accurate. Also see: http://spirit.blau.in/simon/2012/05/02/schotts-general-american-dictionary-0-2-1-ipa/ , http://students.washington.edu/riebold/files/Arpabet%20Vowel%20Analyzer.praat and http://theaccentlab.com/

Real Dreams
  • 493
  • 5
  • 12
  • For English, also see this question: http://opendata.stackexchange.com/questions/3764/is-there-a-free-list-of-english-word-phonetics/3778#3778 – maj Jun 04 '15 at 16:38

2 Answers2

17

Might want to double check the license, but the baseline standard is the CMU Pronunciation dictionary, which is freely downloadable and also ships with many NLP libraries, like NLTK (python).

For out-of-vocabulary words, I've had great success with Sequitur G2P, which is both trainable and under the GPL.

edit: note that CMUDict (and many other speech processing pipelines) represent pronunciation in ARPAbet. I apparently don't have enough points to post more links, but google "FAVE ARPABET" and you'll get a handy cheat sheet.

edit 2, in response to OP's edit:

  1. Converting from arpabet to IPA is deterministic, so again, wikipedia is your friend as long as broad transcription is acceptable (see note below)

  2. Depending on the language, you may not need a pronunciation dictionary. german, japanese and korean are examples of languages that have a deterministic mapping of grapheme to phoneme. english orthography is a hideous mutt of historical accident, so sometimes there's really just no way to tell how a word will be said without just memorizing it. french is horrible, too. i'm not sure about arabic. i'd ask people who do automatic speech recognition in your target language (googling should bring you some researchers' homepages)

"note below": 99.99% of the time, in real-world engineering usage, it is. IPA transcription can get insanely narrow, describing phonetic attributes things like aspiration, specific articulatory gestures, etc that don't "exist" in a speaker's conscious knowledge of their language because they're not phonemic, meaning that they can't be used to signal the difference between words with two different meanings

svick
  • 869
  • 4
  • 9
boblannon
  • 311
  • 1
  • 5
  • Thanks, Are you sure French orthography is horrible? From wiki: ", there are rules governing French orthography which allow for a reasonable degree of accuracy when producing French words from their written forms. The reverse operation, producing written forms from a pronunciation, fails with a higher frequency." Perhaps you mean it's not as horrible as English and not as easy as German. – Real Dreams May 24 '13 at 02:53
  • Yeah, that may be true. I haven't done french phonetic transcription before, so I was basing it on my experience reading french. Totally possible that the ugliness is only in one direction. – boblannon May 29 '13 at 17:20
7

The MRC Psycholinguistic Database is similar to what you're looking for. It is somewhat out of date in terms of design, but it does allow lookup of English words, pronunciation, and a number of other psychologically interesting facts about words.

ted.strauss
  • 357
  • 1
  • 10
  • I can't see any license/copyright information on the reference page at http://websites.psychology.uwa.edu.au/school/MRCDatabase/mrc2.html – ted.strauss May 22 '13 at 15:48