I have a picture with some writings in it. It is written by computer in to me unknown alphabet. Is there a way how to use Mathematica to tell me what alphabet was used or what language? Here are two versions of the same text written in to me unknown alphabets.
Asked
Active
Viewed 205 times
7
1 Answers
8
Here's the list of all languages supported by TextRecognize in v12.1.
languages = {"Afrikaans", "Albanian", "Azerbaijani", "Belarusian", "Bosnian",
"Bulgarian", "Catalan", "Cebuano", "ChineseSimplified",
"ChineseTraditional", "Croatian", "Czech", "Danish", "Dutch",
"English", "Esperanto", "Estonian", "Finnish", "French", "Galician",
"Georgian", "German", "Greek", "Haitian", "Hungarian", "Icelandic",
"Indonesian", "Irish", "Italian", "Japanese", "Kazakh", "Kirghiz",
"Korean", "Lao", "Latin", "Lithuanian", "Macedonian", "Malay",
"Norwegian", "Polish", "Portuguese", "Romanian", "Russian",
"Serbian", "Slovak", "Slovenian", "Spanish", "Swahili", "Swedish",
"Tajik", "Turkish", "Ukrainian", "Uzbek", "Vietnamese", "Welsh"};
This will take a long time to execute the first time it downloads all the languages, so I recommend you remove languages from the list that you know aren't relevant. The code below will recognize your text and produce a list of pairs of the form {text, strength}, where strength tells you if it's a good match:
img = Import["https://i.stack.imgur.com/j9NXm.jpg"];
{#, TextRecognize[img, "Line", {"Text","Strength"}, Language -> #]}&/@languages;
I slimmed down the list of languages to demonstrate:
results = {#, TextRecognize[img, "Line", {"Text", "Strength"},
Language -> #]} & /@ {"English", "French", "Japanese", "Lao", "Thai"}
(**
English {gyaaniia,0.}
French {NN,0.}
Japanese {ココ!せっ,0.15696}
Lao {ປາງເຄານິວ,0.610667}
Thai {ยาวเกานิว,0.941698}
**)
You could select the best one using: First[MaximalBy[results, #[[2, 2]] &]] which gives you:
{"Thai", {"ยาวเกานิว", 0.941698}}
flinty
- 25,147
- 2
- 20
- 86


img = Import["https://i.stack.imgur.com/j9NXm.jpg"]; TextRecognize[img, Language -> "Thai"]gives ยาวเกานิว but this is slightly incorrect. According to google translate it should be "ยาวเก้า นิ้ว" which translates to Nine Inches Long – flinty Jun 13 '20 at 14:23TextRecognizein flinty’s answer, so unless there is some special argument or something that cannot be given to Tesseract through the Mathematica interface,TextRecognizeshould be sufficient. – C. E. Jun 13 '20 at 16:14