0

I started a project in chatbots for the normal regional languages chatting in english like below , I need a dataset to train my model in tensorflow ,Is any dataset default present in tensorflow or any framework available to do this . if so how to use it . I checked the awesome public data sets list, But i could not able to find it and also i verified in tensorflow sonnet related frameworks ,no chance , Please help with the solution ,I almost created the android app but very poor performance with the dataset available in hand

Tamil language written via english

"Neenga Epdi Irukenga?" ==> "How are you ?"

More Info

Keyboard Support is very limited for most of the regional languages ,To make the conversation in regional language ,people are typing regional langauges in english the way as like pronounce

Additional Info

Similar to transliteration but transliteration is for unique word ,the dataset needed for meaningful sentence

albert
  • 11,885
  • 4
  • 30
  • 57

1 Answers1

1

If you are indeed looking for transliteration, you may want to try a programming library like Python's Unidecode. It's not human tranlsations, but it takes unicode alphabets and transforms them to ASCII.

enter image description here

# -*- coding: utf-8 -*-
from unidecode import unidecode

t = u'வணக்கம். எப்படி இருக்கிறீர்கள்?'

print unidecode(t)

Gives me

> vnnkkm. epptti irukkirriirkll?

According to Google Translate, it should be something like this:

> Vaṇakkam. Eppaṭi irukkiṟīrkaḷ?

enter image description here

I've found this library to work quite well for most alphabets, but maybe Tamil isn't so supported.

From the library doc:

So a good rule of thumb is that the further the script you are transliterating is from Latin alphabet, the worse the transliteration will be.

philshem
  • 17,647
  • 7
  • 68
  • 170
  • Super, Really Nice ,You got the context ,Yes its similar to transliteration, Is any dataset available(best) to do chatbots application on that (https://arxiv.org/pdf/1610.09565.pdf)? – krishnakumar sekar May 17 '17 at 13:12