3

This might be slightly off topic since my main goal is to hyphenate strings in Java, but please bare with me.

So far I've only looked at TeXHyphenator-J and Hyphenation libraries and both of them use LaTeX hyphenation patterns.

I would like to hyphenate strings for US English, ideally taking into account medical terms which tend to be lengthy.

I've used multiple US patterns found on CTAN but notice the hypenation isn't quite correct. For example, inputting a string like

Deoxyribonucleic acid, sternocleidomastoid, dicotiledoneas,Pneumonoultramicroscopicsilicovolcanoconiosis

I've got results like these:

De-oxyri-bonu-cle-ic acid, ster-n-oclei-do-mas-toid, di-cotile-doneas,P-neu-monoul-tra-mi-cro-scop-ic-sil-i-co-vol-canoco-nio-sis

Deoxy-ri-bo-nucleic a-ci-d, s-tern-oclei-do-mas-to-i-d, di-co-ti-le-do-ne-as,P-neu-mo-noul-tra-mi-cros-co-pic-si-li-co-vol-ca-no-co-nio-sis

Are there any LaTeX patterns for hyphenating medical terms in (US)English ? If so, where can I find them ?

If not, is it possible to create such a pattern ? Where would I start ?

1 Answers1

2

If you really want to correctly hyphenate the medical idiom it sounds like preparing a new language. TeX uses Liang's algorithm. Accordingly the hyphenation patterns are made for it. The TeX distributions contain a program patgen to generate the hyphenation pattern from a (large) list of correctly hyphenated words.