In Polish and Czech typography short words should never be typeset at the end of a line, hyphenating a longer word is preferred instead. Sadly typing a NB space after every short word adds a lot of work (they are quite common). A simple general rule to apply would be automatically converting every space after a word 3 or less letters long to a NB space. (Not to mention the fact such a solution simply looks more elegant in any Latin script language).
In fact, for biological or chemical texts the rule should also extend to hyphens - a hyphen immediately after a 3 or less letter long word should be non-breaking, so e.g. an element symbol before a name of a chemical compound or a Greek letter in front of a protein name is not separated from what follows.
All existing solutions to the problem involve either a dictionary of words to insert a NB space after or cover only single letter words. This however does not work for many technical texts, because the rule should also take numbers into account (in a properly typeset text unit symbol is never separated from number it belongs to for example). So far I found no way to extend the code to cover any group of three characters.
(no MWE because IMO the matter is too general to make any use of one)
EDIT: upon some consideration I think this question could be divided in two parts. First would be setting a list of non-breaking strings, which should include explicit hyphen (-), dash (–), pause (—), colon (:) and colon surrounded by spaces ( : ) as all these character can appear as parts of a word in a chemical context. Second would be detecting words made of less than 4 letter-like characters (must include letters outside basic Latin script, e.g. Greek, and digits) and converting spaces behind them to non-breaking spaces, what I think could be done with a RegEx as a last resort.
the,to,befor example. – David Carlisle Oct 06 '20 at 19:41