4

Running latex from the current Debian stable on

\documentclass[british]{article}
\usepackage[british]{babel}
\begin{document}
\showhyphens{memorandum}
\end{document}

yields

mem-o-ran-dum

At the same time, https://www.ushuaia.pl/hyphen/?ln=en, which is, using its own words, “based on TeX system”, says

memor•andum

when choosing English (GB). This was also the hyphenation point in a text of mine as of May 2023 using one of my TeX installations (which is now gone, alas); the hyphenation has apparently changed in some direction.

These two hyphenation patterns have no common hyphenation point; at least one of them is probably very wrong. The 5th, 6th, 7th, and 9th printed editions of Oxford Advanced Learner's Dictionary of Current English by A S Hornby say

memo•ran•dum

The New Oxford Spelling Dictionary, The Writers' and Editors' Guide to Spelling and Word Division by Maurice Waite, 2005, ISBN 0-19-860881-0, ISBN 978-0-19-860881-3 (cf. an online library), says on page 313,

memo|ran¦dum
memo|randa
memo|ran¦dums

The bar | is a primary division point (“at which a word can be divided under almost any circumstances”) and the broken bar is a secondary division point (“at which a word is best divided only if absolutely necessary […]”); cf. page x. Other words in the dictionary may have more than one primary or secondary division point, so it's a matter of taste or convention of what to do with this distinction in (La)TeX, and the word list should be interpreted according to the guide at the start of the book anyway.

Being a non-native speaker, I can only suggest that the version from the spelling-and-word-division dictionary is probably authoritative; cf. http://comp.text.tex.narkive.com/9kbsxiTL/british-english-hyphenation#post3.

I have access neither to the most recent edition of the aforementioned dictionary nor to The Oxford minidictionary of spelling and word division. I also don't have access to the data in https://www.oed.com/dictionary/memorandum_int, as they are behind a paywall. If anyone has access and would like to confirm/refute using these sources, please yell.

I kindly ask the British native speakers among you to double-check and, if necessary, adjust or suggest adjusting the UK-hyphenation patterns.

An e-mail concerning the issue has been sent to the maintainers of ushuaia.pl. Hyph-utf8 has received an issue report, too.

  • 2
    I doubt most native speakers could say which is "correct". It's not a case of right and wrong, just whatever the human compiler of the dictionary preferred. I am British but think hyphention after r as found by the UK patterns looks rather strange. (but I usually use the default US patterns) – David Carlisle Jul 22 '23 at 14:47
  • @DavidCarlisle After I finally installed the right package texlive-lang-english, it'd be still nice to see whether memo•ran•dum (from an old paper dictionary) or memor•andum (from the current TeX) is right(TM) :-). –  Jul 22 '23 at 15:28
  • as I say the oxford version you show looks best so \hyphenation{memo-ran-dum} would over-ride either set of patterns, but asking whether it is right or wrong can't have a simple answer – David Carlisle Jul 22 '23 at 15:32
  • @DavidCarlisle Yes, maybe. On the other hand, it's a pattern from a LEARNER'S (viz., for schoolchildren) dictionary, and I have NOT looked up where they took hyphenation positions from: I had the dictionary only about 5 minutes long, and now it's no longer at my disposal :-(. –  Jul 22 '23 at 17:13
  • 2
    that makes no sense to me sorry. I spent every year up to age 18 in English schools and I'm sure the word "hyphenation" was never used, and no dictionaries I used showed hyphenation. The whole concept is a typographical feature not anything for normal use – David Carlisle Jul 22 '23 at 17:19
  • 1
    I've taken the liberty of adding the "hyphenation" tag. Somewhere (in an inaccessible box) I have a copy of the spelling/hyphenation dictionary that was allegedly used to develop the British patterns. When I can dig that out, I want to check this. British hyphenation is supposedly based on etymology rather than syllabification, so one hyphenation point, after "r", is plausible, even if it doesn't look good to me. – barbara beeton Jul 22 '23 at 18:32
  • @barbarabeeton Thx for the hyphenation tag! Would be great if you could double-check the word against your British-English sources. –  Jul 22 '23 at 22:28
  • @barbarabeeton I've just checked the dictionary that seems to be authoritative according to DBW, whoever he may be. –  Jul 24 '23 at 10:43
  • @AlMa0 -- Thanks for that reference; I wasn't aware of it. But I have found odd things in the hyphenation dictionary that I understand to be the one used to develop the UK patterns. See https://tex.stackexchange.com/a/171155 for a scanned page of that reference showing an identified peculiarity. – barbara beeton Jul 24 '23 at 13:49
  • @barbarabeeton I failed to find the contents of “the oxford minidictionary of spelling and word division” online. If you have it, please feel free to look up “memorandum” there. If the minidictionary also happens to say “memo|ran¦dum”, we'd better file a bug report against babel-english. –  Jul 24 '23 at 17:09
  • @AlMa0 -- When/if I manage to unearth it, I expect to find that the listing for "memorandum" agrees with what the patterns produce The real problem, I think, is that current education doesn't give an adequate grounding in the etymological origins of many words, and the hyphenation on that basis is therefore confusing, or even (unintentionally) misleading, based on current pronunciation. – barbara beeton Jul 24 '23 at 18:17
  • @barbarabeeton Alternate may be a verb, an adjective, or a noun. Therefore, differences in hyphenation are not unexpected; after all, we might be trying to put different concepts under the same roof there (which needn't succeed). On the contrary, memorandum is, AFAIK, always a noun, so differences in hyphenation between the mini and nonmini spelling dictionaries would be more surprising for this word. As for missing education, I agree. I have to admit that I'm missing proper Latin lessons and proper English lessons including etymology. –  Jul 24 '23 at 18:39
  • @DavidCarlisle Or maybe it would really have a good outcome because other texts might automatically get better. As for “not bad”, this is your view, and I get it. No argument here. And for a philologist with a PhD, it might hypothetically be really bad. Some folks can be really nitpicking … –  Jul 24 '23 at 19:53
  • @DavidCarlisle I understand. Not a bad goal by itself. And when I run diffpdf each time after a texlive upgrade on my old documents and their recompiled versions (if they recompile at all), I see that some of my documents come out different. Yes, page breaks move. The goal of keeping the page breaks where they are forever is attainable only at the expense of stopping the progress completely. I'm sure you don't want to do that. –  Jul 24 '23 at 20:14

2 Answers2

9

The UK and US patterns have not changed in TeX.

US gives mem-o-ran-dum UK gives memor-andum

If you get mem-o-ran-dum from the document shown you do not have the UK patterns installed.

On Debian-packaged texlive, ensure that the package texlive-lang-english is installed.

David Carlisle
  • 757,742
  • Hm. I do have /usr/share/texlive/texmf-dist/tex/generic/babel-english/british.ldf , which loads english.ldf, which also exists as /usr/share/texlive/texmf-dist/tex/generic/babel-english/english.ldf . –  Jul 22 '23 at 14:40
  • Log says `(/usr/share/texlive/texmf-dist/tex/generic/babel-english/british.ldf Language: british 2017/06/06 v3.3r English support from the babel system

    (/usr/share/texlive/texmf-dist/tex/generic/babel-english/english.ldf Language: english 2017/06/06 v3.3r English support from the babel system Package babel Info: Hyphen rules for 'british' set to \l@english (babel) (\language0). Reported on input line 82. Package babel Info: Hyphen rules for 'UKenglish' set to \l@english (babel) (\language0). Reported on input line 83. `

    –  Jul 22 '23 at 14:48
  • Package babel Info: Hyphen rules for 'canadian' set to \l@english (babel) (\language0). Reported on input line 102. Package babel Info: Hyphen rules for 'australian' set to \l@english (babel) (\language0). Reported on input line 105. Package babel Info: Hyphen rules for 'newzealand' set to \l@english (babel) (\language0). Reported on input line 108. ))) (/usr/share/texlive/texmf-dist/tex/generic/babel/locale/en/babel-british.tex –  Jul 22 '23 at 14:49
  • Package babel Info: Importing font and identification data for british (babel) from babel-en-GB.ini. Reported on input line 11. ) –  Jul 22 '23 at 14:49
  • It's strange that all variants default to language0. I thought that \language0 was for US English, not for UK English. –  Jul 22 '23 at 14:49
  • @AlMa0 - Merriam-Webster, which is a solid reference for US English spelling and hyphenation issues, lists mem-o-ran-dum , i.e., three [3] hyphenation points. This is the same as when using babel's [US-]English patterns. – Mico Jul 22 '23 at 14:50
  • On the console output I see (/usr/share/texlive/texmf-dist/tex/generic/babel/locale/en/babel-ukenglish.tex). So probably, the UK patterns at least try to get loaded. –  Jul 22 '23 at 14:52
  • @DavidCarlisle So if \lanuage0 is NOT British, any idea on what might be wrong with my TeX installation? Should I report to the Debian folks? –  Jul 22 '23 at 14:52
  • @AlMa0 - Please compile the following test document: \documentclass[british]{article} \usepackage{babel} \setlength\textwidth{1sp} \setlength\parindent{0sp} \begin{document}\hspace{0pt}memorandum \end{document}. You should get memor-andum, i.e., a single hyphenation point. – Mico Jul 22 '23 at 14:54
  • @Mico Should. I get mem- o- ran- dum in Debian stable texlive :-(. –  Jul 22 '23 at 14:58
  • @DavidCarlisle I've just managed to install the current TeX Live locally. Using this, I get “memor-andum”. So is Debian stable broken in this regard? –  Jul 22 '23 at 15:00
  • @AlMa0 - Something about your TeX distribution is clearly off. Which operating system and which TeX distiribution do you employ, and when did you last update your TeX distribution? – Mico Jul 22 '23 at 15:03
  • @Mico Debian 12 stable bookworm. –  Jul 22 '23 at 15:07
  • @DavidCarlisle Indeed. texlive-lang-english solved it. How could I have missed it! I guess, a warning on the console or in the log could have done it. Thank you! –  Jul 22 '23 at 15:26
3

It might surprise a non-native speaker of English, but hyphenation is not a skill that is normally taught at school! In handwriting one should only break lines at the end of words, and when word processing, you would allow the software to handle line breaks. Hyphenation is part of the typesetter's skill.

Dictionaries sometimes show the division of words into syllables, but these can only be a guide to where a good hyphenation point might be.

In this particular word, you can split at syllables mem-o-ran-dum, or (noting that "memo" is common abbreviation, so it would be nice not to split it) "memo-ran-dum". Or you can look up the Latin word and note that is formed from the root "memor", via the verb "memorare" with "-andum" being a gerundive verb ending. And split the word following the Latin: "memor-andum".

It's impossible to say which is right, it is matter of judgement. All three hyphenation patters are acceptable to me as a (fairly picky) native speaker. And fortunately, native speakers are rather flexible at dealing with a few words that have been hyphenated poorly. More problematic is too many hyphens.

Generally then, let the typesetter do its job. It might, on rare occasions, make a mistake when hyphenating, but most native speakers simply won't notice.

James K
  • 348
  • 1
  • 7
  • 1
    It's worth pointing out that US hyphenation is based on syllabification, while British hyphenation is generally based on etymology. (But consider what happens in the UK when having a need to hyphenate "helicopter".) – barbara beeton Jul 23 '23 at 15:13
  • Maybe not taught at school in the UK, but I've been taught it at school in my native language. Anyhow, I looked up the “authoritative” hyphenation (see the edited question). –  Jul 24 '23 at 10:59
  • That's good, but you do need to trust your typesetter! The creative job is to write the text. hyphenation isn't a problem that I try to solve myself. The software does it better than the writer. – James K Jul 24 '23 at 12:42
  • @JamesK In the particular case resulting in my original post, I acted as a typesetter. –  Jul 24 '23 at 17:01
  • Indeed, but the whole point of TeX and friends is that you can get a computer to do the typesetting for you. It is possible write pdf files by hand, but who wants to do that? – James K Jul 24 '23 at 17:24
  • @DavidCarlisle “Authoritative” means “by authority”. Do you doubt that OUP is an authority, at least as far as British English is concerned? This publisher is definitely an authority to me. I understood your point. Still, do you wish to say you know better than them in this particular case? No doubt they have arguments about certain words; in this particular case, they are way above my level, so if all their spelling dictionaries coincide on the hyphenation of “memorandum”, I'd see no point in upholding the old hyphenation. –  Jul 24 '23 at 19:35
  • @DavidCarlisle At least the new documents should be typeset right – meaning, respecting the norm, even if the norm simply means “following OUP”. How to make LaTeX do this is a different matter. –  Jul 24 '23 at 19:36
  • @JamesK Writing a PDF by hand is an overkill. The original post resulted from a silently missing package. Had I not paid attention to hyphenation, that would have gone unnoticed. –  Jul 24 '23 at 19:39
  • OUP is an expert, not an authority. But my point is that none of this matters much. Let TeX be your typesetter, it does a pretty good job most of the time. It did a good job this time. The hyphenation patterns found some acceptable points to hyphenate. – James K Jul 24 '23 at 19:45
  • @DavidCarlisle I understand. As for the line breaks, they change all the time anyway. Every time texlive in Debian gets upgraded, line breaks in my documents change for all kinds of reasons. You might think that this is due to, say, \babelprovide[hyphenrules=ngerman-x-latest]{ngerman}, but also pure English texts are affected. NewTX fonts change, mathematics is no longer fragile, PSTricks calculates slightly differently, …. So, from a practical, subjective viewpoint, having a mechanism that divides words according to the current authority (or expert opinion) would be great. –  Jul 24 '23 at 19:57
  • @JamesK Ifyou consider OUP more an expert than an authority, that's fine, too. Discrepancies between the Cambridge and Oxford styles have a long tradition, for example. Philologists, linguists, and typographers from both places are way better than me in these matters. –  Jul 24 '23 at 20:00
  • @DavidCarlisle Hasn't there been a ukhyphen.tex at some time, in analogy to ushyphex.tex, which does get updated? Or an analogon of \babelprovide[hyphenrules=ngerman-x-latest]{ngerman} (which does get updated, too) for British English? –  Jul 24 '23 at 20:38
  • @DavidCarlisle When OUP generously provided Dominik Wujastyk with a private list of allegedly 114925 British-hyphenated words, we were extremely lucky, but such a huge list can hardly be free of errors, and it has to keep up with the evolution of the language. So freezing the patterns and shoving everything else into \hyphenation{…} was, perhaps, not the best technical idea. Unless, of course, the patterns are no longer maintainable, which leaves us with a fixed artefact and \hyphenation{…}, probably mostly for legal reasons (which I can accept), but not for technical reasons. –  Jul 24 '23 at 21:01