24

Now I know there are hyphenation differences between British and American English but in no dictionary, British or American, could I find the word "alternate" to be broken down to al·tern·ate instead of al·ter·nate (for example). And yet:

\documentclass[a5paper]{article}
\usepackage[british]{babel}
\begin{document}
alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate alternate
\end{document}

alternate altern-

If you remove the babel line or change it to american, the hyphenation will be correct:

many alternates, all correct

So how does the British hyphenation work? Is it a complete reimplementation that can be buggy? Or is it just a list of exceptions where British syllabification differs from American? If it is the latter, why would a wrong hyphenation of "alternate" be made part of that word list?

Mico
  • 506,678
Christian
  • 19,238
  • 1
    I cannot reproduce this error. You example gives the correct hyphenation, and \showhyphens{alternate} returns al-ter-nate in the log file. – Alex Apr 12 '14 at 12:00
  • 3
    Oh, interesting. My above comment is incomplete. With TeX Live 2009 I get al-ter-nate, with TeX Live 2012 it is al-tern-ate. – Alex Apr 12 '14 at 12:10
  • @Alex How odd! A hyphenation regression apparently. I'm using TeX Live 2013 and I do indeed get al-tern-ate with \showhyphens. – Christian Apr 12 '14 at 12:22
  • I can go back to TeX Live 2010 which gives al-tern-ate, with British hyphenation. But it was in 2010 that the hyphenation pattern files were reorganized. – egreg Apr 12 '14 at 12:54
  • The ukhyph.tex file in CTAN is dated 1996/09/10 (revision made in 2005), and doesn't show any difference from the hyph-en-gb.pat.txt file that's used in TeX Live 2013 for building the format, other than the hyphenation exceptions have been moved in another file. In the pattern file there's ltern3, that explains the hyphenation al-tern-ate. The word isn't included in the exception list. – egreg Apr 12 '14 at 13:11
  • @HarishKumar AFAIK, which variant of English you get when you use \usepackage[english]{babel} isn't well defined. It's American on most systems but it doesn't have to be. So don't use it. Ever. – Christian Apr 12 '14 at 13:15
  • @egreg I'm not sure I understand which file is responsible for what and why there's a difference now if the files aren't different ... but do I understand it correctly that the reorganization in 2010 introduced a bug that should be reported? – Christian Apr 12 '14 at 13:20
  • @Christian What I wanted to say is that the hyphenation pattern file has just changed its name and has been unmodified since at least 2005 (but probably the patterns date back to 1996). Unfortunately I can't access an older TeX Live until Monday. – egreg Apr 12 '14 at 13:22
  • I'm really tempted to write a script that automatically checks hyphenations of an English word list with a dictionary but neither do I have the time nor am I sure if it's legal to query dictionary websites in this way. – Christian Apr 12 '14 at 13:46
  • @Christian What I know is that the hyphenation patterns for British English were produced using a big file of hyphenated words generously made available by Oxford University Press. Of course the Liang algorithm is not completely failsafe, in the sense that a compromise has to be made in order to reduce the amount of patterns. Based on the information I got, I'd be surprised if TeX's hyphenation of “alternate” has changed in the last 18 years; but I can't investigate further until I can access the machine where I have several versions of TeX Live. – egreg Apr 12 '14 at 13:51
  • @egreg I don't expect correct hyphenations for names and other weird stuff but common dictionary words should be hyphenated correctly, no matter how many rules and exceptions it takes. Especially in English IMHO because there the hyphenation rules are especially unfit for human usage. I don't know how many times I stumbled on a weird hyphenation, looked it up, only to find that it is indeed correct. This time I was actually surprised that it wasn't. – Christian Apr 12 '14 at 15:07
  • 1
    @Christian I'm not saying al-tern-ate is correct, but that probably it has been like this all the time without anybody noticing it. – egreg Apr 12 '14 at 15:08
  • 1
    FWIW, there's a thread at tex-hyphen about a diverging hyphenation of the word catastrophe with LaTeX using Babel (catas-trophe) and Polyglossia (catastrophe) and plain TeX (catas-tro-phe), all using US hyphenation. None of the examples given shows a hyphenation between t and a, which according to pattern cat1a1s2 from file hyph-en-us.pat.txt should be valid. The discussion ended without results. Note, the script debug_spots.lua mentioned in the initial mail has been renamed to patternize.lua in the repository. – Stephan Hennig Apr 12 '14 at 16:29
  • @egreg This still begs the question how this incorrect hyphenation got into babel in the first place, given that they used a list from Oxford University Press which probably means that it should have the exact same entry as the one I linked to above (Oxford University Press actually releases two separate English dictionaries. The better known OED is so advanced, however, that I'm too stupid to make it tell me the hyphenation of "alternate", even though I can get beyond the paywall. So there is the possibility that the OED entry differs from the one I linked to.). – Christian Apr 12 '14 at 17:00
  • @Christian Hyphenation in TeX doesn't examine a long list of words. Patterns are prepared that should guarantee correct (albeit not full) hyphenation of the most common words and, hopefully, not introduce wrong hyphenation points. So it is really possible that some word slips off. – egreg Apr 12 '14 at 17:04
  • @StephanHennig I get catas-tro-phe with all three engines. If I add variant=usmax with Polyglossia, I get cat-a-stro-phe (which is of course wrong, but is a problem of usmax). – egreg Apr 12 '14 at 17:12
  • @egreg But didn't you say there was an exception list? Would make sense to me to devise rules that cover 99% of cases and put the 1% that slip through in a list. – Christian Apr 12 '14 at 18:08
  • @Christian There is an exception list, but it doesn't contain alternate. You can add it manually. – egreg Apr 12 '14 at 18:14
  • @egreg I honestly don't know why this communication seems to fail so horribly. I did add "alternate" to my hyphenation exceptions but I would expect babel to hyphenate normal English words correctly. Whether it uses a rule or an exception, I as a user don't really care. If they already used the OED or its one-volumed sibling, I don't really understand how there can be dictionary words that aren't covered. And if they deliberately did that to save space or something, I'd expect a list of known exceptions that aren't covered so I could just put them in a package and never have to care again. – Christian Apr 12 '14 at 18:21
  • @egreg I cannot reproduce the bad hyphenation catas-trophe with Babel anymore, too. So either something has been fixed in TeX Live in the meantime or I did something horribly wrong back then. Unfortunately, I don't know anymore why I used variant usmax with Polyglossia instead of us. But there's still something odd. Using Polyglossia wih variant us with xelatex shows no valid hyphenations. This is of course even more unrelated to the original question than I originally expected. Sorry for the noise! – Stephan Hennig Apr 12 '14 at 18:48
  • @egreg Remembered. I had chosen variant usmax, because I wanted to use patterns from file hyph-en-us.pat.txt. Which still results in no valid hyphenation here, neither with lualatex nor xelatex with TeX Live 2013. But that is an off-topic issue. – Stephan Hennig Apr 12 '14 at 19:44
  • @Christian I looked on my oldest TeX Live (2007) and the pattern file is exactly the same; I could go back to a gwTeX distribution installed in 2006 and it's again the same. I can't try it, because the binaries were for a different processor, so I'd have to resurrect some old machine. But no change in the pattern file means no change in hyphenation. – egreg Apr 16 '14 at 16:25
  • @egreg Thanks for checking! Of course this makes it even stranger that Alex experienced differences in hyphenation then :/ – Christian Apr 16 '14 at 18:43
  • 1
    @Mico: Why the bounty? I don't see that is doesn't got enough attention. And imho Barbara's answer shows that (at least at the time the hyphenation patterns where created) "al-tern-ate" is/was correct. – Ulrike Fischer Apr 07 '16 at 16:26
  • @UlrikeFischer - Please see the earlier comments exchanged between the OP and me, prompted by the puzzling downvote. I decided to offer a bounty as a mild protest, to state that, in my view at least, the downvote wasn't justified. (This comment will self-destruct in an hour...) – Mico Apr 07 '16 at 17:56
  • @Mico: Imho you are misusing the bounty system. I look from time to time at the featured questions and spent quite some time to find out what is the problem of this question and now I'm feeling quite pissed off. Didn't you thought about the side-effects? – Ulrike Fischer Apr 07 '16 at 18:01
  • @UlrikeFischer - I'm very sorry to have ticked you off. I must confess to not having anticipated such a reaction. FWIW, this is the first time I've awarded a bounty for this reason -- and most likely also the last time. – Mico Apr 07 '16 at 18:13
  • 1
    @UlrikeFischer -- please don't assume that i think that "al-tern-ate" is correct -- i don't, and i don't think it ever was! i don't even think the editors of the dictionary thought it was correct. i've considered the possibility that it's a "plant", an intentional error inserted so that someone checking on possible plagiarism can have a better case. if so, then the side effects for the british hyphenation patterns are very unfortunate! – barbara beeton Apr 07 '16 at 20:33
  • 1
    @barbarabeeton: I don't think that is a plant. If you look at your scan you can see quite a number of hyphenations after a consonant, e.g. amat-ory or almand-ine. – Ulrike Fischer Apr 07 '16 at 20:45
  • 1
    @UlrikeFischer -- i quite agree with your observation on what's there. but i'm really quite amazed that different oxford dictionaries have such different information. somehow i thought (hoped?) that syllabification was a more exact endeavour. – barbara beeton Apr 07 '16 at 20:48
  • @barbarabeeton: hyphenation is not only about syllabification but also reflects the history of word (I was one of the few who knew that one should hyphenatate "Ab-itur" (from ab-ire)) and tries to help a reader (which imho explains why english hyphenate leav-ing and not lea-ving.) So it is quite a mess ... – Ulrike Fischer Apr 07 '16 at 20:56
  • 2
    @UlrikeFischer -- definitely a mess! i will merely observe that while the british claim to hyphenate based on etymology, they don't follow their own dictum when it comes to "helicopter". – barbara beeton Apr 07 '16 at 21:01
  • @Christian - Many thanks! :-) We should probably delete the trail of comments, starting with the ones on April 4. I've gone ahead and deleted mine. – Mico Apr 09 '16 at 20:28

6 Answers6

30

According to the Oxford dictionary the correct hyphenation in British English is

al-ter-nate

The pattern for British English were prepared in 1996 by Dominik Wujastik using a list of hyphenated words made available by Oxford University Press and is present on CTAN as ukhyph.tex. In 2008, the team in charge of maintaining hyphenation patterns for TeX Live made a reorganization of the material; here's the start of hyph-en-gb.tex:

% This file has been renamed from ukhyphen.tex to hyph-en-gb.tex in June 2008
% for consistency with other files with hyphenation patterns in hyph-utf8 package.
% No other changes made. See http://www.tug.org/tex-hyphen for more details.

% File: ukhyphen.tex
% TeX hyphenation patterns for UK English

Some lines later we can read

%       $Log: ukhyph.tex $
%       Revision 2.0  1996/09/10 15:04:04  ucgadkw
%       o  added list of hyphenation exceptions at the end of this file.
%
%
% Version 1.0a.  Released 18th October 2005/PT.
%
% Created by Dominik Wujastyk and Graham Toal using Frank Liang's PATGEN 1.0.
% Like the US patterns, these UK patterns correctly hyphenate about 90% of
% the words in the input list, and produce no hyphens not in the list
% (see TeXbook pp. 451--2).
%
% These patterns are based on a file of 114925 British-hyphenated words
% generously made available to Dominik Wujastyk by Oxford University Press.
% This list of words is copyright to the OUP and may not be redistributed.
% The hyphenation break points in the words in the abovementioned file is
% also copyright to the OUP.

so I argue that the hyphenation patterns have never changed from 1996, except for the addition of a hyphenation exception list that reads, in the original file,

\hyphenation{ % Do NOT make any alterations to this list! --- DW
uni-ver-sity
uni-ver-sit-ies
how-ever
ma-nu-script
ma-nu-scripts
re-ci-pro-city
through-out
some-thing}

and is exactly the same in the reorganized files.

It is true that alternate hyphenates as

al-tern-ate

as the following file to be run with pdflatex shows:

\makeatletter\language\l@british\showhyphens{alternate}\stop

that prints

Underfull \hbox (badness 10000) detected at line 0
[] \OT1/cmr/m/n/10 al-tern-ate

on the terminal.

Hyphenation in TeX doesn't examine a long list of words, but rather uses a method based on patterns, described in Appendix H of the TeXbook. The patgen program distills a set of patterns based on a list of hyphenated words, but some compromise has to be made for efficiency of the algorithm, so it's surely possible that some word slips off and turns out to be hyphenated incorrectly.

That's what the hyphenation exception list is for. You can, until the problem is fixed by adding some suitable patterns or the word in the exception list, add it manually:

\documentclass[a5paper]{article}
\usepackage[british]{babel}

\babelhyphenation[british]{al-ter-nate}

\begin{document}
alternate alternate alternate alternate alternate alternate alternate 
alternate alternate alternate alternate alternate alternate alternate 
alternate alternate alternate alternate
\end{document}

enter image description here

The command \babelhyphenation requires babel version 3.9; for an earlier version one can use

\begin{hyphenrules}{british}
\hyphenation{al-ter-nate}
\end{hyphenrules}

which has the same effect.

egreg
  • 1,121,712
23

this answer will not be as elaborate as the one by egreg, but i have some different information.

essentially everything egreg says is correct, but the clue may lie in exactly which oxford dictionary was the basis for the list of hyphenated words that dominik used.

i have just come into possession of a copy of the dictionary that was purportedly used: "the oxford minidictionary of spelling and word division". in it, the word in question is presented as

al.tern|ate

where the period represents a broken vertical, a "less recommended" place for division.

i agree that i don't find this "attractive", and certainly would question it, but then, i'm from the left side of the pond. (i was offered this dictionary as an aid to my editing of tugboat; since i aim for consistency of style -- either british or u.s. -- i gratefully accepted. i admit to surprise in many instances looking through it, but as i said, i'm from the western side of the atlantic.)

edit: here is a scan of the relevant page of the cited dictionary. in no word beginning "altern" is there a hyphen after the "r"; if there is a hyphen, it's always after the "n" (which i don't understand), but in the case of "alternation", that location is avoided completely, with the primary hyphenation point before the "-tion". a true puzzlement.

scanned page of hyphenation dictionary

note: this image is from "the oxford minidictionary of spelling and word division", copyright by oxford university press, 1986, from a 1992 reprint. (i have neither requested nor received permission for this use.)

Update:
After discussion with a native British speaker, I was coerced into searching for an audio example of the pronunciation. I found a useful example at https://dictionary.cambridge.org/us/pronunciation/english/alternate which gives three forms (verb, adjective, and noun) in both UK and US pronunciations. The UK adjective is pronounced in this example with the stress on the second syllable. But so is the US adjective -- which is just plain wrong in my experience. (The pronunciation given for the US noun is also not what I learned, in any US regional variation.) So I concede that the UK pronunciation of the adjective may differ in the way that makes the hyphenation "al-tern-ate" appropriate. However, since the spelling of the three grammatical forms is uniform, this difference in hyphenation means that the word should be omitted entirely from resolution by the patterns, since no automatic grammatical distinction is possible. A conundrum.

  • 1
    Actually, this makes even less sense to me for British than American English (and I am from the east side but have also lived on the west side). – cfr Apr 12 '14 at 23:23
  • 1
    @cfr -- i sure don't disagree. when i next have access to the means to scan the relevant material, i will try to do so, and add a picture; that won't be until early may. – barbara beeton Apr 13 '14 at 06:09
  • One really has to wonder why they keep reinventing the wheel in Oxford. That would be the third independent English dictionary from the same publisher. Very enlightening answer though! – Christian Apr 13 '14 at 08:24
  • oxford doesn't reinvent the wheel, it merely categorises what it finds. the hyphenations (in the mini-dictionary) are designed to minimise confusion with the split word -- they're based on semantic chunking within the word. it's unfortunate that tex's algorithm can't be doing with these “less good” breakpoints that barbara noted (cf al.tern-ate). (i don't know whence the american patterns come, but one can imagine a similar arrangement being useful there, being as how the two languages are rather closely related.) – wasteofspace May 22 '14 at 13:34
  • 1
    @barbarabeeton I missed your answer; this seems a case where let not thy left hand know what thy right hand doeth applies. ;-) – egreg May 22 '14 at 13:35
  • @egreg -- does your comment refer to my failure to alert you to my answer (which took your name in vain), or to the (what i think is a dreadful) lapse by oxford? – barbara beeton May 22 '14 at 14:25
  • 3
    @barbarabeeton I wanted to say I'm sorry of having missed it when you posted it. The rest of the comment is about Oxford and its departments who apparently ignore, or maybe fight against, each other. ;-) – egreg May 22 '14 at 14:32
  • I am using british babel option and I got dozens of strange hyphenations, quant-ity, intens-ity, mech-anical. https://www.ushuaia.pl/ (hunspell) agrees, but https://www.hyphenation24.com/ disagrees. Libre Office is also using hunspell and giving the same spelling. What should one do? – Pygmalion Feb 15 '21 at 18:23
  • Etymologically, alternate comes from the Latin alternāre. The divisions altern-ate and alterna-tion seem to be instances of the general rule to break a word before a suffix. Makes sense to me. In case of these words, the mini- and non-mini dictionaries of spelling differ. However, let's admit that for any old and rich language there will be dictionaries that differ on certain words. –  Jul 24 '23 at 16:56
10

Apologies for reviving this thread, but I didn't see this answer in here. I believe al-ter-nate and al-tern-ate to be two different words.

  • al-ter-nate: a verb meaning to take turns performing two different activities, e.g. I alternate between running and walking.
  • al-tern-ate: an adjective meaning every other, e.g. I go running on alternate Sundays.

They are pronounced differently, in the same way the word ate (as in to eat) is pronounced, and based on that it makes sense to hyphenate them in two different ways.

Martin
  • 101
  • 2
    Welcome to TeX.SE! I'm not in a position to judge if the proposed hyphenation is appropriate for the word in its adjectival meaning. However, if it's really the case that the word "alternate" has two different hyphenation patterns depending on whether it's a noun or an adjective, I'd say it should be treated, for hyphenation purposes, like the word "record", i.e., TeX should not hyphenate it after either the r or the n. (About "record": depending on its meaning, the correct hyphenation is either re-cord or rec-ord. To avoid ambiguity, hyphenation is disabled for this word.) – Mico Apr 01 '16 at 20:41
  • 1
    Interesting! It hadn't occurred to me that ALternate and alTERnate might be hyphenated differently. I couldn't find anything to back this up but it's at least plausible. – Christian Apr 02 '16 at 08:11
  • 7
    If this is the case, then \babelhyphenation[british]{al-ternate} is the only way out. – egreg Apr 04 '16 at 18:04
4

I believe I can perhaps add a touch of clarity to this discussion. Fowler's Modern English Usage, a standard style guide for [British] English published by the Oxford University Press, is likely to have been the source that informed the hyphenation practice. Fowler says of hyphenation that:

The problems of hyphenation at the line-end are compounded in newspapers by the narrowness of the columns and the customary assumption in most printing that the right-hand margin, like the left-hand one, should be straight (or 'justified'). Who has not encountered bad end-of-line breaks like c-/hanging, mans-/laughter, rear-/ranged? [...]

It is usually best to divide a word after a vowel, taking over the following constant to the next line. In pres. pples take over -ing, e.g., divid-/ing, sound-/ing; but chuck-/ling, trick-/ling, and similar words. Generally, when two consonants or vowels come together one should divide between them, e.g. splen-/dour, appreci-/ate. Terminations such as -cian, -sion, and -tion should not be divided when forming one sound: divide as Gre-/cian, ascen-/sion, subtrac-/tion. Hyphened words should be divded at the hyphen, and indictionaries a second hyphen may be used to clarify their spelling. This is not the end of the story: Ronald McIntosh lists thirty-three rules altogether for dividing words at the line-end. [...]

[Regarding printers] Very broadly, British practice has tended to emphasise morphological structure and word origin (as in triumph-/ant), and American practice has tended to give greater weight to the perceived pronunciation (c.f. trium-/phant).

[...] McIntosh warns us that American practice is likely to become more influential in British English as more and more technology for word-setting becomes imported across the atlantic. We have been warned. [!]

It therefore seems entirely plausible that Babel's rules for British English hyphenation do indeed differ slightly, based on the rules above, before the etymology of each word in preference to their perceived pronunciation.

Landak
  • 373
  • 3
    This is very interesting and I especially like the part where we are warned that even in the UK hyphenation might transform from an obscure etymological art to something one can actually perform just by speaking the language ;) But since "alternate" comes from Latin "alternatus", I can't see how altern-ate is necessarily the more etymologically correct hyphenation. Anyway, I think Barbara gave a very good clue by finding an Oxford dictionary that actually differs from the current (?) one. Whatever its reasons may be. – Christian Sep 06 '15 at 21:22
  • 1
    @Christian -- while i agree totally with your association of "alternate" with latin "alternatus", if you consider the syllabification of the latter word and how it would be hyphenated in (modern renditions of) latin, i think you would find that the hyphen there would come between the "r" and the "n", especially since the middle "a" is long. (latin hyphenation patterns are available; just need to be tried.) – barbara beeton Apr 07 '16 at 20:23
1

I wrote a thick book in British English. Using option british for babel, I also got many "suspicious" spellings like phys-ics, introduct-ory, proced-ure, dir-ection, stand-ards, quant-ity, intens-ity, mech-anical (it goes on and on like that).

Then I checked that against two web pages:

  • ushuaia.pl that uses hunspell agrees but
  • hyphenation24.com disagrees.

Since Libre Office is also using hunspell, I have tested in it and got the same as babel.

So this spelling is obviously used pretty widely.

All these hyphenations sound funny to me, but I am irrelevant as I am obviously not a native speaker. It would be nice to hear an opinion from a native British speaker though.

Pygmalion
  • 6,387
  • 4
  • 34
  • 68
  • Since posting my answer with the scanned page, I have retired, and the mini-dictionary is packed in an inaccessible box, so I can't check. But it's my understanding that hunspell is not the same as the patterns used for (La)TeX. The two senses of "alternate" (verb and adjective) are homographs with legitimately different pronunciations (as described in another answer by Martin), so the mini-dictionary is misguided in assigning only one hyphenation pattern. Grammar checking is required here, a capability beyond the scope or design of TeX. – barbara beeton Feb 15 '21 at 20:02
  • @barbarabeeton OK, I did not understand your reasoning, but I do need an answer to a very practical question. I am handing my manuscript for printing today. Is latex british hyphenation so bad I should do manual corrections for 250+ pages before submitting, or this is just fine? – Pygmalion Feb 16 '21 at 07:53
  • @barbarabeeton If hunspell and latex do not use the same patterns but do give exactly the same results - this is interesting. Probably using the same source for hyphenation rules? – Pygmalion Feb 16 '21 at 07:57
  • I can't answer your question about whether to change things; sorry. (I'm firmly on the left side of the Atlantic.) You might, on the tex.sx chat, pose the question to @DavidCarlisle or @JosephWright, who are both English natives. If you do make changes, use the \hyphenation facility in the preamble (if possible; won't work with "alternate" or other homographs) rather than changing word by word. – barbara beeton Feb 16 '21 at 14:21
-1

Hyphenating a word can occur at ANY syllable break. So "alternate" in syllables is al-ter-nate so it is acting correctly by putting the hyphen at these spots where it is closest to the end of the line. As you have shown it, it is working correctly.

J Hall
  • 1
  • Hi J Hall, welcome to tex.stackexchange.com! My point was that the British hyphenation is alt-ern-ate, which to me sounds very wrong but apparently is correct (see discussion above). That being said, I think your answer would've been better served as a comment either way. – Christian Jul 13 '22 at 13:51
  • Sorry, I thought I was adding a comment : ) <-- Newbie! – J Hall Dec 19 '22 at 18:32