Instead of putting a hyphenated word into a \hbox, which would require actually determining where a word begins and ends, there is another way: Whenever a page is broken at a discretionary (the TeX node which represents e.g. a hyphen), reflow the paragraph with one additional restraint: Do not break at this point. That's easy, because Lua code can change the penalty associated with a linebreak at a discretionary node. Of course, after reflowing the text, the new breakpoint might still be a discretionary. Then we just repeat the procedure until we are happy with the break.
This hasn't been tested with lots of documents, so it might break in more complicated situations.
The code (Some explenations in the comments):
(Replace \includecomment by \excludecomment to disable the code and see the "normal" linebreaking in effect.)
\documentclass{article}
\usepackage[english]{babel}
\usepackage{blindtext,luacode}
\usepackage{comment}
% \excludecomment{nobroken}
\includecomment{nobroken}
\begin{nobroken}
\begin{luacode*}
local discardable_id = {
[node.id'glue'] = true,
[node.id'kern'] = true,
[node.id'penalty'] = true,
}
-- A small helper to use properties. They are used to store the paragraphs without linebreaks.
local function swap_prop(n, new) -- If new is `false`, it is not changed. Use `nil` to clear the prop
local p = node.getproperty(n)
if not p then
p = {}
node.setproperty(n, p)
end
local prop = p.nohyphout
p.nohyphout = new == false and prop or new
return prop
end
-- Actually save the paragraphs
local function pre(n, ctxt)
if ctxt ~= "" or n.id ~= 9 then return true end -- Nothing to see here, but at least unlikely
local discretionaries, m = {}, node.copy_list(n)
swap_prop(n, {unbroken = m, discretionaries = discretionaries})
while n do -- We have to save the correspondance between the original and the copied nodes
if n.id == 7 then -- For performance reasons, we only save the discretionaries. We never change other nodes anyway
discretionaries[node.direct.todirect(n)] = m
end
m, n = m.next, n.next
end
return true
end
-- After linebreaking, we save the node of the last line. This allows deleting "wrong" versions later on.
luatexbase.add_to_callback("pre_linebreak_filter", pre, "save unbroken") -- Of course, saving the unbroken paragraph only works before linebreaking
local post = function(n, ctxt)
if ctxt ~= "" then return true end
while n and n.id ~= 0 do n = n.next end -- There often comes glue/penalties/other stuff in front of the first actual line
local head = n.head
if not head or head.id ~= 9 then return true end -- Nothing to see here, but at least unlikely
local prop = swap_prop(head, false)
if prop and prop.unbroken then
prop.parend = node.tail(n)
end
return true
end
luatexbase.add_to_callback("post_linebreak_filter", post, "save parend")
-- The important functionality: We define a macro \adjustbreaks, which looks at the output box. If the end does not end with hyphen, the first TeX parameter is executed, otherwise a new attempt is made and the second parameter is executed.
local luafunc = luatexbase.new_luafunction"adjustbreaks"
token.set_lua("adjustbreaks", luafunc, "global", "protected")
lua.get_functions_table()[luafunc] = function()
local list = tex.box[255]
local last, lastline
for hlist in node.traverse(list.head) do
if hlist.id == 0 and hlist.head and hlist.head.id == 9 then
if last and last.unbroken then
node.flush_list(last.unbroken)
last = nil
end
local new = swap_prop(hlist.head, nil)
if new and new.unbroken then
last, new.parbegin = new, hlist
end
end
if hlist.id == 0 and last then
if last.parend == hlist then
node.flush_list(last.unbroken)
last = nil
end
lastline = hlist
end
end
if last then -- We reached the end of a page and it ends with a broken paragraph which we *could* change.
local lastnondiscard
for n in node.traverse(lastline.head) do
if not discardable_id[n.id] then
lastnondiscard = n
end
end
if lastnondiscard and lastnondiscard.id == 7 then -- We end with a discretionary. Let's change that!
-- This is the interesting part:
-- We probably never reach this point if the paragraph isn't fully "contributed" to the outer vlist,
-- so last.parend is in the list starting at tex.lists.contrib_head
-- We want to remove the already broken paragraph, so reset the head to the node *after* our last line.
do
local curr, afterpar = tex.lists.contrib_head, last.parend.next
while curr ~= afterpar do curr = node.free(curr) end
tex.lists.contrib_head = curr
end
-- Also forbid the current linebreak, otherwise this wouldn't do anything
last.discretionaries[node.direct.todirect(lastnondiscard)].penalty = 10000
-- We might need multiple passes and after linebreaking the `unbroken` list will no longer be unbroken,
-- So we run our callbacks again to fix everything up. Then run the linebreaking system again.
-- We should probaby set tome parameters, but to get them we have to add a `linebreak_filter` callback.
-- Currently I want to avoid that. FIXME!
pre(last.unbroken, "")
local broken = tex.linebreak(last.unbroken) -- TODO: Get params
post(broken, "")
-- Now broken starts probably with some penalties or glue.
-- But we never deleted the glue in front of our paragraph, so adding it again would double it.
-- Instead just skip it:
while broken and broken.id ~= 0 do broken = node.free(broken) end
last.parbegin.prev.next = broken
node.flush_list(last.parbegin)
token.put_next(token.create'@secondoftwo')
return
else
node.flush_list(last.unbroken)
last = nil
end
end
token.put_next(token.create'@firstoftwo')
end
\end{luacode*}
% Now we still have to change the output routine to actually use \adjustbreaks
% First save the old one
\let\myrealoutput\output
\newtoks\output
\output\expandafter{\the\myrealoutput}
% Now set the new output routine. If \adjustbreaks does not find any problems with the current page, just continue with the regular output routine.
% Otherwise put \outputbox back into the page, then TeX tries again to find a pagebreak.
\myrealoutput{\adjustbreaks{\the\output}{\unvbox255}}
\end{nobroken}
\begin{document}
\hyphenpenalty0 -- This is only for testing, you normally do NOT want \hyphenpenalty0 in actual documents.
\blindtext[5]
The quick brown fox jumps over the lazy dog.
Jackdaws love my big Sphinx of Quartz.
Pack my box with five dozen liquor jugs.
The five boxing wizards jump quickly.
Sympathizing would fix Quaker objectives.
Many-wived Jack laughs at probes of sex quiz.
% Turgid saxophones blew over Mick’s jazzy quaff.
% Playing jazz vibe chords quickly excites my wife.
% A large fawn jumped quickly over white zinc boxes.
% Exquisite farm wench gives body jolt to prize stinker.
According to a list of long words on grammerly.com, one of the best known long words in the english language is a word which the Oxforddictionary describes as ``a nonsense word''. Therefore it fits perfectly into this supercalifragilisticexpialidocious text.
\blindtext
\end{document}
\brokenpenaltyto tell how much you are reluctant to see a page break right after a line ending at a discretionary break (see the last paragraph of the TeXbook p. 104). – frougon May 28 '19 at 11:36