10

I'm not optimistic about this since this is a core part of TeX's algorithm, but is there any way to switch to a 'greedy' justification algorithm? That is, instead of working on the paragraph as a whole and optimizing word spacing, fit as much as you can onto one line, hyphenating if necessary, justify, and then move to the next line.

lockstep
  • 250,273
Sean Allred
  • 27,421

3 Answers3

12

no​​​​​​​​​​​:) – David Carlisle 3 hours ago

On the grounds of never say never, this builds the paragraph up line by line (with normal setting first)....

update fixing small error in inserting parindent

enter image description here

\documentclass{article}
\usepackage{kantlipsum}

\begin{document}




\makeatletter
\def\linebyline#1 {%
\ifx\!#1\else
\setbox\z@\vbox{%
  \noindent\unhbox\tw@\unskip\unskip\unpenalty\sporindent#1}%
\let\sporindent\space
\ifdim\ht\z@>\baselineskip
\setbox\z@\vbox{%
\unvbox\z@
\global\setbox1\lastbox
\unskip\unpenalty\unskip\unpenalty
\global\setbox\thr@@\lastbox
      \unskip\unpenalty\unskip\unpenalty%<<<<<
}%
      \ifdim\ht\z@>\z@\box\z@\fi%<<<<<
\hbox to \hsize{\unhbox\thr@@\unskip\unskip\unpenalty}%
\else
\setbox\z@\vbox{%
\unvbox\z@
\global\setbox\@ne\lastbox
}%
\fi
\setbox\tw@\box\@ne
\expandafter\linebyline
\fi}

\def\linebylinepar#1{{%
\par
\finalhyphendemerits\z@
\clubpenalty\z@
\widowpenalty\z@
\def\sporindent{\hskip\parindent}%
\setbox\tw@\hbox{}%
\@firstofone{\expandafter\linebyline#1} \! \relax
\box\tw@
\par}}
\makeatother


\kantdef\zz{1}

\zz\bigskip\linebylinepar\zz


\begin{minipage}[t]{.27\textwidth}\zz\end{minipage}\qquad
\begin{minipage}[t]{.27\textwidth}\linebylinepar\zz\end{minipage}


\end{document}
David Carlisle
  • 757,742
  • This doesn't do hyphenation, though. – egreg Sep 02 '13 at 22:09
  • @egreg yes it does! see line 3 and 4 of the narrow minipage for example – David Carlisle Sep 02 '13 at 22:13
  • In the normal width example, rep- fits in the first line. – egreg Sep 02 '13 at 22:26
  • @egreg yes but I think that's not the spec, the idea is not to set every line as tight as possible and hyphenate as much as possible, it's to set the line to the best setting possible with a look-ahead of just one word, in that case it's better to take the whole word over, you don't need the paragraph level optimiser to decide that. – David Carlisle Sep 02 '13 at 22:37
  • 1
    Suggestion for long answer: "nooooooooooooooooooooo". :) – Paulo Cereda Sep 03 '13 at 11:31
  • While I still have no idea how it does it (I'm going with black magic), this is exactly what I was looking for! (Although you're right, the only fair test would be to screencap/PDF-export Word's set and \includegraphics it, but this is exactly what (I think) word does!) – Sean Allred Sep 03 '13 at 19:38
  • Would it be possible to make every line as tight as possible? – pts Feb 17 '14 at 15:28
  • @pts probably:-) – David Carlisle Feb 17 '14 at 16:11
  • I think I've found a bug in this implementation of \linebylinepar: some text disappears, see it here: http://pastebin.com/UNdSgJCa – pts Feb 18 '14 at 20:46
  • @pts bug in my code? surely not:-) It's a design limitation it doesn't support words longer than textwidth (it assumes adding on one word never makes 2 linebreaks) it would be possible to make a loop testing for how many lines you have and taking it apart more carefully, or (more easily) warn if it is dropping text, but is there any real use case for this at all? – David Carlisle Feb 18 '14 at 21:27
  • Yes, I was planning to use this code in production for typesetting narrow, multiline table cells if TeX's line breaking algorithm produces overfull or underfull boxes. A bugfix would be very useful for me. Without the bugfix I can't use it, dropping text is a no-go even with a warning. – pts Feb 18 '14 at 21:44
  • @pts why not use the normal linebreaking? This was just an amusement to answer the OP's question of whether TeX could try to emulate a worse job. I may look, but probably not tonight, or I may not or you could post your code into a new question on site and someone else may want a go:-) – David Carlisle Feb 18 '14 at 21:48
  • There is some input which normal TeX linebreaking can't break without an overfull \hbox, but this code can. Our input is 100000 table cells generated from a database, and we can't afford to manually fix all the failed line breaks, and we need an automated solution. I don't know of anything other than your code. – pts Feb 18 '14 at 21:50
  • @pts that would be accidental. This code uses the same hyphenation points so does not have any additional break points. For some particular text you may get lucky and this finds break points such that every line fits whereas the standard one fails, but if so that is (a) surprising and (b) with another text it could be the other way round. Just use the normal line breaking with a large value of \emergencystretch and you should be fine. – David Carlisle Feb 18 '14 at 21:53
  • I need to gather more data on the success rate of normal line breaking (with a large \emergencystretch), to figure out how desperately we need an alternative. Please note that if we have an alternative, then we can try both, and keep the result without an overfull \hbox, or if neither has it, keep the result with fewer number of lines. – pts Feb 18 '14 at 21:58
  • Please note that http://pastebin.com/UNdSgJCa demonstrating the bug doesn't contain any words that need more than 1 line break, so there may be another bug in your code causing text to disappear in an unanticipated way. – pts Feb 18 '14 at 22:01
  • @pts the last big word causes 2 linebreaks one just before the word and one mid-word, the code only takes the last two lines so currently drops the top (3rd) line which is the line ending in allomas. Compare the paragraph set using the normal settings – David Carlisle Feb 18 '14 at 22:31
  • 1
    @pts still far from convinced you ever need this but I added two lines to fix the issue, see updated answer – David Carlisle Feb 18 '14 at 22:42
8

You can make TeX be not so fussy about consecutive visually incompatible lines, prefer hyphenation and be very tolerant about bad spacing.

\documentclass{article}
\usepackage{kantlipsum}

\begin{document}

\kant[1]

\adjdemerits=-1000000 % demerits for consecutive visually incompatible lines
\hyphenpenalty=-5000  % penalty added for hyphenating
\doublehyphendemerits=-1000000 % demerits for consecutive hyphens
\tolerance=10000 % bad lines are OK

\kant[1]

\end{document}

enter image description here


Here is another attempt. The third paragraph has been split manually with the “greedy” approach, taking into account \righthyphenmin=3.

\documentclass{article}
\usepackage{kantlipsum}

\begin{document}

\kant[1]

\adjdemerits=0 % don't be fussy about consecutive visually incompatible lines
\hyphenpenalty=-5000 % prefer hyphenation
\doublehyphendemerits=-1000000 % consecutive hyphens are OK
\tolerance=50 % be strict as regards to spacing
\linepenalty=9999 % as few lines as possible

\kant[1]

\newcommand{\aline}[1]{\hbox to \textwidth{#1}}

\aline{\indent As any dedicated reader can clearly see, the Ideal of practical reason is a rep-}
\aline{resentation of, as far as I know, the things in themselves; as I have shown else-}
\aline{where, the phenomena should only be used as a canon for our understanding.}
\aline{The paralogisms of practical reason are what first give rise to the architectonic of}
\aline{practical reason. As will easily be shown in the next section, reason would}
\aline{thereby be made to contradict, in view of these considerations, the Ideal of prac-}
\aline{tical reason, yet the manifold depends on the phenomena. Necessity depends on,}
\aline{when thus treated as the practical employment of the never-ending regress in the}
\aline{series of empirical conditions, time. Human reason depends on our sense percep-}
\aline{tions, by means of analytic unity. There can be no doubt that the objects in}
\aline{space and time are what first give rise to human reason.\hfill}

\end{document}

enter image description here

The first four lines agree, then TeX's algorithm takes on.

egreg
  • 1,121,712
  • This is a great answer, but it's a bit too drastic when working with smaller \linewidths. Consider \parbox{6cm}{\hyphenpenalty=-5000\adjdemerits=-1000000\doublehyphendemerits=-1000000\tolerance=10000\lipsum[1]} (Oddly, the same does not happen for \kant[1].) – Sean Allred Sep 02 '13 at 16:39
  • don't you want the demerits to be 0 so you just tolerate rather than encourage incompatible lines? – David Carlisle Sep 02 '13 at 16:43
  • @DavidCarlisle Setting the demerits to 0 doesn't seem to have quite enough difference—only one hyphenation is added. – Sean Allred Sep 02 '13 at 16:45
  • @SeanAllred but that's part of the problem with the whole approach rather than compare Tex to a typical word processor you have to compare tex to tex with settings where tex is not only not trying to optimise over the whole paragraph, it is trying to make things as bad as possible (over the whole paragraph) so it won't surprise anyone to find that the default tex settings make a better appearance. – David Carlisle Sep 02 '13 at 16:48
  • @DavidCarlisle I agree with you that no setting of the parameters will do as wanted and this can only be done manually. – egreg Sep 02 '13 at 16:54
  • @SeanAllred A small line width makes the task worse. – egreg Sep 02 '13 at 16:55
8

fit as much as you can onto one line, hyphenating if necessary, justify, and then move to the next line.

Here is a solution that completely ignores the hyphenation rules, and hyphenates as soon as the the line is full.

\def\greedybreak#1{#1\ifx#1\blankspace\else\discretionary{-}{-}{}\fi}

\setuppapersize[A5]

\showframe
\starttext

\input ward

\bgroup
\setupalign[normal,verytolerant,stretch]
\handletokens 
The Earth, as a habitat for animal life, is in old age and
has a fatal illness. Several, in fact. It would be happening
whether humans had ever evolved or not. But our presence is
like the effect of an old-age patient who smokes many packs
of cigarettes per day---and we humans are the cigarettes.
\with \greedybreak
\endgraf
\egroup
\stoptext

which gives

enter image description here

And here is the result with a slightly different test file.

enter image description here

The algorithm can be made slightly smarter so as not to hyphenate before punctuation :) (and using the \dicretionary command correctly; for now I'll leave the images with hyphen at the beginning of the line as well).

Aditya
  • 62,301
  • 1
    Super cool!! Is this answer ConTeXt-specific? – Sean Allred Sep 03 '13 at 19:37
  • You can copy the definition of \handletokens from ConTeXt. The \setupalign part is just a wrapper around internal TeX macros (but much easier to remember!), so it should not be difficult to translate it into plain TeX or LaTeX. – Aditya Sep 03 '13 at 23:35