55

This question led to a new feature in a package:
impnattypo

I come from Poland and I'm writing some text in my native language. One of our typography standards is that we do not leave one-letter words at the end of a line. For example:

Mietek poszedl do sklepu i
kupil jabola

is illegal, and correctly it should be as follows:

Mietek poszedl do sklepu 
i kupil jabola

I don't know if there is such a rule in US/English typography also, but I cannot find any latex setting that would fix such errors. The only solution I've found so far is to add the "~" character, which seems to work as whitespace, but also prevents to break connected words between lines. So I have to write:

Mietek poszedl do sklepu i~kupil jabola

Is there any other way to do so? I've tried with

\widowpenalty10000  
\clubpenalty10000 

which prevents orphans, but it seems that 'orphans' for us (Polish) is something different than for Americans ;) In the US orphan is the first line of a paragraph on the last line of a page (at least I think so), but for us, 'orphan' is such a single-letter word at the end of a line.

Anke
  • 400
  • 1
  • 14
omnomnom
  • 675
  • 9
    Piotrek: I'm not aware of an existing command or macro in LaTeX that would enforce your specific typographic need. Suppressing "widows" and "orphans" (the latter are called "clubs" in TeX for some reason) is definitely not going to meet your need. I'd say that your approach -- to do a global search and replace of strings of the type " ? ??" (where ? stands for a single alpha character) to " ?~??" -- is the most straightforward one. – Mico Sep 07 '11 at 15:29
  • 3
    There are some emacs tools that can help with inserting nonbreakable spaces in appropriate environments, e.g. see http://www.emacswiki.org/emacs/NonbreakableSpace – mas Sep 07 '11 at 15:58
  • 4
    The encTeX extension has code to deal with such cases for Czech prepositions such as "v". Try texdoc enctex, but it's not easy. The best is to get the habit of inserting ties ~. – egreg Sep 07 '11 at 17:40
  • Mietek poszedl do sklepu i kupil jabola, seriously? – fracz Jan 17 '19 at 08:42

7 Answers7

42

This is a LuaLaTeX solution. It is a function that gets called just before TeX breaks the text into lines. It inserts ties ~ (only the penalty of 10000, the glue is already there) after the single letter word. Words will still hyphenate (see example below) - as far as I can see (after the w).

[Edit: I have added a check in the code that only letters (L* unicode character class) will be taken into account when preventing a line break after the glyph.]

\documentclass{article}
\usepackage{polski}
\usepackage[polish]{babel}
\usepackage{fontspec}
\usepackage{luatexbase}\usepackage{luacode}

\begin{luacode}

local prevent_single_letter = function (head)
  while head do
    if head.id == 37 and unicode.utf8.match(unicode.utf8.char(head.char),"%a") then -- a letter
      if head.prev.id == 10 and head.next.id == 10 then    -- only if we are at a one letter word

        local p = node.new("penalty")
        p.penalty = 10000

        -- This is for debugging only, but then you have to
        -- remove the last node.insert_after line:
        -- local w = node.new("whatsit","pdf_literal")
        -- w.data = "q 1 0 1 RG 1 0 1 rg 0 0 m 0 5 l 2 5 l 2 0 l b Q"
        -- node.insert_after(head,head,w)
        -- node.insert_after(head,w,p)

        node.insert_after(head,head,p)

      end
    end
    head = head.next
  end
  return true
end

luatexbase.add_to_callback("pre_linebreak_filter",prevent_single_letter,"active~")
\end{luacode}


\begin{document}
\hsize 2.7in

Noc była sierpniowa, ciepła i słodka, Księżyc oświecał srebrnem światłem wgłębienie, tak,
że twarze małego rycerza i Basi były skąpane w blasku.
Poniżej, na podwórzu zamkowem, widać było uśpione kupy żołnierzy, a także i ciała zabitych
podczas dziennej strzelaniny, bo nie znaleziono dotąd czasu na ich pogrzebanie.

\end{document}

example output

BTW: the small hyphenation marks are made with the package showhyphens.

doncherry
  • 54,637
topskip
  • 37,020
  • 4
    very nice! :-) Also useful for the english language –  Sep 11 '11 at 16:03
  • This would make a useful package on CTAN. Would you like to package it? – raphink Sep 11 '11 at 16:49
  • @Raphink - I'd rather not. I have no idea how to do that and I have not tested the code yet. – topskip Sep 11 '11 at 16:52
  • @Patrick: Ok. If the packaging part is a problem, I'd be willing to do it whenever the code is ready enough. – raphink Sep 11 '11 at 16:53
  • @Raphink - perhaps this should go into polski.sty? I have never done this before. I don't have a clue about dtx, ins and all that. – topskip Sep 11 '11 at 16:54
  • As a matter of fact, I just committed a package that includes French hyphenation rules, and we happen to have the same rule, which I hadn't implemented yet, so I'd be willing to use it, too. – raphink Sep 11 '11 at 16:56
  • 2
    @Patrick: do you mind if I use this code in my impnattypo package? – raphink Sep 13 '11 at 14:11
  • @Raphink: no, not at all. Consider this as public domain and you don't need to give any credit. (I don't think that this small piece of code could be copyrighted anyway.) – topskip Sep 13 '11 at 14:50
  • Well I'll mention your name in the package anyway, with a link to the question. – raphink Sep 13 '11 at 14:50
  • Note: the current algorithm will also not allow » at the end of a line, which should be correct imo. How could we restrain it to letters only? – raphink Sep 14 '11 at 21:57
  • 2
    @Raphink: see my edit. But this edit is not 100% correct (in a formal sense) because the encoding is font dependent, but nowadays it is mostly unicode encoded, so I'd assume that this solution will work in 99.999999% of the cases. – topskip Sep 15 '11 at 06:08
  • @topskip I think you are being mistaken about the percentages. It my be true that most documents are these days encoded in Unicode but that isn't true for fonts (and that is what LuaTeX sees here). Any document using traditional TeX fonts would not be in unicode encoding even though T1 or OT1 is fortunately fairly close to it – Frank Mittelbach Mar 12 '13 at 22:47
30

Since version 0.2, the impnattypo package contains a nosingleletter option which uses Patrick's algorithm:

\documentclass{article}
\usepackage{polski}
\usepackage[polish]{babel}
\usepackage{fontspec}

\usepackage[draft,nosingleletter]{impnattypo}

\begin{document}
\hsize 2.7in

Noc była sierpniowa, ciepła i słodka, Księżyc oświecał srebrnem światłem wgłębienie, tak,
że twarze małego rycerza i Basi były skąpane w blasku.
Poniżej, na podwórzu zamkowem, widać było uśpione kupy żołnierzy, a także i ciała zabitych
podczas dziennej strzelaniny, bo nie znaleziono dotąd czasu na ich pogrzebanie.


\end{document}

enter image description here

raphink
  • 31,894
8

Babel (with LuaTeX) now provides this feature out of the box, by means of the transform oneletter.nobreak (also available for Czech and Slovak):

\documentclass{article}

\usepackage[polish]{babel} \babelprovide[transforms = oneletter.nobreak]{polish}

\begin{document}

\hsize1pt

Mietek poszedl do sklepu i kupil jabola

\end{document}

enter image description here

Javier Bezos
  • 10,003
5

I created small package which enables Patrick's code also with plain luaTeX. After some discussions with other Czech users, I found that the code doesn't work for cases when the single letter is on the beginning of brackets etc. There was also requirement to process only certain letters, not all. So I modified the code slightly:

prevent-single.lua:

-- Module prevent-single
-- code originally created by Patrick Gundlach
-- http://tex.stackexchange.com/q/27780/2891
-- The code was adapted for plain TeX and added some more features
-- 1. It is possible to turn this functionality only for some letters
-- 2. Code now works even for single letters after brackets etc.
--
local M = {}
local utf_match = unicode.utf8.match
local utf_char  = unicode.utf8.char
local match_char = function(x) return utf_match(x,"%a") end
local match_table = function(x, chars)local chars=chars or {}; return chars[x] end
local singlechars = nil-- {a=true,i=true,z=true, v=true, u=true, o = true}
local debug = false
-- Enable processing only for certain letters
-- must be table in the {char = true, char2=true} form
local set_singlechars= function(c)
    --print("Set single chars lua")
    --for k,_ in pairs(c) do print(k) end
    singlechars = c
end

local set_debug= function(x)
    debug = x
end

local prevent_single_letter = function (head)
    local singlechars = singlechars
    local test_fn = singlechars and match_table or match_char
    local space = true
    while head do
        local id = head.id
        if id == 10 then
            space=true
        elseif space==true and id == 37 and utf_match(utf_char(head.char), "%a") then -- a letter       
            if test_fn(utf_char(head.char), singlechars ) and head.next.id == 10 then
                -- only if we are at a one letter word
                local p = node.new("penalty")
                p.penalty = 10000                          
                if debug then
                    local w = node.new("whatsit","pdf_literal")          
                    w.data = "q 1 0 1 RG 1 0 1 rg 0 0 m 0 5 l 2 5 l 2 0 l b Q"  
                    node.insert_after(head,head,w)
                    node.insert_after(head,w,p)
                else
                    node.insert_after(head,head,p)
                end
            end
            space = false
        end
        head = head.next
    end
    return  true
end              

M.preventsingle = prevent_single_letter
M.singlechars = set_singlechars
M.debug = set_debug
return M

prevent-single.tex

\input luatexbase.sty

% Modify pre_linebreak_filter callback so the spaces can be inserted
\directlua{%                                  
preventsingle = require "prevent-single" 
luatexbase.add_to_callback("pre_linebreak_filter", preventsingle.preventsingle,"~")
% Process string and make table of enabled single letters
% By default, spaces for all single letters are inserted
% This can be modified with \singlechars macro
set_singlechars = function(chars)
  local utf_gmatch = unicode.utf8.gmatch
  local chars = chars  or ""
    local singlechars = {}
    for char in utf_gmatch(chars,"(\%a)") do
    singlechars[char] = true
    end
    preventsingle.singlechars(singlechars)
end
}     

% Set letters which are prevented from breaking
\def\singlechars#1{%
\directlua{set_singlechars("#1")}
}

% Enable inserting of visual marks for debugging
\def\preventsingledebugon{%
\directlua{preventsingle.debug(true)}
}

% Disable inserting of visual marks for dewbugging
\def\preventsingledebugoff{%
\directlua{preventsingle.debug(false)}
}

Sample document:

\input ucode
\uselanguage{czech}

\input prevent-single
\singlechars{aAzZiIvVuUoO}
\preventsingledebugon
\input luaotfload.sty
\font\hello={name:Linux Libertine O:+rlig;+clig;+liga;+tlig} at 12pt 
\hsize=3in
\hello
Příliš žluťoučký kůň úpěl ďábelské ódy. 
Text s krátkými souhláskami a samohláskami i dalšími jevy z nabídky možností (v textu možnými). 

Grafika, ffinance -- pomlčka a tak.

I začátek odstavce je třeba řešit, i když výskyt zalomení není pravděpodobný.

Co třeba í znaky š diakritikou?

\preventsingledebugoff
Různé možnosti [v závorkách <a jiných znacích
\bye

And the result:

enter image description here

The code is also on github and I will post it to CTAN.

michal.h21
  • 50,697
2

Package encxvlna does this for Czech, maybe you can have a look at it. Instructions to get it working in Ubuntu (in Czech) are here.

doncherry
  • 54,637
sup
  • 357
2

You can accomplish this with search and replace with regex (finds words with one to three letters and substitutes the space at the end for a non-breaking space (tilde in Latex) ):

Replace: /(\s\b.{1,3}\b)\s/ 
With: /$1~/
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers in the help center. – Community Jun 06 '22 at 11:56
  • This isn't a LaTeX-based solution, and you aren't giving any information about the way to use it. – Miyase Jun 06 '22 at 11:57
  • You could add "for example in Overleaf, click button X, turn on option Y to do this" I guess. – user202729 Jun 06 '22 at 14:11
1

Maybe the less "painful" solution is to use the tool for replacing characters. You can replace for example v  in the whole document with v~. Just check then if you don’t have to use old good \begin{sloppypar}\end{sloppypar} for some paragraphs.

doncherry
  • 54,637
  • 2
    you need more sophisticated editing than you suggest — yours would tie things to the end of every word ending in “v” (not terribly uncommon in polish, iirc) – wasteofspace May 10 '13 at 13:09