What I want to say is, if I generate a PDF file, LaTeX often splits words beetween lines to fit correctly adding the "-". For example "exam-ple". So, my question is, what can I do to avoid the copied text to have the "-" also copied and get only the word "example".
-
1Does this answer your question? How to prevent LaTeX from hyphenating the entire document? – Tom Solid Dec 17 '20 at 08:20
-
You might still have a problem with "ff". – John Kormylo Dec 17 '20 at 15:40
-
2Setting a document without hyphenation would remove the problem, but sometimes omitting hyphenation results in a terminally ugly result. What is more important to you? – barbara beeton Dec 17 '20 at 17:02
1 Answers
After doing some research, I have found a pretty neat solution that works for LuaTeX.
The basic idea is that fonts in LuaTeX comes with tounicode property, which determines how a LaTeX character is translated into a UTF16-BE sequence. An example of this mapping can be found here. Obviously, we need to change this mapping so that the hyphenation symbol is translated to nothing. Fortunately, LuaTeX provides the \prehyphenchar property that allows you to set which character is used for automatic hyphenation. Therefore, the plan is as follows:
- Find a "burner" hyphenation character for our purpose, because we don't want to affect the behavior of the normal one. From this table, I select U+2010 (8208 in decimal). Therefore, I set
\prehyphenchar=8208. - When the document ends, I update all the internal fonts in LuaTeX, effectively mapping character 8208 to nothing. (Of course, you can map it to something else, just for fun.) To do this, call
create_new_fontwith the correct font pattern to update fonts'tounicodetables. I try to print the name of all fonts in the log file, in case you don't know which ones to update. Of course, you can discard this pattern matching step increate_new_fontand simply just modify all fonts available.
After all these steps, in the compiled document, when you copy "contem-porary", the resulting text is "contemporary"; when you copy "a-b", the resulting text is still "a-b".
\documentclass[a4paper]{article}
\usepackage{fontspec}
\usepackage{luacode}
\setmainfont{DejaVu Serif}
% using U+2010
% http://jkorpela.fi/dashes.html
\prehyphenchar=8208
\begin{document}
contemporary contemporary contemporary contemporary contemporary contemporary contemporary
a-b
\begin{luacode}
-- show all fonts in the log
for i,f in font.each() do
texio.write_nl(f.name)
end
function create_new_font(pattern)
local tounicodevalues = {
[8208] = "",
}
for i,f in font.each() do
if (string.match(f.name, pattern)) then
for u, v in pairs(tounicodevalues) do
f.characters[u].tounicode = v
end
font.define(i, f)
end
end
end
\end{luacode}
\directlua{
create_new_font("DejaVuSerif")
}
\end{document}
If you want to dig deeper into this problem or figure out how to implement this in other TeX compilers, these links might be helpful:
- How do I customize a LuaLaTeX cmap?
- Select a font via luaotfload on lua side (If you don't want to use a "burner" hyphenation character, maybe this approach needs to be used)
- The
cmappackage - How to get rid of the hyphenchar in XeLaTeX
- \input{glyphtounicode} with \pdfgentounicode=1 creates unwanted hyperlinks from link-like text
- What are good ways to make pdflatex output copy-and-pasteable?
- XeLaTeX hyphens aren't hyphens?
- 5,227