18

When I create a pdf with pdflatexand copy text from that pdf (using Adobe Reader DC on Windows 10), some of the spaces are missing. Here's an MWE:

\documentclass{article}
\usepackage{newtxtext}
\begin{document}
    Therefore, this work ... \hspace*{\linewidth}
\end{document}

When I copy text from that pdf, this is what I get (1 being the page number):

Therefore, thiswork ...
1

Removing the \hspace*, OR removing newtxtext (or both) fixed the problem, but that's not I want, of course (as \hspace* represents some text following "this work").

I have come across Problem copying text from pdf - spaces being stripped and XeLaTeX and missing spaces in PDF text, which proposed \pdfgeninterwordspace, which is now \pdfinterwordspaceon (thanks, @egreg). So I tried that:

\documentclass{article}
\usepackage{newtxtext}
\pdfmapline{+dummy-space <dummy-space.pfb}
\pdfinterwordspaceon
\begin{document}
    Therefore, this work ... \hspace*{\linewidth}
\end{document}

(See Use pdfinterwordspaceon with pdflatex from MiKTeX on Windows if that does not compile for you.)

Now, when I copy text from that pdf, I get this:

Therefore,  this work  ... 
1

So basically, additional space has been introduced regardless of whether or not it was needed. Yes, the missing space in "thiswork" has been added, which is good; but so have three extra spaces after "Therefore,", "work", and "...", which is not good.

Is there a better solution? Am I using \pdfinterwordspaceon correctly?

bers
  • 5,404
  • 2
    It should be \pdfinterwordspaceon – egreg Jul 01 '16 at 22:48
  • Sorry, I can't check with MiKTeX. – egreg Jul 01 '16 at 23:13
  • 1
    Same issue on ubuntu with acroread, but copying from the system pdf reader evince works as desired. – JPi Jul 08 '16 at 19:15
  • 1
    @JPi: Good point. I tried the Chrome (Browser) pdf plugin, and both pdf files work as expected in terms of copying text from it - Therefore, this work ... is the result in both cases. – bers Jul 08 '16 at 19:21
  • 3
    As far as I know, this is at least partly a known issue with Adobe Reader. It is an issue in the viewer, not the file, and there's not much to be done on the TeX side of things. – cfr Jul 08 '16 at 23:11
  • FWIW, the original code works fine in Preview in Mac OS X. I see no additional spaces. – lhf Jul 09 '16 at 02:06
  • Have you tried the cmap package? That worked for me having a slightly different issue: cmap on ctan – Paul Aug 30 '16 at 03:01
  • @Paul I have tried it just now, but it did not seem to have any effect. (I followed the instructions in cmap.sty, Usage: put \usepackage{cmap} immediately after the \documentclass line, in my original, top-most MWE.) – bers Aug 30 '16 at 15:15
  • @cfr -- having just been through an experience with this, i think this is the answer (at least if cm fonts are used). can you post an answer, please? – barbara beeton Sep 03 '16 at 21:21
  • @barbarabeeton Well, OK. Doesn't seem like much of an answer, though. Anything you'd like me to add? – cfr Sep 03 '16 at 21:29
  • 1
    If you're like me, and don't like the prospect of switching to a new pdf reader just so that copying works, I found the following ghostscript call which fixes the weird spaces:

    gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=new.pdf old.pdf

    – rien333 Oct 27 '19 at 23:48

5 Answers5

6

This is at least partly a known issue with Adobe Reader. Adobe Reader fails to recognise spaces between words in certain cases (e.g. where the spacing is smaller than average) or recognises one space as multiple spaces (e.g. where the spacing is larger than average).

It is an issue in the viewer, not the file - as demonstrated by the fact that other viewers work fine - and there's not much to be done on the TeX side of things.

cfr
  • 198,882
  • 1
    But a pdf file, with \usepackage[T1]{fontenc}, is able to make the "fi" ligature copy-and-pasteable as "fi" (2 letters). Why is there no way to make a space copy-and-pasteable as a space? – bers Sep 08 '16 at 15:12
  • 4
    @bers Space isn't a character. It is just space. – cfr Sep 08 '16 at 21:02
4

Old thread, but I had the same problem as you and after a lot of searching online I was able to find a fix that really helped me.

\fontdimen2 controls the spacing between words in LaTeX so we can use that to make sure a PDF reader can tell where one word starts, and another ends.

Here's a simple example using an environment to define an area that we want different spacing in:

\documentclass{article}
\usepackage{newtxtext}

\newenvironment{goodspacing}{
    \fontdimen2\font=0.5em
}

\begin{document}
\begin{goodspacing}
    Therefore, this work ...
\end{goodspacing}
\end{document}

Wrapping your content with this environment should be sufficient enough for copy and pasting to work correctly.

Adjust the number after \font= to your desired amount of inter-word spacing.

0

Because it seems to be a problem with the PDF viewer, I have just tried different viewers until I found the one that let me copy the text as well as the spaces among words.

Leos313
  • 639
0

Try to use google chrome as viewer, for it helped to copy text with normal spaces

-1

I had a similar issue in which a user was copying text from a PDF viewed in Adobe Reader. When pasted, the text would have no spaces between words.

I found a workaround which helped in her particular case. When viewing the PDF in File Explorer, right-click the file and select Open With > Word 2016. Word will then open and display a message that it will convert your PDF to a Word document. Click OK to have it do this for you.

Once Word finished converting the file (how long that takes will depend on the size of the PDF), it correctly identified the spaces and created them. I verified by copying the text in this Word document and pasting it into Notepad, where the spaces remained.

This process probably depends on how far apart the spaces are on the PDF and thus whether Word will correctly identify them, but this was a big help to the end user in this case.