2

I needed to make spaces at the beginning of line copyable in pdf, and came up with the following trick.

In lm-rmtt.enc I change /tilde to /nbspace and then use this code:

\font\myfont=rm-lmtt10
\def\myspace{{\myfont\char`~}}
{\obeyspaces\global\let =\myspace}

This way, when when I say \obeyspaces, I get copyable spaces at the beginnig of line in pdf:

\font\tentt=cmtt10
\tentt
\obeyspaces
         hello

This works beautifully, but I wanted to ask if this way is "legal". What becomes with the metrics if pdftex is used directly, and if dvipdfmx is used? I know they take metrics from Type1 fonts, without paying attention to tfm files. What metrics are used in this case?

Igor Liferenko
  • 7,063
  • 2
  • 14
  • 47
  • Neither take metrics directly from type1 fonts. They both rely on the .tfm files. The only type1 font files they use are the .pfb (or .pfa files which do not contain any metrics. The metrics are contained in the .afm (or .pfm files but neither TeX nor pdfTeX uses those. When you create font support files, you use those files to create the .tfmfiles, along with other information. (Or, for pdfTeX, you can also create them from .ttf files for use with truetype fonts.) – cfr Dec 24 '15 at 02:32
  • @cfr: I don't understand .enc, .map and .pfb files. I just stumbled to this trick by occasion. Why does it work? In this case if .enc file is changed what metrics in taken for /nbspace? This character is not in .tfm at all. – Igor Liferenko Dec 24 '15 at 02:43
  • \newdimen\tempadimen \advance\tempadimen by 10pt ?? – cfr Dec 24 '15 at 02:46
  • @cfr: what do you mean? – Igor Liferenko Dec 24 '15 at 02:47
  • In your example, the change is global. Doesn't that do horrible things? If you do something like the above afterwards, you get errors. Could you clarify in what sense this works? I don't see any difference in the PDF. What is supposed to happen when I copy the line? I just get 'hello' as I would anyway. – cfr Dec 24 '15 at 02:53
  • @cfr: copy text from 'o' to beginning of line and insert into a text editor - it must appear with leading spaces, like \ \ \ \ \ \ \ \ \ hello, instead of hello, as it would by default. Sorry, I do not know how to format spaces properly so that they will not disappear – Igor Liferenko Dec 24 '15 at 03:02
  • @cfr: as for global change, I specifically use a font which is not used otherwise. To be more clean, it is possible to create e.g. mytt10 and change encoding of that and use it in \myfont – Igor Liferenko Dec 24 '15 at 03:06
  • So you are typing \ \ \ and not in your source .tex document? – cfr Dec 24 '15 at 03:15
  • Even if I put that in the source, copying the line back to a text editor, I still get the first character as h. However, this is probably dependent on the PDF viewer. I think they likely deal differently with spaces. – cfr Dec 24 '15 at 03:18
  • @cfr: the example from the question should taken literally in the source .tex document. the backslashes are for the forum markup – Igor Liferenko Dec 24 '15 at 03:19
  • Neither makes any difference to what gets copied back. As I say, I suspect this depends on your viewer. Obviously it works in yours and not in mine. – cfr Dec 24 '15 at 03:21
  • @cfr: I read on forums that non-breakable-space (U+00A0) is copied in all viewers correctly. Anyway, in atril viewer it works correctly. – Igor Liferenko Dec 24 '15 at 03:33
  • It doesn't work in Okular. (So likely not other poppler-based viewers.) I doubt you are actually getting that character.... – cfr Dec 24 '15 at 03:52
  • @cfr: my test.pdf is at http://expirebox.com/download/e0c3d1d21ae16d7eb2726a6b49a21ef9.html - please check if it works with your viewer. from my viewer the spaces are copied as U+0020, which is the needed behavior – Igor Liferenko Dec 24 '15 at 06:11
  • @cfr: I checked it with okular - indeed the leading spaces are not copied, but in atril they are – Igor Liferenko Dec 24 '15 at 06:29
  • 3
    It clearly depends on the PDF viewer. PDFKit based ones (on Mac OS X) don't copy spaces. Adobe Reader does. – egreg Dec 24 '15 at 10:42

2 Answers2

4

You could also try \pdffakespace and \pdfinterwordspaceon. But while in the following example the "f"'s copy fine from the pdf, spaces are sometimes missing, or there is only one instead of many, 00A0 is copied as 0020 -- I don't think that TeX is removing them, it also happens with boxes between the spaces. So imho the pdf viewer is trying to be to intelligent.

\documentclass{article}
\begin{document}

\pdfglyphtounicode{space}{0066} %f to see it better
{\obeyspaces\gdef {~\pdffakespace}}

hello\pdffakespace hello

\obeyspaces 
x       hello

        g
\end{document}

This copies then as

hellofhello
x f f f f f f fhello
f f f f f f f fg
Ulrike Fischer
  • 327,261
3

First I tried compiling your example with \pdfcompresslevel=0 and I got

BT
/F51 9.9626 Tf 91.925 759.927 Td [(~~~~~~~~~)]TJ/F30 9.9626 Tf 47.074 0 Td [(hello)]TJ/F1 9.9626 Tf 164.51 -654.747 Td [(1)]TJ
ET

As you see, the PDF file contains the repeated tilde, but the font resource has the modified encoding and so this tilde will appear as something else to the PDF viewer.

I did something else, then: I changed also /i into /nbspace and modified your example file into

\pdfcompresslevel=0
\pdfmapline{=rm-lmtt10 LMMono10-Regular " enclmrmtt ReEncodeFont " <lm-rmtt-mod.enc <lmtt10.pfb}

\font\myfont=rm-lmtt10
\def\myspace{{\myfont\char`~}}
{\obeyspaces\global\def {\myspace}}

\font\tentt=cmtt10
\tentt
\obeyspaces
         hello

\def\myspace{{\myfont\char`i}}%
         hello

\bye

(note that lm-rmtt-mod.enc is the modified .enc file, because I didn't want to tamper with default files). Here's what I get in the PDF file

BT
/F51 9.9626 Tf 91.925 759.927 Td [(~~~~~~~~~)]TJ/F30 9.9626 Tf 47.074 0 Td [(hello)]TJ/F51 9.9626 Tf -47.074 -11.955 Td [(iiiiiiiii)]TJ/F30 9.9626 Tf 47.074 0 Td [(hello)]TJ/F1 9.9626 Tf 164.51 -642.792 Td [(1)]TJ
ET

Here's what I see when I select all the text in Adobe Reader

enter image description here

which shows all spaces are “seen”.

If I do the same “select all” operation on Skim (an Apple PDFKit based previewer) I see instead

enter image description here

No spaces are copied. Therefore, the possibility of copying the spaces depends on the previewer.

Note however that, if I perform “copy” from Adobe Reader on the top line and then paste in an editor window, I get

enter image description here

What's “seen” are spaces, but the underlying text still has tildes.

egreg
  • 1,121,712
  • Yes. Okular (and so probably Evince and other poppler-based viewers) behaves as Skim. – cfr Dec 25 '15 at 01:35