44

I need to include a couple of Python code listings, where the indentation of the lines (using some number of spaces) is significant. I would like for the code listings to be copyable, so the spaces at the beginning of the line need to be be copied along with the text.

This question has been asked in various ways before (e.g. How to make listings code correct copyable from PDF and with hyperlink, or How can I make source code included with minted copyable?). Those questions focus on making line numbers uncopyable, though.

Making the spaces at the beginning of a line copyable seems to be harder: "I am not sure it is possible to specify in the PDF (at least in a viewer-independent way) that the indentation should be copied too" (CyberSingularity). At How to make listings code indentation remain unchanged when copied from PDF?, Philippe Goutet suggests a solution (turning the spaces into visible spaces, and coloring them in the background color so that they appear invisible) that works using Acrobat Reader, but not all readers. He says "It works under Acrobat Reader and it's extremely pleasant to be able to quickly copy/paste code without problem (perhaps the problem can be circumvented by writing direct PDF code to tell that it's a space, I've never had the time to try)".

Is it possible to produce a PDF with a code listing with copyable real spaces at the beginning of a line?

Minimal example: The line return x should start with four spaces.

\documentclass{article}

\begin{document}
\begin{verbatim}
def myfunction(x):
    return x
\end{verbatim}
\end{document}

I know that I could attach the code to the PDF as a file, but that's not what I want.

Jake
  • 232,450
  • 1
    To add an example, my experiments have shown that simply converting to text using pdftotext produces a non-linear relationship between the number of spaces at the start of a line that TeX says there are and the number of spaces produced after pdftotext. It can also vary depending on the indentation of surrounding lines. When I looked at the PDF produced by TeX it appeared that spaces are not characters but are literally gaps so it is up to the viewer to interpret them as a given number of characters. – Andrew Stacey Dec 03 '13 at 12:40
  • 1
    This seems to be a viewer issue rather than pdf itself, I find if I cut your example from xpdf the spaces are preserved, but they go from acrobat. I think basically you need to use \char32 rather than so that TeX puts in a character rather than its inter-word skip then you need a character from some font (any font) that looks white but that acrobat doesn't drop. I failed in that last bit, without using explicit color, any I either see a visible character or actobat drops it on copy. \makeatletter\def\@xobeysp{\textcolor{white}{\char32}}\makeatother works for me in xpdf and acrobat – David Carlisle Dec 03 '13 at 15:02
  • 1
    @DavidCarlisle If I use your trick on Jake's example, compile the code, copy the output from Mac Preview, and then paste it in MacVim, I get visible-space characters instead of spaces. – jub0bs Dec 03 '13 at 17:42
  • I typically just attach the code as an attachment. I use ConTeXt, but I believe that there are LaTeX packages for attachment, and it should be possible to interface them with the verbatim environment – Aditya Dec 03 '13 at 18:35
  • @Aditya: Thanks, I'm aware of that (see last sentence of my question), but especially for short code snippets, I'd like to avoid that route. – Jake Dec 03 '13 at 18:37
  • @Jubobs which character exactly (ie what byte stream do you get) ? the "visible space" character in cmtt is character32 which is ascii space and renders as such in any sanely encoded system Unicode doesn't have a visible space character. – David Carlisle Dec 03 '13 at 19:00
  • @cgnieder: Thanks for the link! Unfortunately, that solution doesn't work with Acrobat Reader, and with Evince, I get two spaces instead of one. – Jake Dec 04 '13 at 08:05
  • @Jake Maybe the method from this answer could be a solution? – Stephan Lehmke Dec 06 '13 at 11:06
  • 1
    As a (discouraging) side note, I just tried to copy some code from some Adobe API manual. Same problem: indentation is lost :-( So if they can't get it right, maybe nobody can? (same on Lunix and Windows with acrobat professional) – Stephan Lehmke Dec 06 '13 at 11:57
  • I can only wish you luck, but I think copying from Acrobat in particular and PDFs in general is awful. I know it's not what you're looking for, but for test purposes, what happens if you use a PDF viewer's save as - .txt function? – Chris H Dec 06 '13 at 12:26
  • I'm afraid your problems won't be over after you've resolved this. Acrobat likes to insert spaces between tokens, so (at least with my code-display set-up) fp.open("Name.txt") may become fp . open ( " Name . txt " ). Not only is this ugly, it opens the wrong filename! Do you have a solution to this? – alexis Dec 08 '13 at 12:02

2 Answers2

16

(it seems this works everywhere apart from acrobat reader)

This is based on the example by @DavidCarlisle.

The cmtt visible space character seems to be labelled differently in different cmtt variants. For cm-super (which is loaded here when I use \usepackage[T1]{fontenc}), the respective character is named uni2423 which seems to cause problems with evince when copying that character.

So I rigorously defined everything which looks like space to a non-break space.

You might want to restrict this to verbatim ;-)

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{color}
\input{glyphtounicode}
\pdfglyphtounicode{visiblespace}{A0}
\pdfglyphtounicode{blank}{A0}
\pdfglyphtounicode{visualspace}{A0}
\pdfglyphtounicode{uni2423}{A0}
\pdfgentounicode=1
\begin{document}\showoutput
\makeatletter
\def\@xobeysp{\textcolor{white}{\char32}}
\makeatother
\begin{verbatim}
def myfunction(x):
    return x
\end{verbatim}
\end{document}

I am inclined to consider the fact that apparently no (consecutive or beginning-of-line) spaces can be copied from Acrobat a bug.

Or is this specified anywhere?

At least it's completely the same with official Adobe documents like the PDF Reference.

So I consider this answer valid no matter what :-)

  • Ah, that looks great, but unfortunately, Acrobat Reader seems to copy the actual U+3000 instead of U+0020, which breaks the copied code... – Jake Dec 06 '13 at 10:49
  • @Jake That's a pity. In that case I'm out of ideas, as using 20 or A0 leads to nothing at all being copied :-( – Stephan Lehmke Dec 06 '13 at 10:50
  • 1
    for me xpdf drops all spaces and acrobat reader puts in U+3000 so it looks OK but doesn't work in code. – David Carlisle Dec 06 '13 at 10:51
  • @Jake Hm. I swear I tried using A0 but now it seems to work. Please try again. – Stephan Lehmke Dec 06 '13 at 11:20
  • @StephanLehmke: *sad sigh* nope, unfortunately Acrobat swallows the spaces. Evince works fine. – Jake Dec 06 '13 at 11:23
  • @Jake I think I'm getting hallucinations. I tested it with acroread, but now I can't reproduce it :-( – Stephan Lehmke Dec 06 '13 at 11:26
  • With Stephan's solution, the character that gets copied from Preview and then pasted (in Matlab, MacVim, Emacs) is U+0020. This may be off-topic, but what about newlines? Preview doesn't seem to copy blank lines at all... – jub0bs Dec 06 '13 at 11:33
  • 1
    This is really tough. acroread seems to strictly refuse to copy anything which is even remotely a space character :-( – Stephan Lehmke Dec 06 '13 at 11:43
  • 2
    @Jubobs This shouldn't be much of a problem. verbatim puts \null in a blank line; one would only need to add some character here. But it's incredible acroread can't be convinced to copy spaces at all :-( – Stephan Lehmke Dec 06 '13 at 11:45
  • with okular on an old fedora 13 the text pasted to emacs uses in my testing 0xA0 NO-BREAK SPACE, not the standard space. –  May 15 '14 at 16:27
12

The following doesn't work in evince, see the discussion in comments below

As noted in comments I suspect using colour is the most reliable way:

\documentclass{article}
\usepackage[T1]{fontenc}
\usepackage{color}
\begin{document}\showoutput
\makeatletter
\def\@xobeysp{\textcolor{white}{\char32}}
\makeatother
\begin{verbatim}
def myfunction(x):
    return x
\end{verbatim}
\end{document}

If I process with pdflatex and cut from acrobat I get:

def myfunction(x):
    return x
David Carlisle
  • 757,742
  • 2
    For me, that works with Acrobat, but not with Evince. Evince gives me instead of spaces... – Jake Dec 04 '13 at 08:06
  • @Jake surprising. The interesting question though (which I asked in the comments on the question without reply) is is it U+0020 and your editor has "helpfully" put the text in cmtt which has a _ glyph in the space slot, or is it U+2423 in which case something has added a mapping to that character (presumably in the font itself, since cmap package not loaded, but in any case it could be tracked down if you could say what it is. From acrobat and xpdf I get space characters U+0020 for the indentation. – David Carlisle Dec 04 '13 at 10:01
  • It's U+2423.... – Jake Dec 04 '13 at 10:52
  • @Jake where the heck did that come from?:-) pdftex itself knows nothing about slots above 256:-) It must have been added to a table within the font. You could try locally within the definition changing font to pick up one that doesn't map the space slot. I'll look later. – David Carlisle Dec 04 '13 at 10:58
  • 2
    @Jake I installed evince on windows and can confirm I give up I can't make it work in evince. Other readers don't see the unicode mapping in the font so add a real space, so it works, evince is too clever for its own good. If you give it a character with a real space but from a font without the map to U+2423, it drops it on copy, if you give it a non space character coloured white, it drops the colour on copy. I could delete my answer as basically it is just what you said didn't work, but perhaps it's best to leave it in case anyone wants to have any better ideas. I'll make it cw as no answer – David Carlisle Dec 04 '13 at 15:00
  • Did you try (as last resort) explicitly setting space as replacement text with the accsupp package? – Stephan Lehmke Dec 04 '13 at 15:08
  • @StephanLehmke No I returned to the day job:-) – David Carlisle Dec 04 '13 at 15:11
  • 1
    btw, I see this as caused by the \usepackage[T1]{fontenc} above. For me, that loads cm super fonts, and if I open the tt font in fontforge, I get that character labeled as "uni2423", so possibly that's where evince is taking it from... Funnily, if I say \pdfgentounicode=1, then acroread also copies to U+2423 – Stephan Lehmke Dec 06 '13 at 10:35
  • @StephanLehmke ah well spotted: (I had tried OT1 and T1 and various other combinations:-) – David Carlisle Dec 06 '13 at 10:43
  • @DavidCarlisle Sorry for not replying about -_-' I'll get back to you shortly. – jub0bs Dec 06 '13 at 10:57
  • @DavidCarlisle I can confirm that, in my case, the character that gets pasted (in Emacs, or MacVim) is also U+2423, not U+0020. – jub0bs Dec 06 '13 at 11:27
  • @Jubobs shame it ends in emacs as U+2423 (it doesn't matter what happens in vi(m) :-) – David Carlisle Dec 06 '13 at 11:31
  • @DavidCarlisle Fanning the flames of the flame war, are we? ;) I'm not partisan. I should have mentioned my viewer is Preview, not Evince, by the way. – jub0bs Dec 06 '13 at 11:34
  • @Jubobs I think you did mention preview. But either way it seems the situation is hopeless, random hacks work for random subsets of common pdf viewers, but nothing seems to work in general. – David Carlisle Dec 06 '13 at 11:37
  • 1
    @Jubobs from okular too (on an old Fedora 13) the pasted character in emacs is the U+2423 –  May 15 '14 at 16:35