32

After rendering a document containing this code block

\begin{verbatim}
if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository, if doesn't already exist.
if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,  if doesn't already exist.
git add -A                                  # Stages any files/directories present, in preparation to commit them to local Git repo.
git commit -m 'first commit'                # Commits the staged files/dirs to the local Git repo.
git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created above as a "Git remote" with the alias "origin".
\end{verbatim}

to PDF using pdflatex, and viewing the PDF in Apple's Preview application, the rendered code block looked exactly as expected:

if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository, if doesn't already exist.
if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,  if doesn't already exist.
git add -A                                  # Stages any files/directories present, in preparation to commit them to local Git repo.
git commit -m 'first commit'                # Commits the staged files/dirs to the local Git repo.
git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created above as a "Git remote" with the alias "origin".

However, I then tried copying and pasting the rendered code block from the PDF into a text file. I had been expecting the result to be exactly like the original, but instead it was as follows:

if[!-d.git];thengitinit;fi if [ ! -f README.md ]; touch README.md; fi git add -A git commit -m 'first commit' git remote add origin GIT_REMOTE_URL
# Initialises a # Creates an em # Stages any fi # Commits the s # Adds the GitH

Obviously, this is rather different to the original!

Using Acrobat Professional 8, the result is also wrong, but in a different way:

if [ ! -d .git ]; then git init; fi # Initialises a if [ ! -f README.md ]; touch README.md; fi # Creates an emgit add -A # Stages any figit commit -m 'first commit' # Commits the sgit remote add origin GIT_REMOTE_URL # Adds the GitHEN

My question is: is there a way to ensure that the original contents of every \begin{verbatim}...\end{verbatim} environment is preserved in the PDF output, not only as seen by the eye but also as "seen" by the text selection tools in PDF viewing software?

  • It seems the "listings" package is similarly flawed. –  Jul 04 '12 at 22:47
  • Apparently even with diligent attention to settings, the "listings" package still does not preserve all whitespace via copy/paste! Obviously I'm concerned with the "verbatim" environment rather than "listings", but if the latter can't do it then I wonder what hope there is for the less sophisticated former. –  Jul 04 '12 at 23:38
  • Note once again the answers show the problem is essentially external to the code you showed, it depends on the current font size and page width. Please always provide full documents when posting a question. – David Carlisle Jul 05 '12 at 08:38
  • @DavidCarlisle, interesting point, and adds to my increasingly strong conviction that the "verbatim" environment is utterly misnamed! –  Jul 05 '12 at 10:08
  • In that case the truncation of long lines is not done by verbatim, or even by tex, it is a feature of the viewer you are using not letting you select text that us in the pdf but outside the page area – David Carlisle Jul 05 '12 at 10:13
  • Sorry if I'm being dense, David, but which case are you referring to? –  Jul 05 '12 at 10:23
  • Not directly related to your question, but I usually use the upquote package which gives upright-quote and grave-accent glyphs in verbatim. – Paulo Cereda Jul 05 '12 at 10:30
  • @PauloCereda, yes, I'm doing that. I still think it's odd that the need for upquote isn't regarded as a bug. –  Jul 05 '12 at 10:32
  • 2
    Note that TeX is a typesetting system and PDF a "graphical" page description format. The ability of getting your original text back from the typeset result will always be rather limited. This is different from HTML for instance, where the browser is directly displaying marked-up text and not the microtypographic result of a complex rendering process. There might be ways to amend this somewhat, but however, it's not the fault of the typesetting system! How about adding the original source to the PDF as an attachment? That way people could just get the source file. – Stephan Lehmke Jul 05 '12 at 17:08
  • 2
    @StephanLehmke The whole point of a PDF is to be, as its correct name suggests, a portable document format. There is no need for the output of the verbatim environment to be a "microtypographic result of a complex rendering process": it needs to do little more than pick a monospaced font and a starting co-ordinate, and then lay the characters down in sequence. For a document containing dozens or hundreds of code snippets closely referenced in the text, it would be maddening if they were provided as attachments instead of as code blocks. –  Jul 05 '12 at 17:33
  • 1
    Of course you can insist on your point of view despite being told otherwise by a lot of people. PDF might have some features for accessibility (which are not supported by TeX out of the box), but the way you are interpreting portability is really stretching it very far. – Stephan Lehmke Jul 05 '12 at 17:42
  • 3
    It isn't stretching it far at all. Loads of document formats are capable of being viewed on a range of platforms and of representing text verbatim in a manner that also allows verbatim copying and pasting. I'm merely asking how to achieve that with this one. –  Jul 05 '12 at 17:45
  • (Specifically, using the native feature - i.e. the verbatim environment - which is ostensibly for this sort of purpose.) –  Jul 05 '12 at 17:53
  • What in particular do you mean by "document formats"? Other than PDF? Or PDF by other tools than TeX? – Stephan Lehmke Jul 05 '12 at 20:32
  • Other than PDF: certainly. PDF by tools other than TeX: maybe, but I haven't checked. –  Jul 05 '12 at 21:52
  • Sorry, this is getting very weird. I don't think it'll lead to a construtive solution to compare PDF to other document formats in this respect. Maybe you should present here a PDF (not created by TeX) which has the properties you desire, and then we can try to get TeX to do something similar. – Stephan Lehmke Jul 06 '12 at 03:45
  • If it seems weird, that's probably because we've been working from different assumptions. My approach is that this is all just software, so any computationally feasible result should be possible; but that if for some reason PDFs or TeX are limited in a particular way that makes what I am asking for (i.e. genuinely verbatim text blocks) strictly impossible, then a knowledgeable user here will point out this limitation and explain why it is insurmountable. Unless that happens, I remain optimistic. –  Jul 06 '12 at 08:40
  • 1
    @Jubobs: Since the PDF viewers depend on the contents of the PDF file, and the PDF file is generated by (...)TeX, I do not see that this is off-topic. – Heiko Oberdiek Mar 18 '14 at 16:48
  • @HeikoOberdiek The last paragraph of the question prompted my vote to close, but perhaps you have a solution. If you disagree with closing this question, you can always vote to reopen. – jub0bs Mar 18 '14 at 17:30
  • 1
    @Jubobs: The last paragraph of the question is complete on-topic. Closing a question should not depend on having a (good/easy/...) solution. – Heiko Oberdiek Mar 18 '14 at 17:37
  • @HeikoOberdiek The question seems to me impossible (not just difficult) to answer on the TeX side, because copy & paste behaviour varies widely from one viewer to another. I still stand by my closing vote, but I invite you to vote to reopen if you want. – jub0bs Mar 18 '14 at 17:52
  • 3
    @Jubobs: AFAIK the main point is doing it right on the TeX side. For example, TeX uses skips instead of space characters (an answer solves this by using package accsupp). Other issues are font encodings, mapping to Unicode (e.g. package cmap) and others. All of them are on-topic. – Heiko Oberdiek Mar 18 '14 at 18:01
  • 1
    @HeikoOberdiek Alright; I'll vote to reopen, then. – jub0bs Mar 18 '14 at 18:27

2 Answers2

6

Since you mentioned that page on on using the listings package with PDF tagging, I thought I'd mention that I'd had a little bit more luck with getting the spacing to work -- see my PDF. As others pointed out, you still need to make sure it doesn't break the hboxes (line length). In this case I've split up the comments and made the page landscape. The method below also works with file inclusion by using \lstinputlisting{script.txt} instead of \begin{lstlisting}.

Since I am still an amateur at this kind of LaTeX voodoo, it may be that someone can make some more improvements, but I've made sure this method works with all printable ASCII characters. There are a couple of things which are not perfect, but they may not be much of a problem, or they may not be particularly difficult to fix (by someone more experienced):

  • I didn't test it with the vast number of possible listings options, so I don't know if it plays nicely or not.
  • I went to quite some effort to ensure that all special printable ASCII characters were handled properly, but I can't make any promises.
  • Handling spacing was really painful, and in the end all I could do to get it working was to replace every two spaces with a small dot from textcomp which is displayed in the PDF (it still copies as space though!) and hope that it's not too distracting. It may be possible to put some colour formatting in there to make it vanish; I don't really know. The thing is, you're only really ever likely to see this for indented code; normal text doesn't tend to have two spaces in a row.
  • I hear you ask: Since it only replaces two spaces in a row, what happens to the other spaces? Well, since it replaces two spaces at a time, even numbers of spaces are no problem. What about single spaces though? Most single spaces are not replaced but are preserved fine in the output. The two cases they are not preserved are at the very end or beginning of a line. That is, a line which ends with an odd number of spaces will lose one at the end, and a line that begins with a single space (followed immediately by a printable character) will lose one at the start.
  • Edit: Oh, I forgot to mention; I didn't figure out a way to make it copy blank lines. It's still a lot better than no copy & paste though.

\documentclass{article}
\usepackage[landscape]{geometry}
\usepackage{listings}
\usepackage{textcomp}
\usepackage[space=true]{accsupp}

\newcommand{\pdfactualhex}[3]{\newcommand{#1}{%
\BeginAccSupp{method=hex,ActualText=#2}#3\EndAccSupp{}}}

\pdfactualhex{\pdfactualdspace}{2020}{\textperiodcentered\textperiodcentered}
\pdfactualhex{\pdfactualsquote}{27}{'}
\pdfactualhex{\pdfactualbtick}{60}{`}

\lstset{tabsize=4,basicstyle=\ttfamily,columns=flexible,emptylines=10000}
\lstset{literate={'}{\pdfactualsquote}1
                 {`}{\pdfactualbtick}1
                 {\ \ }{\pdfactualdspace}2
}

\begin{document}
\begin{lstlisting}
if [ ! -d .git ]; then git init; fi         # Initialises a new Git repository,
                                            # if doesn't already exist.

if [ ! -f README.md ]; touch README.md; fi  # Creates an empty README.md file,
                                            # if doesn't already exist.

git add -A                                  # Stages any files/directories
                                            # present, in preparation to commit
                                            # them to local Git repo.

git commit -m 'first commit'                # Commits the staged files/dirs
                                            # to the local Git repo.

git remote add origin GIT_REMOTE_URL        # Adds the GitHub repo created
                                            # above as a "Git remote" with the
                                            # alias "origin".
\end{lstlisting}
\end{document}

Here's a link to my PDF output: http://goo.gl/9Ds75

codebeard
  • 1,285
  • Thanks. This doesn't really answer my question, but it might help people who are using the "listings" package. –  Jul 05 '12 at 12:37
  • The options I set above should make it pretty much exactly the same as the default verbatim package. – codebeard Jul 05 '12 at 12:40
  • One way or another, you're going to need to include some other packages if you want to make this work. – codebeard Jul 05 '12 at 12:42
  • Copying and pasting from your PDF (as viewed in Safari 5) to TextEdit yields a completely unusable result, I'm afraid. –  Jul 05 '12 at 15:11
  • Works fine in my PDF viewer... How does it fare in Adobe? Do you have an example of a typeset code block which does copy and paste properly? – codebeard Jul 05 '12 at 21:50
  • At least in foxit reader both the two spaces and the dots are copied. – remmy Feb 10 '13 at 20:33
  • It doesn't work in Firefox' PDF reader, but it does kind of work in Evince, if you select the right parts. – Clément Aug 07 '16 at 21:26
3

The contents of the verbatim environment is to wide for the page and the lines are truncated before the cr/lf. In the pdf file it ends up as one paragraph. Try the following with shorter lines and you will see that it is just fine

\documentclass{article}
\usepackage{verbatim}
\begin{document}
\begin{verbatim}
if [ ! -d .git ]; then git init; fi         #
if [ ! -f README.md ]; touch README.md; fi  #
git add -A                                  #
git commit -m 'first commit'                #
git remote add origin GIT_REMOTE_URL        #
\end{verbatim}
\end{document}

Another problem that is often overlooked when copying and pasting verbatim code is that the minus symbol "-" is sometimes not a character but a rule (depending on the font). To ensure that it is the right character, use the definition of the verbatim package to define your own verbfont for listings

\makeatletter
\newcommand*\verbfont{\normalfont\ttfamily
    \hyphenchar\font\m@ne
    \@noligs}
\makeatother

\usepackage{listings}
\lstset{basicstyle  = \verbfont}
Danie Els
  • 19,694
  • Shortening the lines as you suggested doesn't solve the problem. As copied from Preview, the code ends up all on one line; and as copied from Acrobat Pro, newlines are preserved but every other bit of whitespace is replaced with a single space. I'm afraid I don't understand the relevance of your second point, since I'm not using the "listings" package. –  Jul 05 '12 at 10:20
  • @sampablokuper: As far as I know does TeX not output a space character for spacing in pdf (or dvi?) but position words according to a coordinate. Therefor space chars are not available for copying. Regarding the second point about the \verbfont command; if you do not load the verbatim package your "-" chars can also dissapear during copying. In your original question (before editing it) you also mentioned "listings" – Danie Els Jul 05 '12 at 11:36
  • Danie Els, thanks, but unless I'm mistaken, I only mentioned "listings" in the comments below my question. As for the positioning, surely verbatim ought only to position the first character of the block by co-ordinates. To do anything else would be buggy and not "verbatim" at all. –  Jul 05 '12 at 12:34
  • @sampablokuper You are forgetting that TeX is a typesetting system, and in typesetting 'space' is the invisible material between glyphs, not a glyph in its own right. It's also primariy aimed at print output (Knuth still prints his slides off to acetate to do a talk!). If you want to include a copy of a file with a PDF, you can attach the file to the PDF itself. – Joseph Wright Jul 05 '12 at 16:56
  • 3
    @JosephWright I appreciate that TeX is a typesetting system aimed primarily at print output. However, (a) it is neither novel nor esoteric to read PDFs onscreen instead of as printouts (especially in contexts like computer system tutorials where dozens or even hundreds of code snippets are to be copy/pasted), (b) LaTeX should facilitate the creation of suitable PDFs, and (c) I think perhaps you are forgetting that I'm talking about the verbatim environment rather than about flowing or body text; I totally appreciate that in the latter context, normal typesetting rules apply. –  Jul 05 '12 at 17:11
  • @sampablokuper Whatever you might see in a verbatim environment, please accept that from a typesetting systems POV, only the typesetting capabilities of verbatim are relevant. – Stephan Lehmke Jul 05 '12 at 17:32
  • 4
    @StephanLehmke my point is that other than using a monospaced font, LaTeX does not seem to be typesetting verbatim environments correctly. I.e. it is treating them as though various normal typesetting rules apply, when in fact they don't. My question can be restated as: is there a way to tell LaTeX to apply only appropriate typesetting rules to verbatim environments? –  Jul 05 '12 at 17:39
  • @sampablokuper: Read the documentation of the verbatim package to get the typesetting rules. Go look inside the Latex and verbatim package code how the verbatim* environment is defined where spaces are shown as characters. You can go and redefine the commands on your own to make space glyphs invisible. – Danie Els Jul 05 '12 at 18:46
  • 3
    @DanieEls if I already knew how to do all that, I wouldn't be here asking how to do it :) –  Jul 05 '12 at 19:02