2

I have a project that involves converting a large batch of multi-page PDFs into single-page PDFs.

Each PDF file has anywhere from 1 to 16 pages, and the pages have varying heights. For example, I have a two-page PDF that looks like this – my goal is to stitch or impose the two pages together vertically to form a single PDF page:

What I want to happen:

Before After ┌─────┐ ┌─────┐ │ A │ │ A │ │ A │ │ A │ │ A │ -----> │ A │ └─────┘ │ │ ┌─────┐ │ B │ │ B │ └─────┘ └─────┘

I set up a Bash script that loops through all of the PDF files in a directory, and for each one, it runs a pdflatex command:

pdflatex "\def\pdfpath{Example.pdf} \input{stitch_pages.tex}"

Which runs this LaTeX script:

\documentclass{article}
\usepackage{pdfpages}
\usepackage[paperwidth=12in, paperheight=200in]{geometry} % Bigger than the combined width/height of the PDF pages (will be cropped after)
\pagestyle{plain}
\begin{document}
      \includepdfmerge[nup=1x16,          % Assuming there are no PDFs with more than 16 pages
                       noautoscale=true,
                       delta=0 0,
                       frame=true]
                       {\pdfpath, -}      % Path is passed in when running from the command line
\end{document}

As you can see above, I set a large paper width and height, to avoid the PDF pages being resized. I have excess margins around the edges being trimmed off in an additional step, using the Python tool pdfCropMargins.

Unfortunately, after the LaTeX script runs, I end up with a single-page PDF that looks like this – with a large margin between the two pages that can't be cropped away:

What actually happens:

Before After ┌─────┐ ┌─────┐ │ A │ │ A │ │ A │ │ A │ │ A │ -----> │ A │ └─────┘ │ │ ┌─────┐ │ │ │ B │ │ │ └─────┘ │ B │ └─────┘

I found a similar question here – Stitch differently-sized pages together – but the proposed solution, using delta, seems to require the pages to be known, fixed sizes. In my case, I don't know how tall any of the pages are for a given PDF in advance. Is it possible to prevent margins from being added between the pages in the first place, or to dynamically calculate what the margin would be and subtract it, or to vertically align the pages to the top of their "box" when combining them (which would cause the large margin above page B to move below)?

This is my first time using pdflatex, pdfpages, and LaTeX in general, so it's possible I'm missing something simple. Non-LaTeX suggestions for how to do this are also welcome, if anyone has any expertise (LaTeX is the most promising command-line option I've found so far).

2 Answers2

1

By default pdfpages determines width and height of the first page and scales all further pages such that they fit into a rectangle of this width and height. With option pagetemplate you can tell pdfpages to take width and height of another page. And with option templatesize you can even define width and height explicitely. But in the end all pages are scaled to fit into the same rectangle given by one (and only one) width/height pair. It is not possible to specify a separate width/height pair for each page.

I'm afraid you have to use \includegraphics for your task and do page imposition manually.

\noindent
\parbox{\linewidth}{%
  \includegraphics[page=1, scale=.6]{file.pdf}}\\
  \includegraphics[page=2, scale=.6]{file.pdf}}\\
  \includegraphics[page=3, scale=.6]{file.pdf}}}
  • Thank you! I ended up going with @Werner's solution, which also uses \includegraphics, but in a more automated way. The background information you provided about how pagetemplate and templatesize work in pdfpages was helpful, though. – Samuel Bradshaw Feb 04 '21 at 01:53
1

Here is a proposed solution:

  1. Use the same input as you currently have:

    pdflatex \def\pdfpath{Example.pdf} \input{stitch_pages.tex}
    
  2. Now, stitch_pages.tex will perform a number of tasks:

    • Read the first page of \pdfpath in order to determine the page width of the entire (resulting) document. This assumes that all the pages within \pdfpath are the same.

    • Set the geometry of the resulting document to match that width and be overly tall (200in, say).

    • Remove all page margins (margin=0pt).

    • Load the entire \pdfpath to count the page numbers.

    • Remove all paragraph indentation (\setlength{\parindent}{0pt}).

    • Sequentially insert one page after another from \pdfpath into the document using \foreach (from pgffor)

Here is stitch_pages.tex:

\documentclass{article}

\usepackage{graphicx,pgffor}

% Retrieve input page width \newlength{\inputpagewidth} \settowidth{\inputpagewidth}{\includegraphics[page=1]{\pdfpath}}

% Set geometry according to document page width \usepackage{geometry} \geometry{ paperwidth=\inputpagewidth, paperheight=200in, margin=0pt }

% Retrieve number of pages in document % https://tex.stackexchange.com/a/198117/5764 \pdfximage{\pdfpath} \edef\numberofpages{\the\pdflastximagepages}

\setlength{\parindent}{0pt}% Remove paragraph indentation

\begin{document}

\foreach \curpage in {1,...,\numberofpages} { \includegraphics[page=\curpage]{\pdfpath}

}

\end{document}

You can then crop the surrounding whitespace using pdfcrop or some other, similar, software as part of your batch script.

Werner
  • 603,163
  • This looks like it might work really well. A couple of questions: 1) What does \setlength{\parindent}{0pt} do – I didn't expect I'd need to do anything with paragraph indentation; does it serve a special purpose when combining pages? 2) What would be the best way to add, say, a 0.5in margin between the pages? – Samuel Bradshaw Feb 03 '21 at 02:40
  • 1
    @SamuelBradshaw: With \geometry{...} I create a page that has no margins, and set each page from \pdfpath sequentially just like you would regular text. So, consider each page to be a paragraph which, by default, would be indented by \parindent. That's why I remove \parindent (by setting it to 0pt). (2) I used a blank line within the \foreach to denote a paragraph separation between each page. Instead, use \foreach \curpage in {1,...,\numberofpages} { \includegraphics[page=\curpage]{\pdfpath}\par\vspace{0.5in} } - an explicit \par followed by the desired \vspace{0.5in}. – Werner Feb 03 '21 at 02:52
  • This worked perfectly for me. Thank you!! – Samuel Bradshaw Feb 04 '21 at 01:50