14

I have multiple single page PDFs I'd like to join together into a single file. (Obviously I mean the file format and not the probability distributions here!)

I'm aware of free online services such as this one and I have used them several times. On this occasion, however, I have info I'd prefer not to upload to a server I don't know anything about.

I know Mathematica can Import and Export PDFs, but is there any easy way to join them together? I've tried using Join, but it produces a single large page instead of a page 1, page 2, page 3 format. Also the file size balloons - joining a 500 kB and 208 kB pdf results in a 3,503 kB file.

Jens
  • 97,245
  • 7
  • 213
  • 499
fizzics
  • 791
  • 6
  • 15
  • 2
    Why insist on using Mathematica, when there are tools like Multivalent? – J. M.'s missing motivation Oct 18 '12 at 10:03
  • 3
    @J.M. sometimes its fun to stretch the expected domain of the system. It rarely results in pleasing performance but it can teach you interesting things, and some day the application may even be practical. – Mr.Wizard Oct 18 '12 at 10:08
  • 3
    This is no job for Mathematica, really. You want to use Ghostscript which is available for alle systems and can be used like this gswin32 -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUT=merged.pdf -dBATCH 1.pdf 2.pdf 3.pdf – halirutan Oct 18 '12 at 10:09
  • On Mac, use Preview, of course. But I like Mr Wizard's gung-ho attitude. – cormullion Oct 18 '12 at 10:13
  • Thanks all for the advice. I had a look at Multivalent but it gets stuck at the splash screen. Our IT dept has our PCs locked down pretty tightly so I tend to try use Mathematica as my go-to program for all sorts of stuff and it rarely lets me down. I'll have a look at Ghostscript too. Thanks again all. – fizzics Oct 18 '12 at 10:16
  • 2
    I've used PDF Split & Merge (http://www.pdfsam.org/) just don't get fooled by the ads that show up (even through adblock+ in FF). – tkott Oct 18 '12 at 15:05
  • 3
    I use pdftk: http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/ – s0rce Oct 18 '12 at 17:30
  • If you're on a Mac you can use Automator. – Jonathan Shock Jun 08 '13 at 10:06

2 Answers2

11

Update

As @VCL pointed out in his comment, just exporting a list of graphics does not work since the braces and commas of the list a exported as well. Additionally, the pdf is one single page. Here is an updated approach, which takes all imported pdf-pages and inserts them into a new notebook where every page is separated by a pagebreak.

The resulting pdf has at least several pages, but content of the page is scaled and (if not turned off) the headers are printed too into the pdf.

First, we simply import the "Pages" from a file

pages = 
  Import["http://amath.colorado.edu/documentation/LaTeX/Symbols.pdf", "Pages"];

Now we try to make a new pdf which has the original file appended to itself. For this I join the pages together, riffle a newline-cell between them and create a new notebook which is instantly exported into a pdf-file:

Export["tmp/test.pdf", 
 CreateDocument[
    Riffle[Join[pages,pages], Cell["", "PageBreak", PageBreakBelow -> True]
 ], Visible -> False]
]

This works here (MacOSX) but when you look closer at the created pdf, you see things like

Mathematica graphics

while in the original document this was type-set properly

Mathematica graphics

Not to forget, that the resulting pdf is much bigger than two copies of the original file would have been! Input pdf file size: 256K, Mma output: 3.2MB, gs output: 176kB.

halirutan
  • 112,764
  • 7
  • 263
  • 474
  • 1
    +1 for doing the job and pointing out the limitation of the exercice ! – chris Oct 18 '12 at 12:07
  • There are several problems with this solution: (a) the pdf is a single long page, (b) at the beginning and at the end the opening and closing brackets are printed, and (c) each segment has a comma printed on the right margin. – VLC Oct 18 '12 at 12:22
  • @VLC, you are completely right. I didn't notice (a) because I have the continuous view turned on in my pdf viewer. (b) and (c) just slipped through because I was inspecting the other errors on the pages. I thought when importing "Pages" results in a list of graphics, then export a list gives again a working pdf.. just like giving a list of graphics and exporting a "gif". I assume you have to create a notebook, insert the pages and make pagebreaks between them to get this working but I'm sure this destroys the pages sizes and everyting. – halirutan Oct 18 '12 at 13:30
  • @VLC, see my update. – halirutan Oct 18 '12 at 13:51
  • @halirutan +1 and accepted. It solved my problem and I learned something interesting. Thanks for the effort! – fizzics Oct 19 '12 at 14:22
6

For merging PDF files one can use the PyMuPdf library for Python as shown here.

First, install PyMuPdf (it requires Python 3.6 or later):

python -m pip install --upgrade pip
python -m pip install --upgrade pymupdf

Define mergePDFs via ExternalFunction:

mergePDFs = ExternalFunction["Python", "import fitz
def merge_pdfs(sources, output):
    result = fitz.open()
    for pdf in sources:
        with fitz.open(pdf) as mfile:
            result.insert_pdf(mfile)
    result.save(output)
    return output"]

It can be used as follows:

SetDirectory[$UserDocumentsDirectory];

mergePDFs[DeleteCases[FileNames["*.pdf"], "result.pdf"], "result.pdf"]

"result.pdf"

It works extremely fast. For example, merging 505 PDF files takes only about 0.7 seconds on my laptop producing a PDF of size 11 MB.

P.S. There is a limitation due to a bug in the current (1.19.4) version of PyMuPdf: it can merge no more than 508 files at one call and cannot merge into one of the source files (fixed in version 1.19.5).


Here is an example of what we can get with this function:

plots = {Plot[{Sin[x], Sin[2 x], Sin[3 x]}, {x, 0, 2 Pi}, PlotLegends -> "Expressions"],
   Plot[{Sin[x], Sin[2 x], Sin[3 x]}, {x, 0, 10}, PlotLayout -> "Column"],
   Plot[{BesselJ[0, x], BesselJ[1, x], BesselJ[2, x], BesselJ[3, x], BesselJ[4, x], 
     BesselJ[5, x]}, {x, -20, 20}, PlotLayout -> {"Column", UpTo[4]}]};
tempFiles = 
  Table[Export[StringTemplate["~temp~``.pdf"][i], plots[[i]]], {i, 1, Length[plots]}];
If[mergePDFs[tempFiles, "plots.pdf"] === "plots.pdf", 
  DeleteFile@tempFiles; "plots.pdf", $Failed] // SystemOpen

screenshot

Alexey Popkov
  • 61,809
  • 7
  • 149
  • 368