78

arXiv is smart enough to detect whether your PDF is generated by LaTeX. If it is, arXiv asks for the sources of the PDF instead of the PDF file itself. This is really annoying and inconvenient.

I ask a way to fool the arXiv and avoid PDFs being detected as generated by LaTeX (even if they were indeed).

I have tried this: http://www.hrstc.org/node/62: "create a pdf from word and then simply insert pages from your article, and delete the word's page".

And it did not work.

strpeter
  • 5,215
  • pdfpages is your friend here. Create a wrapper latex file that includes the other as a series of pdf pages. – Andrew Stacey Jun 22 '14 at 17:38
  • 15
    What's wrong with uploading the sources? It's much better from an archival point of view. – Alex Jun 22 '14 at 18:40
  • 22
    @Alex The arXiv doesn't always have the most up to date versions of packages, meaning that a document using new features might not compile. So it's not necessarily about not uploading the sources, but ensuring that the arXiv doesn't actually try to compile those sources. – Andrew Stacey Jun 22 '14 at 19:34
  • 3
    @LoopSpace I would never use the latest features for writing a paper. Publishers use even older versions of LaTeX than arXiv. – Alex Jun 22 '14 at 20:34
  • 8
    @Alex From my point of view, it is much more convenient by just uploading the pdf rather than the source. arXiv lets me feel word is better than latex. That is it. – Changwang Zhang Jun 22 '14 at 20:58
  • 2
    I don't know how the detection is done but you can customise the metadata so that references to the software used to create the file are not included. By default pdftex, for example, will advertise itself through the file's metadata but you can easily override this by specifying alternative values in your source. – cfr Jun 22 '14 at 21:19
  • 19
    @Alex Some of us think that publishers are as outdated as LaTeX 2.09 ... Seriously, once a publisher agrees to publish my paper then I'll downgrade it (and I have done this: I downgraded one paper from TikZ to xy). Before that, I'm not going to anticipate something that might not happen and so I'll take advantage of the latest features to make my paper as nice to read as possible. – Andrew Stacey Jun 22 '14 at 21:28
  • @LoopSpace do they do that, our just put the source up for download? – vonbrand Jun 23 '14 at 03:12
  • This looks borderline for on-topic to me. We can't actually know what method(s) arXiv uses to detect (La)TeX sources, though we can of course speculate. What we can answer is 'How to remove data X from a (La)TeX generated PDF?' where 'X' would be whatever arXiv use. – Joseph Wright Jun 23 '14 at 07:32
  • @vonbrand If the source is detected as LaTeX, then the PDF that you download from the arXiv has been automatically generated by them. – Andrew Stacey Jun 23 '14 at 14:00
  • 4
    I would appreciate to know how to upload PDF files directly to arXiv for the simple reason that my PDF files are created by omega or luatex and contain special fonts. If arXiv wants us to upload TeX code then it must give us the means to compile this code. If it doesn't, then it is clearly arXiv's fault. Twice already I had to ask for special permission to upload a PDF, permission was granted but I find it unacceptable to have to beg for a PDF file to be accepted and to have to justify why it is necessary. If somebody knows how the arXiv filter works, please let me know! – yannis Feb 28 '16 at 22:06
  • @Alex My TeX "build system" is too complex for arXiv. That's what's wrong with uploading the sources. If TeX/LaTeX had a single, everybody-uses-it build system, then this wouldn't be a problem, but it doesn't. – DanielSank Jan 14 '18 at 23:27
  • Recently they have shifted to TexLive 2020 and it works seamlessly. Except for the very small extra effort of including the .bbl file, things proceed quickly now. – Mohit Lamba Nov 28 '20 at 14:04

11 Answers11

40

Update 2018-11-26: According to the comment from Andrew MacFie, this method no longer works: the arXiv have put in place a check specifically for this. Whether or not they have solved the underlying problem, I have no idea. Whether or not a variation of this solution would work, I have no idea.


I strongly advise uploading the source code to the arXiv for archival purposes. Once the document itself is made public, there is no reason I can think of for not making the source code public. That said, there can be reasons for avoiding the arXiv's own compiler if it doesn't have up to date versions of packages that you use and, for example, uploading the entirety of TikZ/PGF with your document seems a little excessive. (NB For TikZ/PGF specifically, the external library is a big help here.)

Include a PDF copy of the article with your submission (with a slightly modified filename), and at the top of the document put the following (suitably edited):

\documentclass[a4paper]{article}
\pdfoutput=1
\usepackage{hyperref}
\hypersetup{
  pdfinfo={
    Title={Your Title Here},
    Author={Your Name Here},
    Subject={If you want to put something here, do so},
    Keywords={Add some keywords if you feel so inclined}
  }
}
\usepackage{pdfpages}

\begin{document}
\includepdf[pages=1-last]{the_real_article.pdf}
\end{document}

<rest of source code goes here>

I think that to comply with the spirit of the arXiv, in that you're making stuff generally available, then if you are doing this then you should go the extra mile and include something like the list of packages used with their versions so that someone who downloads the source will know what they need in order to compile it.

I can attest to the fact that this has worked in the past.

Andrew Stacey
  • 153,724
  • 43
  • 389
  • 751
  • 3
    This didn't work for me -- the processing proclaimed success but I ended up with a blank one-page document.

    Still trying to get my submission working...

    – Andrew D. King Feb 06 '15 at 23:49
  • 8
    add \pdfoutput=1 in the second line – Grzegorz Chrupała Jun 11 '15 at 14:53
  • 3
    Note that you lose all hyperref (linking) information when using this approach. PAX is a program designed to help with this, but it didn't work for me. – David Spivak Aug 04 '15 at 16:03
  • @DavidSpivak Hadn't heard of pax (though I now see that it is mentioned in the pdfpages documentation). Looks worth a try if I ever end up doing this again, though one would need pax to work on the arXiv and one reason for doing this is that debugging arXiv's tex is haphazard. – Andrew Stacey Aug 05 '15 at 12:14
  • 1
    "Once the document itself is made public, there is no reason I can think of for not making the source code public." How about because you have comments that include old versions of individual sentences/paragraphs, notes between collaborators, etc. And you'd rather not make those public, yet the easiest and most useful place to have them is as comments right there in your TeX... (Or even just because maybe your collaborator added such comments and you don't want to have to check your entire doc for such comments before submitting source...) – Joshua Grochow Nov 26 '15 at 04:51
  • 4
    @JoshuaGrochow Simple: remove all comments before uploading. You can do that with a straightforward script - assuming you haven't changed the catcode of %. After all, comments don't affect the output so it is completely safe to remove them. (To forestall nitpickers: the script should do s/%.*/%/gm if I have my modifiers right.) – Andrew Stacey Nov 26 '15 at 18:31
  • 2
    @LoopSpace: I knew that. The point is that this requires you to add another step to your workflow ("Ah, shoot, I forgot to remove comments before uploading! Let me do that now and re-upload..." - or worse, figure it out after it's already appeared on the arXiv). And if it's not you but collaborators, you may have to help your collaborator setup that script. You'd really rather just upload the PDF and not have to worry about comments in your source. – Joshua Grochow Nov 26 '15 at 18:50
  • @JoshuaGrochow You may have known that, but others reading these comments may not. Experience tells me that I am extremely unlikely to convince you that your purported reason does not outweigh the arXiv's reason for wanting them. So at this point I'm "talking to the crowd" and my aim is to show that if one is worried about comments then it's quite easy to remove them and still comply with the arXiv's policy. – Andrew Stacey Nov 26 '15 at 20:09
  • @LoopSpace: Fair enough. I agree that you are unlikely to convince me :), but, given that the arXiv does allow PDF-only submissions in general, it seems somewhat...unfair? draconian? I'm not sure of the right word...to detect when a PDF in fact came from LaTeX and then essentially force the author to upload the source. – Joshua Grochow Nov 27 '15 at 05:37
  • 1
    "there is no reason I can think of for not making the source code public" On the other hand, what reasons are there to make the source code public? Who cares about the source code? It's the content that's important. And since arXiv garbles articles written in older versions of TeX (look at some papers from the 90s), why shouldn't we be able to include PDFs the way we wanted them to be rendered? – Turion Nov 03 '16 at 09:44
  • Adding to the answer of Loop Space: To make sure that the arXiv compiler detects the use of pdfTeX, add the following line in line 2: \pdfoutput=1 – denizb Dec 10 '14 at 15:55
  • @Turion I have (incomplete) code that changes the layout of a TeX file from the arXiv to make it fit better in my screen and make the pdf more confortably readable. This would be impossible/very hard without the source. – Bruno Le Floch Oct 04 '17 at 09:35
  • 1
    @BrunoLeFloch, ok, that's a possible use case. How often does such a thing happen? I'm working in maths and have never met a person (except you) who told me they downloaded a source from arXiv. – Turion Oct 04 '17 at 09:41
  • 1
    @Turion: I know several people who download the source from the arXiv to (1) see how something is done or (2) read the comments (ok, that last point is an argument against uploading). I'm working in hep-th. – Bruno Le Floch Oct 04 '17 at 10:11
  • "Once the document itself is made public, there is no reason I can think of for not making the source code public." If you upload sources to arXiv, then those sources have to build on arXiv's servers to produce your readable artifact. If they don't, no upload for you. Now take into account that (La)TeX doesn't have a universally used build system, e.g. arXiv cannot build any of my projects because of how I use packages to import sub-documents, and you've got a damned good reason to not want to upload sources to arXiv. – DanielSank Jan 14 '18 at 23:30
  • So far, I always have uploaded my sources to arXiv, but it always requires effort to work around arXiv's build capabilities. In the future, I will not spend the time. My sources are in a public git repository. I don't blame arXiv for this; the TeX community needs to create and support a not-horrible build system. – DanielSank Jan 14 '18 at 23:33
  • 1
    @DanielSank If you look at my answer, you'll see that I provide a method whereby one can upload the sources to the arXiv without compiling them on the arXiv's servers. – Andrew Stacey Jan 15 '18 at 17:37
  • I know! And I appreciate it. I was simply explaining why one might not want to upload sources to arXiv and was trying to show how badly the TeX community needs a standardized build system. Your answer already acknowledges that sometimes one wants to avoid arXiv's compiler, and I agree. – DanielSank Jan 15 '18 at 17:57
  • 1
    Please note that this does not work (or no longer works). I got 'Your submission appears to be a PDFLaTeX wrapper using pdfpages. This is an inappropriate submission, as it circumvents our TeX system. As a result, we have moved your submission to “Incomplete”.' – Andrew Nov 26 '18 at 15:08
  • 1
    @AndrewMacFie Oh, that's a shame. Unless they've made it easier to use specific packages, that is. Which I somehow doubt. I suspect it would be possible to circumvent whatever test they've put in place, but as I don't have an article to test it with at the moment I will pass on figuring it out. – Andrew Stacey Nov 26 '18 at 21:27
  • I have used this method to append another PDF file (kind of an appendix) created with pdflatex to my main file (for which I uploaded the TeX source code). As of today, I can confirm that at least this works. – JPW Apr 09 '20 at 15:05
  • 1
    if you only want to use a pdf page consider use fitpaper=true in your \includepdf[fitpaper=true, pages=-]{test.pdf} for a fully fit to the page – Jason Angel Sep 07 '20 at 20:39
18

The solution is simple. Open your PDF using some PDF viewer/editor. For instance using Foxit Reader. Then print the file using a virtual printer, for example Foxit PDF printer. The printed version of your PDF would be image-based. However, to increase the quality you can choose a higher resolution, such as 300 dpi. The printed PDF file can be easily uploaded into arXiv.org.

NKN
  • 1,034
  • 13
  • 23
  • 1
    For some reason most pdf readers don't allow you to print to pdf anymore. – dorien Dec 01 '18 at 03:30
  • There are a lot of free Print2PDF software you can install that integrate with your computer as a virtual printer. For example, CutePDF, Win2PDF, CUPs-PDF and a lot more. – NKN Dec 07 '18 at 15:50
  • 1
    I have tried this technique, but am getting the message that my submission is on hold because the output from pdf to text conversion was too small to allow further processing. I have printed to PDF virtually, but seems to be some issue with the resolution? – Tom Jul 19 '19 at 05:28
  • The size of my .pdf file has increased 100x with this solution and I get the error message: Total submission size 100862.70 KB exceeds maximum allowed size 50000 KB – jperezmartin Mar 26 '21 at 17:29
14

Since several other methods didn't work (e.g. resetting the PDF producer metadata via hyperref or a PDF editor, or trying a Print As PDF hack through a web browser or Google Drive), here's a simple one that's worked for me in Nov 2019. It's specific to command-line users, though, but that also means that it can be conveniently automated as part of your document build process.

Just process your PDF to Postscript with pdf2ps, and then back again with ps2pdf:

pdf2ps mydoc.pdf
ps2pdf mydoc.ps

That's it.

This seems to reset enough that arXiv doesn't detect the TeX origin of the document, but I don't see any significant detriment to the output. In particular I wanted to upload in this form for a special document -- not a paper, but of interest to the arXiv community -- where I was using custom fonts and XeLaTeX, and those fonts were preserved by the PDF/PS/PDF round-trip.

Here's a sample Makefile to automate this along with the rest of the build (for a simple 2-pass, standalone XeLaTeX compilation):

.PHONY = all clean
BASENAME = mydoc

all: $(BASENAME).pdf $(BASENAME)-arxiv.pdf

clean:
    rm -f $(BASENAME).pdf $(BASENAME).aux $(BASENAME).log $(BASENAME).out $(BASENAME)-arxiv.*

$(BASENAME).pdf: $(BASENAME).tex
    xelatex $< && xelatex $<

$(BASENAME)-arxiv.pdf: $(BASENAME).pdf
    cp $< $@ && pdf2ps $@ && ps2pdf $(@:.pdf=.ps) && rm *.ps

Needless to say -- I hope -- this is to be used for exceptional cases where you really want/need features unsupported by arXiv's AutoTeX processing.

It would be nice if it were possible to also provide the source without it being compiled, but my attempt to include the .tex file in the anc subdirectory for ancillary material itself led to a processing error, because currently "Please note that ancillary files are not supported with PDF submissions at this time." on https://arxiv.org/help/ancillary_files . This may be fixed by the time you're reading this.

andybuckley
  • 1,168
  • 1
  • 10
  • 15
  • it's 2020, and it didn't work for me yet. – JWL May 09 '20 at 15:04
  • does not work for me either, and in addition, kills all hyperlinks (URLs, references, etc.) – Antoine May 16 '20 at 10:16
  • 2
    arXiv seem to have guarded against it now. Annoying, since imho there are valid reasons to post rendered PDFs... but it's their platform, their rules – andybuckley May 16 '20 at 13:22
  • I disagree that this does not cause a detriment to the output -- it removes all the hyperlinks! – E.P. Jul 30 '20 at 10:38
9

Arxiv detected my PDF (made using MS Word!) to be sourced from a TEX file and kept asking me to upload the source file and kept rejecting my PDF submission. Eventually I figured out that MS Word 2016 uses a number of TEX identifiers in their equation editor which gets flagged by Arxiv. After scouring the internet for fixes with no success, I serendipitously found out that if you save your Word manuscript in 'Word 97-2003' format, it removes those TEX identifiers and converts all equations to images. You can then safely Save As PDF and upload to Arxiv without any red flags.

4

November 2021: arXiv accepted my xelatex-generated PDF after:

exiftool -all:all= -overwrite_original FILE.pdf

From: https://stackoverflow.com/questions/60738960/remove-pdf-metadata-removing-complete-pdf-metadata

4

This worked for me today, June 2022.

Apparently, arxiv is smart enough to detect when you are importing the entire PDF using pdfpages as in here https://tex.stackexchange.com/a/186206/272006. However, you are free to import ypur PDF file, page by page, using graphicx and its \includegraphics command. I would speculate that this approach would be harder for arxiv to guard against for a few reasons:

  • correct LaTeX source is being uploaded, and detecting issues in it is not as easy as looking for TeX macros in PDF
  • \includegraphics is a legitimate command, used by almost any submission, so flagging the submission for just using it is impractical
  • even if arxiv detects the for-loop (which is also a legitimate command), one is free to simply regenerate include statements and even obfuscate them

The code is below and here is its explanation and advantages:

  • I tested it myself (for a 173 pages thesis), arxiv has accepted the submission and PDF looks correct
  • it preserves internal and external PDF links if you use PAX
  • it generates a generic PDF document from the file you supply for any number of pages
  • it does not produce any extra blank pages
  • it produces the .blg file that arxiv uses for better biblipgraphy processing

To preserve links, follow the PAX approach (arxiv works correctly with it, and why wouldn't it, pax file is simply a map that is read by the pax package). Install PAX using their instructions, or use one in my (large) docker image (the perl script is in the root). At the end of installation you should have pdfannotextractor.pl script, and you should be able to generate the .pax file with pdfannotextractor.pl path/to/file.pdf, the file will be at path/to/file.pax. If the .pax is in the same directory and the same name as the PDF, pax package will pick it up, and links will magically work.

To generate .blg file, it is enough to load biblatex, load you .bib file and use \nocite{*} command.

Also, because we place an image to fit the page entirely, LaTeX reasonably assumes that the image does not fit and places it on the next page, resulting in an extra first blank page. \vspace*{-4ex} hack seems to fix it without a negative effect on the included image.

Finally, I do support the arxiv idea of generating PDF from source, although, possibly not as such a strict requirement. In my case the thesis document was so complex and it relied on newer TeX distribution, that I simply couldn't compile it using arxiv engine. Please, use this hack wisely.

% cSpell:disable

\documentclass[letter]{report}

\usepackage{graphicx} \usepackage{pgffor} \usepackage[margin=0in]{geometry} % remove all margins \usepackage{pax} % process .pax file to bring back internal links

\newcounter{pdfpages} \newcommand*{\getpdfpages}[1]{% \begingroup \sbox0{% \includegraphics{#1}%% \setcounter{pdfpages}{\pdflastximagepages}% }% \endgroup }

\usepackage[hidelinks]{hyperref} % remove ugly link borders

% Add metadata to resulting file. \hypersetup{ pdfauthor = {Dmytro Bogatov <dmytro@bu.edu>}, pdftitle = {Secure and Efficient Query Processing in Outsourced Databases}, pdfsubject = {Doctoral Dissertation}, pdfkeywords = {OPE, ORE, Range Query Protocols, Epsolute, kNN}, pdfcreator = {LaTeX with hyperref package}, pdfproducer = {dvips + ps2pdf} }

% Adapt to your case. \usepackage[ backend=biber, style=alphabetic, giveninits=false, sorting=nyt, maxbibnames=1000, maxalphanames=4 ]{biblatex}

\bibliography{bibfile}

\begin{document}

% This is to remove the first blank page. Feel free to improve.
\vspace*{-4ex}

\getpdfpages{file}
\foreach\x in {1,...,\value{pdfpages}} { % chktex 11
    \begin{center}
        \includegraphics[width=\paperwidth,keepaspectratio,page=\x]{file}
    \end{center}
}

% This is to generate all the citations from your bib-file to *.blg.
\nocite{*}

\end{document}

3

The Arxiv LaTeX compiler is a pain in the ass and never compiled my paper correctly. That is the reason why I wanted to upload my own PDF w/o source.

Arxiv is identifying the LaTeX pdf based on some meta information and based on the embedded fonts.

Following these steps, I tricked the pdf check:

1) Create an empty pdf, such as with Word. Insert your Latex PDF into the Word-generated PDF. I used Mac Preview. Any other software such as Adobe Acrobat would work too.

2) Open your generated pdf with Adobe Acrobat. Go to: File > Save > Save as Other > Optimized PDF... > Fonts > unembed the troubling fonts (CMR10 & rsfs10). If this did not work, try to also remove other fonts.

(worked on 18th of August 2019)

Guest
  • 31
2

This worked for me 06 November 2019.

For people who do not have access to the paying Adobe Acrobat functions:

  1. Convert your pdf to a .docx or similar. I used this one, maybe there are some better options.
  2. Use MS Word or equivalent and check the file (I noticed this conversion wreaks havoc on .eps images) and replace them with .png or similar where necessary.
  3. Export as .pdf with MS Word or the program you are using.

I really hope they at some point either make their auto-compiler more user-friendly or at least make the generated error messages more informative. I tried all the tips I could find on this site and really wanted to upload it as .tex but nothing seemed to work. The help desk was also no help at all.

Maarten
  • 29
1

The best way to do is to upload PDF to the google docs and then issue print command and then save the file as PDF. Then upload this new PDF on the arxiv. This works as on 7 august 2019 and if you want to know more, read the documentation on https://arxiv.org/help/submit_pdf

Manish
  • 111
  • 2
    If you want to feed google with all datas do it; I think it is better to omit this doing as possible as you can do ... And there are other answers with valid ways to this question. – Mensch Aug 07 '19 at 17:14
  • I tried this, but the import to GDocs resets the custom fonts etc. which was my reason for wanting to exceptionally submit this (not a normal paper) as PDF rather than TeX sources. Trying a print-as-PDF from the original as viewed in a browser doesn't seem to overwrite enough information to bypass arXiv's tests. – andybuckley Nov 07 '19 at 11:48
  • 1
    I tried this and it did not work for me either. – Tom Jan 06 '20 at 17:29
0

It is super easy if you have Adobe Acrobat XI Pro (try it with other versions). In Adobe Acrobat XI Pro, go to "Create" --> "Combine Files into a Single PDF" --> upload your Latex-generated .pdf file and combine this .pdf file, then save it. Thats it. (worked on 6th of August 2020)

The good thing about this approach that the quality of the original Latex-generated .pdf file is not altered compared to other approaches, which require uploading the file to google then downloading it again.

albert
  • 11
0

I just found the way to do it (Worked 2023-07-27). Using Foxit Reader (free version) in Windows. Open your PDF file, Ctrl+P to print, Select the mode name "Microsoft Print to PDF". Select the path to save new PDF file. Upload this PDF file to arXiv.

Ingmar
  • 6,690
  • 5
  • 26
  • 47