3

I made a shell script, Ubuntu 14.04 (taking ideas from google docs as team editor) to grab a google docs file and convert it to LaTeX, compile with LaTeXmk to PDF and then open the pdF with evince as follows:

#!/bin/bash

wget -O $2.tex "https://docs.google.com/document/export?format=txt&id=$1" &&    latexmk -pdf -f -interaction=nonstopmode $2.tex || evince $2.pdf

The first argument of the script $1 is the file sharing id of the file I want to grab the text from (I made a shareable link of the google doc and take the id from there), the second $2 is the name of the file I want to produce without the file extension.

It works okay, but for some reason produces a blank first page. It also produces lots of errors. However, I suspect that the blank first page results from this error:

! LaTeX Error: Missing \begin{document}.

My google docs document ends up producing a LaTeX file that looks like this:

\documentclass{article} 
\begin{document}

I am the walrus.

\end{document}

Which is exactly how the google doc looks. So I don't know why it would complain about a missing \begin{document} when I have one in my file. Not surprisingly, having little to no programming skills, I haven't been able to figure out why it is producing output starting on the 2nd page. And so here I am, any help greatly appreciated.

A Feldman
  • 3,930
  • 1
    Debug the process to figure out where stuff breaks. Download the file. Then compile separately. If that fails, check with a non-downloaded file. Does compilation fail? If so, troubleshoot that issue first because that problem has nothing to do with Google Docs. If not try to get the Google Docs download to match the working test file. You need to narrow down what is going wrong in order to debug any issue like this. Even if it seems to work, a shell script you want to rely on should do some checks and give you some useful feedback when things go wrong. For now, don't script at all. – cfr Dec 28 '15 at 03:29
  • 1
    If you can download the file and compile with pdflatex, troubleshoot the latexmk step. Try latexmk -pdf -verbose <filename>.tex. You don't want it to continue despite errors. You want the first error. Similarly, you don't want no interaction. You want it to stop and tell you stuff. Is a .log file produced? Anything there? Make sure that you have no strange characters in any path or file names. Stick to ASCII letters and numerals and hyphens. No spaces. Nothing strange. (Underscores can be OK but avoid for now.) – cfr Dec 28 '15 at 03:32
  • 1
    Does the same with pdflatex. Thanks, I'll try your suggestions. – A Feldman Dec 28 '15 at 03:36
  • 1
    So you download the file as test.tex. Then you do pdflatex test.tex and get the error about missing \begin{document}? If so, create a document from scratch with that content in your editor and call it test2.tex. Now, pdflatex it. Do you get the error? – cfr Dec 28 '15 at 03:38
  • 1
    If I had to guess, I'd say there's something else in that document. It may not be visible, but it is there and it is before \begin{document}. If you've got invisible characters, TeX may still see them. – cfr Dec 28 '15 at 03:43
  • 1
    The strange thing is that when I open the TeX file created by the script in TeXstudio, it compiles fine with latexmk -pdf -silent -latexoption="-synctex=1" "Another" So I tried changing it to "acroread" rather than "evince", still same problem. – A Feldman Dec 28 '15 at 03:50
  • 1
    If you remove ||... does it work? – cfr Dec 28 '15 at 03:56
  • 1
    I don't know what the || is supposed to do anyway. Why would you open the PDF only in case the compilation fails? – cfr Dec 28 '15 at 04:02
  • 1
    Also adding set -x to the bash script may be helpful. It will make the script very chatty but it can be very, very helpful in seeing at least what is going wrong, even if it does not tell you why it is happening. – cfr Dec 28 '15 at 04:08
  • 1
    I think I found the problem when I did a "cat filename.tex | less" There seems to be something appended to the beginning of the files so the command "\documentclass[letterpaper]{article}" shows up on screen as "<U+FEFF>\documentclass[letterpaper]{article}" – A Feldman Dec 28 '15 at 04:20
  • 1
    Invisible characters, then. This is a problem with the file you are getting from Google Docs. – cfr Dec 28 '15 at 04:32
  • 1
    The problem turns out to concern BOM characters. The issue and some solutions are discussed here. – cfr Dec 28 '15 at 04:45

1 Answers1

3

I had to remove the BOM Before LaTeXmk processing. I used the BOM stripper script from here BOM Remover Script, threw it in /usr/local/bin and changed my script as follows:

#!/bin/bash

wget -O $2.tex "https://docs.google.com/document/export?format=txt&id=$1"; 
bom-remove.sh $2.tex; 
latexmk -pdf -f -interaction=nonstopmode $2.tex;
evince $2.pdf

And now it works. To use the script I call it by name, in my system it is "glatex". Following the script name I put the google doc id something like "1-JdNOZ_73GfRgsRF4u2pWc16HGze-VZKxMgjofTXf5k", and finally the name I want to call both my LaTeX file and my pdf file, "filename" which will get turned into filename.tex and filename.pdf by the script using the google doc whose id is input.

What I like about this is that I get to edit with google docs, use its collaboration abilities and time tracking abilities. Yet, I am using my own LaTeX environment to compile. It really is the best of both worlds.

A Feldman
  • 3,930