0

Edit. My question can be reworded as: "How can I check, by command line, if different sources generate exacty the same .pdf layout?"

I'm improving some emacs scripts of mine that I wrote to remove comments and/or unused macros from the source .tex file.

Here's a sample or my code:

(save-buffer)
(call-process-shell-command
 (concat "latex \"\\let\\oldExecuteOptions\\ExecuteOptions\"\\\n"
     "\"\\def\\ExecuteOptions#1{\\oldExecuteOptions{#1,draft}}\"\\\n"
     "\"\\nonstopmode\\input{" (buffer-name) "}\";"
     "mv " (file-name-sans-extension (buffer-name)) ".pdf /tmp/"  ) nil nil)

;;; =========================================================================
;;; =========================== "COMMENT" AMBIENT ===========================
;;; =========================================================================

;;The "comment" ambient is defined in the "verbatim" e "comment packages"
(if (string-match-p "\\\\includecomment[\s\t\n]*{comment}" (buffer-string))
    (read-string "SOME WARNING HERE...")

  ;; *ELSE*
  (progn
    (goto-char (point-min))
    (while (search-forward-regexp "\\\\begin[\s\t\n]*{comment}" nil t)
      (save-excursion
    (let ((b (make-marker))
          (e (make-marker)))
      (set-marker b (match-beginning 0))
      (search-forward-regexp "\\\\end[\s\t\n]*{comment}" nil t)
      (set-marker e (point))
      (goto-char b)
      (unless (nth 4 (syntax-ppss))

        (comment-region b e))
      )))
    (save-buffer)
    (call-process-shell-command
     (concat "latex \"\\let\\oldExecuteOptions\\ExecuteOptions\"\\\n"
         "\"\\def\\ExecuteOptions#1{\\oldExecuteOptions{#1,draft}}\"\\\n"
         "\"\\nonstopmode\\input{" (buffer-name) "}\"" ) nil nil)
    )
  )

As can be seen:

  1. I first compile my source by passing the draft option by command line, saving/moving my output in the /tmp/ directory.
  2. Then I "comment" (with %) and remove the "comment" ambients from the source .tex file.
  3. Finally I compile my cleaned source, again with the draft option...

So I first have:

\documentclass[11pt]{article}
\pdfoutput=1
\usepackage{comment}
\usepackage{blindtext}

\pagestyle{empty}

\begin{document}

\blindtext

\begin{comment}

\blindtext

\end{comment}

\end{document}

and then

\documentclass[11pt]{article}
\pdfoutput=1
\usepackage{comment}
\usepackage{blindtext}

\pagestyle{empty}

\begin{document}

\blindtext

\end{document}

My question is: "How can I check by command line, if the layout has (or not) changed?".

I tried to compare the two pdfs by command line:

cmp mydoc.pdf /tmp/mydoc.pdf

but it always says that the files are different. But I know the layout has unchanged.

If I try with diffpdf (a GUI program) it says "the pdfs appear to be the same" but, as I said, I need a command line solution. I mean I need to get the comparison result in the stdout of my terminal so I can use it in my scripts (e.g. with the emacs' shell-command-to-string fuction). I do not need (and I don't want) to visual check the layout in this stage.

Please, let me know if I've been able to explain my purpose in a clear way.

NOTE. This is a dummy example but in some cases could be tricky to check if some changes will affect the layout.

My considerations. After many attempts using a "file content" approach (I tried a faketime command approach (see here) but it works only with small and simple files) I'm convinced that I need a "layout approach". A tool like diffpdf (with the --appearance option) that could echo in the stdout could be the solution.

Gabriele
  • 1,815
  • 1
    I don't think you mean layout itself. The content of the document has changed and you can check this difference by something simple like a md5sum hash or similar programs or \pdfmdfivesum from within pdflatex itself. If there's a single white space character more or less than before the md5 hash sums will differ –  Aug 06 '17 at 23:17
  • @ChristianHupfer It's weird... The md5sum technique works with latex/dvi compilation but doesn't with pdflatex or latex/pdfoutput=1 compilation. – Gabriele Aug 06 '17 at 23:39
  • @ChristianHupfer I made some tests and I found that the md5sum hash of a .dvi file is related to the system clock. If I compile a file at 9:03 and then I compile again the same file at 9:04... I will get two different hashcodes. I suppose that the same happens with the pdflatex compiler but with a littler time interval. – Gabriele Aug 07 '17 at 07:07
  • 3
    pdftex inserts a creation date in the pdf, this will normally change at every compilation. See SOURCE_DATE_EPOCH and related subjects in the pdftex documentation. – Ulrike Fischer Aug 07 '17 at 07:21
  • In many pdf viewer it is fairly easy to copy the content of the entire pdf file as a text. Then you can compare the content of these texts. – Name Aug 07 '17 at 08:33
  • @Name Comparing texts isn't enough for my purpose. I need to check if the two pdfs are identical in any single detail. – Gabriele Aug 07 '17 at 08:43
  • If you copy the text of a page in sumatrapdf reader for example, not only the text is copied, but also the structure and formulas are also converted to some text. In my opinion it is worth to have a try. – Name Aug 07 '17 at 09:40
  • @Name Since I need to perform this operation in my "copy editor" workflow, this is a very sensitive issue. Also I need a fast command line solution that avoids me, in the most cases, to visual inspect the layout. – Gabriele Aug 07 '17 at 09:48

0 Answers0