7

Assume you have the following form in PDF ouput/equivalent in Fig. 1. I am thinking how to design the LaTeX form well enough for good data extraction (and eventually seeding into PostgreSQL database). You want to extract the following pieces of data in the form:

  1. Question 1 answer
  2. Question 2 answer
  3. Summary result

Code to generate the PDF file

% https://tex.stackexchange.com/a/384801/13173
\documentclass{article}
\usepackage{hyperref}
\begin{document}

\begin{Form}
\begin{enumerate}
\item \ChoiceMenu[name=football,radio,default=0]{Do you play football?}{Much (2)=2,Little (1)=1,Not at all (0)=0}
\item \ChoiceMenu[name=ice-hockey,radio,default=0]{Do you play ice-hockey?}{Much (2)=2,Little (1)=1,Not at all (0)=0}
\end{enumerate}

\TextField[readonly=true,value=0,calculate={event.value=this.getField("football").value+this.getField("ice-hockey").value;}]{Summary score:}
\end{Form}

\end{document}

Fig. 1 Output

enter image description here

Testing accsupp [rejected because cannot take user input] (Steven)

Code which is not good example because it has integrated values and is not taking values from the user;

\documentclass{beamer}    
\usepackage[english]{babel}    
\usetheme{Berkeley} 
\usepackage{accsupp} % https://ctan.org/pkg/accsupp

\begin{document}

\begin{frame}
\frametitle{Field}
\section{Field 2}

\begin{equation}
    \BeginAccSupp{
        method=pdfstringdef,
        unicode,
        ActualText={%
            a\texttwosuperior +b\texttwosuperior
            =c\texttwosuperior
            }
        }
    a^2 + b^2 = c^2
    \EndAccSupp{}
\end{equation}

\end{frame}

\end{document}

Output in Fig. 2 where I do not really see the point of this package with user inputs, since it is not asking them in the form.

Fig. 2 Output of too simple basic example of accsupp

enter image description here

OS: Debian 9
TeXLive: 2017

1 Answers1

10

As described in Save fillable forms, you can create forms that send their values back to you by email if you click the submit button:

\documentclass{article}
\usepackage{hyperref}
\begin{document}

\begin{Form}[action=mailto:forms <forms@stackexchange.invalid>?subject=The submitted form]
\begin{enumerate}
\item \ChoiceMenu[name=football,radio,default=0]{Do you play football?}{Much (2)=2,Little (1)=1,Not at all (0)=0}
\item \ChoiceMenu[name=ice-hockey,radio,default=0]{Do you play ice-hockey?}{Much (2)=2,Little (1)=1,Not at all (0)=0}
\end{enumerate}

\TextField[name=summary,readonly=true,value=0,calculate={event.value=this.getField("football").value+this.getField("ice-hockey").value;}]{Summary score:}

\Submit[export=xfdf]{Submit}
\end{Form}

\end{document}

If you click the Submit button, an email to the provided address (here forms@stackexchange.invalid) will be composed in your default email program with an attached .fdf file. This attachment contains the submitted data as XML:

<?xml version="1.0" encoding="UTF-8"?>
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve"
><fields
><field name="Submit"
/><field name="football"
><value
>1</value
></field
><field name="ice-hockey"
><value
>2</value
></field
><field name="summary"
><value
>3</value
></field
></fields
><ids original="4C4F1F968A20B15FEDBFD76188D43221" modified="4C4F1F968A20B15FEDBFD76188D43221"
/></xfdf
>

The XML file generated by Adobe Acrobat (Reader) looks a bit peculiar, but can be further processed by any XML parser. Other possible output formats for the .fdf file, specified by the export option of the submit button, are:

  • export=html: exports the data in query string syntax.
  • export=fdf: uses Adobes own Forms Data Format, basically a simplified version of the PDF file format, which might be useful if you want to process the data further using Adobe software.
  • export=pdf: attaches the whole, filled-out PDF file to the email, which you can then process as described in the second part of this answer.

As an alternative, you could consider using the open source PDFtk to extract data from a saved filled-out PDF document: running

pdftk document.pdf dump_data_fields

on a filled-out document document.pdf will report something like

---
FieldType: Button
FieldName: football
FieldFlags: 49152
FieldValue: 0
FieldValue: 1
FieldJustification: Left
FieldStateOption: 0
FieldStateOption: 1
FieldStateOption: 2
FieldStateOption: Off
---
FieldType: Button
FieldName: ice-hockey
FieldFlags: 49152
FieldValue: 0
FieldValue: 2
FieldJustification: Left
FieldStateOption: 0
FieldStateOption: 1
FieldStateOption: 2
FieldStateOption: Off
---
FieldType: Text
FieldName: summary
FieldFlags: 1
FieldValue: 3
FieldJustification: Left
diabonas
  • 25,784
  • 1
    I do not know OpenRefine, so I am afraid I cannot help you with that in detail. Glancing over the docs, it seems that it should be able to import XML files, which can be produced using the first approach presented in this answer, which I modified a bit for this purpose. The second approach is certainly usable as well, but you might have to do some preprocessing using e. g. sed or awk to get the data in a state that OpenRefine is able to work with it. – diabonas Aug 18 '17 at 08:43
  • 3
    That being said, PDF forms are always a bit annoying to deal with, as many features only work with the proprietary Adobe products and fail if you use different software, e. g. because you are using Linux where the latest supported version is the ancient and unsupported Adobe Reader 9. Have you considered using a simple HTML page instead, which should work on virtually all systems and would make processing that data easier as well, using a backend server software like PHP or Ruby on Rails? – diabonas Aug 18 '17 at 08:44
  • 1
    If you want to need to do it offline, i. e. no connection to a central server is possible, and all participants already have Adobe software installed, I concur that an interactive PDF is a viable solution. In this case, the best approach depends on how you want to receive the results: is sending them back by email a possible solution? Then the first approach is probably the easiest one for you to handle. If that is not an alternative, you could also save the resulting XML file directly into a file that you can then collect offline (I can extend the answer if that would be useful for you). – diabonas Aug 22 '17 at 10:22
  • 1
    I tried the action=mailto:... variant. It only works for me (windows, current adobe reader) if I sent it to my main private mail address, all other addresses (own (alias and other accounts), foreign, invalid) gives an error message that the address could not be resolved. – Ulrike Fischer Oct 04 '17 at 10:21
  • 1
    @UlrikeFischer That's weird, I have never seen this error message before. Actually, I'm inclined to believe this is an error message generated by your email program that Adobe Reader passes the message to rather then the PDF viewer itself. Which mail client are you using? A screenshot of the error message, or at least the exact wording might be helpful as well. – diabonas Oct 06 '17 at 09:06
  • 1
    @LéoLéopoldHertz준영 Regarding saving the data to a file (as opposed to sending it back by mail), I've already covered that in a different answer. Either way, you will get a .xfdf for each single filled-out form, by mail if you opt for the method given in this answer or you need to collect them "manually" if your users save them to their local hard drive with the method given in the linked answer. You then batch import all these files at once in OpenRefine to get a table with all form results. – diabonas Oct 06 '17 at 09:16
  • 1
    I'm using TheBat. With thunderbird it works, but I can't tell you if it is a general problem with the application or due to the more complicated setup in the first (I manage more than one email account with it, and not all are imap). Simple \href{mailto:forms@stackexchange.invalid}{mailto-test}-urls work fine. – Ulrike Fischer Oct 06 '17 at 09:23
  • 1
    @UlrikeFischer I can reproduce the problem with TheBat, which apparently has an issue with the format of the email address: adding a display name to the email address seems to solve the problem, see the edited example code. – diabonas Oct 06 '17 at 14:52
  • Hey that's good. I'm impressed. How did you find that out? – Ulrike Fischer Oct 06 '17 at 15:01
  • 1
    @UlrikeFischer Luckily somebody suggest this approach on the Adobe forum, otherwise I probably wouldn't have guessed ;) – diabonas Oct 06 '17 at 15:31