10

I'm writing my masters thesis and my supervisor has picked up my sentence structure, in that they are too long. She commented that Microsoft Word has a feature to investigate 'large' sentences. How can I do this with TeX? I'm using a mac and TexShop.

David Carlisle
  • 757,742

1 Answers1

9

I think this is the easiest way. First, make sure you have pdftotext and diction installed. These should be available via MacPorts.

  1. Render your document to a PDF. Let's assume it's called paper.pdf.
  2. Grab the plain text from the PDF using pdftotext. At the terminal, run this: pdftotext paper.pdf paper.txt
  3. Now run style -l N paper.txt, where you should replace N with a number. This will print out all lines of your document that are longer than N words.

Alternatively, you can do it all as a one-liner:

$ pdflatex paper.tex && pdftotext paper.pdf - | style -l 20

style is extremely powerful and has many other features. For a good overview, see here.

ESultanik
  • 4,410
  • One could alternatively use DeTeX instead of pdftotext to get a plain text version to send to style. I find, however, that pdftotext works a lot better, especially if one has a lot of markup in his or her document that DeTeX doesn't understand (e.g., TikZ). pdftotext also works a lot better if one splits one's original document up into multiple .tex files that one \include{}'s. – ESultanik Aug 26 '10 at 12:38
  • using style without any parameters will give you a summary that will tell you the % of long ("at least 27 word") sentences. – Geoff Aug 26 '10 at 13:35
  • @ESultanik, do you know how to tell diction/style that abbreviations are not (usually) ends of sentences? (Dr., Fig., etc.) – Geoff Aug 26 '10 at 13:42
  • @Geoff This should work for most cases (except for 'etc.', but it's easy to extend the regex for those special cases): pdftotext paper.pdf - | sed 's/([A-Z][a-zA-Z]*)./\1/g' | ... – ESultanik Aug 26 '10 at 14:37
  • @ESultanik. Just drop the period, okay. It seems to me that style should support this. – Geoff Aug 26 '10 at 15:33