Is there any way to do a correct word count of a LaTeX document?

Question

Often assignments (or even papers) have a word count limit. That is not a big deal when using Word, but I don't know how to do that using LaTeX. My solution has been so far to compile the document and then do a rough word count of my pdf file, sometimes even copying the contents of the pdf file and pasting in Word to get a mostly correct Word count.

Is there any tool (maybe even an online tool), package, script or software to do that directly from my .tex document and still get the right word count (i.e., ignore commands, equations, etc)?

Same question on Stack Exchange: http://stackoverflow.com/questions/2974954/correct-word-count-of-a-latex-document — Charles Stewart, Jun 18 '11 at 12:07
Under Linux I normally do it over the PDF to get a rough count: pdftotext file.pdf - | wc -w, but this also counts page numbers etc. as words. — Martin Scharrer, Jun 29 '11 at 18:27
Word count is never perfectly defined: How much words in can't? In an algorithm? In a figure with several texts? So, the notion of correct word count does not exist... — Paul Gaborit, Jul 17 '12 at 11:43
Emacs native tex-mode has a word count function: M-x tex-count-words. — giordano, Jun 04 '13 at 08:33
watches your LaTeX word limit as you type and save: % watch -d "detex index.tex | wc -w" — Vaibhav Bajpai, Feb 28 '14 at 19:43
"Word count" may not mean what you think it means. In English, should "a" or "I" count as a word, on the same basis as "inexhaustible" or "electrodynamics"? In some contexts (notably book publishing) a word is regarded as a certain number of characters, including spaces. The number is often 6. So, if the printed document (not TeX code, but printed) has 60000 characters including spaces, it has 10000 words. Then, the space occupied by ordinary language, not tech, would occupy a predictable amount of space, on average. — , Jan 02 '17 at 23:19

score 229 · Accepted Answer · edited Oct 01 '22 at 02:20

229

This is in the TeX FAQ. The solutions suggested are:

detex filename (which tries to strip LaTeX commands), then use any word count tool. (e.g. wc)
latexcount.pl, a Perl script for word count
texcount, another script even has an online interface
wordcount, which has a script that runs LaTeX with some settings, then counts word indications in the log file.

edited Oct 01 '22 at 02:20

barbara beeton

88,848

answered Jul 29 '10 at 00:52

ShreevatsaR

45,428
10
117
149

22

texcount is pretty neat, especially the friendly online interface. One caveat: it recognizes align, equation, [ ], and $ as defining math environments (and possibly more). But it somehow misses align*. – Willie Wong Jul 29 '10 at 01:01
30

Just want to address my previous comment: the author got back to me and it will be fixed in the next version (2.3) – Willie Wong Jul 29 '10 at 11:19
1

Do these methods ignore all the words inside math environments? – Malabarba Sep 28 '10 at 16:33
9

If you use texcount you'll want to run it with options like -inc -incbib -sum to get a more accurate total. – Seamus Jul 17 '12 at 09:56
another caveat with texcount: it works off the raw tex file, so it will ignore any inline references placed using bibtex. this makes it worthless for most academic articles. – Jeff Nov 28 '12 at 01:27
9

@Jeff though quite often word limits exclude the references. – Chris H Jan 01 '14 at 16:25
1

yet another caveat with texcount - it doesn't understand packages such as acronym which can cause quite a lot of expansion. It also doesn't include 2nd level \inputs. So misses all my big tables and their captions, tablenotes etc. – Chris H Jan 01 '14 at 16:27
@ChrisH in my experience, it excludes the bibliography only; i don't think i've seen any guidelines that exclude inline references. YMMV. – Jeff Jan 02 '14 at 00:56
1

@Jeff in my field inline references are a number, either in square brackets or superscripted, so are neither here nor there in assessing word count. Occasionally we might write out "... as Smith and Jones proposed in 1987 [1]..." but that's quite rare - the total words added by cite commands in a long review would be a few tens. – Chris H Jan 02 '14 at 09:51
@ChrisH yes, that's fine, and I wouldn't consider that an inline citation (such as APA or MLA, as opposed to Vancouver style). my comment was specifically directed at those who use inline citations. – Jeff Jan 02 '14 at 19:56
Make sure to run texcount.pl FROM the folder where your .tex files reside, otherwise you get these errors: !!! File not found: <filename> in [./] !!! Of course you don't have to copy files to your latex dir but just run it from there :) – Michahell Jul 31 '15 at 13:26
Texcount is the best in my opinion; I find the usability superb compared to some others I have tried. The detailed breakdown (of how many words) per (sub(sub))section is especially useful.
+1 for mentioning it to the commenter and response.
– gktscrk Jan 24 '17 at 21:20

matth · Answer 2 · 2013-04-08T05:51:20.280

103

The Texmaker integrated pdf viewer offers a word count feature since version 3.4.
Just right-click in the pdf document, then click Number of words in the document.

enter image description here

edited Apr 08 '13 at 05:51

answered Jul 17 '12 at 08:53

matth

12,381

10

Wow, I’ve been using that editor for a while but didn’t know this! I just played around with it for a little bit and it seems that the word count includes everything that has a space between itself and the next thing, i.e. the page numbers, the section numbers and titles, the section numbers and titles in the ToC, the ToC title, and even every dot in the dotted line for the \subsections in the ToC. – doncherry Apr 07 '13 at 17:08
1

issue 593 has been deleted from the issue tracker, but is archived in the internet archive: https://web.archive.org/web/20130502001436/https://code.google.com/p/texmaker/issues/detail?id=593 – matth Jul 10 '15 at 08:00

Konrad Rudolph · Answer 3 · 2021-09-27T09:30:44.193

42

Here’s an excerpt from my .vimrc that gives me a comfortable word count in Vim:

function! WC()
    let filename = expand("%")
    let cmd = "detex " . filename . " | wc -w | tr -d '[:space:]'"
    let result = system(cmd)
    echo result . " words"
endfunction
command WC call WC()

Now I can invoke :WC in command mode to have the word count echoed in the status line.

edited Sep 27 '21 at 09:30

answered Jul 29 '10 at 16:48

Konrad Rudolph

39,394
22
107
160

6

I cannot help but think that calling perl just to do chomp and a regex is way too much of an overkill. =) Why not just ' ... | wc -w | tr -d [:space:]'? – Willie Wong Aug 04 '10 at 00:06
@Willie: good call. Don’t remember why I used Perl here. – Konrad Rudolph Aug 04 '10 at 08:38
9

now I would be much happier if you had given me something I can use in Emacs! :P – Vivi Aug 04 '10 at 09:45
5

@Vivi http://superuser.com/questions/125027/word-count-for-latex-within-emacs – Seamus Feb 15 '12 at 12:05
@williewong: what does wc count? i saw there was potential for confusion, so i created a test latex doc. the body contained "how many words?", and wc told me 6 words (huh?); i could have forgiven it for counting all the macro names and their arguments, or for counting the arguments only, but what it counts is blocks of text between spaces or line ends. so wc doesn't hack it for a tex user, i guess. – wasteofspace Nov 24 '12 at 19:05
1

@wasteofspace: MWE? Issuing echo "how many words?" | wc -w the output is 3 as expected. If your "test latex doc" includes the standard \documentclass{article} \begin{document}... \end{document}, each of the three counts as one additional word. Those are supposed to be stripped out using the detex command in the OP. Lastly, man wc. – Willie Wong Nov 26 '12 at 09:46
3

Another option is command! -range=% WC <line1>,<line2>w !detex | wc -w, which accepts an optional range. – Brian McCutchon Nov 20 '15 at 18:26
Popping by after getting tr -d [:space:] to complain about not finding [:space:]. Had to enclose it in quotation marks. Full command for me is now let cmd = "sed -n '/\\\\begin{document}/,/\\\\end{document}/p' " . filename . " | detex | wc -w | tr -d '[:space:]'". I also changed the filename to expand("%:p") to avoid any potential relative path errors. Yes, that is indeed four units of backslashes. One pair for nvim, one pair for sed. – mazunki Sep 27 '21 at 08:41
@mazunki The quotes should definitely not be required: none of the following characters are special characters for the shell. And for me it works without, both on Linux and on macOS. – Konrad Rudolph Sep 27 '21 at 08:47
@KonradRudolph Could it be because of zsh? You can clearly see the quotes being required here: https://imgur.com/YQoNpmW. Either way, the quotes will never hurt, will they? – mazunki Sep 27 '21 at 08:48
1

@mazunki Oh yes, zsh behaves differently; I didn’t realise system used the $SHELL to run commands, I thought it always used /bin/sh. And you’re right, the quotes don’t hurt. – Konrad Rudolph Sep 27 '21 at 09:29

score 31 · Answer 4 · answered Apr 07 '13 at 16:36

31

I use texcount with the following parameters:

texcount file.tex -inc -incbib -sum -1

Output is simple like this:

If you remove the -1, then you can get more information:

word count (#headers/#floats/#inlines/#displayed)
3996+48+99 (22/9/0/0) Included file: parts/blup.tex

answered Apr 07 '13 at 16:36

aphex

1,197

3

great one, best answer – Gery Jul 04 '19 at 20:54
But in the documentation, the author was mentioned that TEXcount may be asked to count the number of letters/characters (not including spaces any suggestion on this? Please – MadyYuvi Feb 05 '22 at 04:03

Fran · Answer 5 · 2013-06-04T10:12:52.020

28

You can obtain texcount results in the own LaTeX document:

MWE

Note that this MWE require the filename borra.tex (or modify the code accordingly).

% CAUTION !!!
% 1) Need --enable-write18 or --shell-escape 
% 2) This file MUST be saved 
%    as "borra.tex" before the compilation
%    in your working directory
% 3) This code will write wordcount.tex
%    and charcount.tex in /tmp of your disk.
%    (Windows users must change this path)
% 4) Do not compile if you are unsure
%    of what you are doing.

\documentclass{article}
\usepackage{moreverb} % for verbatim ouput

% Count of words

\immediate\write18{texcount -inc -incbib 
-sum borra.tex > /tmp/wordcount.tex}
\newcommand\wordcount{
\verbatiminput{/tmp/wordcount.tex}}

% Count of characters

\immediate\write18{texcount -char -freq
 borra.tex > /tmp/charcount.tex}
\newcommand\charcount{
\verbatiminput{/tmp/charcount.tex}}


\begin{document}


\section{Section: text example with a float}

Words and characters of this example file are 
automatically counted from the source file 
when compiled (therefore generated text as 
\textbackslash{}lipsum[1-10] is {\bfseries not} 
counted). The results are showed at the end 
of the compiled version.
Counts are made in headers, caption floats 
and normal text for the whole file. Subcounts 
for structured parts (sections, subsections, 
etc.) are also made. Number of headers, 
floats and math chunks are also counted. 

\begin{figure}[h]
\centering
\framebox{This is only a example float} 
\caption{This is a example caption}
\end{figure}

\subsection{Subsection: Little text with math chunks}

In line math: $\pi +2 = 2+\pi$ \\   
Display math: \[\pi +2 = 2+\pi\] 

%TC:ignore  
\dotfill End of the example \dotfill 

\subsubsection*{Counts of words} 
\wordcount

%TC:endignore   

\end{document}

edited Jun 04 '13 at 10:12

answered Jun 04 '13 at 01:01

Fran

80,769

I suggest you change it to:`\immediate\write18{texcount -nc -inc -sum \jobname.tex > wordcount.tex} \newcommand\wordcount{\verbatiminput{wordcount.tex}}
% Count of characters

\immediate\write18{texcount -char -nc -freq \jobname.tex > charcount.tex} \newcommand\charcount{\verbatiminput{charcount.tex}}` then becomes filename independant.
– Louis Apr 05 '14 at 16:31
@Louis, I have done what you suggested, but this only displays the contents of the file at the end of the document. Where can I find out the number of words? – Edy Jo Jan 27 '15 at 20:32
@EdyJo I am not sure if I understand your question, but the macro \wordcount produce it where it says Words in text: 75 or the other statistics below or above that line gives you the sum and in other areas of the document. The detail report below provide the information per chapter, section, etc. I use it in compiling a large document for my own motivation and purposes, but when I send a revision to the reviewers I comment it out before compilation. I even added some lua code yesterday to produce real tables. – Louis Jan 27 '15 at 21:13
Good answer – personally I wouldn't give the output a .tex extension, though – that's traditionally reserved for user-generated content. – Sean Allred Apr 21 '15 at 13:47
For reference: I counted 114 words in this example. (up to, and including, "End of the example"). Emacs' tex-count-words counted 229 – fiacobelli May 16 '17 at 18:44
detex filename.tex | wc yielded 101 texcount, as expected counted 94 – fiacobelli May 16 '17 at 18:52
@fiacobelli "End of example" should be not in the count (is between %TC:ignore tags): The remaining difference I suppose that are the caption label and counters of sections. – Fran May 16 '17 at 19:03
A note for Windows users (using powershell): the redirection operator > creates a file in the shell encoding (UTF16 in my case). This cannot be read by LaTeX during compilation, resulting in a lot of "text line contains an invalid character. <read 3> ^^@" errors. In stead, use texcount <args> | Out-File -Encoding <utf8 or ASCII> wordcount.txt and include with \verbatiminclude{wordcount.txt} – Fee Apr 30 '21 at 08:28

score 19 · Answer 6 · answered May 06 '12 at 13:15

You can use the word count code from Context (lang-wrd.lua). I took the liberty and adapted it for Plain (should work with the LaTeX format as well). The code is stripped of more Context specific features and relies on the character property definitions from char-def.lua. This way there’s no need for external tools and as a bonus you can insert the current word count wherever you like inside the document itself.

The usage example has some explanations.

\setwordthreshold{3} %%% min chars in a row to count as word
\startwordcount      %%% start callback
\input knuth\par     %%% counted
\currentwordcount    %%% => 94 with threshold == 3
\input knuth         %%% counted
\stopwordcount       %%% deregister callback
\input knuth         %%% not counted
\dumpwordcount       %%% => 188

Everything between \startwordcount and \stopwordcount picked up, the rest will be ignored, so you can manually exempt passages from being counted. The word threshold would have to be set to 1 for English.

Due to the nature of thre pre_linebreak_filter you will get word counts only by paragraph, though.

This looks promising.. Is it possible to dump the total wordcount before starting to count? — Frederik, Aug 21 '14 at 07:23
You’d have to store it somewhere between TeX runs. Write the result of packagedata.word_count.current_word_count() to some temporary file and re-read it at the start of the next run. — Philipp Gesang, Aug 21 '14 at 16:25
This code gives me word_count.lua:91: attempt to index local 'data' (a nil value). Maybe line 91 should read as if data ~= nil and is_letter[data.category] then ? — bonanza, Mar 02 '19 at 11:26

score 15 · Answer 7 · edited Feb 15 '12 at 10:40

15

Way back in the depths of time, I scribbled my own perl script to do this. My reason for doing this myself was that sometimes I wanted to count words in command arguments and sometimes not, so I built in a selection routine. Plus I figured that a bit of maths was worth a word so added that in. As the script is really simple, I'm copying it here (which automatically makes it some sort of free-to-use, I guess!).

I don't think that I've used it for years, though - it's been a long time since "number of words" mattered to me at all.

#!/usr/bin/perl -w

@ARGV and $ARGV[0] =~ /^-+h(elp)?$/ && die "Usage:\t$0 files\n\t$0 < files\n\t$0\n";

my $count = 0;
my $first = "";
my $tex = 0;

while ($first =~ /^\s*$/) {
    $first = <>;
}

if ($first =~ /^\\(input|section|setlength|documentstyle|chapter|documentclass|relax|contentsline|indexentry|begin|glossaryentry)/) {
    $tex = sub { $r = $_[0];
                 $m = $_[1];
                 $r =~ s/\\(emph|textbf|textit|texttt|em)\{//g;
                 $r =~ s/\\(sub)*section\*?\{[^\}]*\}//;
                 $r =~ s/\\title\{[^\}]*\}//;
                 $r =~ s/\\\(.*?\\\)/maths/g;
                 $r =~ s/\\\(.*?$/maths/;
                 $r =~ s/^.*?\\\)/maths/;
                 $r =~ s/\\\[.*?\\\]/maths/g;
                 $r =~ s/.*?\\\]// and $m = 0;
                 $m and $r = "";
                 $r =~ s/\\\[.*?$// and $m = 1;
                 $r =~ s/\\\S*//g;
                 $r =~ s/%.*//;
                 return ($r,$m) };
} else {
    $tex = sub { return ($_[0],0) };
    @split = split(" ", $first);
    $count += $#split + 1;
}

while ($s = <>) {
    ($t,$n) = &$tex($s,$n);
    @split = split(" ", $t);
    $count += $#split + 1;
}

print "Number of words: $count\n";

edited Feb 15 '12 at 10:40

doncherry

54,637

answered Jul 29 '10 at 08:31

Andrew Stacey

153,724
43
389
751

5

@Andrew: again the issue: what do you do with it? how I am supposed to use it? Can you add that to your answer? – Vivi Jul 29 '10 at 08:44
7

@Viv: to be honest, I would say that if you don't know what to do with this then you aren't supposed to use it and should use one of the answers in the accepted answer. – Andrew Stacey Jul 29 '10 at 09:18
28

I am glad you were born knowing how to do it, otherwise you wouldn't be using it now, right? Because if you don't know it, you are not supposed to learn it, isn't that right? – Vivi Jul 29 '10 at 10:59
15

@Vivi: I apologise for the fact that my remark has come across other than I intended it. I did start writing out the instructions, but they are so OS and user specific that I gave up. I'd be happy to help with this sort of thing (though I think that the comments here are not the best format for such help), but in this specific case I really do think that if you don't know how to do it, then this is not the right way to do it. It's an old script that I dug up and it is very me-specific so almost certainly would need a little tweaking to make it useful to anyone. (contd) – Andrew Stacey Jul 29 '10 at 11:09
8

@Vivi: (contd) I do learn lots from seeing little scripts and ideas that people have written like this one so I make my scripts available for others to learn from. But - as you sort-of say - I already know the basics so all I need is the script itself. To someone who knows a little perl, this might be a useful skeleton for building their own script, but if someone has never encountered perl scripts before then it really isn't going to help and you'd be better off using one of the methods used elsewhere. (PS I apologise also for spelling your name incorrectly in #2) – Andrew Stacey Jul 29 '10 at 11:13
11

@Andrew: In Brazil I am Vivi, but here I am Viv, so you didn't spell it wrong after all! Thanks for your answer. I understand some things are too complicated to explain here, and it is not really the place. This thing (scripts, and pearl) keeps coming up so often that I decided it is time to understand it! I will ask this to someone that lives here and can explain to me in person (about using scripts in general, but I will probably leave yours for later, given what you said). Thanks again for taking the time to explain, and thanks for sharing the code :) – Vivi Jul 29 '10 at 11:58

score 13 · Answer 8 · edited Sep 07 '22 at 09:49

13

Compile the (La)Tex document to DVI and then execute:

 catdvi document.dvi 2> /dev/null | wc -w

Redirecting the STDERR stream (2>) to /dev/null prevents excessive output of errors and warnings like unknown font encoding, etc.

Converts your DVI file to a text-only file and counts the words using wc; it does include page numbers and section numbers; however, is thought to be a simple, and reasonable solution.

edited Sep 07 '22 at 09:49

dexteritas

9,161

answered Mar 22 '12 at 21:52

Bob

1,168

1

This gets fooled by hyphenation but should be a good approximation. – lhf Jul 17 '12 at 10:46

score 13 · Answer 9 · answered Jul 29 '10 at 00:43

13

The first one to come to mind is detex which strips a tex file of commands. You will then have to pass it through wc or some other word counting software. A search on the internet also brought up two items on Sourceforge: word counter 1 and word counter 2.

Disclaimer: out of the three, I've only used detex before. It worked reasonably well, but I was working with an English essay and it had no equations, so I don't know how it plays with math mode stuff. (Currently I don't have it installed so I can't check.)

answered Jul 29 '10 at 00:43

Willie Wong

24,733
8
74
106

It's been a while since I used it but if I remember correctly, I don't think detex completely strips out the math content of a file, so that might skew the word count. – David Z Jul 29 '10 at 00:47
2

I had opportunity to use detex recently and it leaves many TeX-related words which have nothing to do with the content. I would almost go as far as saying that compiling to PDF and then using pdftotext might produce a more accurate count, even when it contains page numbers and repeats the headers. – José Figueroa-O'Farrill Jul 29 '10 at 00:50
2

Just tried it on a recent paper I wrote: yeah, detex is wildly off. I get even better results from piping dvi2tty to wc. – Willie Wong Aug 04 '10 at 00:51
Never used detex, but untex which comes in Debian seems to do the job. – helcim Aug 06 '10 at 15:23
@José: I found pdftotext to be much better. Especially since I tend to write macros that generate text so stripping them produces wildly incorrect word counts. – TH. Nov 27 '10 at 17:41

Niko Z. · Answer 10 · 2017-12-14T03:51:34.153

Texstudio offers an advanced word count. It is located in the menus under Tools --> Word Analysis.

It refers to words as 'phrases' and offers different options and filters. It can also do word count on specific selection.

I have compared the output to MS Word and LibreOffice Writer, and they are mostly the same. The advantage of Texstudio is that by default it will not count table of contents and bibliography in the total word ('phrase') count. That makes it really convenient to get a reliable estimate on the go as one is editing the document.

score 10 · Answer 11 · answered Jul 29 '10 at 05:36

10

The last time I had to worry about this, I compiled my LaTeX document to PDF and ran it through pdftotext.

answered Jul 29 '10 at 05:36

Blake Stacey

1,420
2
13
14

This approach cannot deliver reliable results due to unreliable conversion by pdftotext. – helcim Aug 06 '10 at 15:26
1

Do you have an example of how pdftotext fails? I got reasonable results when I used it, but the documents I was using were not particularly elaborate (I don't think they had figures, for example). – Blake Stacey Aug 08 '10 at 18:46
1

This will count headers and page numbers in your word count. This will also count lots of words in mathmode. Try doing pdftotext and then wc on a file containing the equation $x+y=z$ . That counts as something like 5 words for this method… – Seamus Jul 17 '12 at 09:55
@Seamus For documents without much logic or maths, though, I've found it more accurate than alternatives when combined with a script to remove headers, footers etc. For documents with a lot of logic or maths, it would be hopeless. (I have no idea how those should be calculated, either.) – cfr Sep 12 '14 at 23:45

score 9 · Answer 12 · answered Feb 12 '16 at 00:46

9

In the specific case where Sublime Text is used for writing latex documents, one can use the package LaTeX Word Count.

answered Feb 12 '16 at 00:46

sodiumnitrate

1,257

score 9 · Answer 13 · answered Apr 23 '17 at 10:20

9

If you are using Overleaf you can click the word count button: wordcount

This will show these stats: stats

You can also easily import an existing document.

answered Apr 23 '17 at 10:20

Freya

191

score 9 · Answer 14 · edited Feb 15 '12 at 10:40

9

For Windows users, the LaTeX Word Counter is pretty neat.

edited Feb 15 '12 at 10:40

doncherry

54,637

answered Nov 27 '10 at 14:01

MSpeed

411

1

The Sourceforge page says it's also available for Linux and Mac. – doncherry Feb 15 '12 at 10:41

score 8 · Answer 15 · answered Aug 12 '16 at 13:11

In addition to Philipp Gesang's answer I'd like to mention how to use the spellchecker module in ConTeXt to count words. It is adapted from the Spellchecker wiki page.

The word count extracted in the wiki includes inline math and content set with \type, though. To have the word count per language without math and \type you have to query categories.document.languages.en.total of the words file array.

\setupspellchecking[state=start,method=2]
\ctxlua{languages.words.threshold=1}

\starttext
\input knuth
\startformula
  x_{1,2} = \frac{-b\pm\sqrt{b^2-4ac}}{2a}
\stopformula
\input ward
\m{E = m c^2}

\startluacode
local wordfile = "\jobname.words"
if file.is_readable(wordfile) then
    local data = dofile(wordfile)
    context.startitemize({"packed"})
    context.item("Total words (including inline math): " .. data.total)
    context.item("Total words (in language \\type{en}): "
                 .. data.categories.document.languages.en.total)
    context.item("Total unique words (in language \\type{en}): "
                 .. data.categories.document.languages.en.unique)
    context.stopitemize()
end
\stopluacode

\stoptext

Sorry for the potentially stupid question, but is there any way to use this with lualatex? — bonanza, Feb 28 '19 at 18:03
@bonanza Yes, see Philipp's answer: https://tex.stackexchange.com/a/54630 It might need some adjustments to run with current LuaTeX though. — Henri Menke, Feb 28 '19 at 20:28
Thanks for your reply. Unfortunately, I am a complete noob regarding (lua)latex. Could I bribe you with some extra reputation to write a complete/noob-safe answer? — bonanza, Mar 01 '19 at 08:25

score 8 · Answer 16 · 2012-03-06T05:40:37.393

8

If you are on Windows and do not mind purchasing software, use WinEdt. It has a built in word count feature (Document->word count).

edited Mar 06 '12 at 05:40

answered Feb 15 '12 at 10:00

score 8 · Answer 17 · answered May 06 '12 at 18:29

In general the answer is NO.

Nearly all requesters of word counts are not interested in the number of words but rather in the amount of space (pages) that the document will need when printed. If there are figures should the words in captions be counted without the space required by the illustration being taken into account? Are equations words, and if so is it one 'word' per variable/symbol or one 'word' per equation? If a paper consists of nothing more than title, author, a sentence and 100 math expressions is that about 50 or 500 'words'? Is a hyphenated word one or two? Does a document that mainly consists of 3 or 4 letter words compare equally with one that has a preponderance of 8 to 10 letter words?

I think that the traditional method is best: print the document, count the average number of 'words' per line in a typical page and multiply by the average number of lines per page and by the number of pages.

It is highly unlikely that the recipient of your work will actually count the number of words.

:@peterwilson: i don't think that's the case; universities are notorious for setting word counts for dissertations, and then employing a menial to count the words when the copy is submitted. in this case, they don't (really) care about the printed size, they care about the idiot letter of their idiot regulations. (iirc, my university (cambridge) has recently followed down that crazy path, having previously set a page limit.) — wasteofspace, Jul 17 '12 at 10:40

score 7 · Answer 18 · answered Dec 18 '13 at 15:09

For Mac users, TeXShop (at least version 3.26) has a line, word and character count under Edit>Statistics. I never tested how well it works, but since TeXShop recognises syntax for colour-coding, I assume it is able to ignore most commands for the text.

Omar Wasow · Answer 19 · 2020-02-02T08:00:03.977

Combining texcount + knitr + R allows for dynamic in-text word count estimation. The code chunk below works on a Mac by calling the Texcount Perl script, grabbing the name of the current file (or, running it on myfile.tex) and then returning a limited set of stats (the -total option) including the sum of all words (the -sum) option. As noted elsewhere in this thread, you may want to adjust the texcount options to include things like the bibliography. Once word count is extracted, a comma is added (if appropriate) and can then be referenced inline with the Sweave command \Sexpr{}.

The word count will always be for the second-to-last compile but compiling twice will solve that (much as with bibtex or table/figure references). I believe the code to call Perl from within R varies by platform so you may need to adjust the system() command below for non-Macs.

<<wordcount, echo = FALSE, cache = FALSE>>=
# adds comma for printing numbers, from scales package by Hadley Wickham
comma <- function (x, ...) {
  format(x, ..., big.mark = ",", scientific = FALSE, trim = TRUE)
} 

# To dynamically extract name of the current file, use code below 
file_name     <- current_input() # get name of file
file_name     <- strsplit(file_name,"\\.")[[1]][1] # extract name, drop extension
file_name_tex <- paste0(file_name, ".tex") # add .tex extension

system_call   <- paste0("system('texcount -inc -incbib -total -sum ", file_name_tex, "', intern=TRUE)") # paste together texcount system command  
texcount_out  <- eval(parse(text=system_call)) # run texcount on current last compiled .tex file

# Or, to manually write name of `myfile.tex`, uncomment and modify line below
# texcount_out <- system("texcount -total -sum myfile.tex", intern=TRUE) 

sum_row <- grep("Sum count", texcount_out, value=TRUE) # extract row
pattern <- "(\\d)+" # regex pattern for digits

count   <- regmatches(sum_row, regexpr(pattern, sum_row) ) # extract digits
count   <- comma(as.numeric(count)) # add comma
@

Word count: \Sexpr{count} % reference R variable in Latex prose

score 7 · Answer 20 · answered Nov 18 '15 at 12:27

7

If you use the online tool ShareLatex then this now has a built in word count:

https://www.sharelatex.com/blog/2015/09/15/word-count.html

answered Nov 18 '15 at 12:27

Peter C

171

1

Nowadays Overleaf. – Tommi Jun 05 '23 at 12:30

score 7 · Answer 21 · answered Jun 20 '17 at 01:43

7

In bash, try:

detex file.tex | wc -w

The first command detex strips latex commands/comments from the file. The output of that is piped to wc -w, which counts the number of words.

answered Jun 20 '17 at 01:43

innisfree

598
5
17

There are already two other answers focusing on detex. Does your answer add anything new? – Teepeemm Jun 20 '17 at 03:03
3

Yes, it adds a bash one-liner for the problem. The others circled around it. – innisfree Jun 20 '17 at 04:22

score 7 · Answer 22 · answered Jan 05 '20 at 10:13

7

If you happen to have the wonderful document conversion packagepandocinstalled:

pandoc -f latex -t plain main.tex | wc -w

Note: This works with any other format pandocsupports not just LaTeX.

answered Jan 05 '20 at 10:13

Mahomet

171

score 6 · Answer 23 · edited Feb 15 '12 at 10:42

6

You can try Microspell. It's a very robust software that knows if you have a main tex document and other subsidiary ones.

edited Feb 15 '12 at 10:42

doncherry

54,637

answered Sep 16 '10 at 17:44

yCalleecharan

4,739

score 5 · Answer 24 · answered Sep 01 '17 at 23:21

The prior answers are (I believe) more than adequate for the original question. But for the benefit of others who find this via search, I would like to provide more information.

"Word count" can mean many things. It is not necessarily determined by looking for word boundaries (space and return).

One widely-used measure, at least for U.S. English, is to visualize an old-fashioned typewriter, where each keystroke generates a character (including quote, period, comma, and space). Carriage return is also a character. Then, take the number of characters, and divide by six. This assumes an average word length (in U.S. English) of five letters, plus a space.

The above definition is useful for estimating how many pages will be used in a lengthy, printed book or manuscript. Of course, if you are preparing a PDF with TeX, you know exactly how many pages it uses.

Note that this criterion is not useful for academic papers containing illustrations, tables, and images.

I do not know whether MS Word counts word boundaries, or characters/6. In theory, the result should be almost the same, for lengthy flowing text (U.S. English).

I recently wrote a book, for which the page count measured by characters/6 was 220. The actual page count, using TeX with 5.5"x8.5" layout, was 240 pages including blanks. Not a bad estimate.

You may ask: In the case of a term paper, why not specify number of pages instead of word count? The obvious answer is that the number of pages can be gamed using different fonts, font sizes, or leading.

score 4 · Answer 25 · answered Jan 08 '13 at 00:24

4

kile the latex editor for the kde (ubuntu) desktop has a word count. It is under the statistics menu

answered Jan 08 '13 at 00:24

Magpie

2,294

3

To have that clear, kile has pretty much nothing to do with ubuntu, or its desktop. – mafp Jan 08 '13 at 00:42
It is not at all accurate. It is OK to get a rough sense of whether document 1 has more words than document 2 but you could not use it for anything requiring an actual count, even an approximate one. – cfr Sep 12 '14 at 23:39

score 2 · Answer 26 · answered Aug 23 '18 at 19:58

Here is a quick and simple way to include a word count in a LaTeX document with TeXcount:

Download and extract TeXcount into the directory of the document
Make sure Perl is installed and accessible via the perl command (should already be the case on Linux, you can get Perl from here for all OSes)
Copy and paste this where you want the word count to appear: Word count: \input{|"perl DOCUMENT_FULL_PATH/texcount.pl -brief -sum -total DOCUMENT_FULL_PATH/mytexfile.tex"}
Make sure to run your LaTeX engine with the option --shell-escape (or --tex-option=--shell-escape if you use TeXworks/MiKTeX/texify)

Done!

JohnBig · Answer 27 · 2021-01-04T08:57:10.337

My solution to this problem was a bit of a workaround, but I wanted to share it, because somebody might have an interest in selectively defining what should be counted in the word count.

I've put all content that should not be counted, such as tables and footnotes, into a command. Then I can comment it out to get a document of only the content that should be counted. To do this, define this before the document starts:

\newif\iftables\newcommand{\tables}[1]{\iftables#1\fi}\tablestrue

You can change the word 'tables' to anything else. Then I would put all my tables into a \tables{} command. When I wanted to count the words, I could set that command to false to compile a document without it. Then I could just copy-paste the entire pdf content into a word processor to word-count it. It's just a bit cumbersome to add, but I can manually decide what I want counted or not.

score 1 · Answer 28 · answered Jan 04 '21 at 15:49

1

As many answers pointed out an accurate word count is nearly impossible. Atom has a LaTeX word count package aware of that difficulty, but I believe it works fine.

answered Jan 04 '21 at 15:49

Luis Turcio

2,757

Is there any way to do a correct word count of a LaTeX document?

28 Answers28

Linked

Related