How can i made a macro similar to \averagecharwidth of ConTeX in LaTeX, that calculates the average width of a character based on the frequency of that character into my document ? that macro is show in this post.

How can i made a macro similar to \averagecharwidth of ConTeX in LaTeX, that calculates the average width of a character based on the frequency of that character into my document ? that macro is show in this post.

The macros are fairly low level TeX, so it is easy to use them in LaTeX by adding a few missing definitions. With these definitions in place, you can simply import lang-frq.mkii,
lang-frd.mkii, and the helper file supp-mis.mkii (on the destination page, click raw to download) and use ConTeXt's \averagecharwidth directly.
% Copy definition of \emptybox from supp-box.mkii
\ifx\voidbox\undefined \newbox\voidbox \fi
\def\emptybox{\box\voidbox}
% Copy definition of \startnointerference from syst-new.mkii
\newbox\nointerferencebox
\def\startnointerference
{\setbox\nointerferencebox\vbox
\bgroup}
\def\stopnointerference
{\egroup
\setbox\nointerferencebox\emptybox}
% Load a trimmed down version of ConTeXt macros
\input supp-mis.mkii
\input lang-frq.mkii
\input lang-frd.mkii
% Set the main language. (I don't know what the LateX equivalent of
% \currentmainlanguage)
\def\currentmainlanguage{en}
\documentclass{article}
\begin{document}
The average character width is \the\averagecharwidth
\end{document}
NOTE: Comment line 116 from lang-frd.mkii (the one that reads \startcharactertable[en] 100 x \stopcharactertable % kind of default).
Here's a naive approach.
Some notes:
\begin{environment} and \par won't match any alphabetic characters, this is an advantage.\text{some text} won't get counted, this is a disadvantage.Anyway, for straight text, this gives an exact average character width. The result becomes less accurate if more printed text is hidden in brace groups.
\documentclass{article}
\usepackage{xparse}
\usepackage{siunitx}
\usepackage{booktabs}
\usepackage{environ}
\ExplSyntaxOn
\bool_new:N \g_has_run_bool
\tl_new:N \l_aw_text_tl
\int_new:N \l_aw_tot_int
\int_new:N \g_aw_tot_alph_int
\int_new:N \g_wid_space_int
\int_new:N \g_space_int
\fp_new:N \g_rat_space_int
\fp_new:N \g_aw_avg_width_fp
\dim_new:N \myalphabetwidth
\dim_new:N \mytextwidth
\input{testing.aux}
\tl_const:Nx \c_aw_the_alphabet_tl {abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ,.;?()!' \token_to_str:N :}
% this can be changed to an evironment or renamed or whatever
\NewDocumentCommand {\avgwidthstart} {}
{
\aw_avg_width:w
}
\NewDocumentCommand {\avgwidthend}{}{}
% Here is the environment version, using just "text" as a name is probably a bad idea.
\NewEnviron{awtext}
{
\expandafter\avgwidthstart\BODY\avgwidthend
}
\makeatletter
\cs_new:Npn \aw_avg_width:w #1 \avgwidthend
{
% if first run, then generate variables to be used
\bool_if:NF \g_has_run_bool
{
\tl_map_inline:Nn \c_aw_the_alphabet_tl
{
\int_new:c {g_##1_int}
\fp_new:c {g_rat_##1_fp}
\fp_new:c {g_wid_##1_fp}
}
}
\tl_set:Nn \l_aw_text_tl {#1}
% this can be used rather than the preceding line to take capital
% letters into account, but is Slooooooow
%\tl_set:Nx \l_aw_text_tl {\tl_expandable_lowercase:n {#1}}
\int_set:Nn \l_aw_tot_int {\tl_count:N \l_aw_text_tl}
\tl_map_function:NN \c_aw_the_alphabet_tl \aw_get_counts:n
\deal_with_spaces:n {#1}
\tl_map_function:NN \c_aw_the_alphabet_tl \aw_calc_ratios:n
\tl_map_function:NN \c_aw_the_alphabet_tl \aw_calc_avg_width:n
\fp_gset_eq:NN \g_aw_avg_width_fp \l_tmpa_fp
\fp_zero:N \l_tmpa_fp
% the dimension \myalphabetwidth gives the width of the alphabet based on your character freq,
% can be accessed by \the\myalphabetwidth
\dim_gset:Nn \myalphabetwidth {\fp_to_dim:n {\fp_eval:n {61*\g_aw_avg_width_fp}}}
% the dimension \mytextwidth gives the recommended \textwidth based on 66 chars per line.
% can be accessed by \the\mytextwidth
\dim_gset:Nn \mytextwidth {\fp_to_dim:n {\fp_eval:n {66*\g_aw_avg_width_fp}}}
\protected@write\@mainaux{}{\mytextwidth=\the\mytextwidth}
\bool_gset_true:N \g_has_run_bool
% and lastly print the content
#1
}
\makeatother
\cs_new:Npn \aw_get_counts:n #1
{
% make a temporary token list from the document body
\tl_set_eq:NN \l_tmpb_tl \l_aw_text_tl
% remove all occurrences of the character
\tl_remove_all:Nn \l_tmpb_tl {#1}
% add to appropriate int the number of occurrences of that character in current block
\int_set:Nn \l_tmpa_int {\int_eval:n{\l_aw_tot_int -\tl_count:N \l_tmpb_tl}}
% add to appropriate int the number of occurrences of that character in current block
\int_gadd:cn {g_#1_int} {\l_tmpa_int}
% add this to the total
\int_gadd:Nn \g_aw_tot_alph_int {\l_tmpa_int}
}
\cs_new:Npn \deal_with_spaces:n #1
{
\tl_set:Nn \l_tmpa_tl {#1}
% rescan body with spaces as characters
\tl_set_rescan:Nnn \l_tmpb_tl {\char_set_catcode_letter:N \ }{#1}
% find number of new characters introduced. add to number of spaces and alph chars
\int_set:Nn \l_tmpa_int {\tl_count:N \l_tmpb_tl -\tl_count:N \l_tmpa_tl}
\int_gadd:Nn \g_space_int {\l_tmpa_int}
\int_gadd:Nn \g_aw_tot_alph_int {\l_tmpa_int}
% since this comes after the rest of chars are dealt with, tot_alph is final total
\fp_set:Nn \g_rat_space_fp {\g_space_int/\g_aw_tot_alph_int}
% get width of space and use it. obviously space is stretchable, so i'll assume
% that the expansions and contractions cancel one another over large text. is this
% a terrible assumption???
\hbox_set:Nn \l_tmpa_box {\ }
\fp_gset:Nn \g_wid_space_fp {\dim_to_fp:n {\box_wd:N \l_tmpa_box}}
\fp_add:Nn \l_tmpa_fp {\g_wid_space_fp*\g_rat_space_fp}
}
\cs_new:Npn \aw_calc_ratios:n #1
{
% divide number of occurrences of char by total alphabetic chars
\fp_gset:cn {g_rat_#1_fp}{{\int_use:c {g_#1_int}}/\g_aw_tot_alph_int}
}
\cs_new:Npn \aw_calc_avg_width:n #1
{
% only need to find char widths once
\bool_if:NF \g_has_run_bool
{
% find width of char box
\hbox_set:Nn \l_tmpa_box {#1}
\fp_gset:cn {g_wid_#1_fp}{\dim_to_fp:n {\box_wd:N \l_tmpa_box}}
}
% multiply it by char frequency and add to avg width
\fp_add:Nn \l_tmpa_fp {{\fp_use:c {g_wid_#1_fp}}*{\fp_use:c {g_rat_#1_fp}}}
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This part is just for fun. Delete it and the showtable command from the document if
% it isn't wanted
\tl_new:N \l_aw_tab_rows_tl
\seq_new:N \g_aw_the_alphabet_seq
\NewDocumentCommand {\showtable}{}
{
\clearpage
\aw_make_table:
}
\cs_generate_variant:Nn \seq_set_split:Nnn {NnV}
\cs_new:Npn \aw_make_table:
{
\thispagestyle{empty}
\seq_set_split:NnV \g_aw_the_alphabet_seq {} \c_aw_the_alphabet_tl
\seq_map_function:NN \g_aw_the_alphabet_seq \aw_generate_row:n
\begin{table}
\centering
\sisetup{round-mode = places,round-precision = 5,output-decimal-marker={,},table-format = 3.5}
\begin{tabular}{lll}
\toprule
{Average\,text\,width}&{Average\,character\,width}&{Average\,alphabet\,width}\\
\midrule
\the\mytextwidth&\fp_eval:n {round(\g_aw_avg_width_fp,5)}pt&\the\myalphabetwidth\\
\bottomrule
\end{tabular}\par
\end{table}
\vfil
\centering
\sisetup{round-mode = places,round-precision = 5,output-decimal-marker={,},table-format = 3.5}
\begin{longtable}{cS}
\toprule
{Letter}&{Actual}\\
\midrule
spaces&\fp_eval:n {\g_rat_space_fp*100}\%\\
\tl_use:N \l_aw_tab_rows_tl
\bottomrule
\end{longtable}\par
}
\cs_new:Npn \aw_generate_row:n #1
{
\tl_put_right:Nn \l_aw_tab_rows_tl {#1&}
\tl_put_right:Nx \l_aw_tab_rows_tl {\fp_eval:n {100*{\fp_use:c {g_rat_#1_fp}}}\%}
\tl_put_right:Nn \l_aw_tab_rows_tl {\\}
}
\ExplSyntaxOff
\begin{document}
\avgwidthstart
My audit group's Group Manager and his wife have an infant I can describe only as fierce.
Its expression is fierce; its demeanor is fierce; its gaze over bottle or pacifier or finger-fierce,
intimidating, aggressive. I have never heard it cry. When it feeds or sleeps, its pale face reddens,
which makes it look all the fiercer.
\avgwidthend
\avgwidthstart
On those workdays when our Group Manager, Mr. Yeagle, brought it in to the District office, hanging papoose-style in a nylon device on his back, the infant appeared to
be riding him as a mahout does an elephant. It hung there, radiating authority. Its back lay directly
against Mr. Yeagle's, its large head resting in the hollow of its father's neck and forcing our Group
Manager's head out and down into a posture of classic oppression. They made a creature with two faces,
one of which was calm and blandly adult and the other unformed and yet emphatically fierce. The infant
never wiggled or fussed in the device. Its gaze around the corridor at the rest of us gathered waiting
for the morning elevator was level and unblinking and (it seemed) almost accusing. The infant's face, as
I experienced it, was mostly eyes and lower lip, its nose a mere pinch, its forehead milky and domed,
its pale red hair wispy, no eyebrows or lashes or even eyelids I could see. I never saw it blink. Its
features seemed suggestions only. It had roughly as much face as a whale does. I did not like it at all.\par\noindent
http://harpers.org/media/pdf/dfw/HarpersMagazine-2008-02-0081893.pdf
\avgwidthend
\begin{awtext}
Here is some more text in an environment this time. This text is included in the calculation of the average width.
\end{awtext}
\showtable{}
\end{document}

Explanation The gist I get from this "average width of a character" thing is the following.
m's will contain fewer characters than a line of all i's since an m is wider than an i.m's and i's in an equal ratio (50/50), then "the average character" has width somewhere between the width of an m and that of an i. Specifically, the average character has width x=(wd(m)+wd(i))/2 and we should set our \textwidth to 66*x. Extrapolating to an arbitrary document we calculate the weighted average of the widths of the characters used according to their relative frequencies within the document, and multiply this by 66 (or use it in whatever way) to get the \textwidth that best accommodates the 66 character per line criteria.~4.69pt value. If I understand correctly, you would like to use this value to set a \textwidth of ~66 characters per line?
– Scott H.
Aug 22 '12 at 17:17
i's and j's then the average width of a character will be smaller than if your document consisted of all m's and w's. Maybe I'll edit the question to update and explain.
– Scott H.
Aug 22 '12 at 17:51
\fp_use:c { l_rat_#1_fp }. Also, use \fp_use:c not \fp_eval:c. In general, you should only make c varaints of N-type arguments, not n-types, because those typically expect a braced argument rather than a single token.
– Bruno Le Floch
Aug 22 '12 at 18:56
\seq_mapthread_function:NNN is expandable, and since you can make the function it maps expandable too (with some work), you ould move that to the table body. Actually, I'm now wondering why we don't have a \seq_mapthread_inline:NNn. By the way, any opinion on the name mapthread?
– Bruno Le Floch
Aug 22 '12 at 19:04
mapthread seems as good as any. Possibly, map_paired?
– Scott H.
Aug 22 '12 at 19:12
\begin{document} \avgwidthstart \input{Dolor} \avgwidthend \end{document}
\avgwidthstart \avgwidthend into the all document and finally sum all the results of declarations.. so you might as well choose what counts in the average
– Aurelius
Aug 22 '12 at 19:21
\input{Dolor} case that FormlessCloud cites is why I advised to do it in LuaTeX: then it would be possible to really count characters which are typeset.
– Bruno Le Floch
Aug 23 '12 at 11:14
\mytextwidth has not yet been defined! Even mid-document, you would first need to make sure that all of the commands collecting text to calculate the average width have been processed before trying to explicitly set it. There are likely other ways but I don't think you can avoid two compilations. Either (1) compile once, determine the width and then explicitly set it for the next compile, or (2) write mytextwidth to the aux file or something at the end of the doc and then process that on the second compile.
– Scott H.
Aug 24 '12 at 22:44
\input{jobname.aux} where jobname is the filename of your document, and (2) add \protected@write\@mainaux{}{\textwidth=\the\mytextwidth} after the line \dim_gset:Nn \mytextwidth in the definition of \aw_avg_width. You'll need to put a \makeatletter \makeatother pair around the definition of \aw_avg_width. Actually, I'll just do that.
– Scott H.
Aug 25 '12 at 01:43
\mytextwidth=\the\mytextwidth, and move the \input command to immediately after the *_new commands at the start. Also, you'll need to put \makeatletter and \makeatother around the definition of \aw_avg_width.
– Scott H.
Aug 25 '12 at 17:03
Total\,characters\,=\,\fp_eval:n {\g_aw_tot_alph_int} ?
– Aurelius
Aug 28 '12 at 10:00
\tl_const:Nn \c_aw_the_alphabet_tl {abcdefghijklmnopqrstuvwxyz} the simbols .,!?:; like this: \tl_const:Nn \c_aw_the_alphabet_tl {abcdefghijklmnopqrstuvwxyz.,!?:;} the final count is more precisely. What do you think about this? there is a more correct way to count that symbols ? for example in a text of 5570 characters, latex count 5301 characters the difference come from that symbols for me, and also the the capital letters
– Aurelius
Aug 28 '12 at 16:42
awtext environment for use it also without the \avgwidthstart\avgwidthend pair and so cancel that pair of commands? Thanks.
– Aurelius
Sep 16 '12 at 20:07
mapthread_function to map_function and removing the theor_rats sequence. I'm not at a computer right now however. I'm not sure I understand why you want to remove the start and end macros from the environment?
– Scott H.
Sep 17 '12 at 03:00
star end pair, and so, not being no longer necessary, delete those commands. (I want to remove the teorical sequence because I need to write also in non english languages).
– Aurelius
Sep 17 '12 at 10:05
\fp_eval:n {\g_rat_space_fp*100} if I cancel it i don't have any problem... Also I have added:\usepackage{longtable} to compile your answer. Where can I read about this programming language, only into the xparse pdf?
– Aurelius
Sep 18 '12 at 10:51
expl3 here: http://mirror.hmc.edu/ctan/macros/latex/contrib/l3kernel/interface3.pdf
– Scott H.
Sep 18 '12 at 18:17