With Plain TeX (and eTeX for convenience)
initial approach limited by TeX arithmetic up to N = 159486 as maximal starting point.
enlarged approach doing big arithmetic and able to master say N = 1,000,000
technical variant using the technique of font dimensions to store all the Collatz lengths with start point from 1 up to N. Maximal N is 5,000,000 (on my TeXLive I could go up to a bit less than 8,000,000. Beyond needs enlarging TeX memory). At the end of the day \number\fontdimen<number>\cz has stored all the Collatz lengths with <number> as starting point from 1 to N=5,000,000.
finally a somewhat simpler algorithm brings close to a 2x speed improvement.
edited my initial post was not at all executing that part of it which handled the storage of known things in control sequences. Hence it was far slower, but on the other hand, when I corrected it, I observed it quickly hit TeX memory limit, due to enclosing expansion in \begingroup and \endgroup. But, after all we are storing absolute data, so no need for a group. And, I decided to store only data with indices up to the maximal N, in order again to escape memory limitations.
% pdftex (or etex)
\newcount\inputNmax
\newcount\intN
\newcount\intNa
\newcount\intNtop
\newcount\intL
\newcount\intLtop
\def\CollOne{%
\advance\intNa 1
\ifnum\intNa > \inputNmax
\CollDone
\else
\intL 0 % will store number of steps starting at Na
\intN = \intNa
\expandafter\CollTwo
\fi
}
% store in
\def\CollTwo{%
\ifcsname collatz\the\intN\endcsname
\expandafter\CollThreeA
\else
\expandafter\CollThreeB
\fi
}
\def\CollThreeA{%
\advance\intL\csname collatz\the\intN\endcsname\relax
\expandafter\edef\csname collatz\the\intNa\endcsname{\the\intL}%
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
\intN = \intNa
\CollUpdate
}
\def\CollUpdate{%
\advance\intL -1
\ifodd\intN
\multiply\intN 3
\advance\intN 1
\else
\divide\intN 2
\fi
\let\next\CollOne
\ifnum\intN>\inputNmax
\else
\ifcsname collatz\the\intN\endcsname
\else
\expandafter\edef\csname collatz\the\intN\endcsname{\the\intL}%
\let\next\CollUpdate
\fi
\fi
\next
%% this variant seems to impact a bit negatively execution time
% \ifcsname collatz\the\intN\endcsname
% \expandafter\CollOne
% \else
% \ifnum\intN>\inputNmax
% \else
% \expandafter\edef\csname collatz\the\intN\endcsname{\the\intL}%
% \fi
% \expandafter\CollUpdate
% \fi
}
\def\CollThreeB{%
\advance\intL 1
\ifodd\intN
\multiply\intN 3
\advance\intN 1
\else
\divide\intN 2
\fi
\CollTwo
}
\def\CollDone{%
From 1 to \the\inputNmax, the longest sequence with smallest starting
point was observed to start at \the\intNtop, and contained
\the\intLtop\relax\
elements.\par
}
\def\CollMax #1{% #1 integer at least 1
%\begingroup
\inputNmax=#1\relax
\intNa = 1
\intNtop = 1
\intLtop = 1
% I would prefer counting steps to reach 1, so here 0
% but it seems the question asks for number of elements, so here 1
\expandafter\def\csname collatz1\endcsname{1}%
\CollOne
%\endgroup
}
\hsize10cm
\CollMax {10}
\CollMax {100}
\CollMax {1000}
\CollMax {10000}
\CollMax {100000}
\bye
Output:

Execution time:
$ time pdftex -interaction batchmode pcollatz.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdftex)
restricted \write18 enabled.
entering extended mode
real 0m6.667s
user 0m6.636s
sys 0m0.027s
There is arithmetic overflow for 1000000.
I will add variant using xint at some later stage.
EXTENSION TO BIG INTEGERS
Actually 159487 is the smallest starting point leading to a TeX arithmetic overflow. (its 56th iterate is 2265333694 >= 2**31)
But then, after having modified the code above to use xintcore macros for big arithmetic, there was another problem which was that the "TeX pool" gets filled-up for starting point at N=317893. I did run it with 317892 obtaining
\CollMax {317892}
From 1 to 317892, the longest sequence with smallest starting point was
observed to start at 230631, and contained 443 elements.
Then I modified the code and got it to store the sequence lengths in memory only for the first 100000 integers, thus overcoming that "TeX pool" limitation. Still handling big arithmetic via xintcore macros.
% pdftex (or etex)
\input xintcore.sty % to handle big integers.
% there is also bnumexpr but it is LaTeX interface only
% or xint, xintexpr which are more extensive than xintcore
\newcount\inputNmax
\newcount\intN
\newcount\intNa
\newcount\intNtop
\newcount\intL
\newcount\intLtop
\def\CollLoop{%
\advance\intNa 1
% \immediate\write-1{\the\intNa}%
\ifnum\intNa > \inputNmax
\CollDone
\else
\intL 0 % will store number of steps starting at Na
\intN = \intNa
\expandafter\CollTwo
\fi
}
\def\CollTwo{%
\ifcsname collatz\the\intN\endcsname
\expandafter\CollThreeA
\else
\expandafter\CollThreeB
\fi
}
% \CollThreeA will be either \CollThreeAwithUpdate or \CollThreeAwithNoUpdate
\def\CollThreeAwithUpdate{%
\advance\intL\csname collatz\the\intN\endcsname\relax
\expandafter\edef\csname collatz\the\intNa\endcsname{\the\intL}%
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
\intN = \intNa
\CollUpdate
}
\def\CollUpdate{%
\advance\intL -1
\ifodd\intN
\multiply\intN 3
\advance\intN 1
\else
\divide\intN 2
\fi
\let\next\CollLoop
\ifnum\intN>\inputNmax
\else
\ifcsname collatz\the\intN\endcsname
\else
\expandafter\edef\csname collatz\the\intN\endcsname{\the\intL}%
\let\next\CollUpdate
\fi
\fi
\next
}
\def\CollThreeAwithNoUpdate{%
\advance\intL\csname collatz\the\intN\endcsname\relax
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
\CollLoop
}
\def\CollThreeB{%
\let\next\CollTwo
\ifodd\intN
\ifnum\intN>\maxdimen
% notice that necessarily this first happens with previous execution
% had done (3x+1)/2, so the real antecedent was > "7FFFFFFF
% and would have created arithmetic overflow if we had done
% x->3x+1->(3x+1)/2
\edef\bigintN{\the\intN}%
\let\next\CollThreeBig
\else
\advance\intL 2
\divide\intN 2
\multiply\intN 3
\advance\intN 2
\fi
\else
\advance\intL 1
\divide\intN 2
\fi
\next
}%
% \def\error{\immediate\write-1{\the\intNa, \the\intL}\csname end\endcsname}
\def\CollThreeBig{%
% 159487 is the smallest starting integer which triggers this, as its
% 56th iterate 2265333694 exceeds 2**31, and the 57th is thus > \maxdimen
% \error
\advance\intL 1
% \xintLastItem does no expansion ...
\ifodd\expandafter\xintLastItem\expandafter{\bigintN}
\advance\intL 1
\edef\bigintN{\xintHalf{\xintiiMul{\bigintN}3}}% Half truncates
% possibly faster to use \xintDouble and an addition, not tested
\else
\edef\bigintN{\xintHalf{\bigintN}}%
\fi
% \xintLength does no expansion ...
\ifnum\expandafter\xintLength\expandafter{\bigintN}>9
\expandafter\CollThreeBig
\else
\intN = \bigintN\relax
\expandafter\CollThreeB
\fi
}%
\def\CollReport{%
From 1 to \the\inputNmax, the longest sequence with smallest starting
point was observed to start at \the\intNtop, and contained
\the\intLtop\relax\
elements.\par
}
\let\CollDone\CollReport
\def\CollMaxInitial {%
\let\CollThreeA\CollThreeAwithUpdate % faster but uses macro storage
\intNa = 1
\intNtop = 1
% I would prefer counting steps to reach 1, so here 0
% but it seems the question asks for number of elements, so 1
% (and not 4 although 1->4->2->1, as I consider that 1 is sequence in itself)
\expandafter\def\csname collatz1\endcsname{1}%
\intLtop = 1
\CollLoop
}
\def\CollMax #1{% #1 integer at least 1
\ifnum#1>100000
\inputNmax 100000
\let\CollDone\empty
\CollMaxInitial
\let\CollDone\CollReport
\let\CollThreeA\CollThreeAwithNoUpdate
\inputNmax=#1\relax
\intNa = 100000
\CollLoop
\else
\inputNmax = #1\relax
\CollMaxInitial
\fi
}
\hsize10cm
% \CollMax {10}
% \CollMax {100}
% \CollMax {1000}
% \CollMax {10000}
% \CollMax {100000}
% \CollMax {200000}
% From 1 to 200000, the longest sequence with smallest starting
% point was observed to start at 156159, and contained 383 elements.
\CollMax {1000000}
% From 1 to 1000000, the longest sequence with smallest start-
% ing point was observed to start at 837799, and contained 525
% elements.
\bye
This gives

and execution time is
$ time pdftex -interaction batchmode pcollatz-big.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdftex)
restricted \write18 enabled.
entering extended mode
real 0m40.345s
user 0m40.247s
sys 0m0.087s
edit: This is on a computer which is usually about 15% faster than the one used for testing first version of the code with 100000. But trying now this code with N=1000000 and the slower computer I observe about 1m execution time (thus +50%...). For some reasons I don't know pdftex becomes indeed more significantly slower on my laptop when TeX memory is used a lot. I have observed it in the past when testing xint on computations with thousands of digits.
USE OF \fontdimen PARAMETERS
Here we go
% pdftex (or etex)
\input xintcore.sty % to handle big integers.
% there is also bnumexpr but it is LaTeX interface only
% or xint, xintexpr which are more extensive than xintcore
% use fontdimen parameters to have an array where to store
% "Collatz lengths". At the end of the day
% \number\fontdimen<number>\cz gives the length of the
% sequence starting at <number> (for all numbers up to 5,000,000) and
% reaching 1 (included).
\newcount\inputNmax
\newcount\intN
\newcount\intNa
\newcount\intNtop
\newcount\intL
\newcount\intLtop
\newcount\czsize
\czsize 5000000
\font\cz=cmr10 at 1pt
\fontdimen \czsize\cz = 0sp % make room ...
% vz texmf.cnf
% Words of font info for TeX (total size of all TFM files, approximately).
% Must be >= 20000 and <= 147483647 (without tex.ch changes).
% font_mem_size = 8000000
% make sure array entries are zero (only \fontdimen2 to 7 are populated for cmr10)
\intN 1
\loop
\fontdimen\intN\cz = 0sp
\advance\intN 1
\ifnum\intN < 8
\repeat
% do I need to do that for all, aren't they zero except first few ones ?
% no, it's ok (paranoide check here, done once)
% \intN 1
% \loop
% \ifnum\fontdimen\intN\cz>0 \error\fi
% \ifnum\intN < \czsize
% \advance\intN 1
% \repeat
\def\CollLoop{%
\advance\intNa 1
\ifnum\intNa > \inputNmax
\CollDone
\else
\intL 0 % will store number of steps starting at Na
\intN = \intNa
\expandafter\CollTwo
\fi
}
\def\CollTwo{%
\let\next\CollThreeB
\unless\ifnum\intN>\czsize
\ifnum\fontdimen\intN\cz>0
\let\next\CollThreeA
\fi
\fi
\next
}
% \CollThreeA will be either \CollThreeAwithUpdate or \CollThreeAwithNoUpdate
% but we are going to use it always with update...
\def\CollThreeAwithUpdate{%
\advance\intL\fontdimen\intN\cz
\fontdimen\intNa\cz=\intL sp
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
\intN = \intNa
\CollUpdate
}
\def\CollUpdate{%
\advance\intL -1
\ifodd\intN
\multiply\intN 3
\advance\intN 1
\else
\divide\intN 2
\fi
\let\next\CollLoop
\ifnum\intN>\inputNmax % always at most \cssize in this macro
\else
\ifnum\fontdimen\intN\cz=0
\fontdimen\intN\cz=\intL sp
\let\next\CollUpdate
\fi
\fi
\next
}
\def\CollThreeAwithNoUpdate{%
\advance\intL\fontdimen\intN\cz
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
\CollLoop
}
\def\CollThreeB{%
\let\next\CollTwo
\ifodd\intN
\ifnum\intN>\maxdimen
% notice that necessarily this first happens with previous execution
% had done (3x+1)/2, so the real antecedent was > "7FFFFFFF
% and would have created arithmetic overflow if we had done
% x->3x+1->(3x+1)/2
\edef\bigintN{\the\intN}%
\let\next\CollThreeBig
\else
\advance\intL 2
\divide\intN 2
\multiply\intN 3
\advance\intN 2
\fi
\else
\advance\intL 1
\divide\intN 2
\fi
\next
}%
% \def\error{\immediate\write-1{\the\intNa, \the\intL}\csname end\endcsname}
\def\CollThreeBig{%
% 159487 is the smallest starting integer which triggers this, as its
% 56th iterate 2265333694 exceeds 2**31, and the 57th is thus > \maxdimen
% \error
\advance\intL 1
% \xintLastItem does no expansion ...
\ifodd\expandafter\xintLastItem\expandafter{\bigintN}
\advance\intL 1
\edef\bigintN{\xintHalf{\xintiiMul{\bigintN}3}}% Half truncates
% possibly faster to use \xintDouble and an addition, not tested
\else
\edef\bigintN{\xintHalf{\bigintN}}%
\fi
% \xintLength does no expansion ...
\ifnum\expandafter\xintLength\expandafter{\bigintN}>9
\expandafter\CollThreeBig
\else
\intN = \bigintN\relax
\expandafter\CollThreeB
\fi
}%
\def\CollReport{%
From 1 to \the\inputNmax, the longest sequence with smallest starting
point was observed to start at \the\intNtop, and contained
\the\intLtop\relax\
elements.\par
}
\let\CollDone\CollReport
\def\CollMaxInitial {%
\let\CollThreeA\CollThreeAwithUpdate % limited by font array storage
\intNa = 1
\intNtop = 1
% I would prefer counting steps to reach 1, so here 0
% but it seems the question asks for number of elements, so 1
% (and not 4 although 1->4->2->1, as I consider that 1 is sequence in itself)
\fontdimen1 \cz= 1sp
\intLtop = 1
\CollLoop
}
\def\CollMax #1{% #1 integer at least 1
\ifnum#1>\czsize
\inputNmax \czsize
\let\CollDone\empty
\CollMaxInitial
\let\CollDone\CollReport
\let\CollThreeA\CollThreeAwithNoUpdate
\inputNmax=#1\relax
\intNa = \czsize
\CollLoop
\else
\inputNmax = #1\relax
\CollMaxInitial
\fi
}
\hsize10cm
% check it does give same results as earlier...
% \CollMax {10}
% \CollMax {100}
% \CollMax {1000}
% \CollMax {10000}
%\CollMax {100000}
%\CollMax {1000000}
\CollMax {5000000}
% From 1 to 5000000, the longest sequence with smallest starting point was
% observed to start at 2929311, and contained 550 elements.
\bye
produces:

For 1000000 it is about 4x-6x faster than method with csname (hard to tell exactly because depends on which computer I test with). Perhaps I can improve more, I am not too much used to that technique.
Here is with my laptop (the "slow" computer) for N=1,000,000:
$ time pdftex --interaction=batchmode pcollatz-array.tex
This is pdfTeX, Version 3.14159265-2.6-1.40.18 (TeX Live 2017) (preloaded format=pdftex)
restricted \write18 enabled.
entering extended mode
real 0m10.408s
user 0m10.313s
sys 0m0.041s
For 5,000,000 it takes about five times as much as for 1,000,000.
An ultimate rewrite brings a 45% speed improvement.
\input xintcore.sty
\newcount\inputNmax
\newcount\inputNstop
\newcount\intN
\newcount\intNa
\newcount\intNb
\newcount\intNtop
\newcount\intL
\newcount\intLtop
\newcount\czsize
\czsize 5000000
\font\cz=cmr10 at 1pt
\fontdimen \czsize\cz = 0sp
\intN 1
\loop
\fontdimen\intN\cz = 0sp
\advance\intN 1
\ifnum\intN < 8
\repeat
\catcode`@ 11
\long\def\@gobble#1{}
\edef\sp{\expandafter\@gobble\string\sp}% (no space needed?)
\def\CollLoopA{%
\advance\intNa 1
\intNb \intNa
\divide\intNb 2
\intL\fontdimen\intNb\cz
\advance\intL 1
\fontdimen\intNa\cz=\intL\sp
\advance\intNa 1
\intNb \intNa
\divide\intNb 3
\multiply\intNb 2
\advance\intNb 1
\intL\fontdimen\intNb\cz
\advance\intL -2
\fontdimen\intNa\cz=\intL\sp
\advance\intNa 1
\ifnum\intNa > \inputNstop
\CollDone
\else
\let\CollLoopBack\CollLoopB
\intL 0
\intN = \intNa
\expandafter\CollThreeB
\fi
}
\def\CollLoopB{%
\advance\intNa 1
\ifnum\intNa > \inputNstop
\CollDone
\else
\let\CollLoopBack\CollLoopC
\intL 0
\intN = \intNa
\expandafter\CollThreeB
\fi
}
\def\CollLoopC{%
\advance\intNa 1
\intNb \intNa
\divide\intNb 2
\intL\fontdimen\intNb\cz
\advance\intL 1
\fontdimen\intNa\cz=\intL\sp
\advance\intNa 1
\ifnum\intNa > \inputNstop
\CollDone
\else
\let\CollLoopBack\CollLoopA
\intL 0
\intN = \intNa
\expandafter\CollThreeB
\fi
}
\def\CollTwo{%
\ifnum\intN<\intNa
\advance\intL\fontdimen\intN\cz
\fontdimen\intNa\cz=\intL\sp
\CollCheckTop
\expandafter\CollLoopBack
\else
\expandafter\CollThreeB
\fi
}
\def\CollCheckTopyes {%
\ifnum\intL > \intLtop
\intLtop = \intL
\intNtop = \intNa
\fi
}%
\let\CollCheckTopno\empty
\let\CollCheckTop\CollCheckTopyes
\def\CollThreeB{%
\let\next\CollTwo
\ifodd\intN
\ifnum\intN>\maxdimen
\edef\bigintN{\the\intN}%
\let\next\CollThreeBig
\else
\advance\intL 2
\divide\intN 2
\multiply\intN 3
\advance\intN 2
\fi
\else
\advance\intL 1
\divide\intN 2
\fi
\next
}%
\def\CollThreeBig{%
\advance\intL 1
% \xintLastItem does no expansion ...
\ifodd\expandafter\xintLastItem\expandafter{\bigintN}
\advance\intL 1
\edef\bigintN{\xintHalf{\xintiiMul{\bigintN}3}}% Half truncates
% possibly faster to use \xintDouble and an addition, not tested
\else
\edef\bigintN{\xintHalf{\bigintN}}%
\fi
% \xintLength does no expansion ...
\ifnum\expandafter\xintLength\expandafter{\bigintN}>9
\expandafter\CollThreeBig
\else
\intN = \bigintN\relax
\expandafter\CollThreeB
\fi
}%
\def\CollReport{%
From 1 to \the\inputNmax, the longest sequence with smallest starting
point was observed to start at \the\intNtop, and contained
\the\intLtop\relax\
elements.\par
}
\let\CollDone\empty
\def\CollMax #1{%
\inputNmax #1\relax
\inputNstop \inputNmax
\divide\inputNstop 2
\intNa = 1
\intNtop = 1
\fontdimen1 \cz= 1sp
\intLtop = 1
\let\CollCheckTop\CollCheckTopno
\CollLoopC
\inputNstop \inputNmax
\let\CollCheckTop\CollCheckTopyes
\ifcase \numexpr3+\intNa - 6*(\intNa/6)\relax
\advance\intNa-2 \let\next\CollLoopC\or
\error\or
\error\or
\advance\intNa-3 \let\next\CollLoopA\or
\advance\intNa-1 \let\next\CollLoopB\else
\error
\fi
\next
\CollReport
}
\catcode`@ 12
\hsize10cm
\CollMax {1000000}
\bye
With time pdftex --interaction=batchmode pcollatz-arrayIII.tex I get a user time of 0m5.780s compared to the former 0m10.313s. On @HenriMenke's machine, execution time of this pure eTeX approach should be around 2s (for N=1000000).
Python. But still, if it's not difficult for you, please bring yourpythontexdecision, because it is also within the scope of the section – sergiokapone Jun 02 '17 at 10:46