Efficient Method to find funsum$(n) \pmod m$ where funsum$(n)= 0^0 + 1^1 + 2^2 + ..... n^n$

Question

I have a series: $$f(n) = 0^0 + 1^1 + 2^2 + ..... n^n$$ I want to calculate $f(n) \pmod m$ where $n ≤ 10^9$ and $m ≤10^3$. I have tried the approach of the this accepted answer but the complexity is more than acceptance. I have used this approach for calculating modular exponential. However, I am not able to optimize it further. Kindly help. Thanks in advance.

Have you tried reducing the exponent? for $k$ prime to $m$ we have $k^{\varphi (m)}\equiv 1 \pmod m$ so there's no reason to have large exponents in the picture. — lulu, Jul 21 '17 at 11:03
@lulu No, I haven't reduced them but can you just explain it little more or give some theory links as I have studied number theory in crypto 6 months ago and it was very little. — Brij Raj Kishore, Jul 21 '17 at 11:11
Well, do you know what $\varphi (m)$ means? See: totient function and Euler's Theorem. In general, it's a bit of a pain to sort out what happens if $k,m$ are not coprime. But if $m$ is square free (which is the case of interest for RSA cryptography) then it works out well, see this answer — lulu, Jul 21 '17 at 11:14
As an example of the technique: to compute $N=123^{10^9}\pmod {647}$ we remark that $647$ is prime so $\varphi(647)=646$. Then note that $10^9\equiv 398 \pmod {646}$ so $N\equiv 123^{398}\pmod {647}$ and that's just $355$ by iterated squaring (or whatever). — lulu, Jul 21 '17 at 11:19
It seems, that the sequence $k^k \pmod m$ has periodicity of $m(m-1)$ if $m$ is prime. If some tested $m$ were composite, then periodicity was a divisor of $m(m-1)$. From this the periodicity of the partial sums occurs; for some prime $m$ it occured that periodicity was $m^2(m-1)$ (for $m=2^A$ there occured an offset before short periodicity began) - Using Pari/GP and its Mod(k^k,m) -function you can evaluate this to really high $k$ and $m$, btw. — Gottfried Helms, Jul 24 '17 at 07:39

score 3 · Accepted Answer · 2017-07-25T10:42:59.997

For an efficient computation, we have to use the well-known fast exponentiation by repeated squaring, based on the simple observations $x^{2k}=(x^k)^2$ and $x^{2k+1}=x^{2k}\cdot x$. There's no need in a recursive function, because it's easy to transform it into an iteration through the binary expansion of the exponent. But that would still give an algorithm of complexity $O(n\log n),$ and that's quite forbidding already for $n=10^8,$ let alone $n=10^9.$ Savings are possible since we need the results only modulo $m$, and that's assumed to be much smaller than $n$.
Obviously, large exponents aren't the problem, so I consider reduction of the exponents modulo $\varphi(m)$ a blind alley: we're summing over all bases up to $n$, many of them having common divisors with $m$ (60% in the obviously allowed case $m=1000$).
But certainly $l=j\pmod m$ implies $l^l=j^l\pmod m$, so we can group our summands accordingly. Let $n=mk+r$ with $0\le r\le m-1.$ Then, $$f(n) = 1+\sum^n_{l=1}l^l=1+\sum^{m-1}_{j=1}j^j\sum^{k-1}_{i=0}j^{mi}+\sum^{r}_{j=1}j^j\cdot j^{mk}\pmod m.$$ With an auxiliary function $$g(x,k)=\sum^{k-1}_{i=0}x^i,$$ we can write this $$f(n) = 1+\sum^{r}_{j=1}j^j\,g(j^m, k+1)+\sum^{m-1}_{j=r+1}j^j\,g(j^m, k)\pmod m.$$ Unfortunately, using the fast formula $$g(x,k)=\frac{x^k-1}{x-1}$$ is not an option, as division modulo a composite $m$ can get very tricky.
But there's an alternative: Obviously, we have $g(x,2k)=(1+x^k)\,g(x,k)$ and $g(x,2k+1)=1+x\,g(x,2k)$, and we don't even have to calculate $x^k,$ since $1+x^k=(x-1)\,g(x,k)+2,$ i.e. $g(x,2k)=((x-1)\,g(x,k)+2)\,g(x,k)$.
As one can see easily, the resulting algorithm has complexity $O(m\log n)$, and that's reasonably fast (milliseconds) for the values of $n,m$ in the question.
The crucial functions are implemented like this (in Java):

private static long f(long n, int mod) {
    long s = 1;
    long k = n / mod;
    long r = n % mod;
    for (int j = 1; j < mod; j++) {
        long x = Power.pow(j, mod, mod);
        s = (s + Power.pow(j, j, mod) * g(x, k + ((j <= r) ? 1 : 0), mod)) % mod;
    }
    return s;
}

private static long g(long x, long k, int mod) {
    if(k == 0) return 0;
    long mask = Long.highestOneBit(k);
    long g = 1;
    while ((mask >>= 1) > 0) {
        g = ((x - 1) * g % mod + 2) * g % mod;
        if ((k & mask) != 0) {
            g = (g * x + 1) % mod;
        }
    }
    return g;
}

Power.pow(x,k,m) just calculates $x^k\pmod m$ (complexity $O(\log k)$). — , Jul 25 '17 at 07:26
Professor Vector Wrong ans for input n = 2 and m = 29. Expected o/p = 6. My o/p = 22 — Brij Raj Kishore, Jul 25 '17 at 08:01
@Brij Raj Kishore I'm sorry, I've only checked cases where $n>m$, so I didn't notice that $g$ should return $0$ for $k==0$, and not $1$ (that happens because the algorithm starts with the highest bit of $k$, that's always set, except for $k==0$). Should be fixed, now. — , Jul 25 '17 at 08:27

score 3 · Answer 2 · 2017-07-24T13:07:33.823

3

$$\sum_{k=1}^n k^k\bmod m\equiv \sum_{k=1}^n(k\bmod m)^k.$$

You can precompute $k^k\bmod m$ for all $k\in[0,m-1]$ (this takes $O(m\log m)$ modular multiplies by squarings). For the next terms, $(k+pm)^{k+pm}\equiv k^k(k^m)^p$, and it suffices to also keep the values of $k^m$.

The total cost for the summation will be

$$\color{green}{O(m\log m+n)}.$$

For example, with $m=5$, and assuming $0^0=0$,

$$\ \ \ \ \ k^5\to0,1,2,3,4,$$ $$\ \ \ \ \ k^k\to 0,1,4,2,1,\\\ k^{k+5}\to0,1,3,1,4,\\k^{k+10}\to0,1,1,3,1,\\k^{k+15}\to0,1,2,4,4,$$ $$k^{k+20}\to0,1,4,2,1,\\k^{k+25}\to0,1,3,1,4,\\k^{k+30}\to0,1,1,3,1,\\k^{k+35}\to0,1,2,4,4,\\\cdots$$

In this table, the columns are the powers of $k$, which show a period of at most $m-1$ (actually $\phi(m)$, Euler's totient). Hence, blocks of $m\cdot(m-1)$ elements bring a constant contribution.

After computing a complete block by the above method, the complexity lowers to

$$\color{green}{O(m^2)},$$

using $$f(n)=f(n\bmod (m-1)m)+\left\lfloor\frac n{(m-1)m}\right\rfloor b$$ where $b$ is the sum over a whole block

Finally, we can yet speed-up the process by accumulating the columns of a block (be them complete or not) with the geometric summation formula

$$k^k+k^{k+m}+k^{k+2m}+\cdots k^{k+pm}=k^k\frac{k^{m(p+1)}-1}{k^m-1}.$$

After precomputing a table of inverses, the last expression can be computed in time $O(\log m)$, hence the whole process takes

$$\color{green}{O(m\log m)}.$$

edited Jul 24 '17 at 13:07

answered Jul 24 '17 at 10:31

Don't you think, that the $n$ on top of the first $\sum$ could be expressed modulo some function of $m$? For me it seems, that the sequence of elements of the sum is periodic with $m(m-1)$ or a divisor of it. The partial sums seem to be periodic with at most $m^2(m-1)$ -again possibly only a factor of it and possibly much smaller. (I've not yet a complete decoding though). In the case of $m=5$ the partial sums are periodic with period $t=100$ for instance; we could thus take $n \pmod {100}$ (because the partial sum up to $100$ is zero) – Gottfried Helms Jul 24 '17 at 11:23
For another instance, the sequence of single terms where $m=14$ is periodic with cyclelength of $7 \cdot 6$ and the cyclic sums are $251$ which is also $-1$ modulo $14$. This would simplify the evaluation for arbitrary large $n$ since we can express this by a small residue-calculation – Gottfried Helms Jul 24 '17 at 11:43
Well, since $m$ is far smaller than $n$, $O(m\log n)$ is still better. How are the results in practice? For the values slightly outside the required range ($n=10^{16}, m=2000003$), my algorithm needs almost 6 seconds. – Jul 24 '17 at 11:58
@GottfriedHelms: you are quite right. The period of $k^n\bmod m$ is at most $m-1$ ($m-1$ is achieved by the primitive roots of $1$ in $\mathbb Z_m$). Hence complete blocks of $m(m-1)$ elements can be computed in a single go. The dependency on $n$ disappears ! – Jul 24 '17 at 12:16
There's a small adjustment needed. $0^0$ is supposed to be $1$ here, but working mod $m$,we have $m^m=0$ and $2m^{2m}=0$. @GottfriedHelms too, because he's involved here. – B. Goddard Jul 24 '17 at 12:27
@B.Goddard: I mentioned that my convention is $0^0=0$. The adjustment is minor. – Jul 24 '17 at 12:32
It seems that for prime $m$ the value for a block (running from $k=1 \ldots m(m-1)$ is $b=-1 \pmod m$ – Gottfried Helms Jul 24 '17 at 14:06
@GottfriedHelms: the remaing challenge is to lower the $m \log m$ bound for incomplete blocks. As this involves computing partial rows or columns, it seems hard to achieve. – Jul 24 '17 at 14:13
What do you get for $m=999$ and $n=10^9$ ? I think it should give $w=261$ ? – Gottfried Helms Jul 24 '17 at 17:49
@Gottfried Helms Yes, that's correct. :-) – Jul 24 '17 at 17:55
Just to compare, a procedure not more optimized than that of evaluating blocks of $\phi(m^2)$ (or $m\phi(m) $) using Pari/GP. {f(m,N)=local(mphi,b,s1,r1,r2,res,Nres); mphi=m*eulerphi(m); Nres=N % mphi; r1=sum(k=1,Nres, Mod(k,m)^k); r2=sum(k= Nres+1, mphi,Mod(k,m)^k); b=r1+r2; s1=(N \ mphi) *b; res=1+s1+r1; return(res); } and gettime();w=f(999,10^9);t=gettime();[t,w] gave $w=261$ in $468$ msec – Gottfried Helms Jul 24 '17 at 18:18
@ProfessorVector: would you mind to do a time-comparision with the bounds of the problem-parameters $n=10^9$ and $m=999$ . It need not be exact (there are even software-implementation differences) but just to get a rough impression for the efficiency of your optimizations... (I didn't implement your procedure in Pari/GP so far) – Gottfried Helms Jul 24 '17 at 18:42
No, sorry, that's not meaningful. The 40 msec or so are just the start-up of the jvm (it's in Java, and much more verbose ;-). My procedure is mainly dominated by the magnitude of $m$, that would have to be much larger to make a difference. – Jul 24 '17 at 19:00
@ProfessorVector; well, I see... I'll try to implement your optimization in Pari/GP tomorrow. Hope I'll get it running... – Gottfried Helms Jul 24 '17 at 19:07
@GottfriedHelms: an important contribution of this post (thanks to you) is to show that there is no dependency on $n$. Besides this theoretical result, the difference between $O(m \log m)$ and $O(m\log n)$ is tiny (if measurable). – Jul 24 '17 at 19:10
Yves, - true. I just didn't think deeply on ProfessorVector's idea and didn't think about the relevant difference due to the software-implementation using java; it was just couriousness of mine concerning that ansatz and its implementation. – Gottfried Helms Jul 24 '17 at 19:17
Did you notice, that the complete block which you noted as from $k^k$ to $k^{k+15}$ (which of course means $\sum_{j=0}^{19} k^{k+j}$ where $19=\varphi(5^2)-1$ ) sums up to $b \equiv -1 \pmod 5$ . It seems, that for $m \in \mathbb P$ this is a constant/an analytically determinable value and thus need not be sequentially computed. – Gottfried Helms Jul 25 '17 at 01:19
@GottfriedHelms: that doesn't change the overall complexity because the incomplete block requires $O(m\log m)$ operations. – Jul 25 '17 at 06:10
Hmm, I'm not very well experienced with the concept of complexity of algorithms in connection with the $O()$-notation. So I assume the complexity must be determined by the worst-case scenario? – Gottfried Helms Jul 25 '17 at 06:35
@GottfriedHelms: well, I am indeed referring to the complexity of the worst case scenario, which is $O(m\log m)$ for the incomplete blocks. In the asymptotic notation adding other $m\log m$ or linear or constant... terms doesn't make a difference. By the way, with your observation the best case would be $O(1)$ ! (But only occurring when $n$ is a multiple of $(m-1)m$ with $m$ prime.) – Jul 25 '17 at 06:44
Yves- thanks for your reply. I'm also trying to find some improvement over the $\varphi(m^2) = m \cdot \varphi(m)$ term in the similar sense as we have the multiplicative order modulo a prime for exponentials as improvement over the totient, such that we can have $m \cdot c$ where $c$ is a divisor of $\varphi(m)$ A small table for $m=pq$ with $p,q \in \Bbb P$ is at https://math.stackexchange.com/q/2370917/ - in case you are interested... – Gottfried Helms Jul 25 '17 at 07:14
@GottfriedHelms: I am more worried by the feasibility of the division by $k^m-1$, or the possibility to compute the sum of a complete row in time $O(\log m)$. – Jul 25 '17 at 08:26

Efficient Method to find funsum$(n) \pmod m$ where funsum$(n)= 0^0 + 1^1 + 2^2 + ..... n^n$

2 Answers2

Linked