I have a series: $$f(n) = 0^0 + 1^1 + 2^2 + ..... n^n$$ I want to calculate $f(n) \pmod m$ where $n ≤ 10^9$ and $m ≤10^3$. I have tried the approach of the this accepted answer but the complexity is more than acceptance. I have used this approach for calculating modular exponential. However, I am not able to optimize it further. Kindly help. Thanks in advance.
2 Answers
For an efficient computation, we have to use the well-known fast exponentiation by repeated squaring, based on the simple observations $x^{2k}=(x^k)^2$ and $x^{2k+1}=x^{2k}\cdot x$. There's no need in a recursive function, because it's easy to transform it into an iteration through the binary expansion of the exponent. But that would still give an algorithm of complexity $O(n\log n),$ and that's quite forbidding already for $n=10^8,$ let alone $n=10^9.$ Savings are possible since we need the results only modulo $m$, and that's assumed to be much smaller than $n$.
Obviously, large exponents aren't the problem, so I consider reduction of the exponents modulo $\varphi(m)$ a blind alley: we're summing over all bases up to $n$, many of them having common divisors with $m$ (60% in the obviously allowed case $m=1000$).
But certainly $l=j\pmod m$ implies $l^l=j^l\pmod m$, so we can group our summands accordingly. Let $n=mk+r$ with $0\le r\le m-1.$ Then,
$$f(n) = 1+\sum^n_{l=1}l^l=1+\sum^{m-1}_{j=1}j^j\sum^{k-1}_{i=0}j^{mi}+\sum^{r}_{j=1}j^j\cdot j^{mk}\pmod m.$$ With an auxiliary function
$$g(x,k)=\sum^{k-1}_{i=0}x^i,$$ we can write this
$$f(n) = 1+\sum^{r}_{j=1}j^j\,g(j^m, k+1)+\sum^{m-1}_{j=r+1}j^j\,g(j^m, k)\pmod m.$$ Unfortunately, using the fast formula $$g(x,k)=\frac{x^k-1}{x-1}$$ is not an option, as division modulo a composite $m$ can get very tricky.
But there's an alternative: Obviously, we have $g(x,2k)=(1+x^k)\,g(x,k)$ and $g(x,2k+1)=1+x\,g(x,2k)$, and we don't even have to calculate $x^k,$ since $1+x^k=(x-1)\,g(x,k)+2,$ i.e. $g(x,2k)=((x-1)\,g(x,k)+2)\,g(x,k)$.
As one can see easily, the resulting algorithm has complexity $O(m\log n)$, and that's reasonably fast (milliseconds) for the values of $n,m$ in the question.
The crucial functions are implemented like this (in Java):
private static long f(long n, int mod) {
long s = 1;
long k = n / mod;
long r = n % mod;
for (int j = 1; j < mod; j++) {
long x = Power.pow(j, mod, mod);
s = (s + Power.pow(j, j, mod) * g(x, k + ((j <= r) ? 1 : 0), mod)) % mod;
}
return s;
}
private static long g(long x, long k, int mod) {
if(k == 0) return 0;
long mask = Long.highestOneBit(k);
long g = 1;
while ((mask >>= 1) > 0) {
g = ((x - 1) * g % mod + 2) * g % mod;
if ((k & mask) != 0) {
g = (g * x + 1) % mod;
}
}
return g;
}
-
-
-
Professor Vector Wrong ans for input n = 2 and m = 29. Expected o/p = 6. My o/p = 22 – Brij Raj Kishore Jul 25 '17 at 08:01
-
1@Brij Raj Kishore I'm sorry, I've only checked cases where $n>m$, so I didn't notice that $g$ should return $0$ for $k==0$, and not $1$ (that happens because the algorithm starts with the highest bit of $k$, that's always set, except for $k==0$). Should be fixed, now. – Jul 25 '17 at 08:27
$$\sum_{k=1}^n k^k\bmod m\equiv \sum_{k=1}^n(k\bmod m)^k.$$
You can precompute $k^k\bmod m$ for all $k\in[0,m-1]$ (this takes $O(m\log m)$ modular multiplies by squarings). For the next terms, $(k+pm)^{k+pm}\equiv k^k(k^m)^p$, and it suffices to also keep the values of $k^m$.
The total cost for the summation will be
$$\color{green}{O(m\log m+n)}.$$
For example, with $m=5$, and assuming $0^0=0$,
$$\ \ \ \ \ k^5\to0,1,2,3,4,$$ $$\ \ \ \ \ k^k\to 0,1,4,2,1,\\\ k^{k+5}\to0,1,3,1,4,\\k^{k+10}\to0,1,1,3,1,\\k^{k+15}\to0,1,2,4,4,$$ $$k^{k+20}\to0,1,4,2,1,\\k^{k+25}\to0,1,3,1,4,\\k^{k+30}\to0,1,1,3,1,\\k^{k+35}\to0,1,2,4,4,\\\cdots$$
In this table, the columns are the powers of $k$, which show a period of at most $m-1$ (actually $\phi(m)$, Euler's totient). Hence, blocks of $m\cdot(m-1)$ elements bring a constant contribution.
After computing a complete block by the above method, the complexity lowers to
$$\color{green}{O(m^2)},$$
using $$f(n)=f(n\bmod (m-1)m)+\left\lfloor\frac n{(m-1)m}\right\rfloor b$$ where $b$ is the sum over a whole block
Finally, we can yet speed-up the process by accumulating the columns of a block (be them complete or not) with the geometric summation formula
$$k^k+k^{k+m}+k^{k+2m}+\cdots k^{k+pm}=k^k\frac{k^{m(p+1)}-1}{k^m-1}.$$
After precomputing a table of inverses, the last expression can be computed in time $O(\log m)$, hence the whole process takes
$$\color{green}{O(m\log m)}.$$
-
Don't you think, that the $n$ on top of the first $\sum$ could be expressed modulo some function of $m$? For me it seems, that the sequence of elements of the sum is periodic with $m(m-1)$ or a divisor of it. The partial sums seem to be periodic with at most $m^2(m-1)$ -again possibly only a factor of it and possibly much smaller. (I've not yet a complete decoding though). In the case of $m=5$ the partial sums are periodic with period $t=100$ for instance; we could thus take $n \pmod {100}$ (because the partial sum up to $100$ is zero) – Gottfried Helms Jul 24 '17 at 11:23
-
For another instance, the sequence of single terms where $m=14$ is periodic with cyclelength of $7 \cdot 6$ and the cyclic sums are $251$ which is also $-1$ modulo $14$. This would simplify the evaluation for arbitrary large $n$ since we can express this by a small residue-calculation – Gottfried Helms Jul 24 '17 at 11:43
-
Well, since $m$ is far smaller than $n$, $O(m\log n)$ is still better. How are the results in practice? For the values slightly outside the required range ($n=10^{16}, m=2000003$), my algorithm needs almost 6 seconds. – Jul 24 '17 at 11:58
-
@GottfriedHelms: you are quite right. The period of $k^n\bmod m$ is at most $m-1$ ($m-1$ is achieved by the primitive roots of $1$ in $\mathbb Z_m$). Hence complete blocks of $m(m-1)$ elements can be computed in a single go. The dependency on $n$ disappears ! – Jul 24 '17 at 12:16
-
There's a small adjustment needed. $0^0$ is supposed to be $1$ here, but working mod $m$,we have $m^m=0$ and $2m^{2m}=0$. @GottfriedHelms too, because he's involved here. – B. Goddard Jul 24 '17 at 12:27
-
@B.Goddard: I mentioned that my convention is $0^0=0$. The adjustment is minor. – Jul 24 '17 at 12:32
-
It seems that for prime $m$ the value for a block (running from $k=1 \ldots m(m-1)$ is $b=-1 \pmod m$ – Gottfried Helms Jul 24 '17 at 14:06
-
@GottfriedHelms: the remaing challenge is to lower the $m \log m$ bound for incomplete blocks. As this involves computing partial rows or columns, it seems hard to achieve. – Jul 24 '17 at 14:13
-
What do you get for $m=999$ and $n=10^9$ ? I think it should give $w=261$ ? – Gottfried Helms Jul 24 '17 at 17:49
-
-
Just to compare, a procedure not more optimized than that of evaluating blocks of $\phi(m^2)$ (or $m\phi(m) $) using Pari/GP.
{f(m,N)=local(mphi,b,s1,r1,r2,res,Nres); mphi=m*eulerphi(m); Nres=N % mphi; r1=sum(k=1,Nres, Mod(k,m)^k); r2=sum(k= Nres+1, mphi,Mod(k,m)^k); b=r1+r2; s1=(N \ mphi) *b; res=1+s1+r1; return(res); }andgettime();w=f(999,10^9);t=gettime();[t,w]gave $w=261$ in $468$ msec – Gottfried Helms Jul 24 '17 at 18:18 -
@ProfessorVector: would you mind to do a time-comparision with the bounds of the problem-parameters $n=10^9$ and $m=999$ . It need not be exact (there are even software-implementation differences) but just to get a rough impression for the efficiency of your optimizations... (I didn't implement your procedure in Pari/GP so far) – Gottfried Helms Jul 24 '17 at 18:42
-
No, sorry, that's not meaningful. The 40 msec or so are just the start-up of the jvm (it's in Java, and much more verbose ;-). My procedure is mainly dominated by the magnitude of $m$, that would have to be much larger to make a difference. – Jul 24 '17 at 19:00
-
@ProfessorVector; well, I see... I'll try to implement your optimization in Pari/GP tomorrow. Hope I'll get it running... – Gottfried Helms Jul 24 '17 at 19:07
-
@GottfriedHelms: an important contribution of this post (thanks to you) is to show that there is no dependency on $n$. Besides this theoretical result, the difference between $O(m \log m)$ and $O(m\log n)$ is tiny (if measurable). – Jul 24 '17 at 19:10
-
Yves, - true. I just didn't think deeply on ProfessorVector's idea and didn't think about the relevant difference due to the software-implementation using java; it was just couriousness of mine concerning that ansatz and its implementation. – Gottfried Helms Jul 24 '17 at 19:17
-
Did you notice, that the complete block which you noted as from $k^k$ to $k^{k+15}$ (which of course means $\sum_{j=0}^{19} k^{k+j}$ where $19=\varphi(5^2)-1$ ) sums up to $b \equiv -1 \pmod 5$ . It seems, that for $m \in \mathbb P$ this is a constant/an analytically determinable value and thus need not be sequentially computed. – Gottfried Helms Jul 25 '17 at 01:19
-
@GottfriedHelms: that doesn't change the overall complexity because the incomplete block requires $O(m\log m)$ operations. – Jul 25 '17 at 06:10
-
Hmm, I'm not very well experienced with the concept of complexity of algorithms in connection with the $O()$-notation. So I assume the complexity must be determined by the worst-case scenario? – Gottfried Helms Jul 25 '17 at 06:35
-
@GottfriedHelms: well, I am indeed referring to the complexity of the worst case scenario, which is $O(m\log m)$ for the incomplete blocks. In the asymptotic notation adding other $m\log m$ or linear or constant... terms doesn't make a difference. By the way, with your observation the best case would be $O(1)$ ! (But only occurring when $n$ is a multiple of $(m-1)m$ with $m$ prime.) – Jul 25 '17 at 06:44
-
Yves- thanks for your reply. I'm also trying to find some improvement over the $\varphi(m^2) = m \cdot \varphi(m)$ term in the similar sense as we have the multiplicative order modulo a prime for exponentials as improvement over the totient, such that we can have $m \cdot c$ where $c$ is a divisor of $\varphi(m)$ A small table for $m=pq$ with $p,q \in \Bbb P$ is at https://math.stackexchange.com/q/2370917/ - in case you are interested... – Gottfried Helms Jul 25 '17 at 07:14
-
@GottfriedHelms: I am more worried by the feasibility of the division by $k^m-1$, or the possibility to compute the sum of a complete row in time $O(\log m)$. – Jul 25 '17 at 08:26
Mod(k^k,m)-function you can evaluate this to really high $k$ and $m$, btw. – Gottfried Helms Jul 24 '17 at 07:39