Optimizing K^(2^N) Mod 10^9+7 in C++

Saitama · Jan 25, 2015

Hello again everyone! :)

I have been trying to implement K^(2^N) mod 10^9+7 in C++ where both K and N are integers and $1\le K\le 10^5$ and $1\le N\le 10^9$.

My idea is to use the fact that

$$a_n=K^{2^N}=K^{2^{N-1}}\cdot K^{2^{N-1}}=a_{n-1}^2$$

Following is the code I wrote:

Code:

#include<iostream>
#define modulo 1000000007
using namespace std;
typedef unsigned long long ull;
int K;
ull power(int N)
{
    if(N==0)
        return K;
    ull x=power(N-1)%modulo;
    ull ans=(x*x)%modulo;
    return ans;
}
int main()
{
    int T, N;
    cin>>T;
    while(T--)
    {
        cin>>K>>N;
        cout<<power(N)<<endl;
    }
    return 0;
}

But it fails to work for $K=10^5$ and $N=10^6$. So I suppose I need to find another way to solve the problem. I am out of ideas.

Any help is appreciated. Many thanks!

kaliprasad · Jan 25, 2015

Pranav said:
Hello again everyone! :)

I have been trying to implement K^(2^N) in C++ where both K and N are integers and $1\le K\le 10^5$ and $1\le N\le 10^9$.

My idea is to use the fact that

$$a_n=K^{2^N}=K^{2^{N-1}}\cdot K^{2^{N-1}}=a_{n-1}^2$$

Following is the code I wrote:
Code:
#include<iostream>
#define modulo 1000000007
using namespace std;
typedef unsigned long long ull;
int K;
ull power(int N)
{
    if(N==0)
        return 1;
    if(N==1 || N==2)
        return K;
    ull x=power(N-1)%modulo;
    ull ans=(x*x)%modulo;
    return ans;
}
int main()
{
    int T, N;
    cin>>T;
    while(T--)
    {
        cin>>K>>N;
        cout<<power(N-1)<<endl;
    }
    return 0;
}
But it fails to work for $K=10^5$ and $N=10^6$. So I suppose I need to find another way to solve the problem. I am out of ideas.

Any help is appreciated. Many thanks!

if N== 0 we should return k

as $k^{2^0}= k^1 = k$

if((N==1) || (N==2)
return K;
need to be removed

Saitama · Jan 25, 2015

kaliprasad said:

if N== 0 we should return k

as $k^{2^0}= k^1 = k$

if((N==1) || (N==2)
return K;
need to be removed

Oops! Those were the base conditions I used while solving the original problem. I have fixed the code, please see the edited version. :)

I like Serena · Jan 25, 2015

Hey Pranav! (Wave)

At first look it seems correct to me now.
Why do you think it is failing?

Oh, and you didn't mention the modulo yet.
I assume the answer should be modulo 1000000007?
At least that means that all results will indeed fit in an [m]ull[/m].

Saitama · Jan 25, 2015

I like Serena said:

Hey Pranav! (Wave)

At first look it seems correct to me now.
Why do you think it is failing?

Oh, and you didn't mention the modulo yet.
I assume the answer should be modulo 1000000007?
At least that means that all results will indeed fit in an [m]ull[/m].

Hi ILS! :)

Yes, I need to find the answer modulo 10^9+7. I do not know why my code fails for the mentioned values. It seems to work fine with $K=10^5$ and $N=10^5$, maybe its a problem with the number of recursive calls?

I like Serena · Jan 25, 2015

Pranav said:

Hi ILS! :)

Yes, I need to find the answer modulo 10^9+7. I do not know why my code fails for the mentioned values. It seems to work fine with $K=10^5$ and $N=10^5$, maybe its a problem with the number of recursive calls?

It seems unlikely that the number of recursive calls is a problem.
If it were, your program would crash. Does it? (Wondering)
And your function only takes 1 argument of 4 bytes.
The required stack space should be ~800 kB, which is big, but I don't expect it to be too big.

How does your code fail?
Maybe there is some kind of problem with the input or output conditions? (Thinking)

Nono713 · Jan 25, 2015

I like Serena said:

It seems unlikely that the number of recursive calls is a problem.
If it were, your program would crash. Does it? (Wondering)
And your function only takes 1 argument of 4 bytes.
The required stack space should be ~800 kB, which is big, but I don't expect it to be too big.

Assuming Pranav is building his program with any kind of optimizations. Without any optimizations (the "normal" mode), the stack frames are quite a bit larger for debugging purposes. For instance, on this 64-bit Linux box with no optimizations the program tops out at N about 175000. The stack size is a generous 8MB, giving a total overhead of 48 bytes per recursive call. With full optimizations it tops out at around 1600000, which is better but still nowhere near the 10^9 maximum. Basically, this is not a practical use case for recursion, and the iterative version isn't that complicated (though the program seems correct assuming it doesn't run out of stack space).

Test input/outputs:

Code:

100000^(2^1)          mod M = 999999937
100000^(2^1000)       mod M = 939692776
100000^(2^1000000)    mod M = 637173350
100000^(2^1000000000) mod M = 401919425

Saitama · Jan 25, 2015

I like Serena said:

It seems unlikely that the number of recursive calls is a problem.
If it were, your program would crash. Does it? (Wondering)

I think so, the compiler gives me a runtime error with the mentioned values.

How does your code fail?
Maybe there is some kind of problem with the input or output conditions? (Thinking)

I am using CodeChef's online compiler - gcc 4.9.2, I doubt there is a problem with input/output conditions.

I like Serena · Jan 25, 2015

Pranav said:

I think so, the compiler gives me a runtime error with the mentioned values.

I am using CodeChef's online compiler - gcc 4.9.2, I doubt there is a problem with input/output conditions.

Right. As Bacterius also confirmed, stack overflow it is. (Nod)

I mixed the numbers up.
With $N=10^9$ you would need $\ge 8$ GB stack space, which is too much. (Wasntme)

It means that you need a for-loop instead of a recursive function call.
I suspect the problem is intentionally set up to make this necessary.

Saitama · Jan 25, 2015

I like Serena said:

Right. As Bacterius also confirmed, stack overflow it is. (Nod)

I mixed the numbers up.
With $N=10^9$ you would need $\ge 8$ GB stack space, which is too much. (Wasntme)

It means that you need a for-loop instead of a recursive function call.
I suspect the problem is intentionally set up to make this necessary.

I tried your suggestion but this time I get a "time limit exceeded" error. Here is the link to original problem: Chef and Pattern | CodeChef

Any other ideas about tackling this?

I like Serena · Jan 25, 2015

Pranav said:

I tried your suggestion but this time I get a "time limit exceeded" error. Here is the link to original problem: Chef and Pattern | CodeChef

Any other ideas about tackling this?

Yeah. I was already afraid of that. $10^9$ iterations is quite a lot. (Thinking)

I believe the key is in the fact that 1000000007 is a prime number.
It means we can apply Fermat's Little Theorem.
I seem to recall that you are familiar with it? (Thinking)

Saitama · Jan 25, 2015

I like Serena said:

Yeah. I was already afraid of that. $10^9$ iterations is quite a lot. (Thinking)

I believe the key is in the fact that 1000000007 is a prime number.
It means we can apply Fermat's Little Theorem.
I seem to recall that you are familiar with it? (Thinking)

Yes I am familiar with it but how do I use it here?

I like Serena · Jan 25, 2015

Pranav said:

Yes I am familiar with it but how do I use it here?

$\displaystyle K^{2^N} \equiv K^{2^N \bmod (p-1)} \pmod p$

The first step is to find $2^N \bmod{100000006}$. (Wasntme)

Saitama · Jan 25, 2015

I like Serena said:

$\displaystyle K^{2^N} \equiv K^{2^N \bmod (p-1)} \pmod p$

The first step is to find $2^N \bmod{100000006}$. (Wasntme)

That's what I am already trying to do but I am not getting anywhere. Please give me some more hints.

I like Serena · Jan 25, 2015

Pranav said:

That's what I am already trying to do but I am not getting anywhere. Please give me some more hints.

You're supposed to show what you tried. (Doh)

Anyway, how about a recursive algorithm to find it?

In general the algorithm to find something like that is:
$$x^k \bmod l = \begin{cases}
(x^{k/2} \bmod l)^2 &\bmod l & \text{ if $k$ even}\\
(x^{k-1} \bmod l)\cdot x &\bmod l & \text{ if $k$ odd}\end{cases}
$$

You can use it to find $y= 2^N \bmod (p-1)$.
And after that, you can use it to find $K^y\bmod p$. (Wasntme)

Saitama · Jan 25, 2015

I like Serena said:

You're supposed to show what you tried. (Doh)

Anyway, how about a recursive algorithm to find it?

In general the algorithm to find something like that is:
$$x^k \bmod l = \begin{cases}
(x^{k/2} \bmod l)^2 &\bmod l & \text{ if $k$ even}\\
(x^{k-1} \bmod l)\cdot x &\bmod l & \text{ if $k$ odd}\end{cases}
$$

You can use it to find $y= 2^N \bmod (p-1)$.
And after that, you can use it to find $K^y\bmod p$. (Wasntme)

This time I am getting a "wrong answer"error. (Crying)

The code below works even for large values of $K$ and $N$. I can verify using Wolfram Alpha that it gives correct answer for small values of K and N.

Code:

#include<iostream>
#define modulo 1000000007
using namespace std;
typedef unsigned long long ull;
ull power(ull N, ull M, ull P)
{
    if(N==1)
        return P;
    if(N%2==0)
        return (power(N/2, M, P)*power(N/2, M, P))%M;
    else
        return (power(N/2, M, P)*power(N/2, M, P)*P)%M;
}
int main()
{
    int T;
    ull y, N, K;
    cin>>T;
    while(T--)
    {
        cin>>K>>N;
        if(N==1)
            cout<<1<<endl;
        else if(N==2 || N==3)
            cout<<K<<endl;
        else
        {
            y=power(N-3, modulo-1, 2);
            cout<<power(y, modulo, K)<<endl;
        }
    }
    return 0;
}

Any idea when my code can run into problems? Thanks for all the help so far.

johng1 · Jan 25, 2015

Hi,
Integer overflow is a hard bug to find since normally in C++ you get no signal that this has occurred. Following I Like Serena's suggestion, you wrote a theoretically correct function, but overflow killed you. Here's a slight variation that is okay (assuming long long is at least 64 bits wide):

Code:

/* returns x^m mod n.  Here to be safe it must be the case that n*n and x*n are valid ulls.
*/
  ull power(ull x, ull m, ull n) {
    ull temp;
    if (m==1) {
      return(x);
    }
    temp=power(x,m/2,n);
    temp*=temp;
    temp%=n;
    if ((m & 1)==1) { // m is odd
      temp*=x;
      temp%=n;
    }
    return(temp);
  }

At the first recursive call you know temp is less than n. If you don't reduce the product temp*temp mod n and m is odd, you might get overflow. I'm quite certain this is what happened. If you remove the first temp%=n, I believe the above function and your function will be the same.

Btw, Java's BigInteger class has a method modPow which has exactly the functionality of power written above. Java's powMod is really fine tuned for efficiency, so it looks much more complicated that the above. However, in general it's much faster.

Saitama · Jan 26, 2015

johng said:
Hi,
Integer overflow is a hard bug to find since normally in C++ you get no signal that this has occurred. Following I Like Serena's suggestion, you wrote a theoretically correct function, but overflow killed you. Here's a slight variation that is okay (assuming long long is at least 64 bits wide):
Code:
/* returns x^m mod n.  Here to be safe it must be the case that n*n and x*n are valid ulls.
*/
  ull power(ull x, ull m, ull n) {
    ull temp;
    if (m==1) {
      return(x);
    }
    temp=power(x,m/2,n);
    temp*=temp;
    temp%=n;
    if ((m & 1)==1) { // m is odd
      temp*=x;
      temp%=n;
    }
    return(temp);
  }
At the first recursive call you know temp is less than n. If you don't reduce the product temp*temp mod n and m is odd, you might get overflow. I'm quite certain this is what happened. If you remove the first temp%=n, I believe the above function and your function will be the same.

Btw, Java's BigInteger class has a method modPow which has exactly the functionality of power written above. Java's powMod is really fine tuned for efficiency, so it looks much more complicated that the above. However, in general it's much faster.

Thank you very much johng! The problem is solved. :)

johng1 · Jan 26, 2015

Hi again,
My bad. I forgot to mention a "bad" feature of your function:

Code:

 if(N%2==0)
        return (power(N/2, M, P)*power(N/2, M, P))%M;
    else
        return (power(N/2, M, P)*power(N/2, M, P)*P)%M;
}

You make two recursive calls in succession:
(power(N/2, M, P)*power(N/2, M, P))

Let T(N) be the total number of recursive calls. Then it's easy to see T satisfies T(N)=2T(N/2) and T(2)=2. So for N a power of 2, T(N)=N. Then for N about a billion, you make about a billion recursive calls. This is going to be pretty slow.
For the function as I wrote it, the number of recursive calls is about $\log_2(N)$, a big improvement.

Saitama · Jan 26, 2015

johng said:
Hi again,
My bad. I forgot to mention a "bad" feature of your function:
Code:
 if(N%2==0)
        return (power(N/2, M, P)*power(N/2, M, P))%M;
    else
        return (power(N/2, M, P)*power(N/2, M, P)*P)%M;
}
You make two recursive calls in succession:
(power(N/2, M, P)*power(N/2, M, P))

Let T(N) be the total number of recursive calls. Then it's easy to see T satisfies T(N)=2T(N/2) and T(2)=2. So for N a power of 2, T(N)=N. Then for N about a billion, you make about a billion recursive calls. This is going to be pretty slow.
For the function as I wrote it, the number of recursive calls is about $\log_2(N)$, a big improvement.

I did notice that when I read your post. Thank you once again, johng! :)

Optimizing K^(2^N) Mod 10^9+7 in C++

FAQ: Optimizing K^(2^N) Mod 10^9+7 in C++

What is K^(2^N) and why is it important in C++?

How can K^(2^N) be implemented in C++?

What are the advantages of using K^(2^N) in C++?

Are there any limitations to using K^(2^N) in C++?

How can one avoid potential errors when implementing K^(2^N) in C++?

Similar threads

Hot Threads

Recent Insights