hermann on Tue, 20 Jun 2023 02:06:57 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

efficient transformations between "sum of squares" and "sqrt(-1) (mod p)" for 1M-digit prime


I learned on this mailing list about "halfgcd()" allowing to efficiently compute sum of squares given "sqrtm1 = sqrt(-1) (mod p)", and for computing "sqrtm1" from sum of squares as "x*y^(-1) (mod p)".


I implemented those computations for

C++ with libgmpxx and PARI
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/c%2B%2B/sqrtm1.smallest_known_1million_digit_prime.cc

Python with cypari
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/python/sqrtm1.smallest_known_1million_digit_prime.py

PARI/GP
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/pari/sqrtm1.smallest_known_1million_digit_prime.gp

for smallest known 1,000,000-digit prime number 10^999999+308267*10^292000+1.


.cc/.py/.gp runtimes for "x*y^(-1) (mod p)" are (GP slower than the other languages):
249/268/309 ms

.cc/.py/.gp runtimes for "halfgcd()" are (big overhead for cypari call of "halfgcd()"):
267/445/269 ms

Same numbers in diagram:
https://stamm-wilbrandt.de/images/xy_sqrtm1__xy.transformation_times.png


Outputs, with single process running and showing near maximal boost CPU frequency of 4.8GHz of i7-11850H:


c++$ perf stat ./sqrtm1.smallest_known_1million_digit_prime 2>err
a = y^(-1) (mod p) [powm]; a *= x; a %= p
0.249437s
[M,V] = halfgcdii(sqrtm1, p)
0.266697s
[x,y] = [V[2], M[2,1]]
0s
done
c++$ grep GHz err
     2,805,719,697      cycles:u                  #    4.701 GHz
c++$


python$ perf stat python sqrtm1.smallest_known_1million_digit_prime.py 2>err
1000000 -digit prime p ( 3321925  bits)
PARI stack size set to 1000000000 bytes, maximum size set to 1000001536
[M,V] = pari.halfgcd(sqrtm1, p)
444.5ms
[x,y] = [mpz(V[1]), mpz(M[1,0])]
0.5ms
sqrtm1 = x * pow(y, -1, p) % p
268.4ms
done, all asserts OK
python$ grep GHz err
     4,903,646,857      cycles:u                  #    4.606 GHz
python$


pari$ perf stat gp-2.15 -q < sqrtm1.smallest_known_1million_digit_prime.gp 2>err
1000000-digit prime p (3321925 bits)
[M,V] = halfgcd(sqrtm1, p)
  ***   last result computed in 269 ms.
[x,y] = [V[2], M[2,1]]
  ***   last result computed in 0 ms.
sqrtm1 = lift(Mod(x/y, p))
  ***   last result computed in 432 ms.
sqrtm1 = lift(x*Mod(1/y, p))
  ***   last result computed in 311 ms.
sqrtm1 = lift(Mod(x, p)/y)
  ***   last result computed in 309 ms.
done, all asserts OK
pari$ grep GHz err
     7,549,438,197      cycles:u                  #    4.703 GHz
pari$


Regards,

Hermann.