hermann on Wed, 02 Aug 2023 13:43:54 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: efficient transformations between "sum of squares" and "sqrt(-1) (mod p)" for 1M-digit prime


On 2023-06-20 02:02, hermann@stamm-wilbrandt.de wrote:
I learned on this mailing list about "halfgcd()" allowing to
efficiently compute sum of squares given "sqrtm1 = sqrt(-1) (mod p)",
and for computing "sqrtm1" from sum of squares as "x*y^(-1) (mod p)".

I just finished big project to determine "sqrt(-1) (mod p)" for largest known 9,383,761-digit prime p =1 (mod 4):
https://github.com/Hermann-SW/9383761-digit-prime#readme

I stopped the 75.4 days expected runtime run after 9d 20.9h, because I learned about llr tool based on gwnum library, and computed "sqrt(-1) (mod p)" in 13.2h plus 10s post processing. That is 137x faster than with my libgmpxx based code.

gwnum library is x86 specific, but allows for huge speedups using AVX512 instructions, and multi-core processing a single square or square+mult (mod p). And 31,172,177 such operations had to be done ...

New PARI/GP script with that sqrtm1 constant:
https://github.com/Hermann-SW/RSA_numbers_factored/blob/main/pari/sqrtm1.9383761_digit.largest_known_1mod4_prime.gp

Nice, computing sum of squares for that huge prime from sqrtm1 in 2.9s only, and computing sqrtm1 given x and y in 4.2s:

hermann@7600x:~/RSA_numbers_factored/pari$ gp -q
? \r sqrtm1.9383761_digit.largest_known_1mod4_prime
9383761-digit prime p (31172179 bits)
[M,V] = halfgcd(sqrtm1, p)
  ***   last result computed in 2,854 ms.
[x,y] = [V[2], M[2,1]]
  ***   last result computed in 1 ms.
sqrtm1 = lift(Mod(x/y, p))
  ***   last result computed in 5,929 ms.
sqrtm1 = lift(x*Mod(1/y, p))
  ***   last result computed in 4,531 ms.
sqrtm1 = lift(Mod(x, p)/y)
  ***   last result computed in 4,203 ms.
done, all asserts OK
?


Reards,

Hermann.