hermann on Wed, 30 Aug 2023 11:05:50 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Pi with MT


On 2023-08-30 09:32, Bill Allombert wrote:
There is a library for fast parallel multiplication of integers ...
..., but I do not remember its name.

A superfast multiplication library is gwnum lib.
There is a version that matches PARI/GP license.


Maybe we should consider implementing parallel multiplication for very very
large integers, using a two-stage NTT process.

As said, gwnum library does that, superfast, but with a restriction:
Only for x86, so Intel and AMD CPUs only.


I do not know of a separate gwnum library.
But it is part of Jean Penn's llr software:
http://jpenne.free.fr/index2.html

There are binary distributions, as well as complete sources that can build the binaries.


If you really consider implementing something on top of gwnum, then please either based on 3.8.21 version (the last that works on AMD CPUs for >2^100000)
or based on latest with fix so that it will work for AMD CPUs as well.


Regarding possible gain, I did determine with patched LLR software 3^((p-1)/4) (mod p) for
9.4million decimal digit prime number in 10:45h on AMD 7600X CPU:
https://github.com/Hermann-SW/9383761-digit-prime#fast-sqrt-1-mod-p-for-9383761-digit-prime-p-1-mod-4

Expected runtime of stopped after 10 days libgmpxx code for computing same was 75 days. So here 6 LLR threads on 7600X was 168x faster than single thread libgmpxx code!


What would be needed is a GMP "powm()" or PARI "Mod(a,p)" equivalent using gwnum.
Not a patch of prime testing LLR software as I did.

You can find my small patches as well as sample code verifying sqrt(-1) (mod p) here:
https://github.com/Hermann-SW/RSA_numbers_factored/tree/main/llr#introduction
Just sync and do "make" to see.


Regards,

Hermann Stamm-Wilbrandt.


P.S:
I did determine sqrt(-1) (mod p) as well as sum of squares for 6 Colbert numbers with
patched LLR software:
https://github.com/Hermann-SW/Colbert_numbers#readme
There are 6 entries [k,n,s,x,y] with p=k*2^n+1, s^2 == p-1 (mod p) and p==x^2+y^2
for use with PARI/GP, C++ and Python.