hermann on Wed, 20 Mar 2024 23:24:06 +0100


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: How to efficiently count Proth primes with GP parfor?


Missing gist link:
https://gist.github.com/Hermann-SW/558728c025a1f2d3dace008292bac897

On 2024-03-20 23:20, hermann@stamm-wilbrandt.de wrote:
On 2024-03-20 15:53, hermann@stamm-wilbrandt.de wrote:
? export(isProth2)
?
my(nbt=default(nbthreads));my(c=0,B=4636016641\nbt);parfor(ii=0,(4636016641+B-1)\B,my(cc=0,i=ii*B+1);forprime(p=i,min(i+B-1,4636016641),cc+=isProth2(p));cc,C,c+=C);c
cpu time = 2min, 27,991 ms, real time = 5,000 ms.
%12 = 10000
?


Wow -- 3193% CPU !!!

While using Karim's isProth2() I improved Bill's parallel work
distribution scheme.

Instead of blocks of size B=n\nbt now block are of size B=n\nbt\if(l,l,1).
Since distribution of primes as well as distribution of Proth primes
is not the same for lower half ad higher half, making each of the 32
hardware threaas work l smaller but more evenly distributed work
areas.

All in new proth.gp gist with comments showing increased l effects.
Overall speedup is limited by Amdahls law.
I increased from n slghtly more than 4 billiom for first 10,000 Proth
primes to 100 billion:

hermann@7950x:~$ n=10^11 l=32 gp -q < proth.gp
32
1min, 10,058 ms × 31.784 = 37min, 6,695 ms
39170
15min, 13,952 ms (13.046×)
39170
hermann@7950x:~$


Very close to factor 32× between gettime() overall time and
getwalltime() sequential time.
Not much better factor than 13× between sequential getwalltime() and
parallel getwalltime().

Regards,

Hermann.