hermann on Sat, 15 Jul 2023 01:09:55 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Why is "lift(Mod(qnr, n)^(n\4))" 16% slower than C libgmp "powm(r, qnr, n/4, n)" ?


On 2023-07-15 00:19, Bill Allombert wrote:
gp:  19.6s (5.440 GHz)
g++: 26.9s (5.320 GHz)

Yes, I got the same result... but in fact, this is rather annoying!
If I change PARI to use mpz_powm then I will slowdown this case.

There are two factors that can effect this:
1) the base 2 is handled specially by PARI
2) the bit pattern of the exponent, in particular the number of 1s.

If you could find in which case PARI is faster then GMP, this would be helpful.

I think you already answered the question with (1).
I changed base to 3, and now g++ with libgmpxx is slightly faster than gp.

hermann@7600x:~$ cat 95369.gp
p=(2^95369 + 1)/3;
x=lift(Mod(3, p)^((p-1)\5));
##
write("/dev/stderr", x);
print(vecsum(binary((p-1)\5)), " 1-bits in exponent");
hermann@7600x:~$
hermann@7600x:~$ gp -q < 95369.gp 2>out.gp
  ***   last result computed in 27,583 ms.
23842 1-bits in exponent
hermann@7600x:~$ ./95369 > out.cc
28709-digit prime p (95368 bits)
26.0986s
done
hermann@7600x:~$ diff out.gp out.cc
hermann@7600x:~$ wc --bytes out.gp
28710 out.gp
hermann@7600x:~$


Since Pari/GP user guide proposes in section 1.2 to use gmp kernel, you might want to upstream your base 2 powm() enhancements to gmplib?

gmp-devel list is quite responsive. I learned from principal GMP author Torbjörn Granlund some tips to get improved gmpbench performance for 7600X CPU:
https://gmplib.org/list-archives/gmp-devel/2023-July/006186.html
Increased by 14.5% by using daily snapshot, now rank 2 compared to published list. But most importantly he stated that the "multiply" parts of gmpbench showed that for Zen4 CPUs GMP is not as good as it should be ...


Regards,

Hermann.