hermann on Sat, 15 Jul 2023 00:03:06 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: Why is "lift(Mod(qnr, n)^(n\4))" 16% slower than C libgmp "powm(r, qnr, n/4, n)" ?


On 2023-07-14 19:13, Bill Allombert wrote:
The short answer is that the GNU MP library does not provide a function mpn_powm that PARI could use. mpz_powm use a lot of internal mpn functions for fast modular
reduction which are very efficient but not public.

Now, I could add a wrapper for mpz_powm for large entries but 16% slower is not
that bad and we need fast modular reduction in more general setting.

Could you compare

p=(2^95369 + 1)/3; Mod(2,p)^((p-1)/5)
with mpz_powm and PARI ?

Sure.

Performance problem was related to the base, exponent and modulus I used. For your base, exponent and modulus gp outperforms C++ with mpz_powm() by 27% !
That is very good to know.
OS is Ubuntu 22.04 Desktop.

gp:  19.6s (5.440 GHz)
g++: 26.9s (5.320 GHz)


hermann@7600x:~$ cat 95369.gp
p=(2^95369 + 1)/3;
x=lift(Mod(2, p)^((p-1)\5));
##
print(x);
hermann@7600x:~$ perf stat -e cycles,task-clock gp < 95369.gp
Reading GPRC: /etc/gprc
GPRC Done.

                  GP/PARI CALCULATOR Version 2.15.3 (released)
          amd64 running linux (x86-64/GMP-6.2.1 kernel) 64-bit version
compiled: Jun 30 2023, gcc version 11.3.0 (Ubuntu 11.3.0-1ubuntu1~22.04.1)
                            threading engine: single
               (readline not compiled in, extended help enabled)

                     Copyright (C) 2000-2022 The PARI Group

PARI/GP is free software, covered by the GNU General Public License, and comes
WITHOUT ANY WARRANTY WHATSOEVER.

Type ? for help, \q to quit.
Type ?18 for how to get moral (and possibly technical) support.

parisizemax = 2000003072, primelimit = 500000
  ***   last result computed in 19,604 ms.
1
Goodbye!

 Performance counter stats for 'gp':

   107,245,113,513      cycles                    #    5.440 GHz
19,715.98 msec task-clock # 1.000 CPUs utilized

      19.717168440 seconds time elapsed

      19.605913000 seconds user
       0.110405000 seconds sys


hermann@7600x:~$



hermann@7600x:~$ g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra -o 95369
hermann@7600x:~$ perf stat -e cycles,task-clock ./95369
28709-digit prime p (95368 bits)
26.8999s
1
done

 Performance counter stats for './95369':

   143,114,482,703      cycles                    #    5.320 GHz
26,901.18 msec task-clock # 1.000 CPUs utilized

      26.902467744 seconds time elapsed

      26.901533000 seconds user
       0.000000000 seconds sys


hermann@7600x:~$ cat 95369.cc
// g++ 95369.cc -lgmp -lgmpxx -O3 -Wall -pedantic -Wextra -o 95369
// (cpplinted and cppchecked)
//
#include <time.h>
#include <math.h>
#include <gmpxx.h>
#include <assert.h>

#include <iostream>

int main() {
    mpz_class a, b, c, p;
    char *buf;
    int bits;
    float dt;
    buf = new char[32000000];
    assert(buf);

    mpz_ui_pow_ui(a.get_mpz_t(), 2, 95369);
    a += 1;
    a /= mpz_class("3");

    p = a;
    c = (p-1) / 5;

    std::cerr
<< strlen(mpz_get_str(buf, 10, p.get_mpz_t())) << "-digit prime p (" << (bits = strlen(mpz_get_str(buf, 2, p.get_mpz_t()))) << " bits)\n";

    b = 2;

    clock_t start = clock();
mpz_powm(a.get_mpz_t(), b.get_mpz_t(), c.get_mpz_t(), p.get_mpz_t());
    dt = static_cast<float>(clock() - start) / CLOCKS_PER_SEC;

    std::cerr << dt << "s\n";
    std::cout << a << "\n";

    std::cerr << "done\n";
    delete buf;

    return 0;
}
hermann@7600x:~$


Regards,

Hermann.