Loïc Grenié on Mon, 05 Jun 2023 15:52:02 +0200


[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]

Re: parsum and nbthreads


    Hi,

On Mon, May 29, 2023 at 15:21, Bill wrote:
On Wed, May 24, 2023 at 10:32:47AM +0200, Loïc Grenié wrote:
>    Hi,
>
>    I've recently used parsum(m2=1,20,<code>) on a 16-threaded
>   machine. parsum parallelizes the sum only up to floor(sqrt(20))
>   threads (hence 4 in my case, for a 16-threaded computer). While
>   this is certainly a good limit if the number of things to sum is
>   >= nbthreads^2, it is less good in the other case.

Use vecsum(parvector(20,m2,<code>))
parsum only makes sense for large sums when storing all the terms would
use too much memory.

vecsum(parvector) has the big advantage of always giving the same result.

    I'm not utterly convinced by parsum's implementation. It's clear that I can
  use vecsum(parvector) or parfor() or whatever, however I do not
  see why I should... I need to sum code that could gain to be summed in
  parallel, why shouldn't parsum try to be as efficient as possible, in the
  largest number of reasonably foreseeable cases.

     Right now, parsum(n=a,b,...) is equivalent to

my(b=b,N=b-a+1,m=sqrtint(N),s);parfor(n0=a,a+m-1,vector((b-n0)\m+1,k,my(n=n0+(k-1)*m);print(n);...),d,s+=vecsum(d));s

   (sorry if I made a mistake, but that should be the rough idea).
   This is inefficient in a certain number of cases: if N is <nbthreads^2, if the
  objects to sum are very large, if most of the time is spent in the actual
  summing (instead of the computation of ...). Moreover all the objects to be
  summed are transferred from the peripheral nodes to the central node.

    I've pushed a "loic-parsum" to pari git. It does not change the number of
  threads (I still think it's not optimal right now), however it addresses most
  of the problems I've illustrated before, should not hurt performance, as far
  as I can tell, and passes the tests involving parsum (export, parallel,
  programming).

    The drawback is that it exports one more function (+1 line in paripriv, and
  +7 non-empty lines in src/functions/programming/parsum) and substitutes
  two functions by a longer one (+14 non-empty lines in src/language/eval.c).

       Could you consider it for inclusion, eventually modified?

          Thank you, best,

                  Loïc